Pentaho's First-class Integration with MongoDB - The Future of BI

Matt Asay

Open source has moved from imitator to innovator, and modern Business Intelligence (BI) tools like Pentaho reflect this. Today most BI tools are still tightly integrated with relational databases (RDBMS). But as Pentaho showed in its most recent Pentaho Business Analytics 5.0, this is set to change. BI tool vendors who persist in offering second-hand integration to first-class data sources like NoSQL databases are going to fall behind.

After all, data today is messy. While there is an always will be structured data, the Big Data explosion is being driven by diverse semi-structured and unstructured data, untidy data that doesn't neatly fit into rows and columns.

Rebuilding For The Future

It's no surprise to anyone, including the BI vendors, that NoSQL databases like MongoDB are booming. In fact, MongoDB now ranks among the top-10 most popular databases in the world, with MongoDB on track to surpass all but Oracle, MySQL and SQL Server by the end of the year.

Fighting this shift toward NoSQL is like fighting gravity. It can't end well.

The problem is that it actually involves hard work to rearchitect BI tools to support NoSQL databases, and perhaps particularly document-oriented databases like MongoDB. The very flexibility that makes MongoDB such a joy to work with also makes it harder for a traditional BI tool to ingest data from it. It's possible to accomplish a low-fidelity integration using ODBC, but such an abstract approach abandons the rich functionality a document data model enables.

Pentaho's Deep, Native Integration

This is why I'm so excited by what Pentaho has done. Pentaho put in the work to create a deep, first-class integration with MongoDB. We were already working with Pentaho, but this expanded level of native integration between Pentaho Business Analytics 5.0 and MongoDB provides the first analytics capability with full support for MongoDB Replica Sets, Tag Sets and Read and Write Preferences.

So, for example, most BI tools don't pay attention to MongoDB Read preferences or Tagging, and don't expose these preferences to the end user. As such, users can't specify a preference at all, even if they wanted to. An ODBC connector, while an understandable compromise, offers no way to configure such preferences. It's a low fidelity way to pull data from MongoDB or any NoSQL database.

Pentaho went further. Much further. Pentaho engineers actually spent a year working with MongoDB's engineers to iterate on a tight, native integration that gives Pentaho customers full access to their MongoDB data. Given the ever-increasing flood of information rushing into MongoDB databases, this is forward-thinking and sets Pentaho apart. I suspect we'll see the rest of the industry follow Pentaho's lead. They'll have to.