MongoDB and HBase Compared

Overview

Relational databases are a well established technology in most organizations, underpinning existing applications that meet current business needs. But IT teams are increasingly considering alternatives to legacy relational infrastructure as they build modern applications, driven by the need to:

  • Achieve faster time to market, enabled by agile development and flexible data models.

  • Handle rapidly growing volumes of structured, semi-structured and unstructured data.

  • Scale beyond the capacity constraints of existing systems.

  • Free themselves from expensive proprietary database software and hardware.

To meet these requirements, companies are building operational applications with a new class of non-tabular databases. MongoDB and HBase are leading technology options. To help IT teams select the best solution that will address their application requirements we compare both technologies in the sections below.

What is MongoDB?

MongoDB is a non-relational database developed by MongoDB, Inc. MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Related information is stored together for fast query access through the MongoDB query language. Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the system offline. Optionally, schema validation can be used to enforce data governance controls over each collection.

MongoDB’s document data model maps naturally to objects in application code, making it simple for developers to learn and use. Documents give you the ability to represent hierarchical relationships to store arrays and other more complex structures easily. Native, idiomatic drivers are provided for 10+ languages – and the community has built dozens more – enabling ad-hoc queries, real-time aggregation and rich indexing to provide powerful programmatic ways to access and analyze data of any structure.

Because documents can bring together related data that would otherwise be modeled across separate parent-child tables in a relational schema, MongoDB’s atomic single-document operations already provide transaction semantics that meet the data integrity needs of the majority of applications. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document.

MongoDB 4.0 added support for multi-document transactions, making it the only database to combine the ACID guarantees of traditional relational databases, the speed, flexibility, and power of the document model, with the intelligent distributed systems design to scale-out and place data where you need it. Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions in MongoDB feel just like transactions developers are familiar with in PostgreSQL. They are multi-statement, with similar syntax (e.g. starttransaction and committransaction), and therefore easy for anyone with prior transaction experience to add to any application.

Learn more from the What is MongoDB page.

What is HBase?

Apache HBase is a wide column store inspired by Google BigTable, written in Java and using HDFS (Hadoop Distributed File System) as its storage layer. HBase is designed for Key-Value workloads with random read and write access patterns.

HBase is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Applications store rows in labelled tables. A row has a sortable key and an arbitrary number of columns. All rows are sorted in strict alphabetical sequence. The table is stored sparsely, so that rows in the same table can have varying columns. Providing resilience to failures, data is replicated across a number of participating nodes in the cluster.

HBase is developed as part of Apache Software Foundation's Apache Hadoop project, with commercial distributions available from most Hadoop vendors.

Key Concepts

Many relational database concepts have similarities with MongoDB and HBase. This table outlines some of the common concepts in each system.

RDBMSMongoDBHBase
TableCollectionTable
RowDocumentColumn Family
No EquivalentShardRegion
GROUP_BYAggregation PipelineMapReduce
Multi-record ACID transactionsMulti-document ACID transactionsNone. Requires implementation in app layer

Feature Comparison

As non-relational databases, both MongoDB and HBase offer data model flexibility, scale-out with sharding and high read/write performance. There are, however, some fundamental differences between them that are highlighted in the tables below.

Considerations for Developers

MongoDBHBaseResult
Data ModelDocumentWide ColumnDocuments match the structures of objects in programming languages, providing greater simplicity to developers.
Supported Data TypesMultiple, including strings, 32 and 64 bit integers, floats, Decimal 128, dates, timestamps and geospatialData converted to uninterpreted bytesSupport for multiple types allows for efficient data comparisons, sorting, and processing with lower application development effort
Query ModelExpressive query language with powerful query operators, comparison, equality, projections, filtering and aggregationsKey-ValueAn expressive query language enables running more complex queries to support advanced operational and real time analytics workloads
Secondary IndexesNative feature of the database, including text geospatial, compound, TTL indexes, and moreMaterialized views maintained by developers in code, or with coprocessorsNative secondary indexes enables greater developer productivity, while supporting richer data access patterns to answer complex queries
AggregationsAggregation pipeline including JOINs, graph traversals, search facets and moreData must be moved into dedicated analytics infrastructure for any queries beyond key-value lookupsNative database aggregations enable real time analytics on live operational data, without ETL into dedicated analytics systems

Market Adoption

Choosing a database is a major investment. Once an application has been built on a given database, it is costly and challenging to migrate it to a different database. Companies usually invest in a small number of core technologies so they can develop expertise, integrations and best practices that can be amortized across many projects.

Non-tabular databases are still relatively new, and while there are many options in the market, a small number of technologies will stand the test of time. In order to reduce risk, users should consider the health of the company and community standing behind the database. It is important not only that the product continues to exist, but also to evolve and to provide new features. A database with a strong community of users makes it easier to find and hire developers that are familiar with the product. It makes it easier to find information, documentation and code samples. It also helps organizations retain key technical talent. Lastly, a strong community encourages other technology vendors to develop integrations and to participate in the ecosystem.

When measuring market adoption, MongoDB occupies much higher positions than HBase in both DB Engine Rankings and the 451 Group’s NoSQL skills index. MongoDB’s broader adoption and skills availability reduces risk and cost for new projects.

What to Use

HBase is well suited to key-value workloads with high volume random read and write access patterns, especially for for those organizations already heavily invested in HDFS as a common storage layer. The leading Hadoop distributor positioned HBase for “super-high-scale but rather simplistic use cases”.

Comparing to MongoDB, the positioning goes on to state the following: “HBase offers very fast random reads and random writes if you want to look up users on a particular key, but MongoDB provides a much richer model through which you could track user behavior all the way through an online application.”

MongoDB’s design philosophy blends key concepts from relational technologies with the benefits of emerging NoSQL databases. While HBase is highly scalable and performant for a subset of use cases, MongoDB can be used across a broader range of applications. The latter’s intuitive data model, multi-document ACID transactions, rich query framework, native drivers, and lower operational overhead will often enable users to ship new applications faster and more easily than with HBase.

Learn More About Selecting Your Next Database

Download the Top 5 Considerations for Selecting Non-Relational Databases where you will discover how to evaluate technology options based on the data and query model, consistency guarantees, APIs, community strength, and level of commercial support.

Top 5 Considerations

How to evaluate non-relational database technology options

Learn about MongoDB

Dive deeper into MongoDB's technology