Relational databases are a well established technology in most organizations, underpinning existing applications that meet current business needs. But IT teams are increasingly considering alternatives to legacy relational infrastructure as they build modern applications, driven by the need to:
Achieve faster time to market, enabled by agile development and flexible data models.
Handle rapidly growing volumes of structured, semi-structured and unstructured data.
Scale beyond the capacity constraints of existing systems.
Free themselves from expensive proprietary database software and hardware.
To meet these requirements, companies are building operational applications with a new class of non-tabular databases. MongoDB and HBase are leading technology options. To help IT teams select the best solution that will address their application requirements we compare both technologies in the sections below.
MongoDB is a non-relational database developed by MongoDB, Inc. MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Related information is stored together for fast query access through the MongoDB query language. Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the system offline. Optionally, schema validation can be used to enforce data governance controls over each collection.
MongoDB’s document data model maps naturally to objects in application code, making it simple for developers to learn and use. Documents give you the ability to represent hierarchical relationships to store arrays and other more complex structures easily. Native, idiomatic drivers are provided for 12+ languages – and the community has built dozens more – enabling ad-hoc queries, real-time aggregation and rich indexing to provide powerful programmatic ways to access and analyze data of any structure.
Because documents can bring together related data that would otherwise be modeled across separate parent-child tables in a relational schema, MongoDB’s atomic single-document operations already provide transaction semantics that meet the data integrity needs of the majority of applications. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document.
MongoDB 4.0 added support for multi-document transactions, making it the only database to combine the ACID guarantees of traditional relational databases, the speed, flexibility, and power of the document model, with the intelligent distributed systems design to scale-out and place data where you need it. Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions in MongoDB feel just like transactions developers are familiar with in PostgreSQL. They are multi-statement, with similar syntax (e.g. starttransaction and committransaction), and therefore easy for anyone with prior transaction experience to add to any application.
Learn more from the What is MongoDB page.
Apache HBase is a wide column store inspired by Google BigTable, written in Java and using HDFS (Hadoop Distributed File System) as its storage layer. HBase is designed for Key-Value workloads with random read and write access patterns.
HBase is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Applications store rows in labelled tables. A row has a sortable key and an arbitrary number of columns. All rows are sorted in strict alphabetical sequence. The table is stored sparsely, so that rows in the same table can have varying columns. Providing resilience to failures, data is replicated across a number of participating nodes in the cluster.
HBase is developed as part of Apache Software Foundation's Apache Hadoop project, with commercial distributions available from most Hadoop vendors.
Many relational database concepts have similarities with MongoDB and HBase. This table outlines some of the common concepts in each system.
|Multi-record ACID transactions||Multi-document ACID transactions||None. Requires implementation in app layer|
As non-relational databases, both MongoDB and HBase offer data model flexibility, scale-out with sharding and high read/write performance. There are, however, some fundamental differences between them that are highlighted in the tables below.
|Data Model||Document||Wide Column||Documents match the structures of objects in programming languages, providing greater simplicity to developers.|
|Supported Data Types|
including strings, 32 and 64 bit integers, floats, Decimal 128, dates, timestamps and geospatial
|Data converted to uninterpreted bytes||Support for multiple types allows for efficient data comparisons, sorting, and processing with lower application development effort|
|Query Model||Key-Value||An expressive query language enables running more complex queries to support advanced operational and real time analytics workloads|
|Secondary Indexes||Materialized views maintained by developers in code, or with coprocessors||Native secondary indexes enables greater developer productivity, while supporting richer data access patterns to answer complex queries|
|Aggregations||Data must be moved into dedicated analytics infrastructure for any queries beyond key-value lookups||Native database aggregations enable real time analytics on live operational data, without ETL into dedicated analytics systems|
Choosing a database is a major investment. Once an application has been built on a given database, it is costly and challenging to migrate it to a different database. Companies usually invest in a small number of core technologies so they can develop expertise, integrations and best practices that can be amortized across many projects.
Non-tabular databases are still relatively new, and while there are many options in the market, a small number of technologies will stand the test of time. In order to reduce risk, users should consider the health of the company and community standing behind the database. It is important not only that the product continues to exist, but also to evolve and to provide new features. A database with a strong community of users makes it easier to find and hire developers that are familiar with the product. It makes it easier to find information, documentation and code samples. It also helps organizations retain key technical talent. Lastly, a strong community encourages other technology vendors to develop integrations and to participate in the ecosystem.
When measuring market adoption, MongoDB occupies a much higher position than HBase in DB Engine Rankings and is classified in the Leader category in both, the Forrester Big Data NoSQL Wave and Database-As-A-Service Wave reports, while HBase failed to be included in either. Finally, for the fourth year running, MongoDB was voted StackOverflow's 'Most Wanted' database in 2020. MongoDB’s broader adoption and skills availability reduces risk and cost for new projects.
HBase is well suited to key-value workloads with high volume random read and write access patterns, especially for for those organizations already heavily invested in HDFS as a common storage layer. The leading Hadoop distributor positioned HBase for “super-high-scale but rather simplistic use cases”.
Comparing to MongoDB, the positioning goes on to state the following: “HBase offers very fast random reads and random writes if you want to look up users on a particular key, but MongoDB provides a much richer model through which you could track user behavior all the way through an online application.”
MongoDB’s design philosophy blends key concepts from relational technologies with the benefits of emerging NoSQL databases. While HBase is highly scalable and performant for a subset of use cases, MongoDB can be used across a broader range of applications. The latter’s intuitive data model, multi-document ACID transactions, rich query framework, native drivers, and lower operational overhead will often enable users to ship new applications faster and more easily than with HBase.
Download the Top 5 Considerations for Selecting Non-Relational Databases where you will discover how to evaluate technology options based on the data and query model, consistency guarantees, APIs, community strength, and level of commercial support.
How to evaluate non-relational database technology options
Dive deeper into MongoDB's technology