When NoSQL databases first launched in the early 2000s, they were immediately adopted by some of today's software giants. These organizations understood the growing power of data and the implications it would have on modern applications. However, in those days, NoSQL databases were not considered general-purpose databases, as we describe MongoDB today. Each one has been developed to answer a demand for some specific workload requirement.
One of these early NoSQL databases is Cassandra, a distributed database that blends a tabular store and a key-value store. MongoDB's distributed document data model is a proven alternative to Cassandra, as it can be adapted to serve many different use cases. In this article, we will discuss the differences between Cassandra and MongoDB.
Since Cassandra has different distributions, we will focus specifically on Apache Cassandra in this article.
What is Cassandra?
Cassandra was initially started by Facebook in 2008 and was intended to power the company’s inbox search capabilities. It is considered to be a distributed, wide-column store running as clusters of nodes called “rings.” Each node in a Cassandra ring stores some ranges of the data and replicates other ranges as a scaling and fault tolerance mechanism. Cassandra is an eventually consistent database with limited tunable consistency, but in general, it favors availability over consistency.
Over the years, Cassandra was open-sourced and is now part of the Apache Foundation, while there are additional commercially available versions.
Cassandra stores data in tabular-like tables with a unique identifier (or set of combined unique identifiers) acting as a key to a stored row, operating mostly as a key-value store with the ability to add a large number of columns to the key. The amount of columns can vary from row to row and therefore, Cassandra has to manage some metadata attributes for each row.
Cassandra's query language is called CQL, and it has its share of similarities to SQL. However, Cassandra does not support joins or subqueries and therefore requires a developer to denormalize the data or duplicate data for efficient access.
Comparing Key Differences
The table below highlights the main differences between Cassandra and MongoDB.
Cassandra vs. MongoDB: Architecture
The basic unit of MongoDB clusters is the replica set. A replica set consists of a primary node, which serves reads and writes to the application, and two or more secondary nodes that serve as replicas of the primary. Out of the box, this architecture enables users to use MongoDB for many types of workloads, depending on their consistency and performance requirements.
Cassandra offers a leaderless ring architecture, allowing users to write to any node. Therefore, Cassandra prioritizes write performance by default. Like MongoDB, Cassandra’s performance will be limited by consistency requirements. However, Cassandra’s data is spread across partitions according to the primary key of the data model. For reads to be performant, partitions and the data inside them must meet specific conditions. Therefore, advanced data modeling skills become critical to ensure acceptable read performance.
Supported Languages
Both Cassandra and MongoDB recommend using language-native drivers to interact with their databases. MongoDB offers a set of officially supported drivers for almost every modern language. It also offers an extension of community drivers and ODMs (object-to-document mapping).
Cassandra offers different flavors of drivers with each language, as they are mostly supported by third-party vendors.
MongoDB publishes an up-to-date compatibility matrix to verify that clients use an optimal driver version for each associated server version. In addition, MongoDB supports a stable API, which allows driver and server versions to be decoupled as long as the API version is aligned. This provides better and safer upgrade guarantees with no breaking changes.
Cassandra documentation does not offer a clear compatibility picture and developers are dependent on third-party documentation to understand the version requirements.
Cassandra vs. MongoDB: Query Language
MongoDB has a single and robust query API called MQL (MongoDB Query Language). It uses different CRUD methods with JSON object inputs to describe queries, write operations, and aggregations.
Cassandra uses a query language called CQL, which has similarities to SQL, as it uses similar keywords like “SELECT,” “INSERT,” “UPDATE,” etc., to interact with Cassandra tables.
Let's compare similar database commands between Cassandra and MongoDB.
Create a row/document
In Cassandra:
In MongoDB:
Comparison notes: With both methods, we submit the record to be stored in “planets” (table/collection).
Query a document by ID
In Cassandra:
In MongoDB:
Comparison notes: Retrieving the data in MongoDB is done with a query parameter, whereas in Cassandra, it's done via a SELECT statement.
Update a document
In Cassandra:
In MongoDB:
Comparison notes: To update a record in MongoDB, we need to issue an updateOne command and specify the new field values under the $set operator in the update clause.
In Cassandra, we use the UPDATE command. The where clause must include the entire primary key. Otherwise, the update won't work.
Aggregate a group by tag
In Cassandra:
In MongoDB:
Comparison notes: Aggregations in MongoDB consist of a query language API receiving a pipeline of stages.
Cassandra can use a GROUP BY clause in a SELECT statement. It only accepts primary key columns in defined order as arguments. Otherwise, other methods are required (like custom functions and analytical nodes).
Cassandra vs. MongoDB: Data Model
MongoDB's data modeling is done through its document data model. This allows documents to maintain a flexible and polymorphic structure, as they can adapt to changing requirements in an application's code. Documents are stored in a binary JSON format called BSON and are grouped into collections within the same database. This is a departure from the tabular data model, where the schema usually drives how developers must build their application. For example, a soccer gaming application might have the following document:
As shown above, a complex state and detailed description can all fit into one logical object. If a data model calls for more structure and finer schema controls, developers can leverage MongoDB's schema validation feature to enforce such rules in the collections that need them.
Cassandra uses a tabular-like structure, where each table resides in a namespace called “keyspace.” Keyspaces are similar concepts to databases in MongoDB and provide a grouping level for tables. Tables are structured in a key and wide-column representation, where the primary key defines the key structure. Without a secondary index, data can be either fully scanned or queried via a primary key filter. Columns can vary from row to row, but require the primary key as part of each row.
A representation of the previous presented document as a Cassandra row could be defined as:
The data represented in this single MongoDB document must be distributed across roughly 17 columns in a Cassandra row.
Secondary Indexes
Secondary indexes are a vital consideration for any application's queries. These are important for enabling an application to performantly fetch data that is not tied to a specific unique identifier or a subset of a primary key.
MongoDB allows developers to build secondary indexes as part of basic collection administration on any field, including objects, arrays, geographic data, and even in conjunction with wildcard and compound wildcard indexes. Additionally, the modern data platform allows users to build secondary full-text and vector search indexes. Secondary indexes require no administration once created.
Cassandra has the ability to create storage-attached indexes (SAI) on other columns than the defined primary key. However, SAI acts as a filter. This means that SAI Cassandra will only allow filtering columns with a secondary index, while in MongoDB, the query language can filter on non-indexed fields as well. This dramatically impacts query flexibility as developers must predict all columns used for filtering up front or create indexes on the fly, risking system stability. Furthermore, although Cassandra’s SAI are able to index more than 15 data types, they only support 15 query operators. MongoDB supports 30+ operators, allowing more query flexibility.
These limitations will lead to Cassandra being more difficult to manage at scale than MongoDB, and especially MongoDB Atlas.
Availability
MongoDB uses replication and replica sets to assure cluster availability. We recommend that each replica set consists of an odd number of nodes (three as a production minimum), allowing it to be fault tolerant to server failures. Whenever a primary becomes unavailable, an election automatically takes place to allow another replica set member to take over as the new primary. Having a single point that controls and performs all writes allows MongoDB to have tunable consistency levels, varying from strong to no conflict resolution requirements through using read and write concerns. MongoDB also allows reads from secondary replica set members with no block to replication for use cases when performance is valued over consistency.
Cassandra uses a replication factor set on a keyspace level to distribute replicas of the keyspace across the different nodes in a ring. The specific factor can be configured by the user. Distributing the replicas in a production environment requires additional architecture considerations as the cluster needs to identify “availability racks” and associate replication to different racks to avoid outages. Since each Cassandra node acts as a partition for one set of data and replicates another, there are consistency and coordination mechanisms that need to govern data distribution among nodes. Those mechanisms, such as read repair, can cause performance issues and blocking write periods for replication.
Scalability
MongoDB was designed to support horizontal scaling and elasticity through sharding. A sharded cluster consists of many replica sets, each holding a piece of the collection. The data distribution is decided by the sharding key and sharding configuration. The developer has the flexibility to choose the shard key and the type of sharding they wish to use (range, hash, etc.). Additionally, via zone sharding, data can be associated to a specific group of shards in accordance with the even distribution concept. Live resharding allows you to change shard keys with no downtime, enabling data distribution to evolve along with an application's evolving needs. Shard key advisor commands generate metrics that will help you refine your shard keys.
Cassandra uses the concept of a partition key, which is the table primary key, to evenly distribute data across the cluster. Essentially, Cassandra's partitioning functionality only supports hash sharding, with no ability to change the shard key over time without undergoing a time-consuming and costly process of manually re-sharding.
Due to this limitation, Cassandra cannot offer cross partition transactions, joins, or data referencing keys.
Aggregation Framework
3MongoDB offers a robust and rich aggregation framework to reshape, calculate, and output your data to the client or other collection. The aggregation framework serves as a pipeline of stages where each stage gets an input stream of the previous stage's documents, manipulating them and passing to the next stage. You can find operations like grouping, counting, restructuring objects and arrays, and many more. MongoDB Atlas also offers a built-in aggregation builder as part of its data explorer, support for Atlas Search's $search stage, and aggregations on Atlas Data Federation's data lakes.
Cassandra offers basic aggregations from version 3.0+ as part of its CQL SELECT statement, such as GROUP BY and COUNT. However, more sophisticated aggregations will need to be handled by the development team and built into an application's logic. It is also common for Cassandra users to have to rely on third-party utilities in order to effectively analyze and aggregate large amounts of data.
Cassandra vs. MongoDB: Read Performance
MongoDB is optimized for both reads and writes. If a user ensures that the correct indexes are in place for their application's common queries, and especially if the indexes fit in memory, they can expect high read performance capable of supporting most modern applications. MongoDB's performance can also be tuned for specific workloads through proper document schema design and cluster topology planning.
High read performance also does not need to come at the cost of strong consistency. Users can tune their cluster's workloads to read from the primary node, which eliminates the need for queries to be dependent on data coordination between multiple nodes. If the primary fails, the replica set will quickly elect a new primary from one of the secondaries and allow reads to be issued before the new primary is fully operational for writes.
Cassandra's architecture is optimized for writes, rather than complex read patterns. The database uses a leaderless model where the read of a primary key combination (partition-key) is coordinated across nodes and performed against the relevant node. This process adds overhead to the read operations, depending on which node receives what query range.
Cassandra vs. MongoDB: ACID Transactions
MongoDB multi-document ACID transactions allow you to perform all-or-nothing operations. Transactional databases fit a specific use case requirement, and we still encourage developers to consider alternate schema design prior to using them in order to avoid adding unnecessary complexity to their applications. Transaction support is one of the main differentiators between MongoDB and Cassandra.
Cassandra does not support multi-row ACID transactions and allows only isolation and durability to be tuned based on a single row operation. Cassandra's consistency model is, by default, in preference of availability over consistency.
Cassandra vs. MongoDB: Use Cases
MongoDB is a general-purpose document database. Therefore, it has a rich set of features and design patterns to cover almost all of today's modern applications. Many common modernizations to legacy SQL applications use MongoDB, as its flexible schema can be customized to transform existing data into the document model. Developers can also port over existing functionality from their SQL application, such as ACID transactions, secondary indexes on any field, and rich aggregation and query capabilities. In addition, it also supports horizontal scalability for big data as a core feature, as well as high availability and data segregation for meeting service level agreements for latency and compliance needs.
Cassandra is mostly used for key-value columnar use cases. It is more suitable for very predictable read and write patterns, especially in write-heavy workloads—for example, a logging or tracking system where there are no or a very small number of in-place updates.
Conclusion
Apache Cassandra is a wide-column store designed for specific use cases where the writes are done by a single primary key and are the vast majority of workloads. Scaling in Cassandra is only applicable to fairly niche workloads.
MongoDB is a general-purpose database that can support multiple use cases with its flexible document model, rich aggregation language, and robust features such as sharding and ACID-compliant transactions. Therefore, it can cover the vast majority of Cassandra's most popular use cases, and much more.
Your real question should be what limits you from using MongoDB for your next application. Since there is no easier way to run a database in the cloud than MongoDB Atlas, you should get started today.


