Comprehensive Guide to Optimising MongoDB Performance
Rate this article
MongoDB is celebrated for its high performance and scalability, making it a popular choice among NoSQL databases. However, to fully leverage its potential, fine-tuning your MongoDB deployment is essential. This guide outlines various strategies and best practices for enhancing MongoDB performance, covering everything from identifying bottlenecks to optimizing queries and hardware.
Before diving into performance tuning, it's crucial to understand your workload. MongoDB's performance can vary significantly based on whether your application is read-heavy, write-heavy, or a balanced mix. Utilize tools like MongoDB's Atlas Profiler or the open-source
mongostat
to analyze your database operations and gain insights into your workload.Effective indexing is one of the most impactful ways to enhance query performance in MongoDB. Here are key practices:
- Create relevant indexes: Tailor indexes to match your application's query patterns. Use the [explain() method to understand query behavior and optimize accordingly.
db.collection.find({ field: value }).explain("executionStats")
You can also get this information from MongoDB Compass with sophisticated output as shown below.
- Avoid over-indexing: While indexes improve query speed, they can hinder write operations and consume additional disk space. Regularly review and remove unused or unnecessary indexes.
db.collection.dropIndex("indexName")
- Use compound indexes: For queries involving multiple fields, compound indexes can significantly boost performance.
db.collection.createIndex({ field1: 1, field2: -1 })
Optimizing your query patterns is crucial for reducing execution time and resource usage:
- Projection: Use projection to limit the fields returned by your queries, minimizing data transfer and processing load. Also, it’s better to exclude _id with 0 (false) if it’s not a field pertaining to the application — i.e., an auto-generated field by MongoDB.
db.collection.find({ field: value }, { field1: 1, field2: 1 })
- Aggregation framework: Leverage MongoDB's aggregation framework for complex data processing. Ensure aggregations utilize indexed fields where possible.
db.collection.aggregate([ { $match: { field: value } }, { $group: { _id: "$field", total: { $sum: "$amount" } } } ])
- Avoid $where: The
$where
operator can be slow and resource-intensive. Use it sparingly and only when necessary. Instead, the use of $expr with aggregation operators that do not use JavaScript (i.e., non-$function and non-$accumulator operators) is faster than $where because it does not execute JavaScript and is preferable, when possible. However, if you must create custom expressions, $function is preferred over $where.
The hardware on which MongoDB runs plays a crucial role in its performance:
- RAM: MongoDB relies heavily on RAM to store working sets. If your dataset exceeds your available RAM, consider upgrading your memory.
- Storage: Utilize SSDs for storage to enhance I/O throughput and data access speeds.
- Network: Ensure your network bandwidth and latency are sufficient, especially in distributed deployments.
Replication and sharding
MongoDB supports replication and sharding to improve availability and scalability:
- Replication: This ensures data redundancy and high availability. Configure read preference settings to effectively route read operations across replicas.
rs.initiate()
Following are the available read methods with MongoDB which you can configure at the application level.
- primary: Reads from the primary only
- primaryPreferred: Reads from the primary if available, otherwise from a secondary
- secondary: Reads from a secondary only
- secondaryPreferred: Reads from a secondary if available, otherwise from the primary
- nearest: Reads from the nearest node based on network latency and operational health
Example: Setting read preferences in application code (Node.js)
- Sharding: This distributes data across multiple servers and is crucial for managing large datasets and high throughput operations. Choose a shard key that evenly distributes data and query load.
sh.enableSharding("mydatabase") sh.shardCollection("mydatabase.mycollection", { shardKey: 1 })
Choosing a shard key in MongoDB can significantly impact performance depending on whether your workload is read-heavy or write-heavy. Here are some guidelines for selecting a shard key based on your workload:
Shard key selection: Choose a shard key that evenly distributes read operations across shards.
Considerations:
Use a high-cardinality field that ensures even distribution of reads.
Avoid shard keys that can cause hot spots where most reads target a single shard.
Example: Use a user ID if user-related queries are common.
sh.shardCollection("mydatabase.mycollection", { userID: 1 })
Shard key selection: Choose a shard key that balances the write load across shards.
Considerations:
Use a field that changes frequently and ensures even write distribution.
Avoid monotonically increasing keys (e.g., timestamps) as they can lead to a single shard being a bottleneck.
Example: Use a hashed shard key to distribute writes evenly if you can not get a unique shard key.
sh.shardCollection("mydatabase.mycollection", { hashedField: "hashed" })
Additional considerations:
Monitor and adjust: Continuously monitor the performance and adjust shard keys if needed.
Indexing: Ensure indexes are aligned with the shard key for optimal query performance.
By selecting the appropriate shard key and considering the nature of your workload, you can optimize your MongoDB deployment for both read and write operations.
Regular monitoring and maintenance are vital for sustained performance:
- Monitoring tools: Utilize MongoDB Atlas, mongostat, and mongotop to monitor database performance and resource usage.
mongostat --host <host> mongotop --host <host>
- Routine maintenance: Regularly compact collections, repair databases, and rebalance shards to ensure optimal performance.
db.repairDatabase()
The choice of write concern can influence both the performance and the durability of the data.
A lower write concern (e.g., w: 0) can enhance performance by reducing the latency of the write operation. However, it risks data durability.
Impact on latency
Lower write concern (e.g., w: 0):
Latency reduction:
- The client does not wait for any acknowledgment from the server.
- The operation is sent to the server and considered complete from the client's perspective.
- There is no network round-trip latency as there is no need for the server to respond.
Trade-off:
- There's an increased risk of data loss since the client receives no confirmation of write success.
- It's suitable for non-critical data or scenarios where high write throughput is needed with minimal latency.
Higher write concern (e.g., w: 1 or w: "majority"):
Latency increase
- The client waits for acknowledgement from the server.
- For w: 1, waits for acknowledgment from the primary node.
- For w: "majority", waits for acknowledgment from the majority of replica set members.
- Network round-trip latency and server processing time add to the overall latency.
- Enhanced data durability and consistency.
- Ensures the write operation is replicated and acknowledged.
db.collection.insertOne({ field: "value" }, { writeConcern: { w: 1 } })
The choice of read preference can influence both the performance and the availability of the data.
Performance: Distributing read operations to secondary members can enhance performance by reducing the load on the primary.
To successfully distribute read operations to secondary members and thereby enhance performance, you need to set the read preference in MongoDB. Here are examples of how to configure read preferences:
db.getMongo().setReadPref("secondaryPreferred")
Connection URI
mongodb://host1,host2,host3/?readPreference=secondaryPreferred
** Application code example (NodeJS) **
By setting the read preference to secondaryPreferred, you direct read operations to secondary members when they are available, reducing the load on the primary node and enhancing overall performance.
Checks to identify the common reasons for performance issues:
- Run mongotop and mongostat, and check which namespace is causing the issue.
- System level - check for primary replication. Is there any lag, and how is the opLog window?
- Application level — check for any batch loads at the application level.
- Any slow queries (with currentOp())?
- Are there proper indexes?
- Sharded cluster — are the majority of the queries using the shard key?
- WT cache? Any evicts?
- Do you see write contention?
- Open files ( ulimit -a ) - 65000
- Check whether the mongod process alone causes server load or any other processes.
- top or htop: Monitor CPU and memory usage of mongod and other processes.
- ps and grep: Run
ps aux | grep mongod
to view mongod resource usage. - iostat: Use
iostat -x 1 10
to check disk I/O metrics. - vmstat: Run
vmstat 1 10
for overall system performance snapshots.
Write contention in MongoDB can be identified by the following indicators:
High locking percentages: Use mongostat to monitor lock percentages. High values indicate contention.
Slow write operations: Check for slow write operations using db.currentOp() which may indicate contention.
Frequent write conflicts: Review logs for messages about write conflicts or rejections.
Increased latency: Observe increased latency in write-heavy operations or applications.
Example command to monitor lock percentages:
mongostat --host <hostname>
Designing the schema properly, such as using appropriate indexes and avoiding hotspots with distributed writes, can help mitigate write contention.
Achieving optimal MongoDB performance involves a comprehensive approach, including query optimization, proper indexing, sufficient hardware resources, and continuous monitoring. By implementing the strategies outlined in this guide, you can significantly enhance the efficiency and responsiveness of your MongoDB deployment, ensuring it meets the demands of your applications.
Top Comments in Forums
Srinivas_MutyalaSrinivas Mutyala2 months ago
Hello Community !!
Seeking your suggestions/ feedback on this article.