MongoDB Developer Blog | MongoDB Blog

How to Choose the Best Embedding Model for Your LLM Application

August 30, 2024

Developer Blog

The Cost of Not Knowing MongoDB, Part 3: appV6R0 to appV6R4

Welcome to the third and final part of the series "The Cost of Not Knowing MongoDB." Building upon the foundational optimizations explored in Part 1 and Part 2 , this article delves into advanced MongoDB design patterns that can dramatically transform application performance. In Part 1, we improved application performance by concatenating fields, changing data types, and shortening field names. In Part 2, we implemented the Bucket Pattern and Computed Pattern and optimized the aggregation pipeline to achieve even better performance. In this final article, we address the issues and improvements identified in appV5R4 . Specifically, we focus on reducing the document size in our application to alleviate the disk throughput bottleneck on the MongoDB server. This reduction will be accomplished by adopting a dynamic schema and modifying the storage compression algorithm. All the application versions and revisions from this article were developed by a senior MongoDB developer, as they are built on all the previous versions and utilize the Dynamic Schema pattern, which isn't very common to see. Application version 6 revision 0 (appV6R0): A dynamic monthly bucket document As mentioned in the Issues and Improvements of appV5R4 from the previous article , the primary limitation of our MongoDB server is its disk throughput. To address this, we need to reduce the size of the documents being stored. Consider the following document from appV5R3, which has provided the best performance so far: const document = { _id: Buffer.from("...01202202"), items: [ { date: new Date("2022-06-05"), a: 10, n: 3 }, { date: new Date("2022-06-16"), p: 1, r: 1 }, { date: new Date("2022-06-27"), a: 5, r: 1 }, { date: new Date("2022-06-29"), p: 1 }, ], }; The items array in this document contains only four elements, but on average, it will have around 10 elements, and in the worst-case scenario, it could have up to 90 elements. These elements are the primary contributors to the document size, so they should be the focus of our optimization efforts. One commonality among the elements is the presence of the date field, with its value including the year and month, for the previous document. By rethinking how this field and its value could be stored, we can reduce storage requirements. An unconventional solution we could use is: Changing the items field type from an array to a document. Using the date value as the field name in the items document. Storing the status totals as the value for each date field. Here is the previous document represented using the new schema idea: const document = { _id: Buffer.from("...01202202"), items: { 20220605: { a: 10, n: 3 }, 20220616: { p: 1, r: 1 }, 20220627: { a: 5, r: 1 }, 20220629: { p: 1 }, }, }; While this schema may not significantly reduce the document size compared to appV5R3, we can further optimize it by leveraging the fact that the year is already embedded in the _id field. This eliminates the need to repeat the year in the field names of the items document. With this approach, the items document adopts a Dynamic Schema, where field names encode information and are not predefined. To demonstrate various implementation possibilities, we will revisit all the bucketing criteria used in the appV5RX implementations, starting with appV5R0. For appV6R0, which builds upon appV5R0 but uses a dynamic schema, data is bucketed by year and month. The field names in the items document represent only the day of the date, as the year and month are already stored in the _id field. A detailed explanation of the bucketing logic and functions used to implement the current application can be found in the appV5R0 introduction . The following document stores data for January 2022 (2022-01-XX), applying the newly presented idea: const document = { _id: Buffer.from("...01202201"), items: { "05": { a: 10, n: 3 }, 16: { p: 1, r: 1 }, 27: { a: 5, r: 1 }, 29: { p: 1 }, }, }; Schema The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0: export type SchemaV6R0 = { _id: Buffer; items: Record< string, { a?: number; n?: number; p?: number; r?: number; } >; }; Bulk upsert Based on the specification presented, we have the following updateOne operation for each event generated by this application version: const DD = getDD(event.date); // Extract the `day` from the `event.date` const operation = { updateOne: { filter: { _id: buildId(event.key, event.date) }, // key + year + month update: { $inc: { [`items.${DD}.a`]: event.approved, [`items.${DD}.n`]: event.noFunds, [`items.${DD}.p`]: event.pending, [`items.${DD}.r`]: event.rejected, }, }, upsert: true, }, }; filter: Target the document where the _id field matches the concatenated value of key, year, and month. The buildId function converts the key+year+month into a binary format. update: Uses the $inc operator to increment the fields corresponding to the same DD as the event by the status values provided. If a field does not exist in the items document and the event provides a value for it, $inc treats the non-existent field as having a value of 0 and performs the operation. If a field exists in the items document but the event does not provide a value for it (i.e., undefined), $inc treats it as 0 and performs the operation. upsert: Ensures a new document is created if no matching document exists. Get reports To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval. Each pipeline follows the same structure, differing only in the filtering criteria in the $match stage: const pipeline = [ { $match: docsFromKeyBetweenDate }, { $addFields: buildTotalsField }, { $group: groupSumTotals }, { $project: { _id: 0 } }, ]; The complete code for this aggregation pipeline is quite complicated. Because of that, we will have just a pseudocode for it here. 1: { $match: docsFromKeyBetweenDate } Range-filters documents by _id to retrieve only buckets within the report date range. It has the same logic as appV5R0. 2: { $addFields: buildTotalsField } The logic is similar to the one used in the Get Reports of appV5R3. The $objectToArray operator is used to convert the items document into an array, enabling a $reduce operation. Filtering the items fields within the report's range involves extracting the year and month from the _id field and the day from the field names in the items document. The following JavaScript code is logic equivalent to the real aggregation pipeline code. // Equivalent JavaScript logic: const [MM] = _id.slice(-2).toString(); // Get month from _id const [YYYY] = _id.slice(-6, -2).toString(); // Get year from _id const items_array = Object.entries(items); // Convert the object to an array of [key, value] const totals = items_array.reduce( (accumulator, [DD, status]) => { let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if (statusDate >= reportStartDate && statusDate < reportEndDate) { accumulator.a += status.a || 0; accumulator.n += status.n || 0; accumulator.p += status.p || 0; accumulator.r += status.r || 0; } return accumulator; }, { a: 0, n: 0, p: 0, r: 0 } ); 3: { $group: groupCountTotals } Group the totals of each document in the pipeline into final status totals using $sum operations. 4: { $project: { _id: 0 } } Format the resulting document to have the reports format. Indexes No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation. Initial scenario statistics Collection statistics To evaluate the performance of appV6R0, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Documents Data Size Document Size Storage Size Indexes Index Size appV5R0 95,350,431 19.19GB 217B 5.06GB 1 2.95GB appV5R3 33,429,492 11.96GB 385B 3.24GB 1 1.11GB appV6R0 95,350,319 11.1GB 125B 3.33GB 1 3.13GB Event statistics To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total data size and index size by the 500 million events. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Data Size/Events Index Size/Events Total Size/Events appV5R0 41.2B 6.3B 47.5B appV5R3 25.7B 2.4B 28.1B appV6R0 23.8B 6.7B 30.5B It is challenging to make a direct comparison between appV6R0 and appV5R0 from a storage perspective. The appV5R0 implementation is the simplest bucketing possible, where event documents were merely appended to the items array without bucketing by day, as is done in appV6R0. However, we can attempt a comparison between appV6R0 and appV5R3, the best solution so far. In appV6R0, data is bucketed by month, whereas in appV5R3, it is bucketed by quarter. Assuming document size scales linearly with the bucketing criteria (though this is not entirely accurate), the appV6R0 document would be approximately 3 * 125 = 375 bytes, which is 9.4% smaller than appV5R3. Another indicator of improvement is the Data Size/Events metric in the Event Statistics table. For appV6R0, each event uses an average of 23.8 bytes, compared to 27.7 bytes for appV5R3, representing a 14.1% reduction in size. Load test results Executing the load test for appV6R0 and plotting it alongside the results for appV5R0 and Desired rates, we have the following results for Get Reports and Bulk Upsert. Get Reports rates The two versions exhibit very similar rate performance, with appV6R0 showing slight superiority in the second and third quarters, while appV5R0 is superior in the first and fourth quarters. Figure 1. Graph showing the rates of appV5R0 and appV6R0 when executing the load test for Get Reports functionality. Both have similar performance, but without reaching the desired rates. Get Reports latency The two versions exhibit very similar latency performance, with appV6R0 showing slight advantages in the second and third quarters, while appV5R0 is superior in the first and fourth quarters. Figure 2. Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Get Reports functionality. appV5R0 has lower latency than appV6R0. Bulk Upsert rates Both versions have similar rate values, but it can be seen that appV6R0 has a small edge compared to appV5R0. Figure 3. Graph showing the rates of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has better rates than appV5R0, but without reaching the desired rates. Bulk Upsert latency Although both versions have similar latency values for the first quarter of the test, for the final three-quarters, appV6R0 has a clear advantage over appV5R0. Figure 4. Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has lower latency than appV5R0. Performance summary Despite the significant reduction in document and storage size achieved by appV6R0, the performance improvement was not as substantial as expected. This suggests that the bottleneck in the application when bucketing data by month may not be related to disk throughput. Examining the collection stats table reveals that the index size for both versions is close to 3GB. This is near the 4GB of available memory on the machine running the database and exceeds the 1.5GB allocated by WiredTiger for cache . Therefore, it is likely that the limiting factor in this case is memory/cache rather than document size, which explains the lack of a significant performance improvement. Issues and improvements To address the limitations observed in appV6R0, we propose adopting the same line of improvements applied from appV5R0 to appV5R1. Specifically, we will bucket the events by quarter in appV6R1. This approach not only follows the established pattern of enhancements but also aligns with the need to optimize performance further. As highlighted in the Load Test Results, the current bottleneck lies in the size of the index relative to the available cache/memory. By increasing the bucketing interval from month to quarter, we can reduce the number of documents by approximately a factor of three. This reduction will, in turn, decrease the number of index entries by the same factor, leading to a smaller index size. Application version 6 revision 1 (appV6R1): A dynamic quarter bucket document As discussed in the previous Issues and Improvements section, the primary bottleneck in appV6R0 was the index size nearing the memory capacity of the machine running MongoDB. To mitigate this issue, we propose increasing the bucketing interval from a month to a quarter for appV6R1, following the approach used in appV5R1. This adjustment aims to reduce the number of documents and index entries by approximately a factor of three, thereby decreasing the overall index size. By adopting a quarter-based bucketing strategy, we align with the established pattern of enhancements applied in appV5R1 versions while addressing the specific memory/cache constraints identified in appV6R0. The implementation of appV6R1 retains most of the code from appV6R0, with the following key differences: The _id field will now be composed of key+year+quarter. The field names in the items document will encode both month and day, as this information is necessary for filtering date ranges in the Get Reports operation. The following example demonstrates how data for June 2022 (2022-06-XX), within the second quarter (Q2), is stored using the new schema: const document = { _id: Buffer.from("...01202202"), items: { "0605": { a: 10, n: 3 }, "0616": { p: 1, r: 1 }, "0627": { a: 5, r: 1 }, "0629": { p: 1 }, }, }; Schema The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0: export type SchemaV6R0 = { _id: Buffer; items: Record< string, { a?: number; n?: number; p?: number; r?: number; } >; }; Bulk upsert Based on the specification presented, we have the following updateOne operation for each event generated by this application version: const MMDD = getMMDD(event.date); // Extract the month (MM) and day(DD) from the `event.date` const operation = { updateOne: { filter: { _id: buildId(event.key, event.date) }, // key + year + quarter update: { $inc: { [`items.${MMDD}.a`]: event.approved, [`items.${MMDD}.n`]: event.noFunds, [`items.${MMDD}.p`]: event.pending, [`items.${MMDD}.r`]: event.rejected, }, }, upsert: true, }, }; This updateOne operation has a similar logic to the one in appV6R0, with the only differences being the filter and update criteria. filter: Target the document where the _id field matches the concatenated value of key, year, and quarter. The buildId function converts the key+year+quarter into a binary format. update: Uses the $inc operator to increment the fields corresponding to the same MMDD as the event by the status values provided. Get reports To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval. Each pipeline follows the same structure, differing only in the filtering criteria in the $match stage: const pipeline = [ { $match: docsFromKeyBetweenDate }, { $addFields: buildTotalsField }, { $group: groupSumTotals }, { $project: { _id: 0 } }, ]; This aggregation operation has a similar logic to the one in appV6R0, with the only differences being the implementation in the $addFields stage. { $addFields: itemsReduceAccumulator }: A similar implementation to the one in appV6R0 The difference relies on extracting the value of year (YYYY) from the _id field and the month and day (MMDD) from the field name. The following JavaScript code is logic equivalent to the real aggregation pipeline code. const [YYYY] = _id.slice(-6, -2).toString(); // Get year from _id const items_array = Object.entries(items); // Convert the object to an array of [key, value] const totals = items_array.reduce( (accumulator, [MMDD, status]) => { let [MM, DD] = [MMDD.slice(0, 2), MMDD.slice(2, 4)]; let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if (statusDate >= reportStartDate && statusDate < reportEndDate) { accumulator.a += status.a || 0; accumulator.n += status.n || 0; accumulator.p += status.p || 0; accumulator.r += status.r || 0; } return accumulator; }, { a: 0, n: 0, p: 0, r: 0 } ); Indexes No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation. Initial scenario statistics Collection statistics To evaluate the performance of appV6R1, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Documents Data Size Document Size Storage Size Indexes Index Size appV5R3 33,429,492 11.96GB 385B 3.24GB 1 1.11GB appV6R0 95,350,319 11.1GB 125B 3.33GB 1 3.13GB appV6R1 33,429,366 8.19GB 264B 2.34GB 1 1.22GB Event statistics To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total data size and index size by the 500 million events. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Data Size/Events Index Size/Events Total Size/Events appV5R3 25.7B 2.4B 28.1B appV6R0 23.8B 6.7B 30.5B appV6R1 17.6B 2.6B 20.2B In the previous Initial Scenario Statistics analysis, we assumed that document size would scale linearly with the bucketing range. However, this assumption proved inaccurate. The average document size in appV6R1 is approximately twice as large as in appV6R0, even though it stores three times more data. Already a win for this new implementation. Since appV6R1 buckets data by quarter at the document level and by day within the items sub-document, a fair comparison would be with appV5R3, the best-performing version so far. From the tables above, we observe a significant improvement in Document Size and consequently Data Size when transitioning from appV5R3 to appV6R1. Specifically, there was a 31.4% reduction in Document Size. From an index size perspective, there was no change, as both versions bucket events by quarter. Load test results Executing the load test for appV6R0 and plotting it alongside the results for appV5R0 and Desired rates, we have the following results for Get Reports and Bulk Upsert. Get Reports rates For the first three-quarters of the test, both versions have similar rate values, but, for the final quarter, appV6R1 has a notable edge over appV5R3. Figure 5. Graph showing the rates of appV5R3 and appV6R1 when executing the load test for Get Reports functionality. appV5R3 has better rates than appV6R1, but without reaching the desired rates. Get Reports latency The two versions exhibit very similar latency performance, with appV6R0 showing slight advantages in the second and third quarters, while appV5R0 is superior in the first and fourth quarters. Figure 6. Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Get Reports functionality. appV5R0 has lower latency than appV6R0. Bulk Upsert rates Both versions have similar rate values, but it can be seen that appV6R0 has a small edge compared to appV5R0. Figure 7. Graph showing the rates of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has better rates than appV5R0, but without reaching the desired rates. Bulk Upsert latency Although both versions have similar latency values for the first quarter of the test, for the final three-quarters, appV6R0 has a clear advantage over appV5R0. Figure 8. Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has lower latency than appV5R0. Performance summary Despite the significant reduction in document and storage size achieved by appV6R0, the performance improvement was not as substantial as expected. This suggests that the bottleneck in the application when bucketing data by month may not be related to disk throughput. Examining the collection stats table reveals that the index size for both versions is close to 3GB. This is near the 4GB of available memory on the machine running the database and exceeds the 1.5GB allocated by WiredTiger for cache . Therefore, it is likely that the limiting factor in this case is memory/cache rather than document size, which explains the lack of a significant performance improvement. Issues and improvements To address the limitations observed in appV6R0, we propose adopting the same line of improvements applied from appV5R0 to appV5R1. Specifically, we will bucket the events by quarter in appV6R1. This approach not only follows the established pattern of enhancements but also aligns with the need to optimize performance further. As highlighted in the Load Test Results, the current bottleneck lies in the size of the index relative to the available cache/memory. By increasing the bucketing interval from month to quarter, we can reduce the number of documents by approximately a factor of three. This reduction will, in turn, decrease the number of index entries by the same factor, leading to a smaller index size. Application version 6 revision 1 (appV6R1): A dynamic quarter bucket document As discussed in the previous Issues and Improvements section, the primary bottleneck in appV6R0 was the index size nearing the memory capacity of the machine running MongoDB. To mitigate this issue, we propose increasing the bucketing interval from a month to a quarter for appV6R1, following the approach used in appV5R1. This adjustment aims to reduce the number of documents and index entries by approximately a factor of three, thereby decreasing the overall index size. By adopting a quarter-based bucketing strategy, we align with the established pattern of enhancements applied in appV5R1 versions while addressing the specific memory/cache constraints identified in appV6R0. The implementation of appV6R1 retains most of the code from appV6R0, with the following key differences: The _id field will now be composed of key+year+quarter. The field names in the items document will encode both month and day, as this information is necessary for filtering date ranges in the Get Reports operation. The following example demonstrates how data for June 2022 (2022-06-XX), within the second quarter (Q2), is stored using the new schema: const document = { _id: Buffer.from("...01202202"), items: { "0605": { a: 10, n: 3 }, "0616": { p: 1, r: 1 }, "0627": { a: 5, r: 1 }, "0629": { p: 1 }, }, }; Schema The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0: export type SchemaV6R0 = { _id: Buffer; items: Record< string, { a?: number; n?: number; p?: number; r?: number; } >; }; Bulk upsert Based on the specification presented, we have the following updateOne operation for each event generated by this application version: const MMDD = getMMDD(event.date); // Extract the month (MM) and day(DD) from the `event.date` const operation = { updateOne: { filter: { _id: buildId(event.key, event.date) }, // key + year + quarter update: { $inc: { [`items.${MMDD}.a`]: event.approved, [`items.${MMDD}.n`]: event.noFunds, [`items.${MMDD}.p`]: event.pending, [`items.${MMDD}.r`]: event.rejected, }, }, upsert: true, }, }; This updateOne operation has a similar logic to the one in appV6R0, with the only differences being the filter and update criteria. filter: Target the document where the _id field matches the concatenated value of key, year, and quarter. The buildId function converts the key+year+quarter into a binary format. update: Uses the $inc operator to increment the fields corresponding to the same MMDD as the event by the status values provided. Get reports To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval. Each pipeline follows the same structure, differing only in the filtering criteria in the $match stage: const pipeline = [ { $match: docsFromKeyBetweenDate }, { $addFields: buildTotalsField }, { $group: groupSumTotals }, { $project: { _id: 0 } }, ]; This aggregation operation has a similar logic to the one in appV6R0, with the only differences being the implementation in the $addFields stage. { $addFields: itemsReduceAccumulator }: A similar implementation to the one in appV6R0 The difference relies on extracting the value of year (YYYY) from the _id field and the month and day (MMDD) from the field name. The following JavaScript code is logic equivalent to the real aggregation pipeline code. const [YYYY] = _id.slice(-6, -2).toString(); // Get year from _id const items_array = Object.entries(items); // Convert the object to an array of [key, value] const totals = items_array.reduce( (accumulator, [MMDD, status]) => { let [MM, DD] = [MMDD.slice(0, 2), MMDD.slice(2, 4)]; let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if (statusDate >= reportStartDate && statusDate < reportEndDate) { accumulator.a += status.a || 0; accumulator.n += status.n || 0; accumulator.p += status.p || 0; accumulator.r += status.r || 0; } return accumulator; }, { a: 0, n: 0, p: 0, r: 0 } ); Indexes No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation. Initial scenario statistics Collection statistics To evaluate the performance of appV6R1, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Documents Data Size Document Size Storage Size Indexes Index Size appV5R3 33,429,492 11.96GB 11.96GB 3.24GB 1 1.11GB appV6R1 33,429,366 8.19GB 264B 2.34GB 1 1.22GB appV6R2 33,429,207 9.11GB 293B 2.8GB 1 1.26GB Event statistics To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total data size and index size by the 500 million events. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Data Size/Events Index Size/Events Total Size/Events appV5R3 25.7B 2.4B 28.1B appV6R1 17.6B 2.6B 20.2B appV6R2 19.6B 2.7B 22.3B As expected, we had an 11.2% increase in the Document Size by adding a totals field in each document of appV6R2. When comparing to appV5R3, we still have a reduction of 23.9% in the Document Size. Let's review the Load Test Results to see if the trade-off between storage and computation cost is worthwhile. Load test results Executing the load test for appV6R2 and plotting it alongside the results for appV6R1 and Desired rates, we have the following results for Get Reports and Bulk Upsert. Get Reports rates We can see that appV6R2 has better rates than appV6R1 throughout the test, but it’s still not reaching the top rate of 250 reports per second. Figure 9. Graph showing the rates of appV6R1 and appV6R2 when executing the load test for Get Reports functionality. appV6R2 has better rates than appV6R1, but without reaching the desired rates. Get Reports latency As shown in the rates graph, appV6R2 consistently provides lower latency than appV6R1 throughout the test. Figure 10. Graph showing the latency of appV6R1 and appV6R2 when executing the load test for Get Reports functionality. appV6R2 has lower latency than appV6R1. Bulk Upsert rates Both versions exhibit very similar rate values throughout the test, with appV6R2 performing slightly better than appV6R1 in the final 20 minutes, yet still failing to reach the desired rate. Figure 11. Graph showing the rates of appV6R1 and appV6R2 when executing the load test for Bulk Upsert functionality. appV6R2 has better rates than appV6R1, almost reaching the desired rates. Bulk Upsert latency Although appV6R2 had better rate values than appV6R1, their latency performance is not conclusive, with appV6R2 being superior in the first and final quarters and appV6R1 in the second and third quarters. Figure 12. Graph showing the latency of appV6R1 and appV6R2 when executing the load test for Bulk Upsert functionality. Both versions have similar latencies. Performance summary The two "maybes" from the previous Issues and Improvements made up for their promises, and we got the best performance for appV6R2 when comparing to appV6R1. This is the redemption of the Computed Pattern applied on a document level. This revision is one of my favorites because it shows that the same optimization on very similar applications can lead to different results. In our case, the difference was caused by the application being very bottlenecked by the disk throughput. Issues and improvements Let's tackle the last improvement on an application level. Those paying close attention to the application versions may have already questioned it. In every Get Reports section, we have "To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval." Do we really need to run five aggregation pipelines to generate the reports document? Isn't there a way to calculate everything in just one operation? The answer is yes, there is. The reports documents are composed of fields oneYear, threeYears, fiveYears, sevenYears, and tenYears, where each one was generated by its respective aggregation pipeline until now. Generating the reports this way is a waste of processing power because we are doing some part of the calculation multiple times. For example, to calculate the status totals for tenYears, we will also have to calculate the status totals for the other fields, as from a date range perspective, they are all contained in the tenYears date range. So, for our next application revision, we'll condense the Get Reports five aggregation pipelines into one, avoiding wasting processing power on repeated calculation. Application version 6 revision 3 (appV6R3): Getting everything at once As discussed in the previous Issues and Improvements section, in this revision, we'll improve the performance of our application by changing the Get Reports functionality to generate the reports document using only one aggregation pipeline instead of five. The rationale behind this improvement is that when we generate the tenYears totals, we have also calculated the other totals, oneYear, threeYears, fiveYears, and sevenYears. As an example, when we request to Get Reports with the key ...0001 with the date 2022-01-01, the totals will be calculated with the following date range: oneYear: from 2021-01-01 to 2022-01-01 threeYears: from 2020-01-01 to 2022-01-01 fiveYears: from 2018-01-01 to 2022-01-01 sevenYears: from 2016-01-01 to 2022-01-01 tenYear: from 2013-01-01 to 2022-01-01 As we can see from the list above, the date range for tenYears encompasses all the other date ranges. Although we successfully implemented the Computed Pattern in the previous revision, appV6R2, achieving better results than appV6R1, we will not use it as a base for this revision. There were two reasons for that: Based on the results of our previous implementation of the Computed Pattern on a document level, from appV5R3 to appV5R4, I didn't expect it to get better results. Implementing Get Reports to retrieve the reports document through a single aggregation pipeline, utilizing pre-computed field totals generated by the Computed Pattern would require significant effort. By the time of the latest versions of this series, I just wanted to finish it. So, this revision will be built based on the appV6R1. Schema The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0: export type SchemaV6R0 = { _id: Buffer; items: Record< string, { a?: number; n?: number; p?: number; r?: number; } >; }; Bulk upsert Based on the specifications, the following bulk updateOne operation is used for each event generated by the application: const YYYYMMDD = getYYYYMMDD(event.date); // Extract the year(YYYY), month(MM), and day(DD) from the `event.date` const operation = { updateOne: { filter: { _id: buildId(event.key, event.date) }, // key + year + quarter update: { $inc: { [`items.${YYYYMMDD}.a`]: event.approved, [`items.${YYYYMMDD}.n`]: event.noFunds, [`items.${YYYYMMDD}.p`]: event.pending, [`items.${YYYYMMDD}.r`]: event.rejected, }, }, upsert: true, }, }; This updateOne has almost exactly the same logic as the one for appV6R1. The difference is that the name of the fields in the items document will be created based on year, month, and day (YYYYMMDD) instead of just month and day (MMDD). This change was made to reduce the complexity of the aggregation pipeline of the Get Reports. Get reports To fulfill the Get Reports operation, one aggregation pipeline is required: const pipeline = [ { $match: docsFromKeyBetweenDate }, { $addFields: buildTotalsField }, { $group: groupCountTotals }, { $project: format }, ]; This aggregation operation has a similar logic to the one in appV6R1, with the only differences being the implementation in the $addFields stage. { $addFields: buildTotalsField } It follows a similar logic to the previous revision, where we first convert the items document into an array using $objectToArray, and then use the reduce function to iterate over the array, accumulating the status. The difference lies in the initial value and the logic of the reduce function. The initial value in this case is an object/document with one field for each of the report date ranges. These fields for each report date range are also an object/document, with their fields being the possible status set to zero, as this is the initial value. The logic in this case checks the date range of the item and increments the totals accordingly. If the item isInOneYearDateRange(...), it is also in all the other date ranges: three, five, seven, and 10 years. If the item isInThreeYearsDateRange(...), it is also in all the other wide date ranges, five, seven, and 10 years. The following JavaScript code is logic equivalent to the real aggregation pipeline code. Senior developers could make the argument that this implementation could be less verbose or more optimized. However, due to how MongoDB aggregation pipeline operators are specified, this is how it was implemented. const itemsArray = Object.entries(items); // Convert the object to an array of [key, value] const totals = itemsArray.reduce( (totals, [YYYYMMDD, status]) => { const [YYYY] = YYYYMMDD.slice(0, 4).toString(); // Get year const [MM] = YYYYMMDD.slice(4, 6).toString(); // Get month const [DD] = YYYYMMDD.slice(6, 8).toString(); // Get day let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if isInOneYearDateRange(statusDate) { totals.oneYear = incrementTotals(totals.oneYear, status); totals.threeYears = incrementTotals(totals.threeYears, status); totals.fiveYears = incrementTotals(totals.fiveYears, status); totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInThreeYearsDateRange(statusDate) { totals.threeYears = incrementTotals(totals.threeYears, status); totals.fiveYears = incrementTotals(totals.fiveYears, status); totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInFiveYearsDateRange(statusDate) { totals.fiveYears = incrementTotals(totals.fiveYears, status); totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInSevenYearsDateRange(statusDate) { totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInTenYearsDateRange(statusDate) { totals.tenYears = incrementTotals(totals.tenYears, status); } return totals; }, { oneYear: { a: 0, n: 0, p: 0, r: 0 }, threeYears: { a: 0, n: 0, p: 0, r: 0 }, fiveYears: { a: 0, n: 0, p: 0, r: 0 }, sevenYears: { a: 0, n: 0, p: 0, r: 0 }, tenYears: { a: 0, n: 0, p: 0, r: 0 }, }, ); Indexes No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation. Initial scenario statistics Collection statistics To evaluate the performance of appV6R3, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Documents Data Size Document Size Storage Size Indexes Index Size appV6R1 33,429,366 8.19GB 264B 2.34GB 1 1.22GB appV6R2 33,429,207 9.11GB 293B 2.8GB 1 1.26GB appV6R3 33,429,694 9.53GB 307B 2.56GB 1 1.19GB Event statistics To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total data size and index size by the 500 million events. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Data Size/Events Index Size/Events Total Size/Events appV6R1 17.6B 2.6B 20.2B appV6R2 19.6B 2.7B 22.3B appV6R3 20.5B 2.6B 23.1B Because we are adding the year (YYYY) information in the name of each items document field, we got a 16.3% increase in storage size when compared to appV6R1 and a 4.8% increase in storage size when compared to appV6R2. This increase in storage size may be compensated by the gains in the Get Reports function, as we saw when going from appV6R1 to appV6R2. Load test results Executing the load test for appV6R3 and plotting it alongside the results for appV6R2, we have the following results for Get Reports and Bulk Upsert. Get Reports rate We achieved a significant improvement by transitioning from appV6R2 to appV6R3. For the first time, the application successfully reached all the desired rates in a single phase. Figure 13. Graph showing the rates of appV6R2 and appV6R3 when executing the load test for Get Reports functionality. appV6R3 has better rates than appV6R2, but without reaching the desired rates. Get Reports latency The latency saw significant improvements, with the peak value reduced by 71% in the first phase, 67% in the second phase, 47% in the third phase, and 30% in the fourth phase. Figure 14. Graph showing the latency of appV6R2 and appV6R3 when executing the load test for Get Reports functionality. appV6R3 has lower latency than appV6R2. Bulk Upsert rate As had happened in the previous version, the application was able to reach all the desired rates. Figure 15. Graph showing the rates of appV6R2 and appV6R3 when executing the load test for Bulk Upsert functionality. appV6R3 has better rates than appV6R2, and reaches the desired rates. Bulk Upsert latency Here, we have one of the most significant gains in this series: The latency has decreased from seconds to milliseconds. We went from a peak of 1.8 seconds to 250ms in the first phase, from 2.3 seconds to 400ms in the second phase, from 2 seconds to 600ms in the third phase, and from 2.2 seconds to 800ms in the fourth phase. Figure 16. Graph showing the latency of appV6R2 and appV6R3 when executing the load test for Bulk Upsert functionality. appV6R3 has lower latency than appV6R2. Issues and improvements The main bottleneck in our MongoDB server is still the disk throughput. As mentioned in the previous Issues and Improvements, this was the application-level improvement. How can we further optimize on our current hardware? If we take a closer look at the MongoDB documentation , we'll find out that by default, it uses block compression with the snappy compression library for all collections. Before the data is written to disk, it'll be compressed using the snappy library to reduce its size and speed up the writing process. Would it be possible to use a different and more effective compression library to reduce the size of the data even further and, as a consequence, reduce the load on the server's disk? Yes, and in the following application revision, we will use the zstd compression library instead of the default snappy compression library. Application version 6 revision 4 (appV6R4) As discussed in the previous Issues and Improvements section, the performance gains of this version will be provided by changing the algorithm of the collection block compressor . By default, MongoDB uses the snappy , which we will change to zstd to achieve a better compression performance at the expense of more CPU usage. All the schemas, functions, and code from this version are exactly the same as the appV6R3. To create a collection that uses the zstd compression algorithm, the following command can be used. db.createCollection("<collection-name>", { storageEngine: { wiredTiger: { configString: "block_compressor=zstd" } }, }); Schema The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0: export type SchemaV6R0 = { _id: Buffer; items: Record< string, { a?: number; n?: number; p?: number; r?: number; } >; }; Bulk upsert Based on the specifications, the following bulk updateOne operation is used for each event generated by the application: const YYYYMMDD = getYYYYMMDD(event.date); // Extract the year(YYYY), month(MM), and day(DD) from the `event.date` const operation = { updateOne: { filter: { _id: buildId(event.key, event.date) }, // key + year + quarter update: { $inc: { [`items.${YYYYMMDD}.a`]: event.approved, [`items.${YYYYMMDD}.n`]: event.noFunds, [`items.${YYYYMMDD}.p`]: event.pending, [`items.${YYYYMMDD}.r`]: event.rejected, }, }, upsert: true, }, }; This updateOne is exactly the same logic as the one for appV6R3. Get reports Based on the information presented in the Introduction, we have the following aggregation pipeline to generate the reports document. const pipeline = [ { $match: docsFromKeyBetweenDate }, { $addFields: buildTotalsField }, { $group: groupCountTotals }, { $project: format }, ]; This pipeline is exactly the same logic as the one for appV6R3. Indexes No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation. Initial scenario statistics Collection statistics To evaluate the performance of appV6R4, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Documents Data Size Document Size Storage Size Indexes Index size appV6R3 33,429,694 9.53GB 307B 2.56GB 1 1.19GB appV6R4 33,429,372 9.53GB 307B 1.47GB 1 1.34GB Event statistics To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total data size and index size by the 500 million events. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Collection Storage Size/Events Index Size/Events Total Storage Size/Events appV6R3 5.5B 2.6B 8.1B appV6R4 3.2B 2.8B 6.0B Since the application implementation of appV6R4 is the same as appV5R3, the values for Data Size, Document Size, and Index Size remain the same. The difference lies in Storage Size, which represents the Data Size after compression. Going from snappy to zstd decreased the Storage Size a jaw-dropping 43%. Looking at the Event Statistics, there was a reduction of 26% of the storage required to register each event, going from 8.1 bytes to 6 bytes. These considerable reductions in size will probably translate to better performance on this version, as our main bottleneck is disk throughput. Load test results Executing the load test for appV6R4 and plotting it alongside the results for appV6R3, we have the following results for Get Reports and Bulk Upsert. Get Reports rate Although we didn't achieve all the desired rates, we saw a significant improvement from appV6R3 to appV6R4. This revision allowed us to reach the desired rates in the first, second, and third quarters. Figure 17. Graph showing the rates of appV6R3 and appV6R4 when executing the load test for Get Reports functionality. appV6R4 has better rates than appV6R3, but without reaching the desired rates. Get Reports latency The latency also saw significant improvements, with the peak value reduced by 30% in the first phase, 57% in the second phase, 61% in the third phase, and 57% in the fourth phase. Figure 18. Graph showing the latency of appV6R3 and appV6R4 when executing the load test for Get Reports functionality. appV6R4 has lower latency than appV6R3. Bulk Upsert rate As had happened in the previous version, the application was able to reach all the desired rates. Figure 19. Graph showing the rates of appV6R3 and appV6R4 when executing the load test for Bulk Upsert functionality. Both versions reach the desired rates. Bulk Upsert latency Here, we also achieved considerable improvements, with the peak value being reduced by 48% in the first phase, 39% in the second phase, 43% in the third phase, and 47% in the fourth phase. Figure 20. Graph showing the latency of appV6R3 and appV6R4 when executing the load test for Bulk Upsert functionality. appV6R4 has lower latency than appV6R3. Issues and improvements Although this is the final version of the series, there is still room for improvement. For those willing to try them by themselves, here are the ones that I was able to think of: Use the Computed Pattern in the appV6R4. Optimize the aggregation pipeline logic for Get Reports in the appV6R4. Change the zstd compression level from its default value of 6 to a higher value. Conclusion This final part of "The Cost of Not Knowing MongoDB" series has explored the ultimate evolution of MongoDB application optimization, demonstrating how revolutionary design patterns and infrastructure-level improvements can transcend traditional performance boundaries. The journey through appV6R0 to appV6R4 represents the culmination of sophisticated MongoDB development practices, achieving performance levels that seemed impossible with the baseline appV1 implementation. Series transformation summary From foundation to revolution: The complete series showcases a remarkable transformation across three distinct optimization phases. Part 1 (appV1-appV4): Document-level optimizations achieving 51% storage reduction through schema refinement, data type optimization, and strategic indexing. Part 2 (appV5R0-appV5R4): Advanced pattern implementation with the Bucket and Computed Patterns, delivering 89% index size reduction and first-time achievement of target rates. Part 3 (appV6R0-appV6R4): Revolutionary Dynamic Schema Pattern with infrastructure optimization, culminating in sub-second latencies and comprehensive target rate achievement. Performance evolution: The progression reveals exponential improvements across all metrics. Get Reports latency: From 6.5 seconds (appV1) to 200-800ms (appV6R4)—a 92% improvement. Bulk Upsert latency: From 62 seconds (appV1) to 250-800ms (appV6R4)—a 99% improvement. Storage efficiency: From 128.1B per event (appV1) to 6.0B per event (appV6R4)—a 95% reduction. Target rate achievement: From consistent failures to sustained success across all operational phases. Architectural paradigm shifts The Dynamic Schema Pattern revolution: appV6R0 through appV6R4 introduced the most sophisticated MongoDB design pattern explored in this series. The Dynamic Schema Pattern fundamentally redefined data organization by Eliminating array overhead: Replacing MongoDB arrays with computed object structures to minimize storage and processing costs. Single-pipeline optimization: Consolidating five separate aggregation pipelines into one optimized operation, reducing computational overhead by 80%. Infrastructure-level optimization: Implementing zstd compression, achieving 43% additional storage reduction over default snappy compression. Query optimization breakthroughs: The implementation of intelligent date range calculation within aggregation pipelines eliminated redundant operations while maintaining data accuracy. This approach demonstrates senior-level MongoDB development by leveraging advanced aggregation framework capabilities to achieve both performance and maintainability. Critical technical insights Performance bottleneck evolution: Throughout the series, we observed how optimization focus shifted as bottlenecks were resolved Initial phase: Index size and query inefficiency dominated performance. Intermediate phase: Document retrieval count became the limiting factor. Advanced phase: Aggregation pipeline complexity constrained throughput. Final phase: Disk I/O emerged as the ultimate hardware limitation. Pattern application maturity: The series demonstrates the progression from junior to senior MongoDB development practices Junior level: Schema design without understanding indexing implications (appV1) Intermediate level: Applying individual optimization techniques (appV2-appV4) Advanced level: Implementing established MongoDB patterns (appV5RX) Senior level: Creating custom patterns and infrastructure optimization (appV6RX) Production implementation guidelines When to apply each pattern: Based on the comprehensive analysis, the following guidelines emerge for production implementations Document-level optimizations: Essential for all MongoDB applications, providing 40-60% improvement with minimal complexity Bucket Pattern: Optimal for time-series data with 10:1 or greater read-to-write ratios Computed Pattern: Most effective in read-heavy scenarios with predictable aggregation requirements Dynamic Schema Pattern: Reserved for high-performance applications where development complexity trade-offs are justified Infrastructure considerations: The zstd compression implementation in appV6R4 demonstrates that infrastructure-level optimizations can provide substantial benefits (40%+ storage reduction) with minimal application changes. However, these optimizations require careful CPU utilization monitoring and may not be suitable for CPU-constrained environments. The true cost of not knowing MongoDB This series reveals that the "cost" extends far beyond mere performance degradation: Quantifiable impacts: Resource utilization: Up to 20x more storage requirements for equivalent functionality Infrastructure costs: Potentially 10x higher hardware requirements due to inefficient patterns Developer productivity: Months of optimization work that could be avoided with proper initial design Scalability limitations: Fundamental architectural constraints that become exponentially expensive to resolve Hidden complexities: More critically, the series demonstrates that MongoDB's apparent simplicity can mask sophisticated optimization requirements. The transition from appV1 to appV6R4 required a deep understanding of Aggregation framework internals and optimization strategies. Index behavior with different data types and query patterns. Storage engine compression algorithms and trade-offs. Memory management and cache utilization patterns. Final recommendations For development teams: Invest in MongoDB education: The performance differences documented in this series justify substantial training investments. Establish pattern libraries: Codify successful patterns like those demonstrated to prevent anti-pattern adoption. Implement performance testing: Regular load testing reveals optimization opportunities before they become production issues. Plan for iteration: Schema evolution is inevitable; design systems that accommodate architectural improvements. For architectural decisions: Start with fundamentals: Proper indexing and schema design provide the foundation for all subsequent optimizations. Measure before optimizing: Each optimization phase in this series was guided by comprehensive performance measurement. Consider total cost of ownership: The development complexity of advanced patterns must be weighed against performance requirements. Plan infrastructure scaling: Understanding that hardware limitations will eventually constrain software optimizations. Closing reflection The journey from appV1 to appV6R4 demonstrates that MongoDB mastery requires understanding not just the database itself, but the intricate relationships between schema design, query patterns, indexing strategies, aggregation frameworks, and infrastructure capabilities. The 99% performance improvements documented in this series are achievable, but they demand dedication to continuous learning and sophisticated engineering practices. For organizations serious about MongoDB performance, this series provides both a roadmap for optimization and a compelling case for investing in advanced MongoDB expertise. The cost of not knowing MongoDB extends far beyond individual applications—it impacts entire technology strategies and competitive positioning in data-driven markets. The patterns, techniques, and insights presented throughout this three-part series offer a comprehensive foundation for building high-performance MongoDB applications that can scale efficiently while maintaining operational excellence. Most importantly, they demonstrate that with proper knowledge and application, MongoDB can deliver extraordinary performance that justifies its position as a leading database technology for modern applications. Learn more about MongoDB design patterns ! Check out more posts from Artur Costa .

October 9, 2025

Developer Blog

The 10 Skills I Was Missing as a MongoDB User

When I first started using MongoDB, I didn’t have a plan beyond “install it and hope for the best.” I had read about how flexible it was, and it felt like all the developers swore by it, so I figured I’d give it a shot. I spun it up, built my first application, and got a feature working. But I felt like something was missing. It felt clunky. My queries were longer than I expected, and performance wasn’t great; I had the sense that I was fighting with the database instead of working with it. After a few projects like that, I began to wonder if maybe MongoDB wasn’t for me. Looking back now, I can say the problem wasn’t MongoDB, but was somewhere between the keyboard and the chair. It was me. I was carrying over habits from years of working with relational databases, expecting the same rules to apply. If MongoDB’s Skill Badges had existed when I started, I think my learning curve would have been a lot shorter. I had to learn many lessons the hard way, but these new badges cover the skills I had to piece together slowly. Instead of pretending I nailed it from day one, here’s the honest version of how I learned MongoDB, what tripped me up along the way, and how these Skill Badges would have helped. Learning to model the MongoDB way The first thing I got wrong was data modeling. I built my schema like I was still working in SQL– every entity in its own collection, always referencing instead of embedding, and absolutely no data duplication. It felt safe because it was familiar. Then I hit my first complex query. It required data from various collections, and suddenly, I found myself writing a series of queries and stitching them together in my code. It worked, but it was a messy process. When I discovered embedding, it felt like I had found a cheat code. I could put related data together in one single document, query it in one shot, and get better performance. That’s when I made my second mistake. I started embedding everything. At first, it seemed fine. However, my documents grew huge, updates became slower, and I was duplicating data in ways that created consistency issues. That’s when I learned about patterns like Extended References, and more generally, how to choose between embedding and referencing based on access patterns and update frequency. Later, I ran into more specialized needs, such as pre-computing data, embedding a subset of a large dataset into a parent, and tackling schema versioning. Back then, I learned those patterns by trial and error. Now, they’re covered in badges like Relational to Document Model , Schema Design Patterns , and Advanced Schema Patterns . Fixing what I thought was “just a slow query” Even after I got better at modeling, performance issues kept popping up. One collection in particular started slowing down as it grew, and I thought, “I know what to do! I’ll just add some indexes.” I added them everywhere I thought they might help. Nothing improved. It turns out indexes only help if they match your query patterns. The order of fields matters, and whether you cover your query shapes will affect performance. Most importantly, just because you can add an index doesn’t mean that you should be adding it in the first place. The big shift for me was learning to read an explain() plan and see how MongoDB was actually executing my queries. Once I started matching my indexes to my queries, performance went from “ok” to “blazing fast.” Around the same time, I stopped doing all my data transformation in application code. Before, I’d pull in raw data and loop through it to filter, group, and calculate. It was slow, verbose, and easy to break. Learning the aggregation framework completely changed that. I could handle the filtering and grouping right in the database, which made my code cleaner and the queries faster. There was a lot of guesswork in how I created my indexes, but the new Indexing Design Fundamentals covers that now. And when it comes to querying and analyzing data, Fundamentals of Data Transformation is there to help you. Had I had those two skills when I first started, I would’ve saved a lot of time wasted on trial and error. Moving from “it works” to “it works reliably” Early on, my approach to monitoring was simple: wait for something to break, then figure out why. If a performance went down, I’d poke around in logs. If a server stopped responding, I’d turn it off and on again, and hope for the best. It was stressful, and it meant I was always reacting instead of preventing problems. When I learned to use MongoDB’s monitoring tools properly, that changed. I could track latency, replication lag, and memory usage. I set alerts for unusual query patterns. I started seeing small problems before they turned into outages. Performance troubleshooting became more methodical as well. Instead of guessing, I measured. Breaking down queries, checking index use, and looking at server metrics side by side. The fixes were faster and more precise. Reliability was the last piece I got serious about. I used to think a working cluster was a reliable cluster. But reliability also means knowing what happens if a node fails, how quickly failover kicks in, and whether your recovery plan actually works in practice. Those things you can now learn in the Monitoring Tooling , Performance Tools and Techniques, and Cluster Reliability skill badges. If you are looking at deploying and maintaining MongoDB clusters, these skills will teach you what you need to know to make your deployment more resilient. Getting curious about what’s next Once my clusters were stable, I stopped firefighting, and my mindset changed. When you trust your data model, your indexes, your aggregations, and your operations, you get to relax. You can then spend that time on what’s coming next instead of fixing what’s already in production. For me, that means exploring features I wouldn’t have touched earlier, like Atlas Search , gen AI, and Vector Search . Now that the fundamentals are solid, I can experiment without risking stability and bring in new capabilities when a project actually calls for them. What I’d tell my past self If I could go back to when I first installed MongoDB, I’d keep it simple: Focus on data modeling first. A good foundation will save you from most of the problems I ran into. Once you have that, learn indexing and aggregation pipelines. They will make your life much easier when querying. Start monitoring from day one. It will save you a lot of trouble in the long run. Take a moment to educate yourself. You can only learn so much from trial and error. MongoDB offers a myriad of resources and ways to upskill yourself. Once you have established that base, you can explore more advanced topics and uncover the full potential of MongoDB. Features like Vector Search, full-text search with Atlas Search, or advanced schema design patterns are much easier to adopt when you trust your data model and have confidence in your operational setup. MongoDB Skill Badges cover all of these areas and more. They are short, practical, and focused on solving real problems you will face as a developer or DBA, and most of them can be taken over your lunch break. You can browse the full catalog at learn.mongodb.com/skills and pick the one that matches the challenge you are facing today. Keep going from there, and you might be surprised how much more you can get out of the database once you have the right skills in place.

October 2, 2025

Developer Blog

Top Considerations When Choosing a Hybrid Search Solution

Search has evolved. Today, natural language queries have largely replaced simple keyword searches when addressing our information needs. Instead of typing “Peru travel guide” into a search engine, we now ask a large language model (LLM) “Where should I visit in Peru in December during a 10-day trip? Create a travel guide.” Is keyword search no longer useful? While the rise of LLMs and vector search may suggest that traditional keyword search is becoming less prevalent, the future of search actually relies on effectively combining both methods. This is where hybrid search plays a crucial role, blending the precision of traditional text search with the powerful contextual understanding of vector search. Despite advances in vector technology, keyword search still has a lot to contribute and remains essential to meeting current user expectations. The rise of hybrid search By late 2022 and particularly throughout 2023, as vector search saw a surge in popularity (see image 1 below), it quickly became clear that vector embeddings alone were not enough. Even as embedding models continue to improve at retrieval tasks, full-text search will always remain useful for identifying tokens outside the training corpus of an embedding model. That is why users soon began to combine vector search with lexical search, exploring ways to leverage both precision and context-aware retrieval. This shift was driven in large part by the rise of generative AI use cases like retrieval-augmented generation (RAG), where high-quality retrieval is essential. Figure 1. Number of vector search vendors per year and type. As hybrid search matured beyond basic score combination, the main fusion techniques emerged - reciprocal rank fusion (RRF) and relative score fusion (RSF). They offer ways to combine results that do not rely on directly comparable score scales. RRF focuses on ranking position, rewarding documents that consistently appear near the top across different retrieval methods. RSF, on the other hand, works directly with raw scores from different sources of relevance, using normalization to minimize outliers and align modalities effectively at a more granular level than rank alone can provide. Both approaches quickly gained traction and have become standard techniques in the market. How did the market react? The industry realized the need to introduce hybrid search capabilities, which brought different challenges for different types of players. For lexical-first search platforms, the main challenge was to add vector search features and implement the bridging logic with their existing keyword search infrastructure. These vendors understood that the true value of hybrid search emerges when both modalities are independently strong, customizable, and tightly integrated. On the other hand, vector-first search platforms faced the challenge of adding lexical search. Implementing lexical search through traditional inverted indexes was often too costly due to storage differences, increased query complexity, and architectural overhead. Many adopted sparse vectors, which represent keyword importance in a way similar to traditional term-frequency methods used in lexical search. Sparse vectors were key for vector-first databases in enabling a fast integration of lexical capabilities without overhauling the core architecture. Hybrid search soon became table stakes and the industry focus shifted toward improving developer efficiency and simplifying integration. This led to a growing trend of vendors building native hybrid search functions directly into their platforms. By offering out-of-the-box support to combine and manage both search types, the delivery of powerful search experiences was accelerated. As hybrid search became the new baseline, more sophisticated re-ranking approaches emerged. Techniques like cross-encoders, learning-to-rank models, and dynamic scoring profiles began to play a larger role, providing systems with additional alternatives to capture nuanced user intent. These methods complement hybrid search by refining the result order based on deeper semantic understanding. What to choose? Lexical-first or vector-first solutions? Top considerations when choosing a hybrid search solution When choosing how to implement hybrid search, your existing infrastructure plays a major role in the decision. For users working within a vector-first database, leveraging their lexical capabilities without rethinking the architecture is often enough. However, if the lexical search requirements are advanced, commonly the optimal solution is served with a traditional lexical search solution coupled with vector search, like MongoDB. Traditional lexical - or lexical-first - search offers greater flexibility and customization for keyword search, and when combined with vectors, provides a more powerful and accurate hybrid search experience. Figure 2. Vector-first vs Lexical-first systems: Hybrid search evaluation. Indexing strategy is another factor to consider. When setting up hybrid search, users can either keep keyword and vector data in separate indexes or combine them into one. Separate indexes give more freedom to tweak each search type, scale them differently, and experiment with scoring. The compromise is higher complexity, with two pipelines to manage and the need to normalize scores. On the other hand, a combined index is easier to manage, avoids duplicate pipelines, and can be faster since both searches run in a single pass. However, it limits flexibility to what the search engine supports and ties the scaling of keyword and vector search together. The decision is mainly a trade-off between control and simplicity. Lexical-first solutions were built around inverted indexes for keyword retrieval, with vector search added later as a separate component. This often results in hybrid setups that use separate indexes. Vector-first platforms were designed for dense vector search from the start, with keyword search added as a supporting feature. These tend to use a single index for both approaches, making them simpler to manage but sometimes offering less mature keyword capabilities. Lastly, a key aspect to take into account is the implementation style. Solutions with hybrid search functions handle the combination of lexical and vector search natively, removing the need for developers to manually implement it. This reduces development complexity, minimizes potential errors, and ensures that result merging and ranking are optimized by default. Built-in function support streamlines the entire implementation, allowing teams to focus on building features rather than managing infrastructure. In general, lexical-first systems tend to offer stronger keyword capabilities and more flexibility in tuning each search type, while vector-first systems provide a simpler, more unified hybrid experience. The right choice depends on whether you prioritize control and mature lexical features or streamlined management with lower operational overhead. How does MongoDB do it? When vector search emerged, MongoDB added vector search indexes to the existing traditional lexical search indexes. With that, MongoDB evolved into a competitive vector database by providing developers with a unified architecture for building modern applications. The result is an enterprise-ready platform that integrates traditional lexical search indexes and vector search indexes into the core database. MongoDB recently released native hybrid search functions to MongoDB Atlas and as part of a public preview for use with MongoDB Community Edition and MongoDB Enterprise Server deployments. This feature is part of MongoDB’s integrated ecosystem, where developers get an out-of-the-box hybrid search experience to enhance the accuracy of application search and RAG use cases. As a result, instead of managing separate systems for different workloads, MongoDB users benefit from a single platform designed to support both operational and AI-driven use cases. As generative AI and modern applications advance, MongoDB gives organizations a flexible, AI-ready foundation that grows with them. Read our blog to learn more about MongoDB’s new Hybrid Search function. Visit the MongoDB AI Learning Hub to learn more about building AI applications with MongoDB.

September 30, 2025

Developer Blog

Endian Communication Systems and Information Exchange in Bytes

Imagine two people trying to exchange phone numbers. One starts from the country code and moves to the last digit, while the other begins at the last digit and works backwards. Both are technically right, but unless they agree on the direction, the number will never connect. Computers face a similar challenge when they talk to each other. Deep inside processors, memory chips, and network packets, data is broken into bytes. But not every system agrees on which byte should come first. Some start with the “big end” of the number, while others begin with the “little end.” This simple difference, known as endianness, quietly shapes how data is stored in memory, transmitted across networks, and interpreted by devices. Whether it’s an IoT sensor streaming temperature values, a server processing telecom call records, or a 5G base station handling billions of radio samples, the way bytes are ordered can determine whether the data makes perfect sense—or complete nonsense. What is endianness? An endian system defines the order in which bytes of a multi-byte number are arranged. Big-endian: The most significant byte (MSB) comes first, stored at the lowest address. Little-endian: The least significant byte (LSB) comes first, stored at the lowest address. For example, the number 0x12345678 would be arranged as: Big-endian → 12 34 56 78 Little-endian → 78 56 34 12 While this looks simple, the implications are huge. If one system sends data in little-endian while another expects big-endian, the values may be misread entirely. To avoid this, networking standards like IP, TCP, and UDP enforce big-endian (network byte order) as the universal convention. Industries where endianness shapes communication From the cell tower to the car dashboard, from IoT devices in our homes to high-speed trading systems, endianness is the silent agreement that keeps industries speaking the same digital language. Endianness may sound like a low-level detail, but it silently drives reliable communication across industries. In telecommunications and 5G, standards mandate big-endian formats so routers, servers, and base stations interpret control messages and packet headers consistently. IoT devices and embedded systems also depend on fixed byte order—sensors streaming temperature, pressure, or GPS data must follow a convention so cloud platforms decode values accurately. The automotive sector is another example: dozens of ECUs from different suppliers must agree on byte order to ensure that speed sensors, braking systems, and infotainment units share correct data. In finance and high-frequency trading, binary protocols demand strict endian rules—any mismatch could distort price feeds or disrupt trades. And in aerospace and defense, radar DSPs, avionics systems, and satellites require exact endian handling to process mission-critical data streams. Across all these domains, endian consistency acts as an invisible handshake, ensuring that machines with different architectures can still speak the same digital language. Use case architecture: From endian to analytics Figure 1. Architecture Diagram for the flow of data. The diagram above illustrates how low-level endian data from IoT devices can be transformed into high-value insights using a modern data pipeline. IoT devices (data sources): Multiple IoT devices (e.g., sensors measuring temperature, vibration, or pressure) generate raw binary data. To remain efficient and consistent, these devices often transmit data in a specific endian format (commonly big-endian). However, not all receiving systems use the same convention, which can lead to misinterpretation if left unhandled. Endian converter: The first processing step ensures that byte ordering is normalized. The endian converter translates raw payloads into a consistent format that downstream systems can understand. Without this step, a simple reading like 25.10°C could be misread as 52745°C—a critical error for industries like telecom or automotive. Apache Kafka (data transport layer): Once normalized, the data flows into Apache Kafka, a distributed streaming platform. Kafka ensures reliability, scalability, and low latency, allowing thousands of IoT devices to stream data simultaneously. It acts as a buffer and transport mechanism, ensuring smooth handoff between ingestion and storage. Atlas Stream Processing (real-time processing): Inside the MongoDB ecosystem, the Atlas Stream Processor consumes Kafka topics and enriches the data. Here, additional transformations, filtering, or business logic can be applied—such as tagging sensor IDs, flagging anomalies, or aggregating multiple streams into one coherent dataset. MongoDB Atlas (storage layer): Processed records are stored in MongoDB Atlas, which provides a flexible, document-oriented database model. This is especially valuable for IoT, where payloads may vary in structure depending on the device. MongoDB’s time-series collections ensure efficient handling of timestamped sensor readings at scale. Analytics & visualization : Finally, the clean, structured data becomes available for analytics tools like Tableau. Business users and engineers can visualize patterns, track equipment health, or perform predictive maintenance, turning low-level binary signals into actionable business intelligence. Endianness may seem like an obscure technicality buried deep inside processors and protocols, but in reality, it is the foundation of digital trust. Without a shared agreement on how bytes are ordered, the vast networks of IoT devices, telecom systems, cars, satellites, and financial platforms would quickly collapse into chaos. What makes this powerful is not just the correction of byte order, but what happens after. With pipelines that normalize, stream, and store data—like the one combining Endian conversion, Kafka, MongoDB Atlas, and Tableau—raw binary signals are elevated into business-ready insights. A vibration sensor’s byte sequence becomes an early-warning alert for machine failure; a packet header’s alignment ensures 5G base stations stay synchronized; a GPS reading, once correctly interpreted, guides a connected car safely on its route. In short, endianness is the invisible handshake between machines. When paired with modern data infrastructure, it transforms silent signals into meaningful stories—bridging the gap between the language of bytes and the language of decisions. To learn more, please check out the video of the prototype I have created. Boost your MongoDB skills by visiting the MongoDB Atlas Learning Hub .

September 25, 2025

Developer Blog

Build AI Agents Worth Keeping: The Canvas Framework

Why 95% of enterprise AI agent projects fail Development teams across enterprises are stuck in the same cycle: They start with "Let's try LangChain" before figuring out what agent to build. They explore CrewAI without defining the use case. They implement RAG before identifying what knowledge the agent actually needs. Months later, they have an impressive technical demo showcasing multi-agent orchestration and tool calling—but can't articulate ROI or explain how it solves actual business needs. According to McKinsey's latest research, while nearly eight in 10 companies report using generative AI, fewer than 10% of use cases deployed ever make it past the pilot stage . MIT researchers studying this challenge identified a " gen AI divide "—a gap between organizations successfully deploying AI and those stuck in perpetual pilots. In their sample of 52 organizations, researchers found patterns suggesting failure rates as high as 95% (pg.3). Whether the true failure rate is 50% or 95%, the pattern is clear: Organizations lack clear starting points, initiatives stall after pilot phases, and most custom enterprise tools fail to reach production. 6 critical failures killing your AI agent projects The gap between agentic AI's promise and its reality is stark. Understanding these failure patterns is the first step toward building systems that actually work. 1. The technology-first trap MIT's research found that while 60% of organizations evaluated enterprise AI tools, only 5% reached production (pg.6)—a clear sign that businesses struggle to move from exploration to execution. Teams rush to implement frameworks before defining business problems. While most organizations have moved beyond ad hoc approaches ( down from 19% to 6% , according to IBM), they've replaced chaos with structured complexity that still misses the mark. Meanwhile, one in four companies taking a true "AI-first" approach—starting with business problems rather than technical capabilities—report transformative results. The difference has less to do with technical sophistication and more about strategic clarity. 2. The capability reality gap Carnegie Mellon's TheAgentCompany benchmark exposed the uncomfortable truth: Even our best AI agents would make terrible employees . The best AI model (Claude 3.5 Sonnet) completes only 24% of office tasks , with 34.4% success when given partial credit . Agents struggle with basic obstacles, such as pop-up windows, which humans navigate instinctively. More concerning, when faced with challenges, some agents resort to deception , like renaming existing users instead of admitting they can't find the right person. These issues demonstrate fundamental reasoning gaps that make autonomous deployment dangerous in real business environments, rather than just technical limitations. 3. Leadership vacuum The disconnect is glaring: Fewer than 30% of companies report CEO sponsorship of the AI agenda despite 70% of executives saying agentic AI is important to their future . This leadership vacuum creates cascading failures—AI initiatives fragment into departmental experiments, lack authority to drive organizational change, and can't break through silos to access necessary resources. Contrast this with Moderna, where CEO buy-in drove the deployment of 750+ AI agents and radical restructuring of HR and IT departments. As with the early waves of Big Data, data science, then machine learning adoption, leadership buy-in is the deciding factor for the survival of generative AI initiatives. 4. Security and governance barriers Organizations are paralyzed by a governance paradox: 92% believe governance is essential, but only 44% have policies (SailPoint, 2025). The result is predictable—80% experienced AI acting outside intended boundaries, with top concerns including privileged data access (60%), unintended actions (58%), and sharing privileged data (57%). Without clear ethical guidelines, audit trails, and compliance frameworks, even successful pilots can't move to production. 5. Infrastructure chaos The infrastructure gap creates a domino effect of failures. While 82% of organizations already use AI agents, 49% cite data concerns as primary adoption barriers (IBM). Data remains fragmented across systems, making it impossible to provide agents with complete context. Teams end up managing multiple databases—one for operational data, another for vector data and workloads, a third for conversation memory—each with different APIs and scaling characteristics. This complexity kills momentum before agents can actually prove value. 6. The ROI mirage The optimism-reality gap is staggering. Nearly 80% of companies report no material earnings impact from gen AI (McKinsey), while 62% expect 100%+ ROI from deployment (PagerDuty). Companies measure activity (number of agents deployed) rather than outcomes (business value created). Without clear success metrics defined upfront, even successful implementations look like expensive experiments. The AI development paradigm shift: from data-first to product-first There's been a fundamental shift in how successful teams approach agentic AI development, and it mirrors what Shawn Wang (Swyx) observed in his influential " Rise of the AI Engineer " post about the broader generative AI space. The old way: data → model → product In the traditional paradigm practiced during the early years of machine learning, teams would spend months architecting datasets, labeling training data, and preparing for model pre-training. Only after training custom models from scratch could they finally incorporate these into product features. The trade-offs were severe: massive upfront investment, long development cycles, high computational costs, and brittle models with narrow capabilities. This sequential process created high barriers to entry—only organizations with substantial ML expertise and resources could deploy AI features. Figure 1. The Data → Model → Product Lifecycle. Traditional AI development required months of data preparation and model training before shipping products. The new way: product → data → model The emergence of foundation models changed everything. Figure 2. The Product → Data → Model Lifecycle. Foundation model APIs flipped the traditional cycle, enabling rapid experimentation before data and model optimization. Powerful LLMs became commoditized through providers like OpenAI and Anthropic. Now, teams could: Start with the product vision and customer need. Identify what data would enhance it (examples, knowledge bases, RAG content). Select the appropriate model that could process that data effectively. This enabled zero-shot and few-shot capabilities via simple API calls. Teams could build MVPs in days, define their data requirements based on actual use cases, then select and swap models based on performance needs. Developers now ship experiments quickly, gather insights to improve data (for RAG and evaluation), then fine-tune only when necessary. This democratized cutting-edge AI to all developers, not just those with specialized ML backgrounds. The agentic evolution: product → agent → data → model But for agentic systems, there's an even more important insight: Agent design sits between product and data. Figure 3. The Product → Agent → Data → Model Lifecycle. Agent design now sits between product and data, determining downstream requirements for knowledge, tools, and model selection. Now, teams follow this progression: Product: Define the user problem and success metrics. Agent: Design agent capabilities, workflows, and behaviors. Data: Determine what knowledge, examples, and context the agent needs. Model: Select external providers and optimize prompts for your data. With external model providers, the "model" phase is really about selection and integration rather than deployment. Teams choose which provider's models best handle their data and use case, then build the orchestration layer to manage API calls, handle failures, and optimize costs. The agent layer shapes everything downstream—determining what data is needed (knowledge bases, examples, feedback loops), what tools are required (search, calculation, code execution), and ultimately, which external models can execute the design effectively. This evolution means teams can start with a clear user problem, design an agent to solve it, identify necessary data, and then select appropriate models—rather than starting with data and hoping to find a use case. This is why the canvas framework follows this exact flow. The canvas framework: A systematic approach to building AI agents Rather than jumping straight into technical implementation, successful teams use structured planning frameworks. Think of them as "business model canvases for AI agents"—tools that help teams think through critical decisions in the right order. Two complementary frameworks directly address the common failure patterns: Figure 4. The Agentic AI Canvas Framework. A structured five-phase approach moving from business problem definition through POC, prototype, production canvas, and production agent deployment. Please see the “Resources” section at the end for links to the corresponding templates, hosted in the gen AI Showcase. Canvas #1 - The POC canvas for validating your agent idea The POC canvas implements the product → agent → data → model flow through eight focused squares designed for rapid validation: Figure 5. The Agent POC Canvas V1. Eight focused squares implementing the product → agent → data → model flow for rapid validation of AI agent concepts. Phase 1: Product validation—who needs this and why? Before building anything, you must validate that a real problem exists and that users actually want an AI agent solution. This phase prevents the common mistake of building impressive technology that nobody needs. If you can't clearly articulate who will use this and why they'll prefer it to current methods, stop here. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Product vision & user problem Define the business problem and establish why an agent is the right solution. Core problem: What specific workflow frustrates users today? Target users: Who experiences this pain and how often? Success vision: What would success look like for users? Value hypothesis: Why would users prefer an agent to current solutions? User validation & interaction User Validation & Interaction Map how users will engage with the agent and identify adoption barriers. User journey: What's the complete interaction from start to finish? Interface preference: How do users want to interact? Feedback mechanisms: How will you know it's working? Adoption barriers: What might prevent users from trying it? Phase 2: Agent design—what will it do and how? With a validated problem, design the agent's capabilities and behavior to solve that specific need. This phase defines the agent's boundaries, decision-making logic, and interaction style before any technical implementation. The agent design directly determines what data and models you'll need, making this the critical bridge between problem and solution. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Agent capabilities & workflow Agent Capabilities & Workflow Design what the agent must do to solve the identified problem. Core tasks: What specific actions must the agent perform? Decision logic: How should complex requests be broken down? Tool requirements: What capabilities does the agent need? Autonomy boundaries: What can it decide versus escalate? Agent interaction & memory Agent Interaction & Memory Establish communication style and context management. Conversation flow: How should the agent guide interactions? Personality and tone: What style fits the use case? Memory requirements: What context must persist? Error handling: How should confusion be managed? Phase 3: Data requirements—what knowledge does it need? Agents are only as good as their knowledge base, so identify exactly what information the agent needs to complete its tasks. This phase maps existing data sources and gaps before selecting models, ensuring you don't choose technology that can't handle your data reality. Understanding data requirements upfront prevents the costly mistake of selecting models that can't work with your actual information. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Knowledge requirements & sources Identify essential information and where to find it. Essential knowledge: What information must the agent have to complete tasks? Data sources: Where does this knowledge currently exist? Update frequency: How often does this information change? Quality requirements: What accuracy level is needed? Data collection & enhancement strategy Plan data gathering and continuous improvement. Collection strategy: How will initial data be gathered? Enhancement priority: What data has the biggest impact? Feedback loops: How will interactions improve the data? Integration method: How will data be ingested and updated? Phase 4: External model integration—which provider and how? Only after defining data needs should you select external model providers and build the integration layer. This phase tests whether available models can handle your specific data and use case while staying within budget. The focus is on prompt engineering and API orchestration rather than model deployment, reflecting how modern AI agents actually get built. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Provider selection & prompt engineering Choose external models and optimize for your use case. Provider evaluation: Which models handle your requirements best? Prompt strategy: How should you structure requests for optimal results? Context management: How should you work within token limits? Cost validation: Is this economically viable at scale? API integration & validation Build orchestration and validate performance. Integration architecture: How do you connect to providers? Response processing: How do you handle outputs? Performance testing: Does it meet requirements? Production readiness: What needs hardening? Figure 6. The Agent POC Canvas V1 (Detailed). Expanded view with specific guidance for each of the eight squares covering product validation, agent design, data requirements, and external model integration. Unified data architecture: solving the infrastructure chaos Remember the infrastructure problem—teams managing three separate databases with different APIs and scaling characteristics? This is where a unified data platform becomes critical. Agents need three types of data storage: Application database: For business data, user profiles, and transaction history Vector store: For semantic search, knowledge retrieval, and RAG Memory store: For agent context, conversation history, and learned behaviors Instead of juggling multiple systems, teams can use a unified platform like MongoDB Atlas that provides all three capabilities—flexible document storage for application data, native vector search for semantic retrieval, and rich querying for memory management—all in a single platform. This unified approach means teams can focus on prompt engineering and orchestration rather than model infrastructure, while maintaining the flexibility to evolve their data model as requirements become clearer. The data platform handles the complexity while you optimize how external models interact with your knowledge. For embeddings and search relevance, specialized models like Voyage AI can provide domain-specific understanding, particularly for technical documentation where general-purpose embeddings fall short. The combination of unified data architecture with specialized embedding models addresses the infrastructure chaos that kills projects. This unified approach means teams can focus on agent logic rather than database management, while maintaining the flexibility to evolve their data model as requirements become clearer. Canvas #2 - The production canvas for scaling your validated AI agent When a POC succeeds, the production canvas guides the transition from "it works" to "it works at scale" through 11 squares organized following the same product → agent → data → model flow, with additional operational concerns: Figure 7. The Productionize Agent Canvas V1. Eleven squares guiding the transition from validated POC to production-ready systems, addressing scale, architecture, operations, and governance. Phase 1: Product and scale planning Transform POC learnings into concrete business metrics and scale requirements for production deployment. This phase establishes the economic case for investment and defines what success looks like at scale. Without clear KPIs and growth projections, production systems become expensive experiments rather than business assets. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Business case & scale planning Translate POC validation into production metrics. Proven value: What did the POC validate? Business KPIs: What metrics measure ongoing success? Scale requirements: How many users and interactions? Growth strategy: How will usage expand over time? Production requirements & constraints Define performance standards and operational boundaries. Performance standards: Response time, availability, throughput? Reliability requirements: Recovery time and failover? Budget constraints: Cost limits and optimization targets? Security needs: Compliance and data protection requirements? Phase 2: Agent architecture Design robust systems that handle complex workflows, multiple agents, and inevitable failures without disrupting users. This phase addresses the orchestration and fault tolerance that POCs ignore but production demands. The architecture decisions here determine whether your agent can scale from 10 users to 10,000 without breaking. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Robust agent architecture Design for complex workflows and fault tolerance. Workflow orchestration: How do you manage multi-step processes? Multi-agent coordination: How do specialized agents collaborate? Fault tolerance: How do you handle failures gracefully? Update rollouts: How do you update without disruption? Production memory & context systems Implement scalable context management. Memory architecture: Session, long-term, and organizational knowledge? Context persistence: Storage and retrieval strategies? Cross-session continuity: How do you maintain user context? Memory lifecycle management: Retention, archival, and cleanup? Phase 3: Data infrastructure Build the data foundation that unifies application data, vector storage, and agent memory in a manageable platform. This phase solves the "three database problem" that kills production deployments through complexity. A unified data architecture reduces operational overhead while enabling the sophisticated retrieval and context management that production agents require. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Data architecture & management Build a unified platform for all data types. Platform architecture: Application, vector, and memory data? Data pipelines: Ingestion, processing, and updates? Quality assurance: Validation and freshness monitoring? Knowledge governance: Version control and approval workflows? Knowledge base & pipeline operations Maintain and optimize knowledge systems. Update strategy: How does knowledge evolve? Embedding approach: Which models for which content? Retrieval optimization: Search relevance and reranking? Operational monitoring: Pipeline health and costs? Phase 4: Model operations Implement strategies for managing multiple model providers, fine-tuning, and cost optimization at production scale. This phase covers API management, performance monitoring, and the continuous improvement pipeline for model performance. The focus is on orchestrating external models efficiently rather than deploying your own, including when and how to fine-tune. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Model strategy & optimization Manage providers and fine-tuning strategies. Provider selection: Which models for which tasks? Fine-tuning approach: When and how to customize? Routing logic: Base versus fine-tuned model decisions? Cost controls: Caching and intelligent routing? API management & monitoring Handle external APIs and performance tracking. API configuration: Key management and failover? Performance Tracking: Accuracy, latency, and costs? Fine-tuning pipeline: Data collection for improvement? Version control: A/B testing and rollback strategies? Phase 5: Hardening and operations Add the security, compliance, user experience, and governance layers that transform a working system into an enterprise-grade solution. This phase addresses the non-functional requirements that POCs skip but enterprises demand. Without proper hardening, even the best agents remain stuck in pilot purgatory due to security or compliance concerns. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Square Purpose Key Questions Security & compliance Implement enterprise security and regulatory controls. Security implementation: Authentication, encryption, and access management? Access control: User and system access management? Compliance framework: Which regulations apply? Audit capabilities: Logging and retention requirements? User experience & adoption Drive usage and gather feedback. Workflow integration: How do you fit existing processes? Adoption strategy: Rollout and engagement plans? Support systems: Documentation and help channels? Feedback integration: How does user input drive improvement? Continuous improvement & governance Ensure long-term sustainability. Operational procedures: Maintenance and release cycles? Quality gates: Testing and deployment standards? Cost management: Budget monitoring and optimization? Continuity planning: Documentation and team training? Figure 8. The Productionize Agent Canvas V1 (Detailed). Expanded view with specific guidance for each of the eleven squares covering scale planning, architecture, data infrastructure, model operations, and hardening requirements. Next steps: start building AI agents that deliver ROI MIT's research found that 66% of executives want systems that learn from feedback , while 63% demand context retention (pg.14). The dividing line between AI and human preference is memory, adaptability, and learning capability. The canvas framework directly addresses the failure patterns plaguing most projects by forcing teams to answer critical questions in the right order—following the product → agent → data → model flow that successful teams have discovered. For your next agentic AI initiative: Start with the POC canvas to validate concepts quickly. Focus on user problems before technical solutions. Leverage AI tools to rapidly prototype after completing your canvas. Only scale what users actually want with the production canvas. Choose a unified data architecture to reduce complexity from day one. Remember: The goal isn't to build the most sophisticated agent possible—it's to build agents that solve real problems for real users in production environments. For hands-on guidance on memory management, check out our webinar on YouTube, which covers essential concepts and proven techniques for building memory-augmented agents. Head over to the MongoDB AI Learning Hub to learn how to build and deploy AI applications with MongoDB. Resources Download POC Canvas Template (PDF) Download Production Canvas Template (PDF) Download Combined POC + Production Canvas (Excel) - Get both canvases in a single excel file, with example prompts and blank templates. Full reference list McKinsey & Company . (2025). "Seizing the agentic AI advantage." ttps://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage MIT NANDA . (2025). "The GenAI Divide: State of AI in Business 2025." Report Gartner . (2025). "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027." https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027 IBM . (2025). "IBM Study: Businesses View AI Agents as Essential, Not Just Experimental." https://newsroom.ibm.com/2025-06-10-IBM-Study-Businesses-View-AI-Agents-as-Essential,-Not-Just-Experimental Carnegie Mellon University . (2025). "TheAgentCompany: Benchmarking LLM Agents." https://www.cs.cmu.edu/news/2025/agent-company Swyx . (2023). "The Rise of the AI Engineer." Latent Space. https://www.latent.space/p/ai-engineer SailPoint . (2025). "SailPoint research highlights rapid AI agent adoption, driving urgent need for evolved security." https://www.sailpoint.com/press-releases/sailpoint-ai-agent-adoption-report SS&C Blue Prism . (2025). "Generative AI Statistics 2025." https://www.blueprism.com/resources/blog/generative-ai-statistics-2025/ PagerDuty . (2025). "State of Digital Operations Report." https://www.pagerduty.com/newsroom/2025-state-of-digital-operations-study/ Wall Street Journal . (2024). "How Moderna Is Using AI to Reinvent Itself." https://www.wsj.com/articles/at-moderna-openais-gpts-are-changing-almost-everything-6ff4c4a5

September 23, 2025

Developer Blog

Modernizing Core Insurance Systems: Breaking the Batch Bottleneck

Modernizing your legacy database to Java + MongoDB Atlas doesn’t have to mean sacrificing batch performance. By leveraging bulk operations, intelligent prefetching, and parallel execution, we built an optimization framework that not only bridges the performance gap but, in many cases, surpasses legacy systems. In workloads where jobs were previously running 25–30x slower before using this framework, it brought execution times back on par and, in some cases, delivered 10–15x better performance. For global insurance platforms, significantly improved batch performance has become an added technical benefit to potentially support newer functionality. The modernization dilemma For organizations modernizing their core platforms, catering to significant user workload and revenue-generating applications, moving from a legacy RDBMS to a modern application stack with Java + MongoDB unlocks several benefits: Flexible document model: PL/SQL code tightly couples business logic with the database, making even small changes risky and time-consuming. MongoDB Atlas , with its flexible document model and application-driven logic, enables teams to evolve schemas and processes quickly, a huge advantage for industries like insurance, where regulations, products, and customer expectations change rapidly. Scalability and resilience: Legacy RDBMS platforms were never designed for today’s scale of digital engagement. MongoDB’s distributed architecture supports horizontal scale-out , ensuring that core insurance workloads can handle growing customer bases, high-volume claims, and peak-time spikes without major redesigns. Cloud-native by design: MongoDB is built to thrive in the cloud. Features like global clusters, built-in replication, and high availability with reduced infrastructure complexity, while enabling deployment flexibility across hybrid and multi-cloud environments. Modern developer ecosystem: Decouples database and business logic dependencies, accelerating feature delivery. Unified operational + analytical workloads: Modern insurance platforms demand more than transactional processing; they require real-time insights. MongoDB’s ability to support both operational workloads and analytics on live data reduces the gap between claims processing and decision-making. However, alongside these advantages, one of the first hurdles they encounter is batch jobs performance, the jobs that are meant to run daily/weekly/monthly, like an ETL process. PL/SQL thrives on set-based operations within the database engine. But when the same workloads are reimplemented with a separate application layer and MongoDB, they can suddenly become unpredictable, slow, and even time out. In some cases, processes that ran smoothly for years started running 25–30x slower after a like-for-like migration. The majority of the issues can be factored into the following broad categories: High network round-trips between the application and the database. Inefficient per-record operations replacing set-based logic. Under-utilization of database bulk capabilities. Application-layer computation overhead when transforming large datasets. For teams migrating complex ETL-like processes, this wasn’t just a technical nuisance—it became a blocker for modernization at scale. The breakthrough: A batch job optimization framework We designed an extensible, multi-purpose & resilient batch optimization framework purpose-built for high-volume, multi-collection operations in MongoDB. The framework focuses on minimising application-database friction while retaining the flexibility of Java services. Key principles include: Bulk operations at scale: Leveraging MongoDB’s native ```bulkWrite``` (including multi-collection bulk transactions in MongoDB 8) to process thousands of operations in a single round trip. Intelligent prefetching: Reducing repeated lookups by pre-loading and caching reference data in memory-friendly structures. Parallel processing: Partitioning workloads across threads or event processors (e.g., Disruptor pattern) for CPU-bound and I/O-bound steps. Configurable batch sizes: Dynamically tuning batch chunk sizes to balance memory usage, network payload size, and commit frequency. Pluggable transformation modules: Modularized data transformation logic that can be reused across multiple processes. Technical architecture The framework adopts a layered and orchestrated approach to batch job processing, where each component has a distinct responsibility in the end-to-end workflow. The diagram illustrates the flow of a batch execution: Trigger (user / cron job): The batch process begins when a user action or a scheduled cron job triggers the Spring Boot controller. Spring boot controller: The controller initiates the process by fetching the relevant records from the database. Once retrieved, it splits the records into batches for parallel execution. Database: Acts as the source of truth for input data and the destination for processed results. It supports both reads (to fetch records) and writes (to persist batch outcomes). Executor framework: This layer is responsible for parallelizing workloads. It distributes batched records, manages concurrency, and invokes ETL tasks efficiently. ETL process: The ETL (Extract, Transform, Load) logic is applied to each batch. Data is pre-fetched, transformed according to business rules, and then loaded back into the database. Completion & write-back: Once ETL operations are complete, the executor framework coordinates database write operations and signals the completion of the batch. Figure 1. The architecture for the layered approach. From bottleneck to advantage The results were striking. Batch jobs that previously timed out are now completed predictably within defined SLAs, and workloads that had initially run 25–30x slower after migration were optimized to perform on par with legacy RDBMSs and in several cases even deliver 10–15x better performance. What was once a bottleneck became a competitive advantage, proving that batch processing on MongoDB can significantly outperform legacy PL/SQL when implemented with the right optimization framework. Caveats and tuning tips While the framework is adaptable, its performance depends on workload characteristics and infrastructure limits: Batch size tuning: Too large can cause memory pressure; too small increases round-trips. Transaction boundaries: MongoDB transactions have limits (document size, total operations), plan batching accordingly. Thread pool sizing: Over-parallelization can overload the database or network. Index strategy: Even with bulk writes, poor indexing can cause slowdowns. Prefetch scope: Balance memory usage against lookup frequency. In short, it’s not one size fits all. Every workload is different, the data you process, the rules you apply, and the scale you run at all shape how things perform. What we’ve seen though is that with the right tuning, this framework can handle scale reliably and take batch processing from being a pain point to something that actually gives you an edge. If you’re exploring how to modernize your own workloads, this approach is a solid starting point. You can pick and choose the parts that make sense for your setup, and adapt as you go. Ready to modernize your applications? Visit the modernization page to learn about the MongoDB Application Platform.

September 18, 2025

Developer Blog

2 “Lightbulb Moments” for Better Data Modeling

We all know that feeling: the one where you have been wrestling with a complex problem for hours, maybe even days, and suddenly, it just clicks. It is a rush of clarity, a "lightbulb moment" that washes away the frustration and replaces it with pure, unadulterated excitement. It's the moment you realize a solution is not just possible, it is elegant. This blog post is dedicated to that feeling—the burst of insight when you discover a more intuitive, performant, and powerful way to work. We have spoken with hundreds of developers new to MongoDB to understand where they get stuck, and we have distilled their most common "Lightbulb Moments" to help you get there faster. We found that the key to getting the best performance from MongoDB is to adjust the way you think about data. Once developers recognize that the flexible document model gives them more control, not less, they become unstoppable. In this inaugural post of our new “Lightbulb Moments” blog series, we will walk you through two essential data modeling tips on schema validation and versioning and the the Single Collection Pattern. These concepts will help you structure your data for optimal performance in MongoDB and lead to your own "Lightbulb Moments," showing you how to build fast, efficient, and scalable applications. 1. Schema validation and versioning: Flexibility with control A common misconception about MongoDB is that its underlying document model is “schemaless.” With MongoDB, your schema is flexible and dependent on the needs of your application. If your workload demands a more structured schema, you can create validation rules for your fields with schema validation. If your schema requires more flexibility to adapt to changes over time, you can apply the schema versioning pattern. db.createCollection("students", { validator: { $jsonSchema: { bsonType: "object", title: "Student Object Validation", required: [ "address", "major", "name", "year" ], properties: { name: { bsonType: "string", description: "'name' must be a string and is required" }, year: { bsonType: "int", minimum: 2017, maximum: 3017, description: "'year' must be an integer in [ 2017, 3017 ] and is required" }, gpa: { bsonType: [ "double" ], description: "'gpa' must be a double if the field exists" } } } } } ) MongoDB has offered schema validation since 2017, providing as much structure and enforcement as a traditional relational database. This feature allows developers to define validation rules for their collections using the industry-standard JSON Schema. Schema validation gives you the power to: Ensure every new document written to a collection conforms to a defined structure. Specify required fields and their data types, including for nested documents. Choose the level of strictness, from off to strict , and whether to issue a warning or an error when a document fails validation. The most powerful feature, however, is schema versioning . This pattern allows you to: Gradually evolve your data schema over time without downtime or the need for migration scripts. Support both older and newer document versions simultaneously by defining multiple schemas as valid within a single collection using the oneOf operator. db.contacts.insertOne( { _id: 2, schemaVersion: 2, name: "Cameron", contactInfo: { cell: "903-555-1122", work: "670-555-7878", instagram: "@camcam9090", linkedIn: "CameronSmith123" } } ) Performance problems can stem from poor schema design—not the database. One example from a financial services company showed that proper schema design improved query speed from 40 seconds to 377 milliseconds. The ability to enforce a predictable structure while maintaining the flexibility of the document model gives developers the best of both worlds. For a deeper dive: See an example of schema optimization in MongoDB Design Reviews: how applying schema design best practices resulted in a 60x performance improvement by Staff Developer Advocate, Graeme Robinson. Learn how to use MongoDB Compass to analyze, export, and generate schema validation rules. 2. The Single Collection Pattern: Storing data together for faster queries Many developers new to MongoDB create a separate collection for each entity. If someone were building a new book review app, they might have collections for books , reviews , and users . While this seems logical at first, it can lead to slow queries that require expensive $lookup operations or multiple queries to gather all the data for a single view, which can slow down your overall app and increase your database costs. A more efficient approach is to use the Single Collection Pattern. This pattern helps model many-to-many relationships when embedding is not an option. It lets you store all related data in a single collection, avoiding the need for data duplication when the costs outweigh the benefits. This approach adheres to three defining characteristics: All related documents that are frequently accessed together are stored in the same collection. Relationships between the documents are stored as pointers or other structures within the document. An index is built on a field or array that maps the relationships between documents. Such an index supports retrieving all related documents in a single query without database join operations. When using this pattern, you can add a docType field (e.g., book, review, user) and use a relatedTo array to link all associated documents to a single ID. This approach offers significant advantages: Faster queries: A single query can retrieve a book, all of its reviews, and user data. No joins: This eliminates the need for expensive joins or multiple database trips. Improved performance: Queries are fast because all the necessary data lives in the same collection. For developers struggling with slow MongoDB queries, understanding this pattern is a crucial step towards better performance. For more details and examples, see: Building with Patterns: The Single Collection Pattern by Staff Senior Developer Advocate, Daniel Coupal. Take a free, one-hour MongoDB Skills course on Schema Design Optimization . Get started with MongoDB Atlas for free today. Start building your MongoDB skills through the MongoDB Atlas Learning Hub .

September 15, 2025

Developer Blog

Scalable Automation Starts Here: Meet Stagehand and MongoDB Atlas

While APIs deliver clean, structured data on a silver platter, the most valuable insights for AI applications often hide in the messy, unstructured corners of the web. AI's potential is immense, and many organizations face a significant challenge: their existing data infrastructure isn't ready for the scale needed by AI. The web is a vast ocean of data, and ~80% of it is unstructured. But what if you could reliably automate web interactions, extract complex data, and seamlessly integrate it into a database that offers a variety of query and search methods? This is where the powerful combination of Stagehand (by Browserbase ) and MongoDB Atlas redefines what's possible for building AI applications. Figure 1. Stagehand and MongoDB Atlas: Seamlessly automating web data collection and AI-ready integration. Stagehand: An SDK for developers to write automations with natural language The browser is a powerful tool for collecting data, but it's hard to control and scale. Traditional browser automation tools like Playwright, Puppeteer, and Selenium often force developers to write fragile code that breaks with even slight UI changes on a website. This makes maintaining scripts on live websites a significant pain point. Stagehand, however, is designed specifically for the AI era. Stagehand allows you to automate browsers using a combination of natural language and code. It's built to be more reliable and adaptable than legacy frameworks, enhancing Playwright's determinism with large language models (LLMs) to account for page changes and volatility. This means you can write code once, and it adapts to that website if it changes. Key capabilities of Stagehand include: Reading and interacting with page content: Stagehand can read page content by parsing the DOM using accessibility trees, interact with, and continue to work even when the page changes. Natural language operations: You can use natural language to extract data or instruct the browser to take actions. For instance, page.extract("the price of the first cookie") or page.act("add the first cookie to cart")

September 12, 2025

Developer Blog

Why Multi-Agent Systems Need Memory Engineering

Most multi-agent AI systems fail not because agents can't communicate, but because they can't remember. Production deployments have shown agents tend to duplicate work, operate on inconsistent states , and burn through token budgets, re-explaining context to each other—problems that scale exponentially as you add more agents. A breakthrough came from context engineering for individual agents: " The right information at the right time " transforms agent effectiveness. But this principle breaks down catastrophically when multiple agents must coordinate without shared memory infrastructure. To understand multi-agent coordination, we must first establish our foundation: Agent memory (and memory management) is a computational exocortex for AI agents—a dynamic, systematic process that integrates an agent’s LLM memory (context window and parametric weights) with a persistent memory management system to encode, store, retrieve, and synthesize experiences. Within this system, information is stored as memory units (also called memory blocks)—the smallest discrete, actionable pieces of memory that pair content with rich metadata such as timestamps, strength/confidence, associative links, semantic context, and retrieval hints. Agent memory is the key term, but the actual discipline that is the focal point of this piece is memory engineering. Memory engineering is the missing architectural foundation for multi-agent systems. Just as databases transformed software from single-user programs to multi-user applications, shared persistent memory systems enable AI to evolve from single-agent tools to coordinated teams capable of tackling enterprise-scale problems. The path from individual intelligence to collective intelligence runs through memory. Figure 1. The relationship between memory engineering and context engineering. The memory crisis in multi-agent systems Enterprise AI agents face a fundamental architectural mismatch. Without the proper data-to-memory transformation pipeline (aggregation → encoding → storage → organization → retrieval), they're built on stateless large language models but must operate in stateful environments that demand continuity, learning, and coordination. This creates cascading failure modes that become exponentially worse when agents work together. Individual agent memory failures Every production agent battles four core memory problems . Context poisoning occurs when hallucinations contaminate future reasoning, creating a feedback loop of increasingly inaccurate responses. Context distraction happens when too much information overwhelms the agent's decision-making process, leading to suboptimal choices. Context confusion emerges when irrelevant information influences responses, while context clash creates inconsistencies when conflicting information exists within the same context window. Figure 2. How context degrades over time. Recent research from Chroma reveals an additional critical issue: context rot —the systematic degradation of LLM performance as input length increases, even on trivially simple tasks. Their evaluation of 18 leading models, including GPT-4.1, Claude 4, and Gemini 2.5, demonstrates that performance degrades non-uniformly across context lengths, with models showing decreased accuracy on tasks as basic as text replication when processing longer inputs. This phenomenon is particularly pronounced when needle-question similarity decreases—as needle-question similarity decreases, model performance degrades more significantly with increasing input length—and when distractors are present, creating cascading failures in multi-agent environments where context pollution spreads between agents. These problems are expensive. According to Manus AI's production data, agents solving complex tasks average 50 tool calls per task with 100:1 input-to-output token ratios , far exceeding simple chatbot interactions. With context tokens costing $0.30-$3.00 per million tokens across major LLM providers, inefficient memory management becomes prohibitively expensive at scale. Multi-agent coordination failures When multiple agents operate without proper memory coordination, individual failures become systemic problems. Research by Cemri et al. analyzing over 200 execution traces from popular multi-agent frameworks found failure rates ranging from 40% to over 80%, with 36.9% of failures attributed to inter-agent misalignment issues ( Cemri et al., "Why Do Multi-Agent LLM Systems Fail?" ). Figure 3. Challenges encountered in multi-agent systems, categorized by type. Work duplication occurs constantly—agents repeat tasks without knowing others have already completed them. An inconsistent state means different agents operate on different versions of reality, leading to conflicting decisions and recommendations. Communication overhead skyrockets as agents must constantly re-explain context and previous decisions to each other. Most critically, cascade failures spread one agent's context pollution to others through shared interactions. The coordination chaos this creates is exactly what Anthropic's team encountered in their early development: "Early agents made errors like spawning 50 subagents for simple queries, scouring the web endlessly for nonexistent sources, and distracting each other with excessive updates" (Anthropic, Hadfield et al., 2025). These coordination failures aren't just inefficient—they're architecturally inevitable without proper memory engineering. This is particularly evident in Deep Research Mode (one of three core application modes alongside Assistant and Workflow modes), where Anthropic found that multi-agent systems excel when properly architected with shared memory infrastructure. Memory as the foundation for multi-agent coordination Understanding multi-agent coordination requires understanding how individual agents manage memory. Like humans, every agent operates within a memory hierarchy—from immediate working memory to long-term stored knowledge to shared cultural understanding: Figure 4. Memory types specific to multi-agent systems. The context window challenge A context window represents an agent's active working memory—everything it can "see" and reason about simultaneously. This includes system prompts, tool schemas, memory units, recent conversations, files, and tool responses. Even large models with 128K token limits can be exceeded by complex agent tasks, creating performance bottlenecks and cost explosions. Context engineering emerged as the solution for individual agents: managing what information enters the context window and how it's organized. The goal is getting "the right information at the right time" to maximize agent effectiveness while minimizing costs. Figure 5. Conversations aren’t free—the context window can become a junkyard of prompts, outputs, tool calls, and metadata, failed attempts, and irrelevant information. This represents the natural evolution from prompt engineering to context engineering , and now to memory engineering and memory management —the operational practice of orchestrating, optimizing, and governing an agent's memory architecture. While single-agent memory management remains an active area of research, techniques like retrieval-augmented generation (RAG) for semantic knowledge access, hierarchical summarization for compressing conversation history, dynamic context pruning for removing irrelevant information, and external memory systems like MemGPT have dramatically improved individual agent reliability and effectiveness over the past few years. The multi-agent memory challenge Research insight: Studies show that memory management in multi-agent systems must handle "complex context data and sophisticated interaction and history information," requiring advanced memory design. These systems need both individual agent memory capabilities and sophisticated mechanisms for "sharing, integrating, and managing information across the different agents" ( LLM Multi-Agent Systems: Challenges and Open Problems, Section 4 ). Most memory engineering techniques developed to date focus on optimizing individual agents, not multi-agent systems. But coordination requires fundamentally new memory structures and patterns that single-agent systems never needed. Multi-agent systems demand innovations like consensus memory (a specialized form of procedural memory) for verified team procedures, persona libraries (extensions of semantic memory's persona memory) for role-based coordination, and whiteboard methods (implementations of shared memory configured for short-term collaboration)—structures that emerge only when agents work together. Figure 6. How persona, consensus, and whiteboard memory work together. More importantly, multi-agent systems create opportunities to invest in collective memory that improves both current performance and future agent capabilities. Shared external memory enables several critical capabilities: Persistent state across agent interactions ensures continuity when agents hand off tasks or collaborate on long-running projects. Atomic operations provide consistent updates when multiple agents need to modify a shared state simultaneously. Conflict resolution handles situations where agents attempt contradictory updates to the same information. Performance optimization through caching and indexing reduces redundant operations across agent teams. Context windows become shared resources requiring careful management. Core memory alignment ensures agents share essential state and objectives. Selective context sharing propagates relevant information between agents without overwhelming their individual context windows. Memory block coordination provides synchronized access to shared memory blocks. Cost optimization maximizes KV-cache hit rates across agent interactions, reducing the exponential cost growth that kills multi-agent deployments. The economic impact of smart context management in multi-agent systems isn’t trivial, as Anthropic found in their multi-agent deep research deployment: " In our data, agents typically use about 4× more tokens than chat interactions, and multi-agent systems use about 15× more tokens than chats " (Anthropic, Hadfield et al., 2025). Yet when coordination and memory work together, the results are remarkable. Anthropic's research system provides compelling evidence: " We found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on our internal research eval " (Anthropic, Hadfield et al., 2025). This dramatic improvement shows the multiplicative potential of well-coordinated agent teams. The 5 pillars of multi-agent memory engineering Successful multi-agent memory engineering requires architectural foundations that extend beyond single-agent patterns. These five pillars provide the complete framework for scalable multi-agent systems. Figure 7. Five pillars of engineering memory for multi-agent systems. 1. Persistence architecture (storage and state management) Multi-agent systems need sophisticated storage patterns that enable coordinated state management across agent teams. Memory units structured as YAML or JSON documents in systems like MongoDB provide the foundation for complex multi-agent state management. These memory units—structured containers with metadata and relationships—can be configured as either short-term or long-term shared memory depending on use case requirements. Shared Todo.md patterns extend the proven individual agent pattern of constantly updated objectives to team-level coordination. A shared objective tracking system ensures all agents work toward aligned goals while maintaining visibility into team progress. Cross-agent episodic memory captures interaction history and decision patterns between agents. This enables agents to learn from past coordination successes and failures, improving future collaboration effectiveness. Procedural memory evolution stores workflows and coordination protocols that improve over time. As agent teams encounter new scenarios, they can update shared procedures, creating institutional memory that benefits the entire system. 2. Retrieval intelligence (selection and querying) Retrieving the right information at the right time becomes exponentially more complex with multiple agents accessing shared memory concurrently. Embedding-based retrieval uses vector similarity to find relevant cross-agent memory, but must account for agent-specific contexts and capabilities. A customer service agent and a technical support agent need different information about the same customer issue. Agent-aware querying tailors memory selection based on individual agent capabilities and roles. The system understands which agents can act on specific types of information and prioritizes accordingly. Temporal coordination manages time-sensitive information sharing. When one agent discovers urgent information, the memory system must propagate this to relevant agents quickly while avoiding information overload for agents working on unrelated tasks. Resource orchestration coordinates access across multiple knowledge bases, APIs, and external systems. Rather than each agent independently querying resources, the memory system can optimize queries and cache results for team benefit. 3. Performance optimization (compression and caching) Optimization becomes critical when context costs multiply across agent teams. Hierarchical summarization compresses inter-agent communication efficiently. Rather than storing complete conversation transcripts between agents, the system can create layered summaries that preserve essential information while reducing storage and retrieval costs. Selective preservation maintains restoration paths for complex coordination scenarios. Even when compressing information, the system preserves references to original sources, enabling agents to access full context when needed. Intelligent eviction implements memory lifecycle management through forgetting (gradual strength degradation) rather than deleting, removing redundant information while preserving the coordination state. The system reduces memory strength attributes of outdated memory units while maintaining their structure for potential reactivation. Cross-agent cache optimization implements shared KV-cache strategies that benefit the entire agent team. When one agent processes information, the results can be cached for other agents with similar contexts, dramatically reducing costs. 4. Coordination boundaries (isolation and access control) Effective boundaries prevent context pollution while enabling necessary coordination. Agent specialization creates domain-specific memory isolation. A financial analysis agent and a marketing agent can share high-level project information while maintaining separate specialized knowledge bases. Memory management agents handle cross-team memory operations as a dedicated responsibility. Rather than every agent managing memory independently, specialized agents can optimize memory operations for the entire team. Workflow orchestration coordinates context across specialized agent teams. The system understands how information flows between different agent roles and can manage context propagation accordingly. Session boundaries isolate memory by project, user, or task domain. This prevents information leakage between unrelated workstreams while enabling rich context within specific collaboration boundaries. 5. Conflict resolution (handling simultaneous updates) Multi-agent systems must gracefully handle situations where agents attempt contradictory or simultaneous updates to shared memory. Atomic operations ensure that critical memory operations to update memory units happen entirely or not at all. When multiple agents need to perform memory operations on shared memory units simultaneously, atomic operations prevent partial updates that could leave the system in an inconsistent state. Version control patterns track changes to shared memory over time, enabling agents to understand how information evolved and resolve conflicts based on temporal precedence or agent authority levels. Consensus mechanisms handle situations where agents have conflicting information about the same topic. The system must determine which information is authoritative and how to propagate corrections to agents operating on outdated knowledge. Priority-based resolution resolves conflicts based on agent roles, information recency, or confidence levels. A specialized technical agent's assessment might override a general-purpose agent's conclusion about a technical issue. Rollback and recovery enable the system to revert problematic changes when conflicts create an inconsistent state. If a memory update causes downstream coordination failures, the system can roll back to a known-good state and retry with better conflict resolution. Figure 8. Analogies for memory engineering. Measuring multi-agent memory success Successful multi-agent memory engineering requires understanding what good coordination looks like and how it relates to the five architectural pillars. Rather than focusing on individual agent performance, we must measure emergent behaviors that only appear when agents work together effectively. What success looks like Seamless coordination makes multi-agent systems reliable (through consistent access to accurate historical context), believable (through trustworthy inter-agent interactions), and capable (through leveraging accumulated collective knowledge)—the RBC framework that defines successful agent memory implementation. Collective intelligence emerges when agent teams consistently outperform individual agents on complex tasks. The shared memory system enables capabilities that no single agent could achieve alone. Cost-effective scaling occurs when adding agents to a team reduces per-task costs rather than multiplying them. Effective memory sharing prevents the exponential cost growth that kills most multi-agent deployments. Resilient operations maintain continuity when individual agents fail or new agents join the team. The shared memory system preserves institutional knowledge and enables smooth transitions. Adaptive learning allows agent teams to improve over time through accumulated experience. Shared memory becomes an investment that benefits current and future agent deployments. Figure 9. High-level signals for measuring memory engineering quality. Persistence architecture success shows up as a consistent state across all agents and zero data loss during coordination. You measure this through state synchronization rates and the absence of coordination conflicts caused by inconsistent storage. Retrieval intelligence success manifests as agents quickly finding exactly the information they need without information overload. Effective retrieval means agents spend time acting rather than searching. Performance optimization success appears as sub-linear cost scaling as you add agents and tasks. Well-optimized systems show decreasing per-agent costs through effective caching and compression. Coordination boundaries success prevents context pollution while enabling necessary information sharing. You see this in specialized agents maintaining their expertise while contributing to team objectives. Conflict resolution success handles simultaneous updates gracefully and maintains system consistency even when agents discover contradictory information. The system should rarely require manual intervention to resolve conflicts. The ultimate measure The true test of multi-agent memory engineering is whether agent teams tackle problems that individual agents cannot solve, at costs that make deployment viable. Success means transitioning from "agents helping humans" to "agent teams solving problems independently"—the difference between tools and teammates. Organizations implementing sophisticated memory engineering will achieve 3× decision speed improvement and 30% operational cost reduction by 2029 (Gartner prediction), demonstrating the strategic value of proper memory architecture. The path forward Memory engineering represents the missing infrastructure layer for production multi-agent systems. Just as relational databases enabled the transition from single-user desktop applications to multi-user web applications, shared memory systems enable the transition from single-agent tools to multi-agent intelligence. The companies succeeding with AI agents today have figured out memory architecture, not just prompt engineering. They understand that agent coordination requires the same foundational infrastructure thinking that built the modern web: persistent state, atomic operations, conflict resolution, and performance optimization. Research validates this approach: Enterprises implementing proper memory engineering achieve 18% ROI above cost-of-capital thresholds (IBM Institute for Business Value) , while systems without memory architecture continue to struggle with the coordination failures that plague 40-80% of multi-agent deployments. Key takeaways Multi-agent systems fail because of memory problems, not communication problems . The real issue isn't that agents can't talk to each other—it's that they can't remember and coordinate state effectively. Memory engineering is the key differentiator for production multi-agent systems . Companies succeeding with AI agents have figured out memory architecture, not just prompt engineering. Individual agent memory patterns extend naturally to multi-agent coordination . The same principles (write, select, compress, isolate) that solve single-agent context problems solve multi-agent coordination problems. Shared external memory is essential infrastructure . Just like databases enabled web applications to scale, shared memory systems enable AI agents to coordinate at enterprise scale. To explore memory engineering further, start experimenting with memory architectures using MongoDB Atlas or review our detailed tutorials available at AI Learning Hub .

September 11, 2025

Developer Blog

Building a Scalable Document Processing Pipeline With LlamaParse, Confluent Cloud, and MongoDB

As organisations generate increasing volumes of data, extracting meaningful insights from unstructured documents has become a significant challenge. This blog presents an advanced architecture that leverages cloud storage, streaming technology, machine learning, and a database to deliver a robust and efficient document processing pipeline. Introduction: Modern document processing challenges Organisations today are drowning in documents. PDFs, reports, contracts, and other text-heavy files contain valuable information, but extracting that knowledge efficiently presents significant challenges. Traditional document processing approaches often suffer from several limitations: Scalability issues when processing large volumes of documents. Limited intelligence in parsing complex document structures. Batch processing delays hinder timely, real-time insights. Difficulty integrating processed data into downstream applications. Modern businesses require solutions capable of processing documents at scale, extracting their semantic meaning, and making this information immediately available for applications like search, recommendation systems, or business intelligence. Our architecture meets these challenges through a streaming-based approach that leverages advanced, cutting-edge technologies. Architecture overview: From raw files to structured data Figure 1. Reference architecture. Our solution features a sophisticated real-time data processing pipeline that combines cloud storage, stream processing, machine learning, and a persistent database to create a comprehensive system for document enrichment and analysis. At its core, the architecture follows a streaming data pattern where information flows continuously from source to destination, being transformed and enriched along the way. Let's walk through the key components: AWS S3 bucket: Serves as the primary data lake, storing raw PDF documents. Python ingestion script: Reads files from S3 and coordinates document processing. LlamaParse: Provides intelligent document parsing and chunking. Confluent: Acts as the central nervous system with two main topics: "raw": Contains parsed document chunks. "summary_embedding": Stores processed chunks with embeddings. Apache flink: Processes streaming data and generates embeddings using ML. Confluent schema registry: Handles data contracts, ensuring consistent data formats. MongoDB: Stores the final processed documents with their embeddings. This architecture excels in scenarios requiring real-time document processing with machine learning enrichment, such as semantic search applications, content classification, or knowledge management systems. Data ingestion: Efficient document chunking with LlamaParse The journey begins with PDF documents stored in an AWS S3 bucket. The ingestion layer, built with Python, handles the following tasks: Document retrieval: The Python script connects to AWS S3 using configured credentials to access stored PDF documents. Intelligent parsing with LlamaParse: The system fundamentally transforms how PDFs are processed. Instead of a simple extraction that treats these complex documents as mere sequences of text, it harnesses the power of LlamaParse. This sophisticated document parsing tool goes beyond simple character recognition, offering an intelligent understanding of document structure and layout. LlamaParse meticulously identifies and interprets critical formatting elements such as: Tables: Accurately distinguishing rows, columns, and cell content, preserving tabular data integrity. Images: Identify images in text, including additional context based on where the image is in the overall layout. Headers: Recognising hierarchical headings and subheadings is crucial for organising a s document effectively. Other formatting elements: Including lists, bolded text, italics, and various layout components, ensuring that the semantic meaning and visual presentation are preserved during parsing. By leveraging LlamaParse, the system ensures that we don’t lose context over the document, by employing a parsing strategy that can make use of classic OCR, as well as LLMs and LVMs.. The following Python code demonstrates how to initialise your parser and extract relevant information. # Initialize LlamaParse with your API key parser = LlamaParse( api_key=os.getenv("LLAMA_PARSE_API_KEY"), result_type="json" # Get JSON output ) # Parse the PDF with file name in extra_info parsed_doc = parser.parse(file_bytes, extra_info={"file_name": file_name}) Document chunking: LlamaParse breaks down documents into manageable chunks, typically at the page level, while preserving the context and metadata of each chunk. This chunking approach provides several benefits: Better processing efficiency for large documents. More precise context for embedding generation. Improved search granularity in the final application. The processed chunks are then ready for the next stage of the pipeline. The Python script handles any parsing errors gracefully, ensuring the pipeline remains robust even when encountering malformed documents. Streaming infrastructure: Leveraging Confluent Cloud Confluent Cloud, a fully managed Apache Kafka service, serves as the backbone of our architecture. This streaming platform offers several advantages: Decoupled components: Kafka separates data producers (document parsers) from consumers (processing engines), allowing each to operate at its own pace. Similarly, LlamaParse, Flink, and MongoDB all process and ingest at different throughputs independently with decoupling. Scalability: The platform handles high throughput with configurable partitioning (6 partitions per topic in our implementation). Data resilience: Kafka's replication ensures no document chunks are lost during processing. Schema management: Confluent Schema Registry provides strong guarantees for schema evolution (forward and backwards compatibility). Our implementation uses two main Kafka topics: raw: Contains the parsed document chunks from LlamaParse. summary_embedding: Stores the processed chunks with their generated embeddings. The Avro schema for the embedding messages ensures consistency: { "type": "record", "name": "summary_embedding_value", "namespace": "org.apache.flink.avro.generated.record", "fields": [ { "name": "content", "type": ["null", "string"], "default": null }, { "name": "embeddings", "type": ["null", {"type": "array", "items": ["null", "float"]}], "default": null } ] } This schema defines the structure for each message, containing the original text content and its corresponding vector embeddings. Once document chunks are flowing through Apache Kafka, the real magic happens in the processing layer. Apache Flink, a powerful stream processing framework, consumes data from the raw topic and applies transformations to generate embeddings. Flink continuously processes the stream of document chunks from Kafka and continuously produces the enriched summary_embedding stream back to Kafka. Embeddings are numerical vector representations that capture the semantic meaning of text. They enable powerful capabilities like: Semantic search (finding documents by meaning, not just keywords). Document clustering and classification. Similarity detection between documents. Foundation for sophisticated AI applications. Our implementation uses AWS Bedrock for embedding generation through Flink SQL: -- Create the embedding model CREATE MODEL AWSBedrockEmbedding INPUT (text STRING) OUTPUT (embeddings ARRAY<FLOAT>) WITH ( 'bedrock.connection' = 'bedrock-connection', 'task' = 'embedding', 'provider' = 'BEDROCK' ); -- Create the destination table CREATE TABLE summary_embedding ( content STRING, embeddings ARRAY<FLOAT> ); -- Insert transformed data INSERT INTO summary_embedding SELECT CAST(val as STRING), embeddings FROM raw, LATERAL TABLE (ML_PREDICT('AWSBedrockEmbedding', CAST(val as STRING))); This SQL defines how Flink should: Connect to AWS Bedrock for ML capabilities. Define the destination structure for embeddings. Transform incoming text by generating embeddings through the ML_PREDICT function. The result is a continuous stream of document chunks paired with their semantic vector representations. Data consumption: Avro deserialization for schema evolution On the consumption side, a dedicated consumer application reads from the embedded_data topic. This application handles several important tasks: Message consumption: Efficiently reads messages from Kafka with proper offset management. Avro deserialization: Converts the binary Avro format back to usable objects using the Schema Registry. Error handling and retries: Manages potential failures in consumption or processing. Avro deserialization is particularly important for maintaining compatibility as the pipeline evolves. The Schema Registry ensures that even if the schema changes over time (adding new fields, for example), the consumer can still correctly interpret messages produced with older schemas. The consumer application is implemented with multi-threading to maximise throughput, allowing parallel processing of messages from different partitions. Storage strategy: MongoDB for flexible document storage The final destination for our processed document chunks is MongoDB, a document-oriented database well-suited for storing complex, nested data structures, including vector embeddings. MongoDB offers several advantages for this architecture: Flexible schema: Accommodates varying document structures and metadata. Vector storage: Efficiently stores and indexes high-dimensional embedding vectors. Query capabilities: Supports semantic search through vector similarity queries. Scalability: Handles large document collections through sharding. Integration options: Easily connects with downstream applications and visualisation tools. The consumer application inserts each processed document chunk into MongoDB, preserving both the original text content and the generated embeddings. This makes the data immediately available for applications that need to search or analyse the document collection. How MongoDB differentiates from other vector databases MongoDB stands out as a versatile choice for vector storage, especially when compared to specialized vector databases. Here's why: Native integration: MongoDB's core strength lies in its ability to store and manage both structured and unstructured data, including vector embeddings, within a single platform. Unlike standalone vector databases that often require separate data synchronization and management, MongoDB Atlas Vector Search allows you to store your original data and its corresponding embeddings together in the same document. This eliminates data silos and simplifies your architecture. Flexible data model: MongoDB's document model provides unparalleled flexibility. You can store your raw text, metadata, and vector embeddings in a single JSON-like document. This means you don't need to normalize your data across multiple tables or systems, making schema evolution easier and reducing development complexity. Comprehensive query capabilities: Beyond simple vector similarity searches, MongoDB allows you to combine vector search with other powerful query operators, such as filtering by metadata, geospatial queries, or full-text search. This enables more nuanced and precise retrieval of information, which is crucial for advanced AI applications. Operational maturity and ecosystem: MongoDB is a mature, battle-tested database with a robust ecosystem of tools, drivers, and integrations. It offers enterprise-grade features like scalability, high availability, security, and a rich set of developer tools. Specialized vector databases, while effective for their niche, may lack the broader operational capabilities and community support of a general-purpose database like MongoDB. Cost-effectiveness and simplification: By consolidating your data storage and vector search capabilities into a single database, you can reduce operational overhead and cost. You avoid the need to manage and scale separate database systems, simplifying your infrastructure and streamlining your development workflow. In essence, while dedicated vector databases excel at one specific task, MongoDB provides a holistic solution that handles your entire data lifecycle, from ingestion and storage to advanced querying and analytics, all within a single, integrated platform. By implementing this architecture, organisations can transform their document processing capabilities from static, batch-oriented systems to dynamic, real-time pipelines that extract meaningful insights from unstructured content. The combination of cloud storage, streaming processing, machine learning, and flexible storage creates a powerful foundation for document-centric applications that drive business value. Get started today by exploring the complete implementation in the GitHub repo . New to MongoDB? Deploy a free instance of MongoDB Atlas to see how it can effortlessly power your AI-driven applications. Not yet a Confluent customer? Start your free trial of Confluent Cloud today and receive $400 to spend during your first 30 days! Sign up to LlamaCloud to get started with LlamaParse, LlamaExtract and more! Want to build your own custom agentic workflow? Check out LlamaIndex .

September 10, 2025

Developer Blog

Real-Time Materialized Views With MongoDB Atlas Stream Processing

For developers coming from a relational database background, the concept of a "join" is a fundamental part of the data model. But in a document database like MongoDB, this approach is often an anti-pattern. While MongoDB offers the $lookup aggregation stage, relying on it for every read operation can lead to performance bottlenecks and create fragile architectures. From relational thinking to document thinking In relational databases, query optimization often revolves around minimizing or optimizing joins. In a normalized relational schema, data is split into multiple related tables, and the optimizer’s job is to figure out how to pull that data together efficiently at query time. Because joins are built into the relational model and cost is reduced through indexing strategies, database architects spend a lot of effort making joins fast. In MongoDB—and most NoSQL document databases—the philosophy is different: it is perfectly acceptable, and often recommended, to duplicate data when it serves an application's query patterns. Instead of joining related entities on every read, MongoDB encourages data models where data that is accessed together is stored together in the same document, pre-aggregated or denormalized to match access needs. Avoiding excessive joins (or $lookup stages) isn’t a limitation—it’s a deliberate design choice that trades some extra storage for dramatically reduced query latency, simpler query logic, and predictable performance at scale. This is why patterns like materializing “query‑optimized” collections, whether batch‑computed or stream‑updated, are so powerful in the MongoDB world: instead of computing the relationship at read time, you store the relationship in exactly the form your application needs. In the past, a common solution for generating data summaries was to precompute and store them (often called “materializing” the data) through a scheduled batch job. This pre-aggregated data was useful for reports but suffered from data staleness, making it unsuitable for real-time applications. So, how do you handle complex data relationships and serve real-time insights without a heavy reliance on expensive joins or stale, batch-processed data? The answer lies in a modern approach that leverages event-driven architecture and a powerful pattern: continuously updated, query‑optimized collections (essentially materializing enriched data in real time) with MongoDB Atlas Stream Processing . The core problem: Why we avoid joins in MongoDB MongoDB is built on the principle that "data that is accessed together should be stored together." This means that denormalization is not only acceptable but is often the recommended practice for achieving high performance. When data for a single logical entity is scattered across multiple documents, developers often face two key problems: 1. The "fragmented data" problem As applications evolve, their data models can become more complex. What starts as a simple, one-document-per-entity model can turn into a web of interconnected data. To get a complete picture of an entity, your application must perform multiple queries or a $lookup to join these pieces of information. If you need to update this complete entity, you're faced with the complexity of a multi-document transaction to ensure data consistency. This overhead adds latency to both read and write operations, slowing down your application and complicating your code. The diagram below illustrates this problematic, join-heavy architecture, which we consider an anti-pattern in MongoDB. Figure 1. Data fragmentation across multiple collections. 2. The "microservice" problem In a microservice architecture, each service often owns its own database, oftentimes running in its own MongoDB cluster, promoting autonomy and decoupling. But this creates a new challenge for data sharing. One service might need data owned by another, and the classic join approach doesn't work. While federated databases might seem like a solution for joining data across different clusters, they are typically designed for ad-hoc queries and analytics rather than high-performance, real-time application queries. MongoDB's $lookup stage has a crucial constraint: it cannot work across different databases, which is a common scenario in this architecture. This forces developers into inefficient synchronous API calls or complex, manual data synchronization pipelines. The solution: Enabling CQRS with event processing The answer to these problems—the reliance on joins, fragmented data, and microservice coupling—lies in an event-driven architecture paired with the power of materialized views. This architectural pattern is a well-established concept known as Command Query Responsibility Segregation (CQRS). At its core, CQRS separates a system's "Command" side (the write model) from its "Query" side (the read model). Commands are actions that change the state of the system, such as CreateOrder or UpdateProduct . Queries are requests that retrieve data, like GetOrderDetails . By embracing this pattern, your core application can handle commands with a transactional, write-focused data model. For all your read-heavy use cases, you can build separate, highly optimized read models. This is where event processing becomes the crucial link. When a command is executed, your write model publishes an event to a stream (e.g., OrderCreated ). This event is an immutable record of what happened. MongoDB Atlas's native change streams are the perfect tool for generating this event stream directly from your primary data collection. A separate process—our stream processor—continuously listens to this event stream. It consumes the OrderCreated event, along with any other related events such as changes in Product, Customers and Payments, and uses this information to build a denormalized, query-optimized collection — effectively materializing enriched data for fast queries. This view becomes the query side of your CQRS pattern. As you can see in the diagram below, this modern CQRS architecture directly solves the problems of fragmentation and joins by separating the read and write concerns entirely. Figure 2. Data access with the CQRS pattern. For a microservice architecture, here is what the CQRS pattern would look like: Figure 3. CQRS pattern applied across multiple services. The tangible benefits: Why this approach matters Embracing this event-driven, real-time data materialization pattern is not just about elegant architecture—it provides real, measurable benefits for your applications and your bottom line. Blazing-fast queries: Since the data is already pre-joined and shaped to your query requirements, there is no need for expensive, real-time $lookup or aggregation operations. Queries against this continuously updated, query‑optimized collection become simple, fast reads, resulting in significantly lower latency and a more responsive user experience. Reduced resource consumption: By offloading the computational work of joins and aggregations to the continuous stream processor, you dramatically reduce the workload on your primary operational database. The highly efficient queries against the materialized view consume far less CPU and RAM. More economical deployments: The reduced resource consumption translates directly into more economical deployments. Since your primary database is no longer burdened with complex read queries, you can often run on smaller, less expensive instances. You are trading more costly CPU and RAM for cheaper storage, which is a highly favorable economic trade-off in modern cloud environments. Improved performance predictability: With a consistent and low-resource query model, you eliminate the performance spikes and variability that often come with complex, on-demand queries. This leads to a more stable and predictable application load, making it easier to manage and scale your services. How MongoDB Atlas Stream Processing can help Building a custom stream processing pipeline from scratch can be complex and expensive, requiring you to manage multiple systems like Apache Flink or Kafka Streams. MongoDB Atlas Stream Processing is a fully managed service that brings real-time stream processing directly into the MongoDB ecosystem. It takes on the responsibility of the "Query Side" of your CQRS architecture, allowing you to implement this powerful pattern without the operational overhead. With MongoDB Atlas Stream Processing, you can use a continuous aggregation pipeline to: Ingest events from a MongoDB Change Stream or an external source like Apache Kafka. Transform and enrich the data on the fly. Continuously materialize the data into a target collection using the $merge stage . Crucially, $merge can either create a new collection or continuously update an existing one. It intelligently inserts new documents including starting with an initial sync , replaces or updates existing ones, and even deletes documents from the target collection when a corresponding deletion event is detected in the source stream. The diagram below provides a detailed look at the stream processing pipeline, showing how data flows through various stages to produce a final, query-optimized target collection. Figure 4. Atlas stream processing pipeline. This gives you an elegant way to implement a CQRS pattern, ensuring your read and write models are decoupled and optimized for their specific tasks, all within the integrated MongoDB Atlas platform. It allows you to get the best of both worlds: a clean, normalized model for your writes and a high-performance, denormalized model for all your reads, without the manual complexity of a separate ETL process. It’s a modern way to apply the core principles of MongoDB data modeling at scale, even across disparate databases. By moving away from join-heavy query patterns and embracing real-time, query‑optimized collections with MongoDB Atlas Stream Processing, you can dramatically cut query latency, reduce resource usage, and build applications that deliver consistent, lightning‑fast experiences at scale. Whether you’re modernizing a monolith, scaling microservices, or rethinking your data model for performance, Atlas Stream Processing makes it easy to adopt a CQRS architecture without the overhead of managing complex stream processing infrastructure yourself. Start building your continuously updated, high-performance read models today with MongoDB Atlas Stream Processing .

September 9, 2025

Developer Blog

Multi-Agentic Ticket-Based Complaint Resolution System

In the AI-first era of customer experiences, financial institutions face increasing pressure to deliver faster, smarter, and more cost-effective service, all while maintaining transparency and trust. This blog introduces a multi-agentic architecture, co-developed with MongoDB and Confluent Data Streaming Platform , designed to automate ticket-based complaint resolution. It leverages the AI-native capabilities of both platforms within a real-time event streaming environment. With continuously active agents orchestrated across the entire system, financial institutions can automatically resolve common customer issues in seconds, significantly improving resolution times and boosting customer satisfaction. Specifically, this blog demonstrates the automation of customer complaint handling for a retail bank using a sequence of specialized AI agents. Customers interact through a simple ticket, while behind the scenes, Confluent Cloud for Apache Flink ® powers intent classification, MongoDB Voyage AI -backed semantic search, and contextual reasoning against live customer data to propose resolutions in real time. Structured events flow through a Confluent Cloud for Apache Kafka ® backbone to automate downstream ticketing and update audit logs, with Atlas Stream Processing (ASP) ensuring sub-second updates to the system-of-record database. Business context & challenge Retail banks face an immense operational challenge in managing the sheer volume of customer inquiries they receive on a monthly basis, often numbering in the millions. A substantial portion of these interactions, frequently ranging from 20% to 30%, is dedicated to addressing relatively routine and common issues that could potentially be automated or handled more efficiently. These routine inquiries typically revolve around a few key areas: Card declines or transaction failures: Customers frequently contact their banks when a credit or debit card transaction fails, or when a card is declined. This can be due to various reasons such as insufficient funds, security holds, incorrect PINs, or technical glitches. OTP / authentication problems: Issues related to One-Time Passwords (OTPs) or other authentication methods are a common source of customer frustration and inquiries. This includes not receiving an OTP, OTPs expiring, or difficulties with multi-factor authentication processes when trying to access accounts or authorize transactions. Billing or statement discrepancies: Customers often reach out to clarify charges on their statements, dispute transactions, or report discrepancies in their billing summaries. These inquiries can range from unrecognized transactions to incorrect fees or interest calculations. Addressing these high-volume, routine inquiries efficiently is crucial for retail banks. Streamlining these processes, perhaps through self-service options, AI-powered chatbots, or improved internal tools for customer service representatives, could significantly reduce operational costs, improve customer satisfaction, and free up resources to handle more complex or sensitive customer issues. Current pain points: Long holds or call-backs frustrate customers and drive support costs. The support team must manually sift through past cases and customer records to recommend next steps. Inconsistent audit trails hamper compliance and SLA reporting. Solution overview Our multi-agentic ticket-based complaint resolution system automates first‑line support by orchestrating a series of AI agents, built on MongoDB and Confluent technologies. Why a multi-agentic system? Traditional AI systems often employ a singular, all-encompassing model to handle a broad spectrum of tasks. However, this architecture proposes a more sophisticated and efficient approach: a multi-agentic system. Instead of a monolithic AI, this design breaks down complex responsibilities into smaller, manageable units. Each of these specialized AI agents is dedicated to a distinct, yet crucial, function. This division of labor allows for greater precision, scalability, and resilience within the overall system, as each agent can be optimized for its specific role without impacting the performance of others. This modularity also simplifies debugging and updates, as changes to one agent do not necessitate a re-evaluation of the entire system. Figure 1. Architecture layers & data flow. Architecture overview Ingestion & classification: The initial triage The initial phase of this customer service system is dedicated to efficiently receiving and categorizing incoming customer complaints. This is where raw, unstructured customer input is transformed into actionable intelligence. Ticket (web/mobile) - The customer's voice: This is the primary entry point for customer grievances. Whether submitted via a web portal, a dedicated mobile application, or other digital channels, customers provide their feedback as free-text complaints. This flexibility allows customers to articulate their issues naturally, without being confined to pre-defined categories. The system is designed to handle the nuances of human language, capturing the essence of the problem as expressed directly by the customer. Intent classification agent - Understanding the "why": Technology stack: At the heart of this classification lies a powerful combination of Confluent Flink for real-time stream processing and an LLM ( large language model ). Confluent Flink provides the robust framework necessary to process incoming tickets with low latency, ensuring that customer complaints are analyzed almost instantaneously. The integrated LLM is the intelligence engine, trained on vast datasets of customer interactions to accurately discern the underlying intent behind the free-text complaint. Core functionality: The primary role of this agent is to classify the complaint into specific, pre-defined intents. For example, a customer typing "My credit card isn't working at the ATM" would be classified under an intent like "card not working." Beyond just intent, the LLM is also adept at extracting explicit parameters embedded within the complaint. In the "card not working" example, it would extract the "card ID" if provided, or other relevant identifiers that streamline the resolution process. This automated extraction of key information significantly reduces the need for manual data entry and speeds up the subsequent steps. Semantic retrieval agent - Learning from the past: Technology stack: This agent leverages the advanced capabilities of Voyage AI and Atlas Vector Search . Voyage AI provides sophisticated embedding generation and re-ranking algorithms that are crucial for understanding the semantic meaning of the classified intent. This semantic understanding is then fed into MongoDB Atlas Vector Search, which is purpose-built for efficient similarity searches across vast datasets. Core functionality: Once the intent is classified and parameters extracted, the semantic retrieval agent leverages MongoDB’s Voyage AI and then re-ranks the embeddings to ensure the most relevant matches are prioritized. This process enables the system to efficiently retrieve the ‘N’ most similar past cases or knowledge base entries. This retrieval is critical for two reasons. Historical context: It provides agents (human or automated) with insights into how similar issues were resolved previously, offering potential solutions or precedents. Knowledge base integration: It pulls relevant articles, FAQs, or troubleshooting guides from the knowledge base, equipping the system with immediate information to address the customer's query. This proactive retrieval of information is a cornerstone of efficient problem resolution. Contextual reasoning: Assembling the puzzle With the initial classification and historical context in hand, the system moves to contextual reasoning, where all available data is brought together to formulate the most appropriate resolution. Contextual reasoning agent - The decision maker: Technology stack: MongoDB Atlas serves as the central data platform for this crucial stage. Its flexible document model and real-time capabilities are ideal for aggregating and querying diverse datasets quickly. Core functionality: This agent performs several critical functions: Live data fetching: It connects to and fetches real-time customer profiles and transaction data from various systems. This includes information such as account status, recent transactions, past interactions, and communication preferences. Access to live data ensures that the suggested resolution is based on the most current customer situation. Data merging: The fetched live data is then seamlessly merged with the semantic results obtained from the previous stage (i.e., the relevant past cases and knowledge base entries). This creates a holistic view of the customer's problem, enriched with both their current context and historical solutions. Resolution suggestion: Armed with this comprehensive dataset, the agent applies pre-defined business rules or leverages lightweight LLM prompts to select the best resolution suggestion. These business rules could involve decision trees, conditional logic, or specific protocols for certain complaint types. For more complex or nuanced situations, lightweight LLM prompts can be used to analyze the combined data and suggest a human-like, contextually appropriate resolution. This stage aims to automate as much of the resolution process as possible, only escalating to human intervention when necessary. Execution & eventing: Actioning the resolution The final phase focuses on executing the determined resolution and ensuring that the necessary downstream systems are updated. Resolution execution agent - Turning suggestions into action: Technology stack: This stage uses a scalable and event-driven infrastructure to process the resolution suggestion and convert it into a structured event. Core functionality: This agent takes the suggested action from the Contextual Reasoning Agent (e.g., "issue new card") and converts it into a structured resolution event. This event is a standardized, machine-readable message that contains all the necessary details to trigger the actual resolution in downstream systems. The structured format ensures consistency and interoperability across the ecosystem. Confluent Cloud: Technology stack: Confluent Cloud Core forms the backbone of the eventing architecture. It acts as a high-throughput, low-latency event bus, facilitating the seamless flow of resolution events across the enterprise. Core functionality: Confluent Cloud carries these structured resolution events to various downstream systems. A critical component here is the Confluent Schema Registry, which ensures that all events adhere to governed schemas. This schema enforcement is vital for data integrity and compatibility, preventing errors and ensuring that all consuming systems correctly interpret the event data. The event bus acts as the central nervous system, broadcasting resolution information to all interested parties. Downstream consumer - Closing the loop: This section describes the various systems that consume the resolution events from the Confluent Cloud event bus, ensuring a fully integrated and closed-loop system. Confluent MongoDB Sink Connector: This connector plays a crucial role in maintaining an audit trail and storing detailed resolution information. It efficiently logs all audit and resolution details into MongoDB, providing a historical record of all customer interactions and the actions taken to resolve their issues. This data is invaluable for analytics, reporting, and future system improvements. Confluent MongoDB Source Connector: This connector facilitates closed-loop integrations by streaming any external database changes back into Kafka. This means that if an action taken in a separate, external system (e.g., a manual card activation by a human agent) results in a database change, this change is streamed back into the event bus. This allows other systems within the ecosystem to react to these external changes, ensuring complete synchronization and a truly integrated customer service experience. MongoDB Atlas Stream Processing (ASP): ASP is designed for real-time data processing and updates. It actively listens on Kafka (the underlying technology of Confluent Cloud) for specific status-change events. When such an event occurs (e.g., a card replacement is initiated), ASP processes it and writes low-latency updates back to the Customer Profile collection in MongoDB Atlas. This ensures that the customer's profile is always up-to-date with the latest status of their issue, providing a consistent view for both customers and internal teams. Figure 2. Key benefits & metrics. Why this matters to management Cost reduction & operational efficiency: By automating routine complaint workflows and first-line resolutions, the system significantly reduces call center headcount and manual support costs. Leveraging serverless AI agents and managed services ensures predictable, elastic operational costs that scale with demand rather than fixed staffing. Faster, contextual resolutions with AI: End-to-end AI-assisted decisioning powered by intent classification, semantic search, and real-time customer context drives faster, more accurate resolutions. Personalized outcomes improve customer satisfaction while reducing friction across the support journey. Enterprise-grade observability & compliance: Every decision and action is governed by enforcement via the Confluent Schema Registry schema, time-stamped, and logged in MongoDB Atlas. This ensures a complete audit trail for internal governance and external regulatory audits streamlining compliance without added overhead. Trusted, scalable platforms: Built on MongoDB and Confluent, the architecture uses proven, enterprise-grade platforms for data, events, and real-time AI workflows. This provides reliability, observability, and high availability by default. Future-proof & extensible design: The modular design allows seamless onboarding of new AI agents for emerging use cases like loan servicing, KYC checks, or investment guidance without re-architecting. This adaptability ensures long-term value as business requirements evolve. Competitive advantage through personalization: Delivering intelligent, real-time resolutions is no longer optional, it’s a differentiator. The system enhances customer loyalty and satisfaction by resolving issues instantly, accurately, and in context. Closing thoughts Agentic architectures are going to be a large part of the future of customer service automation, and they require infrastructure that can reason in real time. MongoDB and Confluent provide a proven, production-ready foundation for deploying multi-agent systems that are cost-effective, explainable, and deeply personalized using the latest AI techniques. Whether you're in financial services or another vertical, this reference architecture offers a start for strategic thinking or even a tangible leap forward. Not yet a MongoDB user? Deploy a free instance of MongoDB Atlas and explore how Atlas can power your AI-driven applications with ease. Not yet a Confluent customer? Start your free trial of Confluent Cloud today ! New users receive $400 to spend during their first 30 days. Kickstart your journey with MongoDB Atlas and Confluent by trying out Confluent Cloud + MongoDB Atlas in real-world gen AI use cases with these Quick Starts: Small Business Loan Agent Chatbot on AWS Medical Knowledge Database on Google Cloud

September 4, 2025

Developer Blog

Ready to get Started with MongoDB Atlas?

Start Free