数据库操作的经济效益

MongoDB
January 8, 2024

面对不断上升的成本以及不明朗的经济形势，许多组织都在探索各种方式，以期取得事半功倍的效果。数据库也不例外。所幸，转为使用文档数据库并实施恰当的数据建模技术，就有机会提高效率、节省资金。文档数据库从两个方面为公司节省资金：

以对象为中心的跨语言 SDK 和架构灵活性，让开发人员能够加速创建和迭代生产代码，从而降低开发成本。
减少实现规定事务吞吐量所必备的硬件，能够大幅降低运营成本。

开发人员效率

所有现代开发都用到了对象概念。对象定义了一系列相关值，以及如何读取、修改值及从值中推导结果的各种方法。顾客、发票和火车时间表都体现出对象概念。与所有程序变量一样，对象也是临时的。因此，必须将它们保留到磁盘存储，将其变为持久化对象。

我们不再采用 Windows 桌面开发人员在 20 世纪 90 年代的的做法，不会再手动将对象序列化成本机文件。目前，数据并不会存储在执行应用程序的电脑上，而是存储在由多个应用程序或多个应用程序实例可以访问的一个集中位置。共用访问位置之后，我们既需要通过网络高效地读写数据；也需要实施机制来确保对该数据的并发更改不会导致一个流程覆盖另一个流程的更改。

关系数据库推出的时间早于已经广泛使用和实施的面向对象的编程。在关系数据库中，数据结构是内含值的多张数学表。与数据的交互需要通过专门语言 SQL，而该语言经过过去 40 年的演变，能够与存储的数据进行所有类型的交互：筛选及重新调整自身、将自身从已去重的相关扁平化模型转变为向应用程序呈现的表格化、重叠、联接结果。然后再大费周章地将数据从这么多行的冗余值再转换为程序需要的对象。

这个环节需要开发人员投入大量工时、技能和专业知识。开发人员必须理清表格之间的关系。他们还需要了解如何检索不同的信息集合，再利用这些数据行来重建数据对象。人们假定了开发人员在入行前就已经学了相关技能，只需要在工作中调用技能就行。这个假定毫无根据。即便开发人员接受过正规的 SQL培训，他们也不大可能懂得如何高效写入有用示例。保留对象的概念催生了文档数据库。有了文档数据库后，只需用非常少的代码或转换就能将强类型对象保留到数据库；只需用范例对象来筛选、重新调整及聚合结果，不必费劲使用名为 SQL 的蹩脚英文来表达。

假设我们想要为拥有一系列重复属性的客户存储客户对象，例如，存储地址。此时，地址是不会在客户之间分享的弱实体。如果使用了 C# 中的代码/类似 Java 的伪代码：

class Address : Object {
  
   Integer number;
   String street, town, type;
 
   Address(number, street, town, type) {
       this.number = number
       this.street = street
       this.town = town,
       this.type = type
   }
 
   //Getters and setters or properties as required
}
class Customer :  Object {
   GUID customerId;
   String name, email
   Array < Address > addresses;
 
   Customer(id, name, email) {
       this.name = name;
       this.email = email;
       this.customerId = id
       this.addresses = new Array < Address > ()
   }
   //Getters and setters or properties as required
}
  
Customer newCustomer = new Customer(new GUID(),
   "Sally Smith", "sallyport@piratesrule.com")
  
Address home = new Address(62, 'Swallows Lane', 'Freeport', 'home')
newCustomer.addresses.push(home)

要将这个客户对象存储在关系数据库管理系统 (RDBMS) 后再在规定位置检索所有客户，我们需要下列代码或类似内容：

//Connect
RDBMSClient rdbms = new RDBMSClient(CONNECTION_STRING)
rdbms.setAutoCommit(false);
 
 
// Add a customer
 
insertAddressSQL = "INSERT INTO Address (number,street,town,type,customerId) values(?,?,?,?,?)"
preparedSQL = rdbms.prepareStatement(insertAddressSQL)
for (Address address of newCustomer.addresses) {
   preparedSQL.setInt(1, address.number)
   preparedSQL.setString(2, address.street)
   preparedSQL.setString(3, address.town)
   preparedSQL.setString(4, address.type)
   preparedSQL.setObject(5, customer.customerId)
   preparedStatement.executeUpdate()
}
 
insertCustomerSQL = "INSERT INTO Customer (name,email,customerId) values(?,?,?)"
preparedSQL = rdbms.prepareStatement(insertCustomerSQL)
preparedSQL.setString(1, customer.name)
preparedSQL.setString(2, customer.email)
preparedSQL.setObject(3, customer.customerId)
preparedStatement.executeUpdate()
rdbms.commit()
 
 
//Find all the customers with an address in freeport
 
 
freeportQuery = "SELECT ct.*, ads.* FROM address ad
INNER JOIN address ads ON ad.customerId=ads.customerId AND ad.town=?
INNER JOIN customer ct ON ct.customerId = ad.customerId"
 
preparedSQL = rdbms.prepareStatement(freeportQuery)
preparedSQL.setString(1, 'Freeport')
ResultSet rs = preparedSQL.executeQuery()
String CustomerId = ""
Customer customer; 
 
//Convert rows back to objects
 
while (rs.next()) {
   //New CustomerID value
   if rs.getObject('CustomerId').toString != Customerid) {
       if (customerId != "") { print(customer.email) }
       customer = new Customer(rs.getString("ct.name"),   
                               rs.getString('ct.email'), 
                               rd.getObject('CustomerId')
   
  }
   customer.addresses.push(new Address(rs.getInteger('ads.number'), 
                           rs.getString("ads.street"),
                           rs.getString('ads.town'),         
                           rs.getString("ads.type")))
}
if (customerId != "") { print(customer.email) }

这个代码冗长且会随着对象所在字段深度或数量的增加而愈发复杂，而添加新字段则需要执行大量相关更改。

相比之下，有了文档数据库后，可供使用的代码如下所示，向对象添加新字段或深度时也不必更改数据库交互：

//Connect
mongodb = new MongoClient(CONNECTION_STRING)
customers = mongodb.getDatabase("shop").getCollection("customers",Customer.class)
 
//Add Sally with her addresses
customers.insertOne(newCustomer)
 
//Find all the customers with an address in freeport
FreeportCustomer = new Customer()
FreeportCustomer.set("addresses.town") = "Freeport"
 
FindIterable < Customer > freeportCustomers = customers.find(freeportCustomer)
for (Customer customer : freeportCustomers) {
   print(customer.email) //These have the addresses populated too
}

开发人员遇到编程模型（对象）和存储模型（行）之间断开连接的情况时，也能快速创建出同事和未来的自己看不到的抽象层。能够自动在对象和表格之间来回转换的代码称为对象关系映射 (ORM)。遗憾的是，ORM 往往使用特定语言，将开发团队与该语言绑定，使得将其他工具和技术用于该数据的难度增大。

要执行更复杂操作时，即便使用了 ORM，也避免不了 SQL 的负担。此外，由于基础数据库不会识别对象，ORM 通常无法在数据库存储和处理环节提供足够效率。

类似 MongoDB 的文档数据库会保留开发人员已经熟悉的对象，因此，不需要类似 ORM 的抽象层。此外，只要学会使用其中一个语言版本的 MongoDB，使用其他语言版本也就不在话下。因此，再也不必为了使用伪英文 SQL 查询而将对象移回。

PostgreSQL 和 Oracle 的确支持 JSON 数据类型，但是靠 JSON 来摆脱 SQL 并不可行。RDBMS 中的 JSON 适用于非托管、非结构化数据，是使用了糟糕的附加查询语法、经过美化的字符串类型。JSON 不适合数据库结构。因此，实际的文档数据库才能满足需求。

减少特定工作负载所需的硬件

现代文档数据库的内部结构非常类似 RDBMS。标准化关系模型中的架构要求所有请求都得到公平对待，而与其不同的是，文档数据库会牺牲其他工作负载来优化特定工作负载的架构。文档模型不仅会将相关行都放在同一个关系模型中，也会将可能要用于特定任务的全部数据放在同一个位置，以这种方式将按索引组织的表格和聚集索引提高到一个新的水平。此处体现的理念是，若拥有一阶数组类型，则关系的重复子属性就不需要放在单独的表（以及类似存储）中。或者，换句话说，可以拥有列类型的“嵌入式表”。

这种所谓的协同位置，即，对弱实体表的隐式联接能够降低从存储区检索数据的成本，因为通常只需读取单一缓存或磁盘位置就可以将对象传回客户端或为其应用筛选条件。

与前述做法不同的是，另一种做法需要识别、找出并读取许多行才能传回相同数据和必要的客户端硬件，从而利用这些行来重构对象。后面这种做法的成本非常高，以至于开发人员会优先考虑让次要且更简单的键值存储（而不是主要数据库）充当缓存。

这些开发人员知道主要数据库无法独自以合理方式满足工作负载需求。文档数据库不需要提前配备外部缓存就能够达成性能目标，但仍能够执行 RDBMS 的全部任务，且效率更高。

效率提升了多少？我按步骤打造了测试装置来确定与使用标准关系数据库相比，使用文档数据库可以提升的效率和节省的成本。在这些测试中，我想要量化为打造同类最佳云托管 RDBMS 与云托管文档数据库 (尤其是 MongoDB Atlas)，每美元的事务吞吐量分别为几何。

我选择的用例代表了常见、真实的应用程序，其中的数据集会定期更新，且读取频率更高：以已发布的数据为基础实施英国的汽车检测 (MOT) 系统，及其公共界面和私有界面。

测试结果显示，在 MongoDB Atlas 中创建、更新和读取操作的速度大大提高。总体而言，在实例成本类似的类似指定服务器实例上，MongoDB Atlas 每秒管理的事务大约多 50%。而关系结构越复杂，这个差值就越大，导致联接的成本也越高。除了基础的实例成本外，在这些测试中，因为利用磁盘的额外费用，关系数据库的每小时运行成本是 Atlas 成本的 200% 到 500% 不等。利用 Atlas 托管系统的成本整体低 3 到 5 倍，非常适合达成特定性能目标。简单地说，Atlas 每美元可推进的事务多得多。

独立测试也证实了文档模型的高效率。总部设于瑞士的软件公司 Temenos 受到全球大型银行和金融机构的青睐，在执行基准测试方面，其拥有超过 15 年的经验。在其最近的测试中，该公司通过 MongoDB Atlas 达到 74,000 的事务处理速率 (TPS)。

这次的测试实现的每核心吞吐量比三年前类似测试的吞吐量高 4 倍，与此同时，使用的基础设施减少 20%。执行这个测试所使用的是生产级基准架构，搭载能够反映生产系统的配置，如高可用性、安全性和专用链接等非功能性要求。

在此测试期间，MongoDB 读取了 74,000 TPS，响应时间为 1 毫秒，此外还引入了另外的 24,000 TPS。此外，由于 Temenos 也使用了文档数据库，过程中没有缓存。所有查询都是直接参照数据库运行。

总结

除非想要在组织中配置供全部应用程序使用的单一数据库，否则建议将工作负载从关系模型移到文档模型。因为这样能够让组织用更少时间、相同数量的开发人员打造出更多数据库，以及大幅减少将数据库投入生产的成本。您所在组织不太可能还没开始使用面向对象的编程。那么，为何您还不开始试试面向对象的文档数据库呢？可注册阿里云版MongoDB https://free.aliyun.com/pipCode=mongodb&utm_content=m_1000371601 进行试用。

文档数据库能手 John Page 在加入 MongoDB 前，已在完整堆栈文档数据库技术领域拥有 18 年的经验。目前，他也参与机器人构建，但属于玩票性质。他也测试数据库并撰写相关文章，获得的酬劳用于支撑机器人项目。阅读 John Page 的更多文章→

← Previous

文档数据库比 RDBMS 快吗？实操体验告诉你

我效力于全球领先的文档数据库公司 MongoDB。文档数据库和关系数据库有很多相似之处，如强类型数据、ACID 事务、富查询、更新和聚合功能，以及索引和 B 树等。文档模型数据库与关系数据库之间的真正不同之处体现在于，它能够在存储和数据建模层一些表格是“碎片”内容嵌入其他表中。这就有点像 RDBMS 中按照索引组织的表格，但对于特定的工作负载，它提供了更大的优化范围。此外，它还能够非常轻松从 Java、C# 和其他现代语言保存对象。我认为 MongoDB 非常适合电子商务或物流等高吞吐量联机事务处理 (OLTP) 工作负载，既可在其中读写数据，也可将其看作信息源。我还认为（下面将详细说明），对于给定读取/写入/更新工作负载，不论是在单位成本执行的事务数量，还是在每开发小时所开发的功能数量，MongoDB 的效率远远超过 RDBMS。我没把握的是 MongoDB 的效率比 RDBMS 具体高多少（平心而论，我也没有可复验的证据来验证我的观点）。因此，我花了些时间测量相关值。本文简短介绍了我执行的测试及结果，以及可从何处取得代码来亲自测试。在比较时，我尽力避免参照另一个数据库对一个数据库进行专家式过度调优，因为大部分数据库性能测试都是如此操作。我非常熟悉 MongoDB，但是我会尽量克制自己过度“卖弄”。另一方面，我的确尽我所能地兼顾 PostgreSQL 和 MySQL。我比较熟悉 MySQL。我还使用了这三款数据库的托管版本，因为我希望能够使用由运行这些服务的专家所安装和配置的版本。我的目标是能够在公平的环境中执行公正的测试。当然还得请大家根据我在下方记录的全部数据、测试和调优选项，判断我是否做到客观公平。我评定了哪些工作负载我选择模拟英国政府的汽车检测系统。公众可以查询汽车最近的检验以及汽车修理厂，在车辆年检通过或未通过时输入数据。我选择这个的原因是英国政府发布了实际数据、数据的关系架构和查询，以及数据格式指南。代码测试的内容包括：检索汽车的最新测试详情、添加新汽车、未能通过汽车测试，以及修改现有结果以更正行驶里程。不同测试所用的比率也不同。我加载了 2021 年的全部数据，刚刚超过 4000 万个测试结果。数据库和客户端托管选择我的测试全部都在主要云提供商环境中完成。我使用的是与数据库位于同一个区域的托管 Unix 服务器，为各案例执行测试应用程序。我使用 MongoDB Atlas 来执行 MongoDB 副本集，也使用了该云提供商的托管 MySQL 和 PostgreSQL 产品组合。我尽量保留相同的实例规格，为保证透明度，我选择同时显示规格和每小时定价。代码我用的代码是多线程 Java。我将 MongoDB Java 驱动程序用于 Atlas，将 JDBC 用于 MySQL 和 Postgres。值得注意的是，MongoDB 只需要 20% 的代码行就能够读取和写入数据库，因为不需要在一个事务中执行多次插入就能够保留和检索对象，也不需要从多行重建对象。可检索对象并将其从 MongoDB 转换为 JSON 的代码如下。 public String getMOTResultInJSON(String identifier) { long identifierLong; try { identifierLong = Long.valueOf(identifier); Bson byIdQuery = Filters.eq("vehicleid", identifierLong); testObj = testresults.find(byIdQuery).limit(1).first(); if (testObj != null) { return testObj.toJson(); } } catch (Exception e) { logger.error(e.getLocalizedMessage()); } return "{ }"; // Not found } 相对于适用于 RDBMS 的代码，适合只有一个嵌套层的对象。 //Query From https://data.dft.gov.uk/anonymised-mot-test/MOT_user_guide_v4.docx private final String getlatestByVehicleSQL = "select " + "tr.*, " + "ft.FUEL_TYPE, " + "tt.TESTTYPE AS TYPENAME, " + "to2.RESULT, " + "ti.*, " + "fl.*, " + "tid.MINORITEM,tid.RFRDESC,tid.RFRLOCMARKER,tid.RFRINSPMANDESC,tid.RFRADVISORYTEXT,tid.TSTITMSETSECID, " + "b.ITEMNAME AS LEVEL1, " + "c.ITEMNAME AS LEVEL2, " + "d.ITEMNAME AS LEVEL3, " + "e.ITEMNAME AS LEVEL4, " + "f.ITEMNAME AS LEVEL5 " + "from TESTRESULT tr " + "LEFT JOIN TESTITEM ti on ti.TESTID = tr.TESTID " + "LEFT JOIN FUEL_TYPES ft on ft.TYPECODE = tr.FUELTYPE " + "LEFT JOIN TEST_TYPES tt on tt.TYPECODE = tr.TESTTYPE " + "LEFT JOIN TEST_OUTCOME to2 on to2.RESULTCODE = tr.TESTRESULT " + "LEFT JOIN FAILURE_LOCATION fl on ti.LOCATIONID = fl.FAILURELOCATIONID " + "LEFT JOIN TESTITEM_DETAIL AS tid ON ti.RFRID = tid.RFRID AND tid.TESTCLASSID = tr.TESTCLASSID " + "LEFT JOIN TESTITEM_GROUP AS b ON tid.TSTITMID = b.TSTITMID AND tid.TESTCLASSID = b.TESTCLASSID " + "LEFT JOIN TESTITEM_GROUP AS c ON b.PARENTID = c.TSTITMID AND b.TESTCLASSID = c.TESTCLASSID " + "LEFT JOIN TESTITEM_GROUP AS d ON c.PARENTID = d.TSTITMID AND c.TESTCLASSID = d.TESTCLASSID " + "LEFT JOIN TESTITEM_GROUP AS e ON d.PARENTID = e.TSTITMID AND d.TESTCLASSID = e.TESTCLASSID " + "LEFT JOIN TESTITEM_GROUP AS f ON e.PARENTID = f.TSTITMID AND e.TESTCLASSID = f.TESTCLASSID " + "WHERE tr.TESTID = (SELECT TESTID FROM TESTRESULT WHERE VEHICLEID=? LIMIT 1)"; public String getMOTResultInJSON(String identifier) { long identifierLong; jsonObj = new JSONObject(); // Check we aren't a new thread - if we are we need a new conneciton. try { identifierLong = Long.valueOf(identifier); //Pick a prepared statement from out list of readers randomly PreparedStatement getTestStmt = readConnections.get(ThreadLocalRandom.current().nextInt(0, readConnections.size())); getTestStmt.setLong(1, identifierLong); ResultSet testResult = getTestStmt.executeQuery(); ResultSetMetaData metaData = testResult.getMetaData(); // Create JSON from a set of Rows String[] topFieldNames = { "TESTID", "VEHICLEID", "TESTTYPE", "TESTRESULT", "TESTDATE", "TESTCLASSID", "TYPENAME","TESTMILEAGE", "POSTCODEREGION", "MAKE", "MODEL", "COLOUR", "FUELTYPE", "FUEL_TYPE", "CYLCPCTY","FIRSTUSEDATE","RESULT" }; String[] itemFieldNames = { "RFRID", "RFRTYPE", "DMARK", "LOCATIONID", "LAT", "LONGITUDINAL", "VERTICAL","MINORITEM", "RFRDESC","RFRLOCMARKER", "RFRINSPMANDESC", "RFRADVISORYTEXT", "LEVEL1", "LEVEL2", "LEVEL3", "LEVEL4", "LEVEL5" }; boolean firstRow = true; JSONArray itemsJSON = new JSONArray(); while (testResult.next()) { ; JSONObject itemJSON = new JSONObject(); for (int col = 1; col <= metaData.getColumnCount(); col++) { String label = metaData.getColumnLabel(col); if (firstRow && Arrays.asList(topFieldNames) .contains(label.toUpperCase())) { Object val = testResult.getObject(col); jsonObj.put(label.toLowerCase(), val); } // All Rows add to the Items array - this is a simple JSON structure // Wiith just one top level array of objects if (Arrays.asList(itemFieldNames).contains(label.toUpperCase())) { Object val = testResult.getObject(col); itemJSON.put(label.toLowerCase(), val); } } /* If our item isnt blank add it to the items JSONArray */ if (itemJSON.optInt("rfrid", -1) != -1) { itemsJSON.put(itemJSON); } firstRow = false; } jsonObj.put("testitems", itemsJSON); testResult.close(); return jsonObj.toString(); } catch (Exception e) { e.printStackTrace(); logger.error(e.toString()); } return jsonObj.toString(); } 可以访问我的 GitHub 存储库中的代码。该存储库提供下载及清理数据所需的全部说明。 (它里面有几处错误，例如奇数重复键。它还缺失显式 NULL 值，因此 PostgreSQL 所报告的 CSV 中缺失值是字符串。)将数据加载到 PostgreSQL 或 MySQL 中，再将其转换为对象，最后再从 PostgreSQL 或 MySQL 加载到 MongoDB 中。由于测试用具拥有可从 RDBMS 创建对象的代码，因此，将数据加载到 MongoDB 的最简单方式是从 MySQL 将其读取为对象，再将这些对象写入 Atlas。结果执行第一个测试时，针对 Atlas 和云提供商使用了推荐的最小“生产”设置。生产指的是包括适用于灾难恢复 (DR) 和读取扩展之副本的最低等级，依赖的是专用计算，而不是可突发计算，以确保性能可预测。MongoDB Atlas 拥有 3 个数据库实例并搭载 2 个 vCPU 核心和 8GB RAM，而 MySQL 和 Postgres 则拥有 2 个实例并搭载 2 个 vCPU 核心和 16GB RAM。数据是一样的。对于读取/插入/更新工作负载，要求执行之线程的比率为 85:15:5；而只读工作负载的比率为 100:0:0。总体来看，在小型服务器上，我使用了 100 个线程；而通过大型数据库服务器完成测试时使用了 300 个。我测试了两种情况，即，只从primary/写入程序实例（单一服务器）读取但由其他实例提供 DR/HA 容错功能；以及，将读取分发到secondary/读取程序实例（包括读取副本）。在小型数据库设置中，MongoDB 拥有两个次要节点（允许的最小值），而 MySQL 和 Postgres 只拥有一个节点（允许的最小值）。MongoDB Atlas 三节点配置的每小时总价仍然稍低于云提供商的双节点解决方案。混合读取/写入/更新工作负载 - 单一服务器（小型） IOPS 定价的重要提示：对于 RDBMS，磁盘读取和写入收费为每 100 万次 I/O 操作 0.22 美元。这几个测试下来，可以得出使用计算实例的成本每小时可增加 50% 到 250% 不等。MongoDB Atlas 定价包括所有磁盘成本。只读工作负载 - 单一服务器（小型）混合读取/写入/更新工作负载 - 包括读取副本（小型）只读工作负载 - 包括读取副本（小型）执行第二个测试时，针对 Atlas 和云提供商使用了较大“生产”设置。该设置包括适用于 DR 和读取扩展的 3 个副本，以及足够 RAM 来确保 RDBMS 能够让工作集保留在缓存中。RDBMS 和 MongoDB Atlas 都拥有 3 个数据库实例，以及 4 个 CPU 和 32GB RAM。数据与上次测试一样。比率相同的情况下，线程的总数量从 100 个增加到 300 个。混合读取/写入/更新工作负载 - 单一服务器（大型）只读工作负载 - 单一服务器（大型）混合读取/写入/更新工作负载 - 包括副本（大型）只读工作负载 - 包括副本（大型）注意事项与结论结果的差异大到令人咋舌。看起来，MySQL 处理大量并行线程的能力尤其低下。背后原因很可能是服务器调整问题。添加写入和更新后，与只读相比，MySQL 和 MongoDB 的每线程读取量都相对下降。在读取方面，MongoDB Atlas 比 PostgreSQL 快 50–100%，比 MySQL 快得多。这个结果证实了我的预判。此外，每笔事务也实惠许多，要知道全部 I/O 成本都已纳入 Atlas 中，而云提供商 RDBMS 则产生了额外、非常大额 IOPS 成本。 MongoDB 会压缩磁盘上的数据，但是在缓存中不会经过压缩。也就是说，与 RDBMS 不同，Atlas 永远无法保留 RAM 格式的 45 GB 数据，且其速度与 I/O 息息相关。我没有使用 $lookup（相当于 MongoDB 的 LEFT JOIN），因此总体并不复杂。文档包含来自小型域表格的 200+ 个字符串说明，主要介绍了失败项。在这种情况下，最好是 $lookup 查询中的这些说明，而不是将其存储在数据中。此时，一股脑嵌入所有内容没用，了解如何充分利用文档模型才能带来丰厚回报。我可以扩展测试来演示这个效果，因为让数据缓存应当能够进一步优化速度。 PostgreSQL 和 MongoDB 在新数据插入速率上的表现相当，MongoDB 快 5–10%，但有一个例外，因此很可能需要重新测试或进一步调查。在 MongoDB 中更新操作比 PostgreSQL 慢二到四倍，但仍比 MySQL 快。这个结果在意料之内，因为我们在大得多的文档中更改单一值，因此在写出更改方面，MongoDB 的 I/O 比 PostgreSQL 多。前面提到使用 $lookup 的建议能够将文档大小从平均 1260 字节减少到平均 895 字节，显著提高读取和写入的速度。仍有疑问？试试复刻我的代码后再执行几个测试。真实的 OLTP 会输入的内容包括多张表格、复杂的更新和并行，而不是对随机数据的键/值检索，后者的推出往往是为了比较数据库模型。可注册阿里云版MongoDB https://free.aliyun.com/pipCode=mongodb&utm_content=m_1000371601 进行试用。

January 8, 2024

Next →

Agentic Supplier Management with MongoDB Atlas, Voyage AI, and Multi-Modal Search

Retail supply chains are not a back-office logistics function; they are a high-stakes, board-level concern. Imagine learning suddenly that shipment rerouting surcharges have doubled due to new regional escalations; the impact on competitive differentiation and consumer trust is immediate. As a result, a long-standing focus on linear efficiency and lean inventory is being disrupted by a mandate for resilience and AI-driven responsiveness. To survive, retailers must move beyond the rigidity of legacy systems and embrace an AI-ready data platform that can pivot as fast as headlines change. Indeed, a 2026 study by KPMG reported that businesses are establishing new performance metrics, centered around post-disruption recovery time, supplier diversification, sourcing agility, revenue growth from improved experiences, cost savings, and employee engagement. Now, retailers are modernizing their supplier management capabilities. An effective supplier management application that boosts visibility, builds resilience, and delivers material business benefits must be underpinned by unified supplier data and AI copilots. To unlock these next-generation capabilities, retail leaders use MongoDB as a unified data foundation, enabling the high-velocity intelligence and material results required in today’s volatile landscape. However, the business agility of many organizations remains restricted by their enterprise resource planning (ERP) systems, which were designed for an era when stability was assumed, laborious data access was the norm, and delays due to batch processing were acceptable. These legacy foundations have become an operational bottleneck and a strategic threat that prevents real-time responsiveness to external shocks. The speed of supply chain decision-making is hard-capped by the difficulty of getting fast, accurate answers from supplier information buried in legacy systems, PDFs, spreadsheets, and email chains. These systems fail because they are not able to force incompatible data profiles into a one-size-fits-all table structure. Any multi-modal data, such as images and PDFs, is not queryable. By the time a supplier manager has gathered the data required to make a decision, hours, if not days, have passed. Benefits of supplier management modernization The opportunity for retailers that move decisively to modernize is measured in both profitability and market share. IDC predicts that 70% of large retailers will invest in data modernization to unlock better insights and resilience by 2027. To achieve true resilience, retailers must decouple supplier management from the ERP core and deliver a high-impact capability for the business. MongoDB facilitates low-latency data access, geospatial data, and multi-modal AI-assisted discovery that can deliver a world-class supplier management capability. By creating a dedicated application with MongoDB as its consolidated operational data layer, retailers gain the flexibility to handle modern complexities without the legacy overhead. Imagine a geopolitical escalation has triggered a 50% tariff on aluminium imports from South Korea from midnight tonight. The external event propagates its way into your modernized system, triggering a real-time identification of your impacted suppliers. The business assesses this impact and decides whether to seek alternatives. Instead of typing in a specific supplier attribute, they describe the need: "Alternative dairy partner in a tariff-neutral zone." The system scans thousands of supplier profiles and digitized contracts stored as high-dimensional vectors. Within seconds, it identifies a mid-sized supplier that hasn't been used in two years. The business delves deeper into the supplier details and decides they are a suitable alternative. The risk has been mitigated; the disruption avoided. Breaking free from the pitfalls inherent within legacy systems has ensured the business remains operationally agile in the face of external change. Figure1. An Agentic Supplier Management solution, with multi-modal search, powered by MongoDB. Agentic Supplier Management Blog - Image 1 media Operational flexibility for supplier attributes Suppliers are complex entities with varied and evolving attributes. A textile supplier in Vietnam will have very specific data requirements when compared with a packaging partner in Poland. New requirements will emerge over time, like the need to track a custom "Tariff Exposure Rating" or "Sustainability Score" for 500 suppliers in a specific region. Business users will expect a modern application to add those fields instantly to the relevant supplier profiles without taking the system offline or rewriting the schema. MongoDB’s flexible data model allows different supplier data attributes to be stored inside a single collection of suppliers. This polymorphic capability allows data to evolve at the same pace as global trade policy, without impacting core operations. Sourcing agility with semantic discovery When a primary supplier is sidelined by a localized lockdown or a shipping bottleneck, the clock starts ticking. Traditionally, finding an alternative meant a manual, frantic search through spreadsheets. In a modern system, business users will expect semantic search capabilities, low-latency experiences, and intelligent, AI-powered assistance. MongoDB provides multi-modal intelligence with Voyage AI, a specialized retrieval layer for AI applications that provides API-based embedding models and re-rankers. It enables unstructured data like documents and images to be defined as high-dimensional vectors, all stored right beside standard operational data in the same MongoDB platform. When a supplier in a disrupted region fails, MongoDB Vector Search can instantly identify alternative suppliers across your global network who have the most similar attributes. Think product attributes, lead times, and sustainability credentials. Because semantic search is based on mathematical "closeness" rather than exact keyword matches, it can surface a high-potential partner in a different region that your team might have otherwise overlooked. This transforms searching from a reactive, manual scramble into a proactive, intelligent capability Real-time, low-latency visibility In 2026, visibility is no longer a luxury; it is the heartbeat of operational survival. Most retailers are paralyzed by disconnected systems that trap critical data points in isolated silos, leaving decision-makers to act on data that is difficult to access or out-of-date. In a disruption scenario, this disconnect is fatal. Unifying supply chain data into a single, coherent layer is the only way to ensure that customer promises are grounded in current reality. Through MongoDB Change Streams, the data platform acts as a high-speed nervous system, propagating updates from legacy cores to a modernized supplier application with near-zero latency. Because MongoDB does not require a rigid, pre-defined structure for every incoming piece of data, you can instantly ingest a flow of data directly into your supplier profiles. This immediacy fundamentally changes the dynamic of an impending crisis: instead of managing the aftermath of an external issue over an extended period, the business can address the impact in minutes. Decision-making shifts from reactive guesswork to high-confidence execution, allowing businesses to reroute shipments or trigger alternative sourcing before the disruption reaches the bottom line. The foundation of resilience By leveraging MongoDB’s AI-ready data platform to modernize supplier management, retailers will achieve business outcomes that were previously impossible. When supply chain disruption inevitably occurs, the business can be empowered with AI-driven impact assessment, semantic discovery of alternative supplier options, and multi-modal data access, combining to mitigate risk and maintain consumer confidence. Figure 2. An AI-driven Supplier Management workflow with MongoDB. Agentic Supplier Management Blog - Image 2 media Market data from Congruence shows that 72% of leading retailers are investing in AI-integrated platforms, including supply chain. While the 2026 macroenvironment generates supply chain issues that result in manual struggles and customer frustration, competitors will use MongoDB to treat their supplier management agility as a dynamic engine for resilience and value. Our recommendation is simple: start your migration to a flexible, AI-ready data platform now, or prepare to be outmaneuvered by competitors that are already moving on. Agentic Supplier Management Blog - Aside aside References KPMG (2026), Key trends impacting supply chains in 2026 IDC (2025), IDC FutureScape: Worldwide Retail 2026 Predictions Congruence Market Insights (2025), Next-Gen Retail Technology Market Report: Growth Drivers, Market Dynamics & Future Potential (2026–2033)

June 3, 2026