Why do these perform so differently?

Will_Calderwood · March 26, 2024, 5:03pm

I have tested two methods of fetching the same data from a MongoDb Atlas hosted database. embeddingOpenAi3Large is an array of 3072 numbers . Method 1 individually fetches all 50 items, if I include the embeddingOpenAi3Large it takes 1400ms seconds, if I don’t, it takes 380ms.

    let startTime = Date.now();
    const promises: Promise<INewsItemWithEmbeddedContent[]>[] = new Array<
      Promise<INewsItemWithEmbeddedContent[]>
    >();
    for (const result of results) {
      promises.push(
        NewsItem.loadItems(
          { _id: result._id },
          { _id: 1, embeddingOpenAi3Large: 1 },
        ),
      );
    }
    await Promise.all(promises);
    let endTime = Date.now();
    console.log(endTime - startTime);

Method 2 adds the 50 ids to an array to fetch all the data as a single query, if I include embeddingOpenAi3Large it takes over 21 seconds, if I don’t it takes 80ms.

    startTime = Date.now();
    for (const result of results) {
      resultIds.push(result._id);
    }
    const embeddingResult: INewsItemWithEmbeddedContent[] =
      await NewsItem.loadItems(
        { _id: { $in: resultIds } },
        { _id: 1, embeddingOpenAi3Large: 1 },
      );
    endTime = Date.now();
    console.log(endTime - startTime);

I don’t understand why downloading all the embeddings in a single query should be so much slower than fetching them individually. Some insights would be appreciated.

loadItems does a lean find with the passed filter and projection.