Join us at MongoDB.local London on 7 May to unlock new possibilities for your data. Use WEB50 to save 50%.
Register now >
Docs Menu
Docs Home
/
MongoDB Meta Documentation
/

Writing Meaningful Tests

This guide describes the considerations for writing tests that prove your code example does what the docs claim. For a scannable summary of the Expect API, wildcard patterns, cleanup hooks, and common mistakes, see the Grove Testing Cheat Sheet.

Note

Most code examples in this guide use JavaScript syntax. The concepts apply to all languages.

A test can pass and still prove nothing. The goal is not "the code ran without errors." The goal is "the code did what the docs say it does."

Before you write a test, identify what the example teaches the reader. Your test should verify that specific teaching point.

Every code example exists to demonstrate a behavior:

  • A find() example with a filter demonstrates filtering.

  • An updateOne() example demonstrates modifying a document.

  • An aggregate() example demonstrates a pipeline stage.

Your test proves that the behavior works as shown.

  • A weak test proves the command ran:

    The example runs db.movies.updateOne(), and the test checks that the output is { acknowledged: true, matchedCount: 1, ... }.

    This example proves only that MongoDB accepted the command. If the $set syntax were wrong in a way that silently failed, this test would still pass.

  • A strong test proves the teaching point:

    The example runs updateOne followed by findOne to verify the change. The test checks that findOne returns { title: "The Godfather", rated: "PG" }.

    This test proves the update took effect, which is what the reader learns from the example.

Levels of test strength exist on a spectrum:

Level
What It Proves
Example

Acknowledgment test

The server accepted the command.

Checking acknowledged: true or a returned index name

Output test

The operation produced expected results

Checking that a query returns specific documents

Behavior test

The operation had the intended effect

Running a follow-up query to verify a write, or checking that an index enforces a constraint

Aim for output tests at minimum. Use behavior tests when the example teaches a write operation or a side effect such as index creation or schema validation.

Acknowledgment-level tests are acceptable when the acknowledgment is the teaching point. For instance, if the example covers write concerns and the docs specifically discuss the acknowledgment format.

Over-permissive schema validation. A schema that asserts only a document count (and no required fields or field values) allows almost any output to pass validation, including results that have nothing to do with what the example teaches:

// Weak - proves 5 documents came back,
// but nothing about content
Expect.that(result)
.shouldResemble("output.json")
.withSchema({ count: 5 });
// Better - proves documents contain the fields
// the example queries
Expect.that(result)
.shouldResemble("output.json")
.withSchema({
count: 5,
requiredFields: ["title", "year", "genres"],
fieldValues: { year: 2012 }
});

Ignoring all the relevant fields. If you ignore every dynamic field, the assertion stops verifying anything meaningful:

// Weak - ignores the fields the example is about
Expect.that(result)
.withIgnoredFields(
"_id", "title", "rated", "year"
)
.shouldMatch("output.json");
// Better - only ignore truly dynamic fields,
// verify the rest
Expect.that(result)
.withIgnoredFields("_id")
.shouldMatch("output.json");

Testing the wrong output. If your example inserts data and then queries it, test the query result, not the insert acknowledgment:

// Weak - tests the insert, not the query
// that is the point of the example
const insertResult = await loadData();
Expect.that(insertResult)
.shouldMatch("insert-output.json");
// Better - tests the query the example is
// actually teaching
const queryResult = await runPipeline();
Expect.that(queryResult)
.shouldMatch("pipeline-output.json");

Before you call your test done, verify the following information:

  • The test catches a silent failure in the core operation. For example, if your update set the wrong field, the test fails.

  • The assertion matches what the docs claim. For example, if the docs say "returns documents sorted by year," the test verifies the sort order using withOrderedSort().

  • The test is specific enough that a completely different query would not pass. For example, if the example teaches how to filter by a specific field value, the test checks that the result contains the correct documents.

The most common mistake is jumping straight to the test file. The example file is what readers see. Write it first, get the code correct, then build everything else around the code.

Your workflow:

  1. Write the example file (the working code that readers see).

  2. Run it to see what it produces.

  3. Capture that output as your expected output file.

  4. Write the test to automate what you did manually.

If you skip the manual test, you spend time debugging test failures that are actually wrong expected output.

You do not always need to run the example manually. The test infrastructure can capture output for you.

The test harness runs your example as a subprocess and captures everything mongosh prints to stdout. Use the following steps to generate your expected output file:

  1. Write the example file and a minimal test. Point shouldMatch at an output file that does not exist yet, or create a placeholder with incorrect content.

  2. Run the test. It fails, and the error message shows the actual output.

  3. Copy the actual value into your expected output file.

  4. Replace dynamic values such as _id and timestamps with "...".

  5. Rerun the test. It passes.

This approach is often faster than connecting to mongosh separately, especially when you need to set up a specific database state first.

For the language-based driver suites (JavaScript, Python, Go, Java, and C#), your example is a function that returns a value. The test calls the function and passes the return value to Expect. When the comparison fails, the error shows the actual value the same way. You can use the same "write a failing test, copy the actual output" workflow as the preceding mongosh steps.

Running manually is useful when you want to explore or understand what an operation produces before you commit to a test structure. How you run the example depends on the language:

Language
How to Run

mongosh

Paste into a mongosh shell session

JavaScript

Run the function in a Node script or REPL

Python

Run the function in a Python script or REPL

Go

Call the function from a main() method and run go run

Java

Call the method from a main() method and run the program in your preferred IDE

C#

Call the method from Program.cs and run dotnet run

Every test falls into one of the following scenarios, depending on whether you're using sample data or creating custom data.

Use custom data when:

  • The docs page includes custom sample data as part of the example.

  • The example needs a specific schema or dataset.

Use sample data when:

  • You document a query pattern against a standard Atlas sample dataset (such as sample_mflix or sample_restaurants).

  • The docs page already references a sample dataset.

  • You do not need exact output, only the right shape.

Identify which scenario applies to save time.

Your example queries sample data and does not change anything. This is the simplest case. No need to reverse any changes.

// No need for any cleanup
it("finds movies by year", async () => {
const result = await findMoviesByYear();
Expect.that(result)
.shouldResemble(
"movies-by-year-output.json"
)
.withSchema({
count: 8,
requiredFields: [
"title", "genres", "year"
],
});
});

Considerations:

  • Use your language's sample data utility so the test skips gracefully if the dataset is not loaded. See Add Tests for Examples for details.

  • Use shouldMatch if the output is stable and predictable.

  • Use shouldResemble with withSchema if you care only about shape. See the Choose Between shouldMatch and shouldResemble section for details.

Your example modifies documents in a sample database. You must undo exactly what the example did. Think of it as a "reverse transaction."

// After each test, restore the original value
afterEach(async () => {
const db = client.db("sample_mflix");
await db.collection("movies").updateOne(
{ title: "The Godfather" },
{ $set: { rated: "R" } }
);
});
it("updates a movie rating", async () => {
const result = await updateMovieRating();
Expect.that(result)
.withIgnoredFields("upsertedId")
.shouldMatch("update-rating-output.json");
});

Considerations:

You need to check the sample data before writing your cleanup to save the original values. Query the document, write down the fields your example changes, and restore those exact values in teardown.

Your example does not use sample data. It creates its own database with its own collections and data. This is the cleanest pattern. You control all the data, so you can drop the whole database after the test.

// After each test, drop the custom database
afterEach(async () => {
await client.db("iot_db").dropDatabase();
});
it("computes average temperature", async () => {
const result = await computeAvgTemperature();
Expect.that(result)
.shouldMatch("avg-temperature-output.json");
});

Use the following descriptions to guide this decision.

  • shouldMatch = "The output must look like this."

  • shouldResemble = "The output must have this shape."

The following decision tree can help you choose between the two methods:

Decision tree for choosing between shouldMatch and shouldResemble

Use shouldMatch when:

  • The output is predictable and stable across runs.

  • You want readers to see realistic output in the docs.

  • The exact values matter and are guaranteed to be consistent, such as when demonstrating an update result.

Expect.that(result)
.shouldMatch("update-output.json");

Use shouldResemble when:

  • The output order or exact values can vary across environments, such as with vector search.

  • You care only that the right number of documents came back containing the right fields.

Expect.that(result)
.shouldResemble("find-output.json")
.withSchema({
count: 8,
requiredFields: [
"title", "genres", "year"
],
});

Start with shouldMatch. If the test is flaky (passes sometimes, fails sometimes), switch to shouldResemble.

Note

Queries With Limit and No Sort

A query that uses limit without an explicit sort can return different documents each run because MongoDB may sample a different subset. This commonly causes shouldMatch tests to pass locally but fail during the PR check. Either add an explicit sort to stabilize the results, or switch to shouldResemble.

The expected output file is where most debugging time goes. The following sections describe principles you can apply to your expected output files.

Never hand-write expected output from scratch. Always run the example first and copy the real output. Then, edit it to replace dynamic values. You can get the real output either by running the example by hand or by running a deliberately-failing test and copying the actual value from the error message. For details, see Capturing Output.

Both handle dynamic values, but they work differently.

  • withIgnoredFields("_id"): The field must exist in both actual and expected output, but the value comparison is skipped. Use this approach when the field is part of your example's story and you want readers to see it in the expected output:

    // Test ignores the value comparison:
    Expect.that(result)
    .withIgnoredFields("_id")
    .shouldMatch("output.json");
  • Ellipsis in the output file: You can also use ellipsis patterns directly in the output file instead of withIgnoredFields(). The comparison engine supports three forms:

    • "..." as a field value means "this field exists but skip the value comparison." Use this for fields with dynamic values such as _id or timestamps:

      { "_id": "...", "title": "The Godfather" }
    • Standalone ... as its own line in the output file enables global ellipsis mode. This allows extra fields in the actual output that are not listed in the expected file. Use this when you want to verify specific fields but do not care what else the document contains:

      {
      "title": "The Godfather",
      ...
      }
    • "...": "..." as a key-value pair is an object wildcard that matches any object. Use this when a nested object can have any structure:

      {
      "title": "The Godfather",
      "metadata": { "...": "..." }
      }

Either approach (withIgnoredFields or ellipsis patterns) works. Use whichever reads more clearly in context.

Value Type
Strategy

_id (ObjectId)

Use "..." or withIgnoredFields("_id")

Timestamps and dates

Use "..." or withIgnoredFields("updatedAt")

Counts, booleans, enums

Hard-code. These are stable.

String fields from sample data

Hard-code if they are core to the example.

Long text fields

Truncate by using an ellipsis, for example: "A young man..."

Fields you do not care about

Use standalone ... to allow extras.

For a quick reference of every supported wildcard pattern, see Grove Testing Cheat Sheet.

  • Format must match your language's output. mongosh uses single quotes and unquoted keys ({ title: 'Argo' }). Driver suites typically produce standard JSON ({ "title": "Argo" }). Match what your example returns.

    Tip

    MongoDB Shell Consideration

    mongosh output uses single quotes, no trailing commas, and unquoted object keys. Your expected output file must match this style exactly.

  • Array wrappers. Multi-document results from find() are wrapped in [...]. Single-document results from findOne() are not. Match what the driver or shell returns.

A test is idempotent if it produces the same result every time you run it (first run, tenth run, after another test, in CI on a machine you have never touched). If your test only passes under specific conditions, it's not a real test, it's a coincidence.

Almost every non-obvious test failure traces back to broken idempotency.

Symptom
Root Cause

Passes first time, fails second time.

The example changed data and cleanup did not fully revert it.

Fails in CI but passes locally.

Your local database has state left over from a previous manual run.

Passes alone, fails when run with other tests.

Another test left behind data or indexes that changed your query results.

Returns different document order each run.

The query plan or result set is non-deterministic. Common causes include an index created by a prior test changing the plan, a query with limit and no sort sampling a different subset, or approximate queries such as vector search returning different results each run.

Fails locally with topology or "not supported" errors.

Your local MongoDB is a standalone instance. Some features (such as transactions and change streams) require a replica set.

Run your test twice in a row:

npm test -- -t "my test name" && \
npm test -- -t "my test name"

If the test does not pass both times, something in your test (usually cleanup) leaves the database in a different state than it started in.

Anything your example does that persists after the test ends causes non-idempotency. For example:

  • Modified documents: A field value differs from before.

  • Inserted documents: New documents appear in the collection.

  • Deleted documents: Expected documents are missing.

  • Created indexes: Query plans change, output order shifts.

  • Created collections or views: A createCollection call fails with "already exists" on the second run.

  • Changed configuration: Validation rules, collation settings.

If your example does any of these and you do not undo them, the next run starts from a different state and produces different results.

Every persistent side effect needs a matching undo. The goal is not "tidy up after yourself" in the abstract. The goal is to ensure that the next run sees the same database state as the first run.

This section describes how to write cleanup that runs reliably and leaves the database in a consistent state for subsequent tests.

If your test throws an error, the teardown hook still runs, but only if you set up your cleanup before the failure.

In driver suites, this is usually natural because cleanup lives in a framework hook (afterEach, tearDown, defer, @AfterEach, or [TearDown] depending on the framework) that registers before the test body runs. For the exact syntax per framework, see Grove Testing Cheat Sheet.

In mongosh, where cleanup is sometimes a per-test function, register it before calling Expect:

mongosh Example
// mongosh pattern: register cleanup first
test("my test", async () => {
// 1. Register cleanup
currentCleanup = myCleanupFunction;
// 2. Run the test
await Expect.outputFromExampleFiles(...)
});

Your cleanup might run when the database is in various states: the example succeeded, failed partway through, or did not run at all. Make each cleanup operation safe to run regardless of the database state:

Node.js Driver Example
// Node.js driver: each operation
// tolerates "not found"
afterEach(async () => {
const db = client.db("sample_mflix");
try {
await db.collection("users")
.dropIndex("email_1");
} catch (e) {
// Index may not exist if test failed
// before creating it
}
await db.collection("users")
.deleteMany({ testDoc: true });
});
PyMongo Driver Example
# PyMongo: same idea
def tearDown(self):
db = self.client["sample_mflix"]
try:
db.users.drop_index("email_1")
except Exception:
pass
db.users.delete_many({"testDoc": True})

Your cleanup does not serve only your test. It protects every test that runs after yours. Consider the following:

  • If another test queries sample_mflix.movies, does it find unexpected documents that you inserted?

  • If another test creates an index on users.email, does it fail because you left a duplicate index behind?

  • If another test reads a specific document, does it see the original values or your modified ones?

Your Example Operation
Your Cleanup Should

Updates a document

Restore original field values

Inserts documents

Delete the inserted documents

Deletes documents

Re-insert them (capture in setup first)

Creates an index

Drop the index (wrap in try/catch)

Creates a collection or view

Drop it

Creates a custom database

Call dropDatabase() (only for custom databases)

Sometimes your example needs to set up data before querying it. How you handle this depends on the suite.

Driver examples are functions, so you can include setup and query logic in the same function. You can use Bluehawk's :remove: tags to exclude setup from the extracted snippet if it does not need to be included in the documentation:

Node.js Driver Example
// :snippet-start: avg-temperature
export async function avgTemperature() {
const client = new MongoClient(
process.env.CONNECTION_STRING
);
try {
const db = client.db("iot_db");
// :remove-start:
await db.collection("sensors").insertMany([
{ sensorId: "s1", temperature: 22.5 },
{ sensorId: "s2", temperature: 25.3 },
{ sensorId: "s3", temperature: 18.7 }
]);
// :remove-end:
const result = await db
.collection("sensors")
.aggregate([
{
$group: {
_id: null,
avgTemp: { $avg: "$temperature" }
}
}
]).toArray();
return result;
} finally {
await client.close();
}
}
// :snippet-end:
  • Option A: Separate load and query files. The test runs them in sequence. Use this approach when both steps need to appear in the documentation:

    await Expect
    .outputFromExampleFiles([
    "topic/load-data.js",
    "topic/run-pipeline.js"
    ])
    .withDbName("iot_db")
    .shouldMatch("topic/output.sh");
  • Option B: Comma operator in a single file. The harness captures only the last expression's result:

    // :snippet-start: update-and-verify
    (
    db.movies.updateOne(
    { title: "The Godfather" },
    { $set: { rated: "PG-13" } }
    ),
    db.movies.findOne(
    { title: "The Godfather" },
    { title: 1, rated: 1 }
    )
    )
    // :snippet-end:

    Note

    Comma Operator is MongoDB Shell-Specific

    The comma operator pattern applies only to mongosh. Driver suites do not need it because they run multiple operations in a single function.

Index tests are uniquely tricky because indexes persist across test runs and change query behavior.

Your test creates an index. The test passes. You run it again, and now the index already exists. Either:

  • The createIndex call returns a different result ("index already exists" instead of "created").

  • Or, a subsequent query uses the index and returns results in a different order.

Clean up indexes in both setup and teardown:

Node.js Driver Example
// Node.js driver: beforeEach + afterEach
const cleanupIndexes = async () => {
try {
await collection.dropIndex("email_1");
} catch (e) { /* may not exist */ }
};
beforeEach(cleanupIndexes);
afterEach(cleanupIndexes);
PyMongo Driver Example
# PyMongo: setUp + tearDown
def _cleanup_indexes(self):
try:
self.collection.drop_index("email_1")
except Exception:
pass
def setUp(self):
self._cleanup_indexes()
def tearDown(self):
self._cleanup_indexes()

Setup handles the case where a previous test run crashed before teardown ran. Teardown handles the normal case so the next test starts clean.

When a test fails, work through this checklist.

The comparison engine produces detailed diffs. Look for the following:

  • "Expected N documents but got M": Your query returned a different number of results. Run the example by hand to see why.

  • "Field X: expected Y but got Z": A specific value did not match. Check if your cleanup restores the original value.

  • "Missing field X": The document structure does not match your expected output. Check the query projection.

Run the example outside the test framework. Compare the actual output to your expected output file. This usually reveals the mismatch immediately.

Run your test twice in a row. If the test passes once and fails the second time, your test is not idempotent. Cleanup is incomplete. Something your example changed is not reverted. See Idempotency for a list of things that might persist between runs.

If your output order changed, an index might affect the query plan. Check what indexes exist on the collection.

When you get a syntax error in mongosh, the error message shows the generated temp file content. This reveals how the harness wrapped your code. Look for mismatched parentheses or unexpected printjson() wrapping.

The Grove testing infrastructure does not support all MongoDB features and environments. Consider the following limitations when you write tests:

  • Upcoming server versions: You cannot test examples that require features available only in unreleased MongoDB server versions. Tests run against the server versions available in CI.

  • Sharding: Tests run against a replica set, not a sharded cluster. Examples that demonstrate sharding features such as shard keys, chunk migration, or sh helper commands cannot be verified.

  • Stream processing: Examples that demonstrate Atlas Stream Processing cannot be tested in the current infrastructure.

For examples that fall into one of these categories, skip the automated test and note in your PR that the example is not covered by Grove. Include a manual verification step in the PR description so reviewers know the example was exercised before the docs change merges.

Does it prove something:

  • Test verifies the teaching point of the example, not that the code ran.

  • If the core operation silently failed, the test catches it.

  • shouldResemble schemas include requiredFields or fieldValues, not only count.

Does it work:

  • Example code runs successfully outside the test framework.

  • Expected output was captured from a real run, not hand-written.

  • Dynamic values use "..." or withIgnoredFields().

  • Test passes on first run.

  • Test passes on second consecutive run (idempotency check).

Is it clean:

  • Cleanup reverts all changes to sample data.

  • Bluehawk snippet tags produce valid standalone code.

  • node snip.js runs without errors.

For a list of common mistakes to avoid and a glossary of Grove terms, see Grove Testing Cheat Sheet.

Back

Mark Up Examples and Output Files

On this page