Docs Menu

Docs HomeMongoDB Atlas

Generate Synthetic Data

On this page

  • Prerequisites
  • Procedure

You can generate synthetic data that aligns to your real data's schema. Synthetic data is useful for testing and development environments.

This tutorial connects to an Atlas cluster and generates synthetic data using Node.js and faker.js. To learn more, see the Faker JS docs.

To create synthetic data by using the steps in this tutorial, you must:

  • Create an Atlas cluster to load data into. To learn more, see Create a Database Deployment.

  • Install Node, NPM, and the MongoDB Node.js Driver.

  • Install faker.js:

    npm install --save-dev @faker-js/faker

Follow these steps to generate synthetic data in your cluster:

1
  1. Run the following command to create and navigate to the directory for the app:

    mkdir syntheticdata
    cd syntheticdata
  2. Run the following command to initialize your project and link it to npm.

    npm init

    Press Enter to accept all default values except for entry point: (index.js). When the terminal returns entry point: (index.js), enter this text and press Enter:

    myapp.js

    Continue to accept all default values and type Yes when prompted.

  3. Run the following command to install express, a web application framework:

    npm install express --save
  4. In the directory that you created, create a file named myapp.js.

2

In the myapp.js file, add the following code. Replace the following placeholder values with your values and save the contents of the file:

  • <YOUR-ATLAS-URI>: the connection string for your Atlas cluster. To learn how to find your connection string, see Find Your MongoDB Atlas Connection String.

  • <DATABASE-NAME>: Name of the database to create in Atlas.

  • <COLLECTION-NAME>: Name of the collection to create in Atlas.

// require the necessary libraries
const { faker } = require("@faker-js/faker");
const MongoClient = require("mongodb").MongoClient;
function randomIntFromInterval(min, max) { // min and max included
return Math.floor(Math.random() * (max - min + 1) + min);
}
async function seedDB() {
// Connection URL
const uri = "<YOUR-ATLAS-URI>";
const client = new MongoClient(uri);
try {
await client.connect();
console.log("Connected correctly to server");
const collection = client.db("<DATABASE-NAME>").collection("<COLLECTION-NAME>");
// make a bunch of time series data
let timeSeriesData = [];
for (let i = 0; i < 5000; i++) {
const firstName = faker.person.firstName();
const lastName = faker.person.lastName();
let newDay = {
timestamp_day: faker.date.past(),
cat: faker.lorem.word(),
owner: {
email: faker.internet.email({firstName, lastName}),
firstName,
lastName,
},
events: [],
};
for (let j = 0; j < randomIntFromInterval(1, 6); j++) {
let newEvent = {
timestamp_event: faker.date.past(),
weight: randomIntFromInterval(14,16),
}
newDay.events.push(newEvent);
}
timeSeriesData.push(newDay);
}
await collection.insertMany(timeSeriesData);
console.log("Database seeded with synthetic data! :)");
} catch (err) {
console.log(err.stack);
}
}
seedDB();

For example, your code may include the following lines that specify a database named synthetic-data-db and a collection named synthetic-data-collection:

const collection = client.db("synthetic-data-db").collection("synthetic-data-collection");

This code creates a time-series collection about cats, adds the following fields to each document, and populates the fields with synthetic data from faker.js:

  • timestamp_day

  • cat

  • owner.email

  • owner.firstName

  • owner.lastName

  • events

You can replace the fields and values in the code with fields and values that align with your data. To learn more about available fields in faker.js, see the Faker API Reference.

3

Run the following code in the terminal to run your app:

node myapp.js

The app generates 5,000 documents that reflect the data pattern in myapp.js.

After you run this code, you can press CTRL + C to exit the running application.

4

To find the documents that the app creates, navigate to the new collection in the Atlas UI:

  1. Click Database in the left navigation.

  2. Click the name of the cluster to which you connected the app.

  3. Click Collections.

  4. Expand the name of the database you created and click the name of the collection you created. Your synthetic data displays.

←  Sample Weather DatasetCreate and Connect to Database Deployments →