Generate Synthetic Data
On this page
You can generate synthetic data that aligns to your real data's schema. Synthetic data is useful for testing and development environments.
This tutorial connects to an Atlas cluster and generates synthetic data using Node.js and faker.js. To learn more, see the Faker JS docs.
Prerequisites
To create synthetic data by using the steps in this tutorial, you must:
Create an Atlas cluster to load data into. To learn more, see Create a Cluster.
Install faker.js:
npm install --save-dev @faker-js/faker
Procedure
Follow these steps to generate synthetic data in your cluster:
Create your Node.js app.
Run the following command to create and navigate to the directory for the app:
mkdir syntheticdata cd syntheticdata Run the following command to initialize your project and link it to
npm
.npm init Press
Enter
to accept all default values except forentry point: (index.js)
. When the terminal returnsentry point: (index.js)
, enter this text and pressEnter
:myapp.js Continue to accept all default values and type
Yes
when prompted.Run the following command to install
express
, a web application framework:npm install express --save In the directory that you created, create a file named
myapp.js
.
Add the code to generate synthetic data.
In the myapp.js
file, add the following code. Replace the
following placeholder values with your values and save the
contents of the file:
<YOUR-ATLAS-URI>
: the connection string for your Atlas cluster. To learn how to find your connection string, see Find Your MongoDB Atlas Connection String.<DATABASE-NAME>
: Name of the database to create in Atlas.<COLLECTION-NAME>
: Name of the collection to create in Atlas.
// require the necessary libraries const { faker } = require("@faker-js/faker"); const MongoClient = require("mongodb").MongoClient; function randomIntFromInterval(min, max) { // min and max included return Math.floor(Math.random() * (max - min + 1) + min); } async function seedDB() { // Connection URL const uri = "<YOUR-ATLAS-URI>"; const client = new MongoClient(uri); try { await client.connect(); console.log("Connected correctly to server"); const collection = client.db("<DATABASE-NAME>").collection("<COLLECTION-NAME>"); // make a bunch of time series data let timeSeriesData = []; for (let i = 0; i < 5000; i++) { const firstName = faker.person.firstName(); const lastName = faker.person.lastName(); let newDay = { timestamp_day: faker.date.past(), cat: faker.lorem.word(), owner: { email: faker.internet.email({firstName, lastName}), firstName, lastName, }, events: [], }; for (let j = 0; j < randomIntFromInterval(1, 6); j++) { let newEvent = { timestamp_event: faker.date.past(), weight: randomIntFromInterval(14,16), } newDay.events.push(newEvent); } timeSeriesData.push(newDay); } await collection.insertMany(timeSeriesData); console.log("Database seeded with synthetic data! :)"); } catch (err) { console.log(err.stack); } } seedDB();
For example, your code may include the following lines that
specify a database named synthetic-data-db
and
a collection named synthetic-data-collection
:
const collection = client.db("synthetic-data-db").collection("synthetic-data-collection");
This code creates a time-series collection about cats, adds the following fields to each document, and populates the fields with synthetic data from faker.js:
timestamp_day
cat
owner.email
owner.firstName
owner.lastName
events
You can replace the fields and values in the code with fields and values that align with your data. To learn more about available fields in faker.js, see the Faker API Reference.
In Atlas, go to the Clusters page for your project.
If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
If the Clusters page is not already displayed, click Database in the sidebar.
The Clusters page displays.
Go to the Collections page.
Click the Browse Collections button for your cluster.
The Data Explorer displays.