Event
{Event}  MongoDB is going on a world tour! Gather your team and head to your nearest MongoDB.local. Learn more >

How to Use Python with MongoDB

Python, the top programming language for data science, and MongoDB, with its flexible and dynamic schema, are a fantastic match for building modern web applications, JSON APIs, and data processors, just to name a few. MongoDB has a native Python driver and a team of engineers dedicated to making sure MongoDB and Python work together flawlessly.

Table of Contents

What is Python?

Python, the Swiss Army knife of today’s dynamically typed languages, has comprehensive support for common data manipulation and processing tasks, which makes it one of the best programming languages for data science and web development. Python’s native dictionary and list data types make it second only to JavaScript for manipulating JSON documents — and well-suited to working with BSON. PyMongo, the standard MongoDB driver library for Python, is easy to use and offers an intuitive API for accessing databases, collections, and documents.

Objects retrieved from MongoDB through PyMongo are compatible with dictionaries and lists, so we can easily manipulate, iterate, and print them.


How MongoDB stores data

MongoDB stores data in JSON-like documents:

# Mongodb document (JSON-style)
document_1 = {
  "_id" : "BF00001CFOOD",
  "item_name" : "Bread",
  "quantity" : 2,
  "ingredients" : "all-purpose flour"
}

Python dictionaries look like:

# python dictionary
dict_1 = {
  "item_name" : "blender",
  "max_discount" : "10%",
  "batch_number" : "RR450020FRG",
  "price" : 340
}

Read on for an overview of how to get started and deliver on the potential of this powerful combination.

Prerequisites

Download and install Python on your machine. To confirm if your installation is right, type python --version in your command line terminal. You should get something similar to:

Python 3.9.12

You can follow the python MongoDB examples in this tutorial even if you are new to Python.

We recommend that you set up a MongoDB Atlas free tier cluster for this tutorial.

Connecting Python and MongoDB Atlas

PyMongo has a set of packages for Python MongoDB interaction. For the following tutorial, start by creating a virtual environment, and activate it.

python -m venv env
source env/bin/activate

Now that you are in your virtual environment, you can install PyMongo. In your terminal, type:

python -m pip install "pymongo[srv]"

Now, we can use PyMongo as a Python MongoDB library in our code with an import statement.


Creating a MongoDB database in Python

The first step to connect Python to Atlas is to create a cluster. You can follow the instructions from the documentation to learn how to create and set up your cluster.

Next, create a file named pymongo_get_database.py in any folder to write PyMongo code. You can use any simple text editor, like Visual Studio Code.

Create the mongodb client by adding the following:

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:pass@cluster.mongodb.net/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()

To create a MongoClient, you will need a connection string to your database. If you are using Atlas, you can follow the steps from the documentation to get that connection string. Use the connection_string to create the mongoclient and get the MongoDB database connection. Change the username, password, and cluster name.

In this python mongodb tutorial, we will create a shopping list and add a few items. For this, we created a database user_shopping_list.

MongoDB doesn’t create a database until you have collections and documents in it. So, let’s create a collection next.


Creating a collection in Python

To create a collection, pass the collection name to the database. In a new file called pymongo_test_insert.py file, add the following code.

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]

This creates a collection named user_1_items in the user_shopping_list database.


Inserting documents in Python

For inserting many documents at once, use the pymongo insert_many() method.

item_1 = {
  "_id" : "U1IT00001",
  "item_name" : "Blender",
  "max_discount" : "10%",
  "batch_number" : "RR450020FRG",
  "price" : 340,
  "category" : "kitchen appliance"
}

item_2 = {
  "_id" : "U1IT00002",
  "item_name" : "Egg",
  "category" : "food",
  "quantity" : 12,
  "price" : 36,
  "item_description" : "brown country eggs"
}
collection_name.insert_many([item_1,item_2])

Let’s insert a third document without specifying the _id field. This time, we add a field of data type ‘date’. To add date using PyMongo, use the Python dateutil package.

Start by installing the package using the following command:


python -m pip install python-dateutil

Add the following to pymongo_test_insert.py:

from dateutil import parser
expiry_date = '2021-07-13T00:00:00.000Z'
expiry = parser.parse(expiry_date)
item_3 = {
  "item_name" : "Bread",
  "quantity" : 2,
  "ingredients" : "all-purpose flour",
  "expiry_date" : expiry
}
collection_name.insert_one(item_3)

We use the insert_one() method to insert a single document.

Open the command line and navigate to the folder where you have saved pymongo_test_insert.py.

Execute the file using the

python pymongo_test_insert.py

command.

Let’s connect to MongoDB Atlas UI and check what we have so far.

Log in to your Atlas cluster and click on the collections button.

On the left side, you can see the database and collection name that we created. If you click on the collection name, you can view the data as well:

view of the database and collection name

view of data on click

The _id field is of ObjectId type by default. If we don’t specify the _id field, MongoDB generates the same. Not all fields present in one document are present in others. But MongoDB doesn’t stop you from entering data — this is the essence of a schemaless database.

If we insert item_3 again, MongoDB will insert a new document, with a new _id value. However, the first two inserts will throw an error because of the _id field, the unique identifier.


Querying in Python

Let’s view all the documents together using find(). For that, we will create a separate file pymongo_test_query.py:

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
 
# Create a new collection
collection_name = dbname["user_1_items"]
 
item_details = collection_name.find()
for item in item_details:
   # This does not give a very readable output
   print(item)

Open the command line and navigate to the folder where you have saved pymongo_test_query.py. Execute the file using the python pymongo_test_query.py command.

We get the list of dictionary object as the output:

dictionary list

We can view the data but the format is not all that great. So, let’s print the item names and their category by replacing the print line with the following:

print(item['item_name'], item['category'])

Although MongoDB gets the entire data, we get a Python ‘KeyError’ on the third document.

Python KeyError

To handle missing data errors in python, use pandas.DataFrames. DataFrames are 2D data structures used for data processing tasks. Pymongo find() method returns dictionary objects which can be converted into a dataframe in a single line of code.

Install pandas library as:

python -m pip install pandas

Now import the pandas library by adding the following line at the top of the file:

from pandas import DataFrame

And replace the code in the loop with the following to handle KeyError in one step:

# convert the dictionary objects to dataframe
items_df = DataFrame(item_details)

# see the magic
print(items_df)

The errors are replaced by NaN and NaT for the missing values.

NaN and NaT for the missing values.

Indexing in Python MongoDB

The number of documents and collections in a real-world database always keeps increasing. It can take a very long time to search for specific documents — for example, documents that have “all-purpose flour” among their ingredients — in a very large collection. Indexes make database search faster and more efficient, and reduce the cost of querying on operations such as sort, count, and match.

MongoDB defines indexes at the collection level.

For the index to make more sense, add more documents to our collection. Insert many documents at once using the insert_many() method. For sample documents, copy the code from github and execute python pymongo_test_insert_more_items.py in your terminal.

Let’s say we want the items that belong to the category ‘food’:

item_details = collection_name.find({"category" : "food"})

To execute the above query, MongoDB has to scan all the documents. To verify this, download Compass. Connect to your cluster using the connection string. Open the collection and go to the Explain Plan tab. In ‘filter’, give the above criteria and view the results:

Query results without index

Note that the query scans 14 documents to get five results.

Let's create a single index on the ‘category’ field. In a new file named pymongo_index.py, add the following code.

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
 
# Create a new collection
collection_name = dbname["user_1_items"]
 
# Create an index on the collection
category_index = collection_name.create_index("category")

Explain the same filter again on Compass UI:

Query results with index

This time, only five documents are scanned because of the category index. We don’t see a significant difference in execution time because of the small number of documents. But we see a huge reduction in the number of documents scanned for the query. Indexes help in performance optimization for aggregations, as well. Aggregations are out of scope for this tutorial, but here’s an overview.

Conclusion

In this Python MongoDB tutorial, we learned the basics of PyMongo and performed simple database operations. As a next step, explore using PyMongo to perform CRUD operations with business data. If you did not work along with this tutorial, start now by installing MongoDB Atlas for free. There is also a course available on that specific topic at MongoDB University.

Ready to get started?

Launch a new cluster or migrate to MongoDB Atlas with zero downtime and connect to Python today.

FAQ

How do you connect MongoDB to Python?

There are three ways to connect MongoDB to Python:

  • PyMongo
    • The native driver for connecting MongoDB and Python. PyMongo has all the libraries to perform database operations from Python code. Since PyMongo is a low-level driver, it is fast and intuitive and provides more control.
  • MongoEngine
    • MongoEngine is a Document Object Mapper. We can define a schema that maps application objects and document data.
  • Djongo
    • We use Djongo for python web applications using the Django framework. It converts existing SQL queries to mongodb query documents.

Learn more about using MongoEngine and Djongo.

Which database is best for Python?

Python works well with different databases. The choice depends on your project requirements. MongoDB, because of its flexible schema and how it maps closely to Python native objects, is a great choice for Python applications. This makes Python and MongoDB, a great choice for doing web development work.

For more information, read NoSQL vs. SQL Databases. There are some native python databases as well but they aren't popular and have very limited capabilities.

Is MongoDB good for Python?

MongoDB stores data in flexible and schema-less JSON-like documents. Python has rich libraries that directly process JSON and BSON data formats. Python integrates well with MongoDB through drivers like PyMongo, MongoEngine etc.

This makes MongoDB good for Python by eliminating rigidity in the database schema.

How does Python store data in MongoDB?

Python stores data in MongoDB through libraries like PyMongo and MongoEngine. For web applications using the Django framework, we can use Djongo.

  • PyMongo: PyMongo is the native python driver for MongoDB database. Since it’s a low-level driver, it’s faster and also a preferred way of connecting Python and MongoDB.
  • MongoEngine: With MongoEngine, we can create a schema (yes, for a schema-less database). MongoEngine follows the ODM approach to map application classes and database documents.
  • Djongo: Djongo is a SQL transpiler. You can migrate existing SQL projects to MongoDB without many changes to the code.

Learn more about using MongoEngine and Djongo.

How do you use MongoDB with Python?

We can connect MongoDB with Python using PyMongo. Pymongo is the native Python driver for MongoDB. It has a syntax similar to MongoShell, so that we can easily correlate and use the right method. For example, insertMany() on MongoShell corresponds to insert_many() in PyMongo. We can also connect Python and MongoDB using MongoEngine and Djongo. But, the preferred approach is to use PyMongo because it’s a low-level driver that is faster and provides more control. To learn more about PyMongo, check our documentation on PyMongo.

How do you get data from MongoDB using Python?

The most efficient and easy method to connect to MongoDB in Python is to use PyMongo. PyMongo is the native Python driver for MongoDB. To connect, we use the command pymongo.MongoClient() with the connection_string as argument. Then, we can use the find() method to get the required documents. Example:

import pymongo

# connect to mongodb from python using pymongo
client = pymongo.MongoClient(CONNECTION_STRING)
# open the database
dbname = client['user_shopping_list']
# get the collection
collection_name = dbname["item_details"]
# get the data from the collection
item_details = collection_name.find()import pymongo

# connect to mongodb from python using pymongo
client = pymongo.MongoClient(CONNECTION_STRING)
# open the database
dbname = client['user_shopping_list']
# get the collection
collection_name = dbname["item_details"]
# get the data from the collection
item_details = collection_name.find()

How do you insert data into MongoDB using Python?

To insert data, connect MongoDB and Python using PyMongo. PyMongo is the native Python driver for MongoDB. Once we connect, we can use PyMongo’s methods like insert_one() and insert_many(). Example:

# Get the mongoclient
client = pymongo.MongoClient(CONNECTION_STRING)

# Get/Create database
dbname = client['user_shopping_list']

# Get/create collection
collection_name = dbname["item_details"]

# Create the document
item_1 = {"item_name": "Bread",...,"category" : "food",
"quantity" : 2}

# Insert one row
collection_name.insert_one(item_1)# Get the mongoclient
client = pymongo.MongoClient(CONNECTION_STRING)

# Get/Create database
dbname = client['user_shopping_list']

# Get/create collection
collection_name = dbname["item_details"]

# Create the document
item_1 = {"item_name": "Bread",...,"category" : "food",
"quantity" : 2}

# Insert one row
collection_name.insert_one(item_1)

How do you create a database in MongoDB using Python?

We use PyMongo driver to create a MongoDB database using Python code. Example:

import pymongo

# Get the mongoclient
client = pymongo.MongoClient(CONNECTION_STRING)

# Get/Create database
dbname = client['user_shopping_list']import pymongo

# Get the mongoclient
client = pymongo.MongoClient(CONNECTION_STRING)

# Get/Create database
dbname = client['user_shopping_list']

Difference between SQL databases and NoSQL databases

SQL databases are also called relational databases and NoSQL (“non SQL” or “not only SQL”) databases are also called non-relational databases. Relational databases are termed relational because it is based on the "relational data model" in mathematics.

SQL databases store data in the form of tables with fixed rows and columns. NoSQL databases comes in many types, for example:

  1. Document type: JSON documents

  2. Key-value: Key-value pairs

  3. Wide-column: Wide-column data store has tables with rows and dynamic columns

Example of SQL based databases are MySQL, Microsoft SQL Server, PostgreSQL, and SQLite. NoSQL database examples are: MongoDB, CouchDB, Redis, DynamoDB etc.

For More detailed difference, please refer SQL vs NoSQL.