Basic MongoDB Operations in Python
Rate this quickstart
Like Python? Want to get started with MongoDB? Welcome to this quick start guide! I'll show you how to set up an Atlas database with some sample data to explore. Then you'll create some data and learn how to read, update and delete it.
You'll need the following installed on your computer to follow along with this tutorial:
- An up-to-date version of Python 3. I wrote the code in this tutorial in Python 3.8, but it should run fine in version 3.6+.
- A code editor of your choice. I recommend either PyCharm or the free VS Code with the official Python extension.
Now you've got your local environment set up, it's time to create a MongoDB database to work with, and to load in some sample data you can explore and modify.
You could create a database on your development machine, but it's easier to get started on the Atlas hosted service without having to learn how to configure a MongoDB cluster.
Get started with an M0 cluster on Atlas today. It's free forever, and it's the easiest way to try out the steps in this blog series.
You'll need to create a new cluster and load it with sample data. My awesome colleague Maxime Beugnet has created a video tutorial to help you out.
If you don't want to watch the video, the steps are:
- Click "Get started free".
- Enter your details and accept the Terms of Service.
- Create a Starter cluster.
- Select the same cloud provider you're used to, or just leave it as-is. Pick a region that makes sense for you.
- You can change the name of the cluster if you like. I've called mine "PythonQuickstart".
It will take a couple of minutes for your cluster to be provisioned, so while you're waiting you can move on to the next step.
You should set up a Python virtualenv which will contain the libraries you install during this quick start. There are several different ways to set up virtualenvs, but to simplify things we'll use the one included with Python. First, create a directory to hold your code and your virtualenv. Open your terminal,
cd
to that directory and then run the following command:1 # Note: 2 # On Debian & Ubuntu systems you'll first need to install virtualenv with: 3 # sudo apt install python3-venv 4 python3 -m venv venv
The command above will create a virtualenv in a directory called
venv
. To activate the new virtualenv, run one of the following commands, according to your system:1 # Run the following on OSX & Linux: 2 source venv/bin/activate 3 4 # Run the following on Windows: 5 .\\venv\\Scripts\\activate
To write Python programs that connect to your MongoDB database (don't worry - you'll set that up in a moment!) you'll need to install a Python driver - a library which knows how to talk to MongoDB. In Python, you have two choices! The recommended driver is PyMongo - that's what I'll cover in this quick start. If you want to write asyncio programs with MongoDB, however, you'll need to use a library called Motor, which is also fully supported by MongoDB.
To install PyMongo, run the following command:
1 python -m pip install pymongo[srv]==3.10.1
For this tutorial we'll also make use of a library called
python-dotenv
to load configuration, so run the command below as well to install that:1 python -m pip install python-dotenv==0.13.0
Hopefully, your MongoDB cluster should have finished starting up now and has probably been running for a few minutes.
The following instructions were correct at the time of writing, but may change, as we're always improving the Atlas user interface:
In the Atlas web interface, you should see a green button at the bottom-left of the screen, saying "Get Started". If you click on it, it'll bring up a checklist of steps for getting your database set up. Click on each of the items in the list (including the optional "Load Sample Data" item), and it'll help you through the steps to get set up.
Following the "Get Started" steps, create a user with "Read and write access to any database". You can give it a username and password of your choice - take a copy of them, you'll need them in a minute. Use the "autogenerate secure password" button to ensure you have a long random password which is also safe to paste into your connection string later.
When deploying an app with sensitive data, you should only allow the IP address of the servers which need to connect to your database. To allow the IP address of your development machine, select "Network Access", click the "Add IP Address" button and then click "Add Current IP Address" and hit "Confirm".
The last step of the "Get Started" checklist is "Connect to your Cluster". Select "Connect your application" and select "Python" with a version of "3.6 or later".
Ensure Step 2 has "Connection String only" highlighted, and press the "Copy" button to copy the URL to your pasteboard. Save it to the same place you stored your username and password. Note that the URL has
<password>
as a placeholder for your password. You should paste your password in here, replacing the whole placeholder including the '<' and '>' characters.Now it's time to actually write some Python code to connect to your MongoDB database!
In your code editor, create a Python file in your project directory called
basic_operations.py
. Enter in the following code:1 import datetime # This will be needed later 2 import os 3 4 from dotenv import load_dotenv 5 from pymongo import MongoClient 6 7 # Load config from a .env file: 8 load_dotenv() 9 MONGODB_URI = os.environ['MONGODB_URI'] 10 11 # Connect to your MongoDB cluster: 12 client = MongoClient(MONGODB_URI) 13 14 # List all the databases in the cluster: 15 for db_info in client.list_database_names(): 16 print(db_info)
In order to run this, you'll need to set the MONGODB_URI environment variable to the connection string you obtained above. You can do this two ways. You can:
- Run an
export
(orset
on Windows) command to set the environment variable each time you set up your session. - Save the URI in a configuration file which should never be added to revision control.
I'm going to show you how to take the second approach. Remember it's very important not to accidentally publish your credentials to git or anywhere else, so add
.env
to your .gitignore
file if you're using git. The python-dotenv
library loads configuration from a file in the current directory called .env
. Create a .env
file in the same directory as your code and paste in the configuration below, replacing the placeholder URI with your own MongoDB URI.1 # Unix: 2 export MONGODB_URI='mongodb+srv://yourusername:yourpasswordgoeshere@pythonquickstart-123ab.mongodb.net/test?retryWrites=true&w=majority'
The URI contains your username and password (so keep it safe!) and the hostname of a DNS server which will provide information to PyMongo about your cluster. Once PyMongo has retrieved the details of your cluster, it will connect to the primary MongoDB server and start making queries.
Now if you run the Python script you should see output similar to the following:
1 $ python basic_operations.py 2 sample_airbnb 3 sample_analytics 4 sample_geospatial 5 sample_mflix 6 sample_supplies 7 sample_training 8 sample_weatherdata 9 twitter_analytics 10 admin 11 local
You just connected your Python program to MongoDB and listed the databases in your cluster! If you don't see this list then you may not have successfully loaded sample data into your cluster; You may want to go back a couple of steps until running this command shows the list above.
In the code above, you used the
list_database_names
method to list the database names in the cluster. The MongoClient
instance can also be used as a mapping (like a dict
) to get a reference to a specific database. Here's some code to have a look at the collections inside the sample_mflix
database. Paste it at the end of your Python file:1 # Get a reference to the 'sample_mflix' database: 2 db = client['sample_mflix'] 3 4 # List all the collections in 'sample_mflix': 5 collections = db.list_collection_names() 6 for collection in collections: 7 print(collection)
Running this piece of code should output the following:
1 $ python basic_operations.py 2 movies 3 sessions 4 comments 5 users 6 theaters
A database also behaves as a mapping of collections inside that database. A collection is a bucket of documents, in the same way as a table contains rows in a traditional relational database. The following code looks up a single document in the
movies
collection:1 # Import the `pprint` function to print nested data: 2 from pprint import pprint 3 4 # Get a reference to the 'movies' collection: 5 movies = db['movies'] 6 7 # Get the document with the title 'Blacksmith Scene': 8 pprint(movies.find_one({'title': 'Blacksmith Scene'}))
When you run the code above it will look up a document called "Blacksmith Scene" in the 'movies' collection. It looks a bit like this:
1 {'_id': ObjectId('573a1390f29313caabcd4135'), 2 'awards': {'nominations': 0, 'text': '1 win.', 'wins': 1}, 3 'cast': ['Charles Kayser', 'John Ott'], 4 'countries': ['USA'], 5 'directors': ['William K.L. Dickson'], 6 'fullplot': 'A stationary camera looks at a large anvil with a blacksmith ' 7 'behind it and one on either side. The smith in the middle draws ' 8 'a heated metal rod from the fire, places it on the anvil, and ' 9 'all three begin a rhythmic hammering. After several blows, the ' 10 'metal goes back in the fire. One smith pulls out a bottle of ' 11 'beer, and they each take a swig. Then, out comes the glowing ' 12 'metal and the hammering resumes.', 13 'genres': ['Short'], 14 'imdb': {'id': 5, 'rating': 6.2, 'votes': 1189}, 15 'lastupdated': '2015-08-26 00:03:50.133000000', 16 'num_mflix_comments': 1, 17 'plot': 'Three men hammer on an anvil and pass a bottle of beer around.', 18 'rated': 'UNRATED', 19 'released': datetime.datetime(1893, 5, 9, 0, 0), 20 'runtime': 1, 21 'title': 'Blacksmith Scene', 22 'tomatoes': {'lastUpdated': datetime.datetime(2015, 6, 28, 18, 34, 9), 23 'viewer': {'meter': 32, 'numReviews': 184, 'rating': 3.0}}, 24 'type': 'movie', 25 'year': 1893}
It's a one-minute movie filmed in 1893 - it's like a YouTube video from nearly 130 years ago! The data above is a single document. It stores data in fields that can be accessed by name, and you should be able to see that the
title
field contains the same value as we looked up in our call to find_one
in the code above. The structure of every document in a collection can be different from each other, but it's usually recommended to follow the same or similar structure for all the documents in a single collection.MongoDB is often described as a JSON database, but there's evidence in the document above that it doesn't store JSON. A MongoDB document consists of data stored as all the types that JSON can store, including booleans, integers, floats, strings, arrays, and objects (we call them subdocuments). However, if you look at the
_id
and released
fields, these are types that JSON cannot store. In fact, MongoDB stores data in a binary format called BSON, which also includes the ObjectId
type as well as native types for decimal numbers, binary data, and timestamps (which are converted by PyMongo to Python's native datetime
type.)The
movies
collection contains a lot of data - 23539 documents, but it only contains movies up until 2015. One of my favourite movies, the Oscar-winning "Parasite", was released in 2019, so it's not in the database! You can fix this glaring omission with the code below:1 # Insert a document for the movie 'Parasite': 2 insert_result = movies.insert_one({ 3 "title": "Parasite", 4 "year": 2020, 5 "plot": "A poor family, the Kims, con their way into becoming the servants of a rich family, the Parks. " 6 "But their easy life gets complicated when their deception is threatened with exposure.", 7 "released": datetime(2020, 2, 7, 0, 0, 0), 8 }) 9 10 # Save the inserted_id of the document you just created: 11 parasite_id = insert_result.inserted_id 12 print("_id of inserted document: {parasite_id}".format(parasite_id=parasite_id))
If you're inserting more than one document in one go, it can be much more efficient to use the
insert_many
method, which takes an array of documents to be inserted. (If you're just loading documents into your database from stored JSON files, then you should take a look at mongoimportRunning the code above will insert the document into the collection and print out its ID, which is useful, but not much to look at. You can retrieve the document to prove that it was inserted, with the following code:
1 import bson # <- Put this line near the start of the file if you prefer. 2 3 # Look up the document you just created in the collection: 4 print(movies.find_one({'_id': bson.ObjectId(parasite_id)}))
The code above will look up a single document that matches the query (in this case it's looking up a specific
_id
). If you want to look up all the documents that match a query, you should use the find
method, which returns a Cursor
. A Cursor will load data in batches, so if you attempt to query all the data in your collection, it will start to yield documents immediately - it doesn't load the whole Collection into memory on your computer! You can loop through the documents returned in a Cursor with a for
loop. The following query should print one or more documents - if you've run your script a few times you will have inserted one document for this movie each time you ran your script! (Don't worry about cleaning them up - I'll show you how to do that in a moment.)1 # Look up the documents you've created in the collection: 2 for doc in movies.find({"title": "Parasite"}): 3 pprint(doc)
Many methods in PyMongo, including the find methods, expect a MongoDB query as input. MongoDB queries, unlike SQL, are provided as data structures, not as a string. The simplest kind of matches look like the ones above:
{ 'key': 'value' }
where documents containing the field specified by the key
are returned if the provided value
is the same as that document's value for the key
. MongoDB's query language is rich and powerful, providing the ability to match on different criteria across multiple fields. The query below matches all movies produced before 1920 with 'Romance' as one of the genre values:1 { 2 'year': { 3 '$lt': 1920 4 }, 5 'genres': 'Romance' 6 }
Even more complex queries and aggregations are possible with MongoDB Aggregations, accessed with PyMongo's
aggregate
method - but that's a topic for a later quick start post.I made a terrible mistake! The document you've been inserting for Parasite has an error. Although Parasite was released in 2020 it's actually a 2019 movie. Fortunately for us, MongoDB allows you to update documents in the collection. In fact, the ability to atomically update parts of a document without having to update a whole new document is a key feature of MongoDB!
Here's some code which will look up the document you've inserted and update the
year
field to 2019:1 # Update the document with the correct year: 2 update_result = movies.update_one({ '_id': parasite_id }, { 3 '$set': {"year": 2019} 4 }) 5 6 # Print out the updated record to make sure it's correct: 7 pprint(movies.find_one({'_id': ObjectId(parasite_id)}))
As mentioned above, you've probably inserted many documents for this movie now, so it may be more appropriate to look them all up and change their
year
value in one go. The code for that looks like this:1 # Update *all* the Parasite movie docs to the correct year: 2 update_result = movies.update_many({"title": "Parasite"}, {"$set": {"year": 2019}})
Now it's time to clean up after yourself! The following code will delete all the matching documents from the collection - using the same broad query as before - all documents with a
title
of "Parasite":1 movies.delete_many( 2 {"title": "Parasite",} 3 )
Once again, PyMongo has an equivalent
delete_one
method which will only delete the first matching document the database finds, instead of deleting all matching documents.Did you enjoy this quick start guide? Want to learn more? We have a great MongoDB University course I think you'll love!
If that's not for you, we have lots of other courses covering all aspects of hosting and developing with MongoDB.
This quick start has only covered a small part of PyMongo and MongoDB's functionality, although I'll be covering more in later Python quick starts! Fortunately, in the meantime the documentation for MongoDB and using Python with MongoDB is really good. I recommend bookmarking the following for your reading pleasure:
- PyMongo Documentation provides thorough documentation describing how to use PyMongo with your MongoDB cluster, including comprehensive reference documentation on the
Collection
class that has been used extensively in this quick start. - MongoDB Query Document documentation details the full power available for querying MongoDB collections.
Related
Tutorial
How to Implement Databricks Workflows and Atlas Vector Search for Enhanced Ecommerce Search Accuracy
Sep 18, 2024 | 6 min read
Tutorial
Part #2: Create Your Model Endpoint With Amazon SageMaker, AWS Lambda, and AWS API Gateway
Sep 18, 2024 | 7 min read