Build a RESTful API with Flask, MongoDB, and Python
Mark SmithPublished Jan 14, 2022 • Updated May 12, 2022
Rate this tutorial
This is the first part of a short series of blog posts called "Rewrite it in Rust (RiiR)." It's a tongue-in-cheek title for some posts that will investigate the similarities and differences between the same service written in Python with Flask, and Rust with Actix-Web.
This post will show how I built a RESTful API for a collection of cocktail recipes I just happen to have lying around. The aim is to show an API server with some complexity, so although it's a small example, it will cover important factors such as:
- Data transformation between the database and a JSON representation.
- Data validation.
- Python 3.8 or above
- A MongoDB Atlas cluster. Follow the "Get Started with Atlas" guide to create your account and MongoDB cluster. Keep a note of your database username, password, and connection string as you will need those later.
This is an advanced guide, so it'll cover a whole bunch of different libraries which can be brought together to build a declarative Restful API server on top of MongoDB. I won't cover repeating patterns in the codebase, so if you want to build the whole thing, I recommend checking out the source code, which is all on GitHub.
It won't cover the basics of Python, Flask, or MongoDB, so if that's what you're looking for, I recommend checking out the following resources before tackling this post:
Begin by cloning the sample code source from GitHub. There are four top-level directories:
- actix-cocktail-api: You can ignore this for now.
- data: This contains an export of my cocktail data. You'll import this into your cluster in a moment.
- flask-cocktail-api: The code for this blog post.
- test_scripts: A few shell scripts that use curl to test the HTTP interface of the API server.
There are more details in the GitHub repo, but the basics are: Install the project with your virtualenv active:
Next, you should import the data into your cluster. Set the environment variable
$MONGO_URIto your cluster URI. This environment variable will be used in a moment to import your data, and also by the Flask app. I use
direnvto configure this, and put the following line in my
.envrcfile in my project's directory:
Note that your database must be called "cocktails," and the import will create a collection called "recipes." After checking that
$MONGO_URIis set correctly, run the following command:
Now you should be able to run the Flask app from the
(You can run
make runif you prefer.)
Check the output to ensure it is happy with the configuration, and then in a different terminal window, run the
list_cocktails.shscript in the
test_scriptsdirectory. It should print something like this:
The code is divided into three submodules.
__init__.pycontains all the Flask setup code, and defines all the HTTP routes.
model.pycontains all the Pydantic model definitions.
objectid.pycontains a Pydantic field definition that I stole from the Beanie object-data mapper for MongoDB.
I mentioned earlier that this code makes use of several libraries:
- PyMongo and Flask-PyMongo handle the connection to the database. Flask-PyMongo specifically wraps the database collection object to provide a convenient
- Pydantic manages data validation, and some aspects of data transformation between the database and a JSON representations.
- along with a single function from FastAPI.
When building a robust API, it's important to validate all the data passing into the system. It would be possible to do this using a stack of
if/elsestatements, but it's much more effective to define a schema declaratively, and to allow that to programmatically validate the data being input.
I used a technique that I learned from Beanie, a new and neat ODM that I unfortunately couldn't practically use on this project, because Beanie is async, and Flask is a blocking framework.
Beanie uses Pydantic to define a schema, and adds a custom Field type for ObjectId.
Cocktailschema defines the structure of a
Cocktailinstance, which will be validated by Pydantic when instances are created. It includes another embedded schema for
Ingredient, which is defined in a similar way.
I added convenience functions to export the data in the
Cocktailinstance to either a JSON-compatible
dictor a BSON-compatible
dict. The differences are subtle, but BSON supports native
datetimetypes, for example, whereas when encoding as JSON, it's necessary to encode ObjectId instances in some other way (I prefer a string containing the hex value of the id), and datetime objects are encoded as ISO8601 strings.
to_jsonmethod makes use of a function imported from FastAPI, which recurses through the instance data, encoding all values in a JSON-compatible form. It already handles
datetimeinstances correctly, but to get it to handle ObjectId values, I extracted some custom field code from Beanie, which can be found in
to_bsonmethod doesn't need to pass the
jsonable_encoder. All the types used in the schema can be directly saved with PyMongo. It's important to set
True, so that the key for
_idis just that,
_id, and not the schema's
idwithout an underscore.
This approach is neat for this particular use-case, but I can't help feeling that it would be limiting in a more complex system. There are many patterns for storing data in MongoDB. These often result in storing data in a form that is optimal for writes or reads, but not necessarily the representation you would wish to export in an API.
What is a Slug?
Looking at the schema above, you may have wondered what a "slug" is ... well, apart from a slimy garden pest.
A slug is a unique, URL-safe, mnemonic used for identifying a document. I picked up the terminology as a Django developer, where this term is part of the framework. A slug is usually derived from another field. In this case, the slug is derived from the name of the cocktail, so if a cocktail was called "Rye Whiskey Old-Fashioned," the slug would be "rye-whiskey-old-fashioned."
In this API, that cocktail could be accessed by sending a
GETrequest to the
I've kept the unique
slugfield separate from the auto-assigned
_idfield, but I've provided both because the slug could change if the name of the cocktail was tweaked, in which case the
_idvalue would provide a constant identifier to look up an exact document.
In the Rust version of this code, I was nudged to use a different approach. It's a bit more verbose, but in the end I was convinced that it would be more powerful and flexible as the system grew.
Now I'll show you what a single endpoint looks like, first focusing on the "Create" endpoint, that handles a POST request to
/cocktailsand creates a new document in the "recipes" collection. It then returns the document that was stored, including the newly unique ID that MongoDB assigned as
_id, because this is a RESTful API, and that's what RESTful APIs do.
This endpoint modifies the incoming JSON directly, to add a
date_addeditem with the current time. It then passes it to the constructor for our Pydantic schema. At this point, if the schema failed to validate the data, an exception would be raised and displayed to the user.
After validating the data,
to_bson()is called on the
Cocktailto convert it to a BSON-compatible dict, and this is directly passed to PyMongo's
insert_onemethod. There's no way to get PyMongo to return the document that was just inserted in a single operation (although an upsert using
find_one_and_updateis similar to just that).
After inserting the data, the code then updates the local object with the newly-assigned
idand returns it to the client.
Flask-PyMongo, the endpoint for looking up a single cocktail is even more straightforward:
This endpoint will abort with a 404 if the slug can't be found in the collection. Otherwise, it simply instantiates a Cocktail with the document from the database, and calls
to_jsonto convert it to a dict that Flask will automatically encode correctly as JSON.
This endpoint is a monster, and it's because of pagination, and the links for pagination. In the sample data above, you probably noticed the
_linkssection is specified as part of the HAL (Hypertext Application Language) specification. It's a good idea to follow a standard for pagination data, and I didn't feel like inventing something myself!
And here's the code to generate all this. Don't freak out.
Although there's a lot of code there, it's not as complex as it may first appear. Two requests are made to MongoDB: one for a page-worth of cocktail recipes, and the other for the total number of cocktails in the collection. Various calculations are done to work out how many documents to skip, and how many pages of cocktails there are. Finally, some links are added for "prev" and "next" pages, if appropriate (i.e.: the current page isn't the first or last.) Serialization of the cocktail documents is done in the same way as the previous endpoint, but in a loop this time.
The update and delete endpoints are mainly repetitions of the code I've already included, so I'm not going to include them here. Check them out in the GitHub repo if you want to see how they work.
Nothing irritates me more than using a JSON API which returns HTML when an error occurs, so I was keen to put in some reasonable error handling to avoid this happening.
After Flask set-up code, and before the endpoint definitions, the code registers two error-handlers:
The first error-handler intercepts any endpoint that fails with a 404 status code and ensures that the error is returned as a JSON dict.
The second error-handler intercepts a
DuplicateKeyErrorraised by any endpoint, and does the same thing as the first error-handler, but sets the HTTP status code to "400 Bad Request."
As I was writing this post, I realised that I've missed an error-handler to deal with invalid Cocktail data. I'll leave implementing that as an exercise for the reader! Indeed, this is one of the difficulties with writing robust Python applications: Because exceptions can be raised from deep in your stack of dependencies, it's very difficult to comprehensively predict what exceptions your application may raise in different circumstances.
This is something that's very different in Rust, and even though, as you'll see, error-handling in Rust can be verbose and tricky, I've started to love the language for its insistence on correctness.
When I started writing this post, I though it would end up being relatively straightforward. As I added the requirement that the code should not just be a toy example, some of the inherent difficulties with building a robust API on top of any database became apparent.
In this case, Flask may not have been the right tool for the job. I recently wrote a blog post about building an API with Beanie. Beanie and FastAPI are a match made in heaven for this kind of application and will handle validation, transformation, and pagination with much less code. On top of that, they're self-documenting and can provide the data's schema in open formats, including OpenAPI Spec and JSON Schema!
If you're about to build an API from scratch, I strongly recommend you check them out, and you may enjoy reading Aaron Bassett's posts on the FARM (FastAPI, React, MongoDB) Stack.
I will shortly publish the second post in this series, Build a Cocktail API with Actix-Web, MongoDB, and Rust, and then I'll conclude with a third post, I Rewrote it in Rust—How Did it Go?, where I'll evaluate the strengths and weaknesses of the two experiments.
Thank you for reading. Keep a look out for the upcoming posts!
If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.
How to Set Up HashiCorp Vault KMIP Secrets Engine with MongoDB CSFLE or Queryable Encryption
Nov 14, 2022