Build a RESTful API with Flask, MongoDB, and Python
Rate this tutorial
This is the first part of a short series of blog posts called "Rewrite it in Rust (RiiR)." It's a tongue-in-cheek title for some posts that will investigate the similarities and differences between the same service written in Python with Flask, and Rust with Actix-Web.
This post will show how I built a RESTful API for a collection of cocktail recipes I just happen to have lying around. The aim is to show an API server with some complexity, so although it's a small example, it will cover important factors such as:
- Data transformation between the database and a JSON representation.
- Data validation.
- Pagination.
- Error-handling.
- Python 3.8 or above
- A MongoDB Atlas cluster. Follow the "Get Started with Atlas" guide to create your account and MongoDB cluster. Keep a note of your database username, password, and connection string as you will need those later.
This is an advanced guide, so it'll cover a whole bunch of different libraries which can be brought together to build a declarative Restful API server on top of MongoDB. I won't cover repeating patterns in the codebase, so if you want to build the whole thing, I recommend checking out the source code, which is all on GitHub.
It won't cover the basics of Python, Flask, or MongoDB, so if that's what you're looking for, I recommend checking out the following resources before tackling this post:
- actix-cocktail-api: You can ignore this for now.
- data: This contains an export of my cocktail data. You'll import this into your cluster in a moment.
- flask-cocktail-api: The code for this blog post.
- test_scripts: A few shell scripts that use curl to test the HTTP interface of the API server.
There are more details in the GitHub repo, but the basics are: Install the project with your virtualenv active:
Next, you should import the data into your cluster. Set the environment variable
$MONGO_URI
to your cluster URI. This environment variable will be used in a moment to import your data, and also by the Flask app. I use direnv
to configure this, and put the following line in my .envrc
file in my project's directory:Note that your database must be called "cocktails," and the import will create a collection called "recipes." After checking that
$MONGO_URI
is set correctly, run the following command:Now you should be able to run the Flask app from the
flask-cocktail-api
directory:(You can run
make run
if you prefer.)Check the output to ensure it is happy with the configuration, and then in a different terminal window, run the
list_cocktails.sh
script in the test_scripts
directory. It should print something like this:The code is divided into three submodules.
__init__.py
contains all the Flask setup code, and defines all the HTTP routes.model.py
contains all the Pydantic model definitions.objectid.py
contains a Pydantic field definition that I stole from the Beanie object-data mapper for MongoDB.
I mentioned earlier that this code makes use of several libraries:
- PyMongo and Flask-PyMongo handle the connection to the database. Flask-PyMongo specifically wraps the database collection object to provide a convenient
find_one_or_404
method. - Pydantic manages data validation, and some aspects of data transformation between the database and a JSON representations.
When building a robust API, it's important to validate all the data passing into the system. It would be possible to do this using a stack of
if/else
statements, but it's much more effective to define a schema declaratively, and to allow that to programmatically validate the data being input.I used a technique that I learned from Beanie, a new and neat ODM that I unfortunately couldn't practically use on this project, because Beanie is async, and Flask is a blocking framework.
This
Cocktail
schema defines the structure of a Cocktail
instance, which will be validated by Pydantic when instances are created. It includes another embedded schema for Ingredient
, which is defined in a similar way.I added convenience functions to export the data in the
Cocktail
instance to either a JSON-compatible dict
or a BSON-compatible dict
. The differences are subtle, but BSON supports native ObjectId
and datetime
types, for example, whereas when encoding as JSON, it's necessary to encode ObjectId instances in some other way (I prefer a string containing the hex value of the id), and datetime objects are encoded as ISO8601 strings.The
to_json
method makes use of a function imported from FastAPI, which recurses through the instance data, encoding all values in a JSON-compatible form. It already handles datetime
instances correctly, but to get it to handle ObjectId values, I extracted some custom field code from Beanie, which can be found in objectid.py
.The
to_bson
method doesn't need to pass the dict
data through jsonable_encoder
. All the types used in the schema can be directly saved with PyMongo. It's important to set by_alias
to True
, so that the key for _id
is just that, _id
, and not the schema's id
without an underscore.This approach is neat for this particular use-case, but I can't help feeling that it would be limiting in a more complex system. There are many patterns for storing data in MongoDB. These often result in storing data in a form that is optimal for writes or reads, but not necessarily the representation you would wish to export in an API.
What is a Slug?
Looking at the schema above, you may have wondered what a "slug" is ... well, apart from a slimy garden pest.
A slug is a unique, URL-safe, mnemonic used for identifying a document. I picked up the terminology as a Django developer, where this term is part of the framework. A slug is usually derived from another field. In this case, the slug is derived from the name of the cocktail, so if a cocktail was called "Rye Whiskey Old-Fashioned," the slug would be "rye-whiskey-old-fashioned."
In this API, that cocktail could be accessed by sending a
GET
request to the /cocktails/rye-whiskey-old-fashioned
endpoint.I've kept the unique
slug
field separate from the auto-assigned _id
field, but I've provided both because the slug could change if the name of the cocktail was tweaked, in which case the _id
value would provide a constant identifier to look up an exact document.In the Rust version of this code, I was nudged to use a different approach. It's a bit more verbose, but in the end I was convinced that it would be more powerful and flexible as the system grew.
Now I'll show you what a single endpoint looks like, first focusing on the "Create" endpoint, that handles a POST request to
/cocktails
and creates a new document in the "recipes" collection. It then returns the document that was stored, including the newly unique ID that MongoDB assigned as _id
, because this is a RESTful API, and that's what RESTful APIs do.This endpoint modifies the incoming JSON directly, to add a
date_added
item with the current time. It then passes it to the constructor for our Pydantic schema. At this point, if the schema failed to validate the data, an exception would be raised and displayed to the user.After validating the data,
to_bson()
is called on the Cocktail
to convert it to a BSON-compatible dict, and this is directly passed to PyMongo's insert_one
method. There's no way to get PyMongo to return the document that was just inserted in a single operation (although an upsert using find_one_and_update
is similar to just that).After inserting the data, the code then updates the local object with the newly-assigned
id
and returns it to the client.Thanks to
Flask-PyMongo
, the endpoint for looking up a single cocktail is even more straightforward:This endpoint will abort with a 404 if the slug can't be found in the collection. Otherwise, it simply instantiates a Cocktail with the document from the database, and calls
to_json
to convert it to a dict that Flask will automatically encode correctly as JSON.This endpoint is a monster, and it's because of pagination, and the links for pagination. In the sample data above, you probably noticed the
_links
section:This
_links
section is specified as part of the HAL (Hypertext Application
Language) specification. It's a good idea to follow a standard for pagination data, and I didn't feel like inventing something myself!And here's the code to generate all this. Don't freak out.
Although there's a lot of code there, it's not as complex as it may first appear. Two requests are made to MongoDB: one for a page-worth of cocktail recipes, and the other for the total number of cocktails in the collection. Various calculations are done to work out how many documents to skip, and how many pages of cocktails there are. Finally, some links are added for "prev" and "next" pages, if appropriate (i.e.: the current page isn't the first or last.) Serialization of the cocktail documents is done in the same way as the previous endpoint, but in a loop this time.
The update and delete endpoints are mainly repetitions of the code I've already included, so I'm not going to include them here. Check them out in the GitHub repo if you want to see how they work.
Nothing irritates me more than using a JSON API which returns HTML when an error occurs, so I was keen to put in some reasonable error handling to avoid this happening.
After Flask set-up code, and before the endpoint definitions, the code registers two error-handlers:
The first error-handler intercepts any endpoint that fails with a 404 status code and ensures that the error is returned as a JSON dict.
The second error-handler intercepts a
DuplicateKeyError
raised by any endpoint, and does the same thing as the first error-handler, but sets the HTTP status code to "400 Bad Request."As I was writing this post, I realised that I've missed an error-handler to deal with invalid Cocktail data. I'll leave implementing that as an exercise for the reader! Indeed, this is one of the difficulties with writing robust Python applications: Because exceptions can be raised from deep in your stack of dependencies, it's very difficult to comprehensively predict what exceptions your application may raise in different circumstances.
This is something that's very different in Rust, and even though, as you'll see, error-handling in Rust can be verbose and tricky, I've started to love the language for its insistence on correctness.
When I started writing this post, I though it would end up being relatively straightforward. As I added the requirement that the code should not just be a toy example, some of the inherent difficulties with building a robust API on top of any database became apparent.
In this case, Flask may not have been the right tool for the job. I recently wrote a blog post about building an API with Beanie. Beanie and FastAPI are a match made in heaven for this kind of application and will handle validation, transformation, and pagination with much less code. On top of that, they're self-documenting and can provide the data's schema in open formats, including OpenAPI Spec and JSON Schema!
If you're about to build an API from scratch, I strongly recommend you check them out, and you may enjoy reading Aaron Bassett's posts on the FARM (FastAPI, React, MongoDB) Stack.
I will shortly publish the second post in this series, Build a Cocktail API with Actix-Web, MongoDB, and Rust, and then I'll conclude with a third post, I Rewrote it in Rust—How Did it Go?, where I'll evaluate the strengths and weaknesses of the two experiments.
Thank you for reading. Keep a look out for the upcoming posts!
If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.
Related
Quickstart
Store Sensitive Data With Python & MongoDB Client-Side Field Level Encryption
Sep 23, 2022 | 11 min read
Tutorial
Part #1: Build Your Own Vector Search with MongoDB Atlas and Amazon SageMaker
Jul 11, 2024 | 4 min read