Build a RESTful API with Flask, MongoDB, and Python

Mark Smith10 min read • Published Jan 14, 2022 • Updated May 12, 2022

Flask Python

Rate this tutorial

This is the first part of a short series of blog posts called "Rewrite it in Rust (RiiR)." It's a tongue-in-cheek title for some posts that will investigate the similarities and differences between the same service written in Python with Flask, and Rust with Actix-Web.

This post will show how I built a RESTful API for a collection of cocktail recipes I just happen to have lying around. The aim is to show an API server with some complexity, so although it's a small example, it will cover important factors such as:

Data transformation between the database and a JSON representation.
Data validation.
Pagination.
Error-handling.

Prerequisites

Python 3.8 or above
A MongoDB Atlas cluster. Follow the "Get Started with Atlas" guide to create your account and MongoDB cluster. Keep a note of your database username, password, and connection string as you will need those later.

This is an advanced guide, so it'll cover a whole bunch of different libraries which can be brought together to build a declarative Restful API server on top of MongoDB. I won't cover repeating patterns in the codebase, so if you want to build the whole thing, I recommend checking out the source code, which is all on GitHub.

It won't cover the basics of Python, Flask, or MongoDB, so if that's what you're looking for, I recommend checking out the following resources before tackling this post:

Getting Started

Begin by cloning the sample code source from GitHub. There are four top-level directories:

actix-cocktail-api: You can ignore this for now.
data: This contains an export of my cocktail data. You'll import this into your cluster in a moment.
flask-cocktail-api: The code for this blog post.
test_scripts: A few shell scripts that use curl to test the HTTP interface of the API server.

There are more details in the GitHub repo, but the basics are: Install the project with your virtualenv active:

Code Snippet

Next, you should import the data into your cluster. Set the environment variable $MONGO_URI to your cluster URI. This environment variable will be used in a moment to import your data, and also by the Flask app. I use direnv to configure this, and put the following line in my .envrc file in my project's directory:

Code Snippet

Note that your database must be called "cocktails," and the import will create a collection called "recipes." After checking that $MONGO_URI is set correctly, run the following command:

Code Snippet

Now you should be able to run the Flask app from the flask-cocktail-api directory:

Code Snippet

(You can run make run if you prefer.)

Check the output to ensure it is happy with the configuration, and then in a different terminal window, run the list_cocktails.sh script in the test_scripts directory. It should print something like this:

Code Snippet

{
    "_links": {
        "last": {
            "href": "http://localhost:5000/cocktails/?page=5"
        }, 
        "next": {
            "href": "http://localhost:5000/cocktails/?page=5"
        }, 
        "prev": {
            "href": "http://localhost:5000/cocktails/?page=3"
        }, 
        "self": {
            "href": "http://localhost:5000/cocktails/?page=4"
        }
    }, 
    "recipes": [
        {
            "_id": "5f7daa198ec9dfb536781b0d", 
            "date_added": null, 
            "date_updated": null, 
            "ingredients": [
            {
                "name": "Light rum", 
                "quantity": {
                "unit": "oz", 
                }
            }, 
            {
                "name": "Grapefruit juice", 
                "quantity": {
                "unit": "oz", 
                }
            }, 
            {
                "name": "Bitters", 
                "quantity": {
                "unit": "dash", 
                }
            }
            ], 
            "instructions": [
            "Pour all of the ingredients into an old-fashioned glass almost filled with ice cubes", 
            "Stir well."
            ], 
            "name": "Monkey Wrench", 
            "slug": "monkey-wrench"
        },
    ]
    ...

Breaking it All Down

The code is divided into three submodules.

__init__.py contains all the Flask setup code, and defines all the HTTP routes.
model.py contains all the Pydantic model definitions.
objectid.py contains a Pydantic field definition that I stole from the Beanie object-data mapper for MongoDB.

I mentioned earlier that this code makes use of several libraries:

PyMongo and Flask-PyMongo handle the connection to the database. Flask-PyMongo specifically wraps the database collection object to provide a convenientfind_one_or_404 method.
Pydantic manages data validation, and some aspects of data transformation between the database and a JSON representations.
along with a single function from FastAPI.

Data Validation and Transformation

When building a robust API, it's important to validate all the data passing into the system. It would be possible to do this using a stack of if/else statements, but it's much more effective to define a schema declaratively, and to allow that to programmatically validate the data being input.

I used a technique that I learned from Beanie, a new and neat ODM that I unfortunately couldn't practically use on this project, because Beanie is async, and Flask is a blocking framework.

Beanie uses Pydantic to define a schema, and adds a custom Field type for ObjectId.

Code Snippet

This Cocktail schema defines the structure of a Cocktail instance, which will be validated by Pydantic when instances are created. It includes another embedded schema for Ingredient, which is defined in a similar way.

I added convenience functions to export the data in the Cocktail instance to either a JSON-compatible dict or a BSON-compatible dict. The differences are subtle, but BSON supports native ObjectId and datetime types, for example, whereas when encoding as JSON, it's necessary to encode ObjectId instances in some other way (I prefer a string containing the hex value of the id), and datetime objects are encoded as ISO8601 strings.

The to_json method makes use of a function imported from FastAPI, which recurses through the instance data, encoding all values in a JSON-compatible form. It already handles datetime instances correctly, but to get it to handle ObjectId values, I extracted some custom field code from Beanie, which can be found in objectid.py.

The to_bson method doesn't need to pass the dict data through jsonable_encoder. All the types used in the schema can be directly saved with PyMongo. It's important to set by_alias to True, so that the key for _id is just that, _id, and not the schema's id without an underscore.

Code Snippet

This approach is neat for this particular use-case, but I can't help feeling that it would be limiting in a more complex system. There are many patterns for storing data in MongoDB. These often result in storing data in a form that is optimal for writes or reads, but not necessarily the representation you would wish to export in an API.

What is a Slug?

Looking at the schema above, you may have wondered what a "slug" is ... well, apart from a slimy garden pest.

A slug is a unique, URL-safe, mnemonic used for identifying a document. I picked up the terminology as a Django developer, where this term is part of the framework. A slug is usually derived from another field. In this case, the slug is derived from the name of the cocktail, so if a cocktail was called "Rye Whiskey Old-Fashioned," the slug would be "rye-whiskey-old-fashioned."

In this API, that cocktail could be accessed by sending a GET request to the /cocktails/rye-whiskey-old-fashioned endpoint.

I've kept the unique slug field separate from the auto-assigned _id field, but I've provided both because the slug could change if the name of the cocktail was tweaked, in which case the _id value would provide a constant identifier to look up an exact document.

In the Rust version of this code, I was nudged to use a different approach. It's a bit more verbose, but in the end I was convinced that it would be more powerful and flexible as the system grew.

Creating a New Document

Now I'll show you what a single endpoint looks like, first focusing on the "Create" endpoint, that handles a POST request to /cocktails and creates a new document in the "recipes" collection. It then returns the document that was stored, including the newly unique ID that MongoDB assigned as _id, because this is a RESTful API, and that's what RESTful APIs do.

Code Snippet

This endpoint modifies the incoming JSON directly, to add a date_added item with the current time. It then passes it to the constructor for our Pydantic schema. At this point, if the schema failed to validate the data, an exception would be raised and displayed to the user.

After validating the data, to_bson() is called on the Cocktail to convert it to a BSON-compatible dict, and this is directly passed to PyMongo's insert_one method. There's no way to get PyMongo to return the document that was just inserted in a single operation (although an upsert using find_one_and_update is similar to just that).

After inserting the data, the code then updates the local object with the newly-assigned id and returns it to the client.

Reading a Single Cocktail

Thanks to Flask-PyMongo, the endpoint for looking up a single cocktail is even more straightforward:

Code Snippet

This endpoint will abort with a 404 if the slug can't be found in the collection. Otherwise, it simply instantiates a Cocktail with the document from the database, and calls to_json to convert it to a dict that Flask will automatically encode correctly as JSON.

Listing All the Cocktails

This endpoint is a monster, and it's because of pagination, and the links for pagination. In the sample data above, you probably noticed the _links section:

Code Snippet

This _links section is specified as part of the HAL (Hypertext Application Language) specification. It's a good idea to follow a standard for pagination data, and I didn't feel like inventing something myself!

And here's the code to generate all this. Don't freak out.

Code Snippet

@app.route("/cocktails/")
def list_cocktails():
    """
    GET a list of cocktail recipes.

The results are paginated using the `page` parameter.
    """

page = int(request.args.get("page", 1))
    per_page = 10  # A const value.

# For pagination, it's necessary to sort by name,
    # then skip the number of docs that earlier pages would have displayed,
    # and then to limit to the fixed page size, ``per_page``.
    cursor = recipes.find().sort("name").skip(per_page * (page - 1)).limit(per_page)

cocktail_count = recipes.count_documents({})

links = {
        "self": {"href": url_for(".list_cocktails", page=page, _external=True)},
        "last": {
            "href": url_for(
                ".list_cocktails", page=(cocktail_count // per_page) + 1, _external=True
            )
        },
    }
    # Add a 'prev' link if it's not on the first page:
    if page > 1:
        links["prev"] = {
            "href": url_for(".list_cocktails", page=page - 1, _external=True)
        }
    # Add a 'next' link if it's not on the last page:
    if page - 1 < cocktail_count // per_page:
        links["next"] = {
            "href": url_for(".list_cocktails", page=page + 1, _external=True)
        }

return {
        "recipes": [Cocktail(**doc).to_json() for doc in cursor],
        "_links": links,
    }

Although there's a lot of code there, it's not as complex as it may first appear. Two requests are made to MongoDB: one for a page-worth of cocktail recipes, and the other for the total number of cocktails in the collection. Various calculations are done to work out how many documents to skip, and how many pages of cocktails there are. Finally, some links are added for "prev" and "next" pages, if appropriate (i.e.: the current page isn't the first or last.) Serialization of the cocktail documents is done in the same way as the previous endpoint, but in a loop this time.

The update and delete endpoints are mainly repetitions of the code I've already included, so I'm not going to include them here. Check them out in the GitHub repo if you want to see how they work.

Error Handling

Nothing irritates me more than using a JSON API which returns HTML when an error occurs, so I was keen to put in some reasonable error handling to avoid this happening.

After Flask set-up code, and before the endpoint definitions, the code registers two error-handlers:

Code Snippet

The first error-handler intercepts any endpoint that fails with a 404 status code and ensures that the error is returned as a JSON dict.

The second error-handler intercepts a DuplicateKeyError raised by any endpoint, and does the same thing as the first error-handler, but sets the HTTP status code to "400 Bad Request."

As I was writing this post, I realised that I've missed an error-handler to deal with invalid Cocktail data. I'll leave implementing that as an exercise for the reader! Indeed, this is one of the difficulties with writing robust Python applications: Because exceptions can be raised from deep in your stack of dependencies, it's very difficult to comprehensively predict what exceptions your application may raise in different circumstances.

This is something that's very different in Rust, and even though, as you'll see, error-handling in Rust can be verbose and tricky, I've started to love the language for its insistence on correctness.

Wrapping Up

When I started writing this post, I though it would end up being relatively straightforward. As I added the requirement that the code should not just be a toy example, some of the inherent difficulties with building a robust API on top of any database became apparent.

In this case, Flask may not have been the right tool for the job. I recently wrote a blog post about building an API with Beanie. Beanie and FastAPI are a match made in heaven for this kind of application and will handle validation, transformation, and pagination with much less code. On top of that, they're self-documenting and can provide the data's schema in open formats, including OpenAPI Spec and JSON Schema!

If you're about to build an API from scratch, I strongly recommend you check them out, and you may enjoy reading Aaron Bassett's posts on the FARM (FastAPI, React, MongoDB) Stack.

I will shortly publish the second post in this series, Build a Cocktail API with Actix-Web, MongoDB, and Rust, and then I'll conclude with a third post, I Rewrote it in Rust—How Did it Go?, where I'll evaluate the strengths and weaknesses of the two experiments.

Thank you for reading. Keep a look out for the upcoming posts!

If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.

Rate this tutorial

Tutorial

Building a RAG System With Google's Gemma, Hugging Face and MongoDB

Mar 21, 2024 | 12 min read

Quickstart

MongoDB Change Streams with Python

Sep 23, 2022 | 9 min read

Tutorial

How to Implement Databricks Workflows and Atlas Vector Search for Enhanced Ecommerce Search Accuracy

Sep 22, 2023 | 6 min read

Tutorial

How to Build a RAG System With LlamaIndex, OpenAI, and MongoDB Vector Database

Feb 16, 2024 | 10 min read

Prerequisites
Getting Started
Breaking it All Down
Data Validation and Transformation
Creating a New Document
Reading a Single Cocktail
Listing All the Cocktails
Error Handling
Wrapping Up

Python

Build a RESTful API with Flask, MongoDB, and Python

Prerequisites

Getting Started

Breaking it All Down

Data Validation and Transformation

Creating a New Document

Reading a Single Cocktail

Listing All the Cocktails

Error Handling

Wrapping Up

Related

Building a RAG System With Google's Gemma, Hugging Face and MongoDB

MongoDB Change Streams with Python

How to Implement Databricks Workflows and Atlas Vector Search for Enhanced Ecommerce Search Accuracy

How to Build a RAG System With LlamaIndex, OpenAI, and MongoDB Vector Database

Table of Contents