Getting Started with Python, PyMODM, and MongoDB Atlas

Jason Ma

#Technical#Cloud

What is PyMODM

PyMODM is an object modeling package for Python that works like an Object Relational Mapping (ORM) and provides a validation and modeling layer on top of PyMongo (MongoDB’s Python driver). Developers can use PyMODM as a template to model their data, validate schemas, and easily delete referenced objects. PyMODM can be used with any Python framework, is compatible with Python 3, and is supported by MongoDB.

Benefits of PyMODM

PyMODM allows developers to focus more on developing application logic instead of creating validation logic to ensure data integrity.

Some key benefits of PyMODM are:

Field Validation. MongoDB has a dynamic schema, but there are very few production use cases where data is entirely unstructured. Most applications expect some level of data validation either through the application or database tier. MongoDB provides Document Validation within the database. Users can enforce checks on document structure, data types, data ranges, and the presence of mandatory fields. Document validation is useful for centralizing rules across projects and APIs, as well as minimizing redundant code for multiple applications. In certain cases, application side validation makes sense as well, especially when you would like to obviate the need for a round trip between the application and database. PyMODM provides users the ability to define models and validate their data before storing it in MongoDB, thus eliminating the amount of data validation logic developers need to write in the application tier.

Built In Reference Handling. PyMODM has built in reference handling to make development simpler. Developers don’t have to plan on normalizing data as much as they would with an RDBMS. PyMODM can automatically populate fields that reference documents in other collections, in a similar way to foreign keys in a RDBMS.

For example, you might have a model for a blog post that contains an author. Let’s say we want to keep track of these entities in separate collections. The way we store this in MongoDB is to have the _id from the author document be stored as an author field in the post document:

{
"title": "Working with PyMODM",
"author": ObjectId('57dad74a6e32ab4894ea6898')
}

If we were using the low-level driver, we would just get an ObjectId when we accessed post['author'], whereas PyMODM will lazily dereference this field for you:

>>> post.author
Author(name='Jason Ma')

In other words, PyMODM handles all the necessary queries to resolve referenced objects, instead of having to pull out the ids yourself and perform the extra queries manually.

PyMODM also provides several strategies for managing how objects get deleted when they are involved in a relationship with other objects. For example, if you have a Book and Publisher class, where each Book document references a Publisher document, you have the option of deleting all Book objects associated with that Publisher.

Familiar PyMongo Syntax. PyMODM uses PyMongo-style syntax for queries and updates, which makes it familiar and easy to get started with for those already familiar with PyMongo.

Installing PyMODM

Getting started with PyMODM is simple. You can install PyMODM with pip.

pip install pymodm

Connecting to MongoDB Atlas

For developers that are interested in minimizing operational database tasks, MongoDB Atlas is an ideal option.

MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime.

Setting up Atlas is simple.

Select the instance size that fits your application needs and click “CONFIRM & DEPLOY”.

Connecting PyMODM to MongoDB Atlas is straightforward and easy. Just find the connection string and plug it into the ‘connect’ method. To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled. IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group. For AWS, VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers.

from pymodm import connect

#Establish a connection to the database and call the connection my-atlas-app
connect(
'mongodb://jma:PASSWORD@mongo-shard-00-00-efory.mongodb.net:27017,mongo-shard-00-01-efory.mongodb.net:27017,mongo-shard-00-02-efory.mongodb.net:27017/admin?ssl=true&replicaSet=mongo-shard-0&authSource=admin', 
	alias='my-atlas-app'
)

In this example, we have set alias=’my-atlas-app’. An alias in the connect method is optional, but comes in handy if we ever need to refer to the connection by name. Remember to replace “PASSWORD” with your own generated password.

Defining Models

One of the big benefits of PyMODM is the ability to define your own models and apply schema validation to those models. The below examples highlight how to use PyMODM to get started with a blog application.

Once a connection to MongoDB Atlas is established, we can define our model class. MongoModel is the base class for all top-level models, which represents data stored in MongoDB in a convenient object-oriented format. A MongoModel definition typically includes a number of field instances and possibly a Meta class that provides settings specific to the model:

from pymodm import MongoModel, fields
from pymongo.write_concern import WriteConcern

class User(MongoModel):
	email = fields.EmailField(primary_key=True)
	first_name = fields.CharField()
	last_name = fields.CharField()
 
	class Meta:
    	connection_alias = 'my-atlas-app'
    	write_concern = WriteConcern(j=True)

In this example, the User model inherits from MongoModel, which means that the User model will create a new collection in the database (myDatabase.User). Any class that inherits directly from MongoModel will always get it’s own collection.

The character fields (first_name, last_name) and email field (email) will always store their values as unicode strings. If a user stores some other type in first_name or last_name (e.g. Python ‘bytes’) then PyMODM will automatically convert the field to a unicode string, providing consistent and uniform access to that field. A validator is readily available on CharField, which will validate the maximum string length. For example, if we wanted to limit the length of a last name to 30 characters, we could do:

last_name = fields.CharField(max_length=30)

For the email field, we set primary_key=True. This means that this field will be used as the id for documents of this MongoModel class. Note, this field will actually be called _id in the database. PyMODM will validate that the email field contents contain a single ‘@’ character.

New validators can also be easily be created. For example, the email validator below ensures that the email entry is a Gmail address:

def is_gmail_address(string):
	if not string.endswith(‘@gmail.com’):
		raise ValidationError(‘Email address must be valid gmail account.’)

class User(MongoModel):
email = fields.EmailField(validators=[is_gmail_address])

Here, PyMODM will validate that the email field contains a valid Gmail address or throw an error. PyMODM handles field validation automatically whenever a user retrieves or saves documents, or on-demand. By rolling validation into the Model definition, we reduce the likelihood of storing invalid data in MongoDB. PyMODM fields also provide a uniform way of viewing data in that field. If we use a FloatField, for example, we will always receive a float, regardless of whether the data stored in that field is a float, an integer, or a quoted number. This mitigates the amount of logic that developers need to create in their applications.

Finally, the last part of our example is the Meta class, which contains two pieces of information. The connection_alias tells the model which connection to use. In a previous code example, we defined the connection alias as my-atlas-app. The write_concern attribute tells the model which write concern to use by default. You can define other Meta attributes such as read concern, read preference, etc. See the PyMODM API documentation for more information on defining the Meta class.

Reference Other Models

Another powerful feature of PyMODM is the ability to reference other models.

Let’s take a look at an example.

from pymodm import EmbeddedMongoModel, MongoModel, fields
 
class Comment(EmbeddedMongoModel):
    author = fields.ReferenceField(User)
    content = fields.CharField()
 
class Post(MongoModel):
    title = fields.CharField()
    author = fields.ReferenceField(User)
    revised_on = fields.DateTimeField()
    content = fields.CharField()
    comments = fields.EmbeddedDocumentListField(Comment)

In this example, we have defined two additional model types: Comment and Post. Both these models contain an author, which is an instance of the User model. The User that represents the author in each case is stored among all other Users in the myDatabase.User collection. In the Comment and Post models, we’re just storing the _id of the User in the author field. This is actually the same as the User’s email field, since we set primary_key=True for the field earlier.

The Post class gets a little bit more interesting. In order to support commenting on a Post, we’ve added a comments field, which is an EmbeddedDocumentListField. The EmbeddedDocumentListField embeds Comment objects directly into the Post object. The advantage of doing this is that you don’t need multiple queries to retrieve all comments associated with a given Post.

Now that we have created models that reference each other, what happens if an author deletes his/her account. PyMODM provides a few options in this scenario:

  • Do nothing (default behaviour).
  • Change the fields that reference the deleted objects to None.
  • Recursively delete all objects that were referencing the object (i.e. delete any comments and posts associated with a User).
  • Don’t allow deleting objects that have references to them.
  • If the deleted object was just one among potentially many other references stored in a list, remove the references from the list. For example, if the application allows for Post to have multiple authors we could remove from the list just the author who deleted their account.

For our previous example, let’s delete any comments and posts associated with a User that has deleted his/her account:

author = fields.ReferenceField(User, on_delete=ReferenceField.CASCADE)

This will delete all documents associated with the reference.

In this blog, we have highlighted just a few of the benefits that PyMODM provides. For more information on how to leverage the powerful features of PyMODM, check out this github example of developing a blog with the flask framework.

Summary

PyMODM is a powerful Python ORM for MongoDB that provides an object-oriented interface to MongoDB documents to make it simple to enforce data validation and referencing in your application. MongoDB Atlas helps developers free themselves from the operational tasks of scaling and managing their database. Together, PyMODM and MongoDB Atlas provide developers a compelling solution to enable fast, iterative development, while reducing costs and operational tasks.

Get Started with PyMODM