How to work with Johns Hopkins University COVID-19 Data in MongoDB Atlas
Rate this article
Our MongoDB Cluster is running in version 7.0.3.
You can connect to it using MongoDB Compass, the Mongo Shell, SQL or any MongoDB driver supporting at least MongoDB 7.0 with the following URI:
readonlyis the username and the password, they are not meant to be replaced.
- First data entry is 2020-01-22, last one is 2023-03-09.
- Cluster now running on 7.0.3
- Removed the database
covid19jhuwith the raw data. Use the much better database
- BI Tools access is now disable.
- Upgraded the cluster to 4.4.
- Renamed the field "city" to "county" and "cities" to "counties" where appropriate. They contain the data from the column "Admin2" in JHU CSVs.
covid19.statisticscollection is renamed
covid19.global_and_usfor more clarity.
- The dataset is updated hourly so any commit done by JHU will be reflected at most one hour later in our cluster.
- the World Health Organization,
- the National Health Commission of the People's Republic of China,
- the United States Centre for Disease Control,
- the Australia Government Department of Health,
- the European Centre for Disease Prevention and Control,
- and many others.
Using the CSV files they provide, we are producing two different databases in our cluster.
covid19contains the same dataset but with a clean MongoDB schema design with all the good practices we are recommending.
Here is an example of a document in the
- global (the data from the time series global files)
- us_only (the data from the time series US files)
- global_and_us (the most complete one)
- countries_summary (same as global but countries are grouped in a single doc for each date)
In the following sections, we will also show you how to consume this dataset using the Java, Node.js and Python drivers.
We will show you how to perform the following queries in each language:
- Retrieve the last 5 days of data for a given place,
- Retrieve all the data for the last day,
- Make a geospatial query to retrieve data within a certain distance of a given place.
For MongoDB Compass or your driver, you can use this connection string.
The sample code shows how to install pymongo and use it to connect to the MongoDB COVID-19 dataset. There are some example queries which show how to query the data and display it in the notebook, and the last example demonstrates how to display a chart using Pandas & Matplotlib!
If you want to modify the notebook, you can take a copy by selecting "Save a copy in Drive ..." from the "File" menu, and then you'll be free to edit the copy.
You can get lots of value from the dataset without any programming at all. We've enabled the (not anymore, see News section), which exposes an SQL interface to MongoDB's document structure. This means you can use data analysis and dashboarding tools like , , and even to analyze, visualise and extract understanding from the data.
Here's an example of a visualisation produced in a few clicks with Tableau:
- Server: covid-19-biconnector.hip2i.mongodb.net,
- Port: 27015,
- Database: covid19,
- Username: readonly or readonly?source=admin,
- Password: readonly.
Accessing our copy of this data in a read-only database is useful, but it won't be enough if you want to integrate it with other data within a single MongoDB cluster. You can obtain a copy of the database, either to use offline using a different tool outside of MongoDB, or to load into your own MongoDB instance.
mongoexportis a command-line tool that produces a or CSV export of data stored in a MongoDB instance. First, follow these .
Now you can run the following in your console to download the metadata and global_and_us collections as jsonl files in your current directory:
Another smart way to duplicate the dataset in your own cluster would be to use
mongorestore. Apart from being more efficient, it will also grab the indexes definition along with the data.
We see the value and importance of making this data as readily available to everyone as possible, so we're not stopping here. Over the coming days, we'll be adding a GraphQL and REST API, as well as making the data available within Excel and Google Sheets.