The 5-Minute Guide to Working with ESG Data on MongoDB
Rate this tutorial
MongoDB makes it incredibly easy to work with environmental, social, and corporate governance (ESG) data from multiple providers, analyze that data, and then visualize it.
In this quick guide, we will show you how MongoDB can:
- Move ESG data from different data sources to the document model.
- Easily incorporate new ESG source feeds to the document data model.
- Run advanced, aggregated queries on ESG data.
- Visualize ESG data.
- Manage different data types in a single document.
- Integrate geospatial data.
NOTE: An MSCI account and login is required to download the datasets linked to in this article. Dataset availability is dependent on MSCI product availability.
Our examples are drawn from real-life work with MongoDB clients in the financial services industry. Screenshots (apart from code snippets) are taken from MongoDB Compass, MongoDB’s GUI for querying, optimizing, and analyzing data.
The first step is to download the MSCI dataset, and import the MSCI .csv file (Figure 1) into MongoDB.
Even though MSCI’s data is in tabular format, MongoDB’s document data model allows you to import the data directly into a database collection and apply the data types as needed.
Figure 1. Importing the data using MongoDB’s Compass GUI
With the MSCI data imported into MongoDB, we can start discovering, querying, and visualizing it.
Source Data Set: MSCI ESG Accounting Governance Risk (AGR)
Collection:
accounting_governance_risk_agr_ratings
From MSCI - “ESG AGR uses a quantitative approach to identify risks in the financial reporting practices and accounting governance of publicly listed companies. Metrics contributing to the score include traditional fundamental ratios used to evaluate corporate strength and profitability, as well as forensic ratios.”
Fields/Data Info:
- The AGR (Accounting & Governance Risk) Rating consists of four groupings based on the AGR Percentile: Very Aggressive (1-10), Aggressive (11-35), Average (36-85), Conservative (86-100).
- The AGR (Accounting & Governance Risk) Percentile ranges from 1-100, with lower values representing greater risks.
In this example, we will count the number of AGR rated companies in Japan belonging to each AGR rating group (i.e., Very Aggressive, Aggressive, Average, and Conservative). To do this, we will use MongoDB’s aggregation pipeline to process multiple documents and return the results we’re after.
The aggregation pipeline presents a powerful abstraction for working with and analyzing data stored in the MongoDB database. The composability of the aggregation pipeline is one of the keys to its power. The design was actually modeled on the Unix pipeline, which allows developers to string together a series of processes that work together. This helps to simplify their application code by reducing logic, and when applied appropriately, a single aggregation pipeline can replace many queries and their associated network round trip times.
What aggregation stages will we use?
- The $match operator in MongoDB works as a filter. It filters the documents to pass only the documents that match the specified condition(s).
- The $group stage separates documents into groups according to a "group key," which, in this case, is the value of Agr_Rating.
- Additionally, at this stage, we can summarize the total count of those entities.
Combining the first two aggregation stages, we can filter the Issuer_Cntry_Domicile field to be equal to Japan — i.e., ”JP” — and group the AGR ratings.
As a final step, we will also sort the output of the total_count in descending order (hence the -1) and merge the results into another collection in the database of our choice, with the $merge operator.
The result and output collection
'jp_agr_risk_ratings'
can be seen below.Next, let’s visualize the results of Step 1 with MongoDB Charts, which is integrated into MongoDB. With Charts, there’s no need for developers to worry about finding a compatible data visualization tool, dealing with data movement, or data duplication when creating or sharing data visualizations.
Figure 2. Distribution of AGR rating in Japan
Let’s go a step further and group the results for multiple countries. We can add more countries — for instance, Japan and Hong Kong — and then $group and $count the results for them in Figure 3.
Figure 3. $match stage run in MongoDB Compass
Moving back to Charts, we can easily display the results comparing governance risks for Hong Kong and Japan, as shown in Figure 4.
Figure 4. Compared distribution of AGR ratings - Japan vs Hong Kong
From MSCI - “GeoQuant's Country Fundamental Risk Indicators fuses political and computer science to measure and predict political risk. GeoQuant's machine-learning software scrapes the web for large volumes of reputable data, news, and social media content. “
Fields/Data Info:
- Health (Health Risk) - Quality of/access to health care, resilience to disease
- IR (International Relations Risk) - Prevalence/likelihood of diplomatic, military, and economic conflict with other countries
- PolViol (Political Violence Risk) - Prevalence/likelihood of civil war, insurgency, terrorism
With the basics of MongoDB’s query framework understood, let’s move on to more complex queries, again using MongoDB’s aggregation pipeline capabilities.
With MongoDB’s document data model, we can nest documents within a parent document. In addition, we are able to perform query operations over those nested fields.
Imagine a scenario where we have two separate collections of ESG data, and we want to combine information from one collection into another, fetch that data into the result array, and further filter and transform the data.
We can do this using an aggregation pipeline.
Let’s say we want more detailed results for companies located in a particular country — for instance, by combining data from
focus_risk_scores
with our primary collection: accounting_governance_risk_agr_ratings
.
Figure 5. accounting_governance_risk_agr_ratings collection in MongoDB Compass
Figure 6. focus_risk_scores collection in MongoDB Compass
In order to do that, we use the $lookup stage, which adds a new array field to each input document. It contains the matching documents from the "joined" collection. This is similar to the joins used in relational databases. You may ask, "What is $lookup syntax?"
To perform an equality match between a field from the input documents with a field from the documents of the "joined" collection, the $lookup stage has this syntax:
In our case, we want to join and match the value of Issuer_Cntry_Domicile from the collection accounting_governance_risk_agr_ratings with the value of Country field from the collection focus_risk_scores, as shown in Figure 7.
Figure 7. $lookup stage run in MongoDB Compass
After performing the $lookup operation, we receive the data into the ‘result’ array field.
Imagine that at this point, we decide only to display Issuer_Name and Issuer_Cntry_Domicle from the first collection. We can do so with the $project operator and define the fields that we want to be visible for us in Figure 8.
Figure 8. $project stage run in MongoDB Compass
Additionally, we remove the result_.id field that comes from the original document from the other collection as we do not need it at this stage. Here comes the handy $unset stage.
Figure 9. $unset stage run in MongoDB Compass
With our data now cleaned up and viewable in one collection, we can go further and edit the data set with new custom fields and categories.
Updating fields
Let’s say we would like to set up new fields that categorize Health, IR, and PolViol lists separately.
To do so, we can use the $set operator. We use it to create new fields — health_risk, politcial_violance_risk, international_relations_risk — where each of the respective fields will consist of an array with only those elements that match the condition specified in $filter operator.
$filter has the following syntax:
input — An expression that resolves to an array.
as — A name for the variable that represents each individual element of the input array.
cond — An expression that resolves to a boolean value used to determine if an element should be included in the output array. The expression references each element of the input array individually with the variable name specified in as.
In our case, we perform the $filter stage where the input we specify as “$result” array.
Why dollar sign and field name?
This prefixed field name with a dollar sign $ is used in aggregation expressions to access fields in the input documents (the ones from the previous stage and its result field).
Further, we name every individual element from that $result field as “metric”.
To resolve the boolean we define conditional expression, in our case, we want to run an equality match for a particular metric "$$metric.Risk" (following the "$$." syntax that accesses a specific field in the metric object).
And define and filter those elements to the appropriate value (“Health”, “PolViol”, “IR”).
The full query can be seen below in Figure 10.
Figure 10. $set stage and $filter operator run in MongoDB Compass
After we consolidate the fields that are interesting for us, we can remove redundant result array and use $unset operator once again to remove result field.
Figure 11. $unset stage run in MongoDB Compass
The next step is to calculate the average risk of every category (Health, International Relations, Political Violence) between country of origin where Company resides (“Country” field) and other countries (“Primary_Countries” field) with $avg operator within $set stage (as seen in Figure 12).
Figure 12. $set stage run in MongoDB Compass
And display only the companies whose average values are greater than 0, with a simple $match operation Figure 13.
Figure 13. $match stage run in MongoDB Compass
Save the data (merge into) and display the results in the chart.
Once again, we can use the $merge operator to save the result of the aggregation and then visualize it using MongoDB Charts Figure 14.
Figure 14. $merge stage run in MongoDB Compass
Let’s take our data set and create a chart of the Average Political Risk for each company, as displayed in Figure 15.
Figure 15. Average Political Risk per Company in MongoDB Atlas Charts
We can also create Risk Charts per category of risk, as seen in Figure 16.
Figure 16. average international risk per company in MongoDB Atlas Charts
Figure 17. average health risk per company in MongoDB Atlas Charts
Below is a snippet with all the aggregation operators mentioned in Scenario 2:
From MSCI - “Elevate’s Supply Chain ESG Risk Ratings aggregates data from its verified audit database to the country level. The country risk assessment includes an overall score as well as 38 sub-scores organized under labor, health and safety, environment, business ethics, and management systems.”
ESG data processing requires the handling of a variety of structured and unstructured data consisting of financial, non-financial, and even climate-related geographical data. In this final scenario, we will combine data related to environmental scoring — especially wastewater, air, environmental indexes, and geo-locations data — and present them in a geo-spatial format to help business users quickly identify the risks.
MongoDB provides a flexible and powerful multimodel data management approach and includes the support of storing and querying geospatial data using GeoJSON objects or as legacy coordinate pairs. We shall see in this example how this can be leveraged for handling the often complex ESG data.
Firstly, let’s filter and group the data. Using $match and $group operators, we can filter and group the country per country and province, as shown in Figure 15 and Figure 16.
Figure 18. $match stage run in MongoDB Compass
Figure 19. $group stage run in MongoDB Compass
Now that we have the data broken out by region and country, in this case Vietnam, let’s display the information on a map.
It doesn’t matter that the original ESG data did not include comprehensive geospatial data or data in GeoJSON format, as we can simply augment our data set with the latitude and longitude for each region.
Using the $set operator, we can apply the logic for all regions of the data, as shown in Figure 20.
Leveraging the $switch operator, we evaluate a series of case expressions and set the coordinates of longitude and latitude for the particular province in Vietnam.
Figure 20. $set stage and $switch operator run in MongoDB Compass
Using MongoDB Charts’ built-in heatmap feature, we can now display the maximum air emission, environment management, and water waste metrics data for Vietnamese regions as a color-coded heat map.
Figure 21. heatmaps of Environment, Air Emission, Water Waste Indexes in Vietnam in MongoDB Atlas Charts
Below is a snippet with all the aggregation operators mentioned in Scenario 3:
As we can see from the scenarios above, MongoDB’s out-of-the box tools and capabilities — including a powerful aggregation pipeline framework for simple or complex data processing, Charts for data visualization, geospatial data management, and native drivers — can easily and quickly combine different ESG-related resources and produce actionable insights.
MongoDB has a distinct advantage over relational databases when it comes to handling ESG data, negating the need to produce the ORM mapping for each data set.
Import any type of ESG data, model the data to fit your specific use case, and perform tests and analytics on that data with only a few commands.
To learn more about how MongoDB can help with your ESG needs, please visit our dedicated solution page.