Mongo as API aggregator

Jonny · February 29, 2020, 8:27pm

Hi
I’m an experienced software dev and have worked with every major RDBMS, always been interested in Mongo but never really felt like I had something that fit with Mongo. However, I think I do now, so I wanted to double check. Regardless of the answer I’m enrolled and doing the Mongo courses and really enjoying them and will continue to do so.

So my next task that I am seriously considering using Mongo for, is building a reporting system. I’ve built a few before and really enjoy them. But this one is a little different. We don’t directly own any of the data. I’m basically going to have to aggregate/combine results from Jira and TestRail. Because of the scale this data, I can’t just make new requests and then aggregate/combine responses results sets in the 1000s. The performance will be poor. So I’m looking at persisting the data in Mongo.

Reasons being:

a) The responses from both APIs can be custom, or somewhat flexible
b) They’re JSON to start with and being potentially custom, mapping them into DB tables would be tedious and probably error prone due to the various types that can be returned.
c) The potential scale, so I’ll elaborate. In Jira, we’ll have many projects, a project is a system in our case. So those will have 100s/1000s of stories, then subsequently in TestRail, each story will have multiple test cases, and each test will have a test result for each test run. These are all automated, so may run nightly, so quickly a test will have dozens of results so quickly you have a load of data.

So although I’m not through the data modelling course, I am thinking that there is a lot of scale potential, especially when you start trying to do work around trends over time. There’s what appear to me currently going to be lots of 1:N relations.
Projects 1:N Stories
Stories 1:N Test Cases
Test Cases 1:N Test Results

So I’m at this early stage thinking of collections for each major item (Stories, Test Cases, Test Results). I just wondered what peoples thoughts were on this approach and whether Mongo is a suitable choice here? As mentioned either way, I’m going to continue with learning Mongo as it’s cool. I certainly see that this could be modeled in an RDBMS, but the flexibility and lack of control I have on what custom things people put into TR and Jira make modelling that in something like Postgres a bit harder. PG was my original idea and just use json columns to store the responses, but I’m not convinced reporting through the json columns would be that easy/performant.

Anyway, I’d be interested to hear some thought

Prasad_Saya · March 2, 2020, 2:30am

The main entities are Projects, Stories, Test Cases and Test Results and each with one-to-many relationship with the following entity. I am assuming you already have some idea about how the data is collected into the MongoDB database.

There are couple of questions that come to mind, to start with. How much data? And, how this data is going to be used (what are the main queries or reports)?

These are some relevant references:

Data Model Design discusses embedding vs referencing.
6 Rules of Thumb for MongoDB Schema Design discusses one-to-N relationships.

Jonny · March 2, 2020, 2:43pm

Thanks for the links, I’ve just read the 6 rules. In terms of data we’ll have dozens of systems, with 1000s of stories, each with 1-15 associated tests, the test results for these test cases will be automated, so probably run weekly. So a 50 test results per test case in a given year.

Query wise, the report is basically the state of testing for applications, I am considering making an aggregated document with the number of requirements, number of tests, requirement coverage, test coverage etc.