Check data consistency between two collections

Hi there,

I transferred data from one collection to another with a different structure.
I need to verify the data consistency between these two collections.
Is it possible to reformat a collection from a template to avoid doing the work manually ?

Do you think $objectToArray aggregation could be a good idea ?

thanks

Thats a pretty lose requirement to give much feedback. On a daily basis I run reconciliations between environments to ensure data migrations have gone well but it relies on knowing the data format of the source and target data.
Typically we run a top level check on doc counts and totals using group operations, then delve further into the data on a period and type basis.
What do your source and target documents look like?

2 Likes

Hi John,

This is what the documents look like; I’ve only taken a part.

Source :

{
  "seasonId": "H2021",
  "seasonRef": "7",
  "universeId": "1",
  "name": "SCLARK",
  "madeIn": [
    {
      "countryIso": "TN",
      "countryLabel": "TUNISIE"
    }
  ],
  "laundryCareSymbols": {
    "washing": "30D",
    "bleaching": "NO",
    "drying": "NO_TUMBLE",
    "ironing": "MEDIUM",
    "professionalCleaning": "NO_DRY_CLEAN"
  },
  "segmentations": [
    {
      "prefix": "segmentation",
      "segmentationId": "a26630",
      "name": "Top",
      "segmentationType": "PURCHASE_FAMILY",
      "segmentationCode": "a26630_PURCHASE_FAMILY",
      "parent": {
        "prefix": "segmentation",
        "segmentationId": "a26",
        "name": "Chemisier"
      }
    },
    {
      "prefix": "segmentation",
      "segmentationId": "7002",
      "name": "Hiver",
      "segmentationType": "SEASONALITY",
      "segmentationCode": "7002_SEASONALITY"
    },
    {
      "prefix": "segmentation",
      "segmentationId": "w32115",
      "name": "Chemises / Chemisiers",
      "segmentationType": "WEB_FAMILY",
      "segmentationCode": "w32115_WEB_FAMILY",
      "parent": {
        "prefix": "segmentation",
        "segmentationId": "w32",
        "name": "Chemisiers / Tuniques"
      }
    },
    {
      "prefix": "segmentation",
      "segmentationId": "fr_OMNICANAL_FAMILY_32_OMNICANAL_FAMILY_32115",
      "name": "Chemises / Chemisiers",
      "segmentationType": "OMNICANAL_FAMILY",
      "segmentationCode": "32115",
      "parent": {
        "prefix": "segmentation",
        "segmentationId": "fr_OMNICANAL_FAMILY_32",
        "name": "Chemisiers / Tuniques"
      }
    }
  ]
}

Target :

{
  "universeId": "1",
  "seasonId": "H2021",
  "seasonRef": "7",
  "model": {
    "modelId": "17260627",
    "modelName": "SCLARK"
  },
  "madeIn": [
    {
      "alpha2Code": "TN",
      "alpha3Code": "TUN",
      "name": "Tunisie"
    }
  ],
  "laundryCareSymbol": {
    "washing": "30D",
    "bleaching": "NO",
    "drying": "NO_TUMBLE",
    "ironing": "MEDIUM",
    "professionalCleaning": "NO_DRY_CLEAN"
  },
  "productSegmentations": [
    {
      "segmentationId": "fr_TYPE_MATIERE_TMAT_TYPE_MATIERE_C",
      "segmentationType": "TYPE_MATIERE",
      "segmentationCode": "C",
      "segmentationName": "Chaine et Trame",
      "parent": {
        "segmentationId": "fr_TYPE_MATIERE_TMAT",
        "segmentationType": "TYPE_MATIERE",
        "segmentationCode": "TMAT",
        "segmentationName": "TMAT"
      }
    },
    {
      "segmentationId": "fr_PURCHASE_FAMILY_26_PURCHASE_FAMILY_630",
      "segmentationType": "PURCHASE_FAMILY",
      "segmentationCode": "630",
      "segmentationName": "Top",
      "parent": {
        "segmentationId": "fr_PURCHASE_FAMILY_26",
        "segmentationType": "PURCHASE_FAMILY",
        "segmentationCode": "26",
        "segmentationName": "Chemisier"
      }
    },
    {
      "segmentationId": "fr_OMNICANAL_FAMILY_29_OMNICANAL_FAMILY_29517",
      "segmentationType": "OMNICANAL_FAMILY",
      "segmentationCode": "29517",
      "segmentationName": "Débardeurs",
      "parent": {
        "segmentationId": "fr_OMNICANAL_FAMILY_29",
        "segmentationType": "OMNICANAL_FAMILY",
        "segmentationCode": "29",
        "segmentationName": "Tops / T-shirts"
      }
    }
  ]
}

A few fields are similar (same name and same type).
For others, I need to rename, reorder or remove them from an embedded document or an array.
I can use $set to reorder and rename fields; however, for embedded and array fields it is more complex!

Finaly, I want to export the data in two CSV files with the same format and be able to compare these files.
I’m attempting to create a JSON template to facilitate the transformation and make the processus faster than doing it manually.

Regards

Sorry Emmanuel, Ive had to travel over last few days and so not had time to reply. Ill take a look at your reply when i get time though later in the week when im back.

I’m getting back into things Emmanuel, sorry for disappearing but I had some family matters to attend to.

With the two sets of data I’d look at the following approaches:

  • Top level rec, how much data is in the two outputs, if you have an unwind then take this into account
  • If you have unwinds or pushes into an array then check a key dimension within and check the amount of data in the source and target
  • Get a full list of fields and generate a mapping document to verify each field, where it goes and where it came from, you could then build a secondary projection for each that converts to a single row output CSV and compare the source and target to make sure every field remains the same.

For #3 I find notepad++ and Excel to be superb in creating these kinds of thing, I also make a lot of use of Pivot tables in excel when reconciling data between systems, so when exporting data you can pull it into Excel as a CSV and then pivot key dimensions to check counts.

I can’t think of a magic bullet for this, if you have two projections to compare then it can be a manual work process, but you can start with smaller datasets which can make life easier.

Hi John,

I hope your family is doing well.

Thank you very much for your answers.
Starting with smaller datasets is a good approach.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.