Array of Objects become Array of String during upload to Kafka

Hello *,

Mongo Version => 4.0-xenial (Docker)
Confluent Kafka Version => 6.0.0 (Docker)

MongoConnector => 1.3.0

I am encountering problem where suddenly during uploading data to Kafka Array of Objects becomes Array of Strings e.g. there is:

{
  "id": 12345
  "array_name": [
     {"id": 12345, "something": {"id": 12345, "another": 5.5} ...},
     . . .
  ]
}

and in Kafka it becomes:

{
  "id": 12345
  "array_name": [
     "{\"id\": 12345, \"something\": {\"id\": 12345, \"another\": 5.5} ...}",
     . . .
  ]
}

Connector options are:

{
  "key.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false",
  "name": "Mongo-Connect",
  "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
  "tasks.max": "1",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "errors.log.enable": "true",
  "errors.log.include.messages": "true",
  "connection.uri": "mongodb://mongo1:27017",
  "database": "testdb",
  "collection": "testcol",
  "topic.prefix": "test-prefix",
  "output.format.key": "json",
  "output.format.value": "schema",
  "output.schema.infer.value": "true",
  "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
  "copy.existing": "true"
}

Hi,
I could not reproduce this problem and I used your same configuration. I used the mongosh shell to insert

db.testcol.insert({"array_name":[{"something":{"another":55}}]})

I used Kafkacat to print the message

kafkacat -b localhost:9092 -t test-prefix.testdb.testcol -C -f '\nKey :\t %k\t\nValue :\t %s\nPartition: %p\tOffset: %o\n--\n'\n

Key :	 {"_id": {"_data": "825FB57CB1000000022B022C0100296E5A1004E00B897973A14FEDB1A43528B132F50846645F696400645FB57CB12CDCC09225054EBE0004"}}	
Value :	 {"_id":{"_data":"825FB57CB1000000022B022C0100296E5A1004E00B897973A14FEDB1A43528B132F50846645F696400645FB57CB12CDCC09225054EBE0004"},"clusterTime":-588311704,"documentKey":{"_id":"5fb57cb12cdcc09225054ebe"},"fullDocument":{"_id":"5fb57cb12cdcc09225054ebe","array_name":[{"something":{"another":55}}]},"ns":{"coll":"testcol","db":"testdb"},"operationType":"insert"}
Partition: 0	Offset: 0

How are you reading the kafka message ? My guess is it whatever you are using to read the message is taking that array of objects and converting to strings. Can you try with Kafkacat?

Hey @Robert_Walters yes indeed reproduction is not trivial :slight_smile: I posted JSON Document which could help to reproduce.

{
    "-1236052134575208584": 1603802006849,
    "-3078921119283744887": {
      "6022414958441676900": {
        "4344647195749500666": "4344647195749500666",
        "6440041613324510652": "6440041613324510652",
        "8265573421575953197": "8265573421575953197"
      },
      "2919502189498201214": {
        "-4263262838430303025": {
          "-1013215829982866721": 6,
          "1209830369222860200": 12124
        },
        "8815982370389166190": false,
        "6135136337149060510": {
          "-5671795673574091253": 4.226828822362668,
          "-1545304719862780077": 0,
          "6472265773460362391": "6472265773460362391",
          "1664694551407375950": {
            "-8201417328859773479": 643,
            "1895862094853980300": 400
          },
          "3582048408897295416": 3,
          "4676544319411765991": {
            "-5606549786281302960": "-5606549786281302960"
          },
          "3702657416359142280": null,
          "-2764320561543193457": null,
          "7342153465701196142": {
            "4273523332437800642": 12124,
            "2877948439254078926": true
          }
        },
        "-7535015702201045030": null,
        "6968704615890252414": "6968704615890252414",
        "9143290242159779235": 1
      },
      "-1631773694551398667": "-1631773694551398667",
      "1654016379873652592": 1
    },
    "-8730419922316402040": {
      "-701297860731463640": {
        "8850753136911882592": {
          "-8954055381677511378": 6,
          "-3600730930140391064": 12124
        },
        "34763064450124911": false,
        "-7207119165969726438": {
          "-7658080728774377404": 1.6432313657184152,
          "-5661544056901251958": 5079,
          "-6626094279135658848": "-6626094279135658848",
          "2276608449961011082": {
            "641677532283622210": 609,
            "-3232278498323453766": 400
          },
          "3356311166389917555": 29,
          "7445038319791579437": [
            "5374410395986528777",
            "-5970305719089840402"
          ],
          "5453741773890927602": 1,
          "-8201960194732107681": null,
          "7128737013364301064": {
            "27640474992925850": 12124,
            "-6617784163376580317": true
          }
        },
        "-8750040877746075818": [
          {"-7675143677926194569": 41, "-76751436779261945619": {"-7675143677926194569": 12124, "-76751436779226194569": true}, "-76751433677926194569": ["-7675143677926194569", "-7675143677926194569"], "-76751443677926194569": {"-76751436775926194569": 941, "-7675143677926194569": 660}, "-76751436779261994569": 50, "-76751431677926194569": 1.044447267034656, "-7675143677926194569a": "-7675143677926194569", "-76751h43677926194569": 2, "-76751436747926194569": 0},
          {"-25342639a09144125530": 29, "-25342463909144125530": {"-25344263909144125530": 12124, "-2534263909144125530": true}, "-2534263909a144125530": ["-2534263909144125530", "-2534263909144125530"], "-253426390914a4125530": {"-25342639091441a25530": 609, "-2534a263909144125530": 400}, "-253426390aa9144125530": 5079, "-253a4263909144125530": 1.6432313657184152, "-25a34263909144125530": "-2534263909144125530", "-2534263909144125530": null, "-25342639091d44125530": 1},
          {"-351845619a2799115644": 42, "-35184561492799115644": {"-35184561942799115644": 12124, "-3518456192799115644": true}, "-35a18456192799115644": ["-3518456192799115644", "-3518456192799115644"], "-35184561927a99115644": {"-3518456192799a115644": 684, "-3518456192799115644": 400}, "-3518456192799aa115644": 593, "-351845a6192799115644": 2.479277203030802, "-3518a456192799115644": "-3518456192799115644", "-a": null, "-3518456a192799115644": 2},
          {"-7973291a657838087544": 26, "-79732491657838087544": {"-79732916578380875444": 12124, "-7973291657838087544": true}, "-797329165a7838087544": ["-7973291657838087544", "-7973291657838087544", "-7973291657838087544", "-7973291657838087544", "-7973291657838087544", "-7973291657838087544"], "-7973a291657838087544": {"-797a3291657838087544": 847, "-7973291657838087544": 650}, "-797329165783808a7544": 1820, "-797329165a7838a087544": 2.5081437308920598, "-79732a91657838087544": "-7973291657838087544", "-7973291657aaa838087544": 2, "-79732916578a38087544": 3},
          {"723957439050a4453704": 30, "72395743940504453704": {"72395743905044543704": 12124, "7239574390504453704": true}, "7239574390504a453704": ["7239574390504453704", "7239574390504453704", "7239574390504453704", "7239574390504453704"], "723957a4390504453704": {"7239574390504453704": 759, "723957439aa0504453704": 1100}, "72a39574390504453704": 1732, "7239574390a504453704": 2.5294378400108055, "72395743905044aa53704": "7239574390504453704", "7239574390504aa453704": 2, "723957439a0504453704": 4},
          {"-23912009330439173695": 43, "-23910093340439173695": {"-23910093304394173695": 12124, "-2391009330439173695": true}, "-23910a09330439173695": ["-2391009330439173695", "-2391009330439173695"], "-2391009330a4391736a95": {"-23910093304391736a95": 616, "-2391009330439173695": 480}, "-239100933043917a3695": 347, "-239100aa9330a439173695": 2.9827470360102772, "-2391a009330439173695": "-2391009330439173695", "-2391009330a439173695": null, "-2391009330439173695": 5},
          {"-745169876a8078486853": 39, "-74516984768078486853": {"-74516948768078486853": 12124, "-7451698768078486853": true}, "-7451698768a078486853": ["-7451698768078486853", "-7451698768078486853"], "-74516987a6a8078486853": {"-7451698a768078486853": 690, "-7451698768078486853": 480}, "-745169876807848a6853": 1411, "-745169876aa8078486853": 3.7791890450274153, "-745a1698768078486853": "-7451698768078486853", "-745169876a80aa78486853": null, "-7451698768078486853": 6},
          {"-12230520276098273608": 6, "-12230502760948273608": {"-12230540276098273608": 12124, "-1223050276098273608": true}, "-122305a0276098273608": ["-1223050276098273608", "-1223050276098273608"], "-1223050276a098273608": {"-122305027a6098273608": 1098, "-1223050276098273608": 1170}, "-122305027609827a3608": 319, "-12230502760a98273608": 3.790121254273812, "-12230a50276098273608": "-1223050276098273608", "-122305027609a8273608": 2, "-1223050276098273608": 7},
          {"-83081032622357989357": 34, "-83081026422357989357": {"-83084102622357989357": 12124, "-8308102622357989357": true}, "-830810262235a7989357": ["-8308102622357989357", "-8308102622357989357"], "-83081026a22357989357": {"-830810262a2357989357": 1009, "-8308102622357989357": 1500}, "-83081026223579aa89357": 425, "-8308102622a357989357": 4.128897616678362, "-830a8102622357989357": "-8308102622357989357", "-83081026223a57989357": 2, "-8308102622357989357": 8},
          {"-75169676a67566740423": 44, "-75169676467566740423": {"-75146967667566740423": 12124, "-7516967667566740423": true}, "-75169a67667566740423": ["-7516967667566740423", "-7516967667566740423"], "-75169676a67a566740423": {"-7516a967667566740423": 776, "-7516967667566740423": 440}, "-a7516967667566740423": 379, "-7516967667566a740423": 5.887623744018165, "-75169a67667566a740423": "-7516967667566740423", "-751696766756a6740423": null, "-7516967667566740423": 9},
          {"72411138a38436319010": 52, "72411138384436319010": {"72411134838436319010": 12124, "7241113838436319010": false}, "72411138384363190a10": ["7241113838436319010", "7241113838436319010"], "7241113838436aa319010": {"72411138384a36319010": 761, "7241113838436319010": 459}, "724111383843631901a0": 1942, "7241113838436319a010": 6.263293924039651, "72411138384a36319010": "7241113838436319010", "724111383a8436319010": 1, "7241113838436319010": 10},
          {"66888828393518514293": 40, "66888828935185142493": {"6688882893518514293": 12124, "66888824893518514293": true}, "668888289351851a4293": ["6688882893518514293", "6688882893518514293"], "66888828935185a14293": {"66888828935a18514293": 668, "6688882893518514293": 500}, "6688882893a51851a4293": 99, "668888289351851429a3": 9.81369212962963, "6688882893518a514293": "6688882893518514293", "66888828a93518514293": null, "6688882893518514293": 11},
          {"682406913a7771811068": 27, "68240691377718110648": {"68244069137771811068": 12124, "6824069137771811068": true}, "6824069137a771811068": ["6824069137771811068"], "68240691377718110a68": {"6824069137771a811068": 977, "6824069137771811068": 850}, "6824069137771811a068": 416, "682a4069137771811068": 19.35300719938331, "6a824069137771811068": "6824069137771811068", "6824069137a77181a1068": 2, "6824069137771811068": 12},
          {"401853999a2064174925": 54, "40185399920641749425": {"40185399920464174925": 12124, "4018539992064174925": true}, "4018539992064a174925": ["4018539992064174925", "4018539992064174925"], "401853999206417a4925": {"401853999206a4174925": 669, "4018539992064174925": 400}, "40185399920641a74925": 48, "40185399920641749a25": null, "40185399920a64174925": "4018539992064174925", "401853a9992064174925": null, "4018539992064174925": 13},
          {"90179546838230936796": 45, "90179546882309367496": {"90179546882430936796": 12124, "9017954688230936796": true}, "901795468823a0936796": ["9017954688230936796", "9017954688230936796"], "901795468823093a6796": {"901795468823a0936796": 637, "9017954688230936796": 500}, "90179546882309a3a6796": 0, "90179546882309a36796": 3.661615067005595, "901a7954688230936796": "9017954688230936796", "901795a4688230936796": null, "9017954688230936796": 14},
          {"5443577614393337551": 16, "5443577613933375451": {"5443574761393337551": 12124, "544357761393337551": true}, "54435776139333a7551": ["544357761393337551", "544357761393337551"], "54435776a1393337551": {"5443577613a93337551": 862, "544357761393337551": 400}, "5443577a61393337551": 0, "5443577613933375a51": 4.044922689075912, "544357761393337a551": "544357761393337551", "54435776139a3337551": 3, "544357761393337551": 15},
          {"-26743996049095110585": 3, "-26743996409095110585": {"-26743996094095110585": 12124, "-2674399609095110585": true}, "-26743996090495110585": ["-2674399609095110585", "-2674399609095110585"], "-26a74399609095110585": {"-267439a9609095110585": 643, "-2674399609095110585": 400}, "-26743a99609095110585": 0, "a-2674399609095110585": 4.226828822362668, "-267439960a9095110585": "-2674399609095110585", "-26743996a09095110585": null, "-2674399609095110585": 16}
        ],
        "6215212195274199410": "6215212195274199410",
        "-5550529520968987173": 1
      }
    }
  }
1 Like

I trimmed this down into the following to repro:

{
    "L1": {
      "L2": {
        "L3": [
          {"V2": {"K1": 0},
           "K1": 0},
          {"V5": ["A1", "A2"],
          "V11": 1}
        ]
      }
    }
  }

I filed a Jira ticket to track this and investigate it more.
https://jira.mongodb.org/browse/KAFKA-175

1 Like

Hi @lyubick,

I can see why this is confusing. Unfortunately, Arrays in Kafka can only have a single value type (a single schema for all values), given the arrays have varied values, with differing schemas the infer schema logic picks the base String schema type and uses the Json formatter for formatting that data.

I hope that helps clarify the situation.

Ross

Hi @Robert_Walters,

Thanks, I understood the RCA, I am bypassing it by using StringConverter it produces 100% valid JSON string then later is read and transformed back to JSON Object.

However I do not agree that problem is with Kafka itself, since everything in Kafka is kept in a Array[Byte] format - this is why we are using converters, so types are not applied there.

Another point that I could agree that there is a logic that elements of array should be the same. But again on which scope? Basically this array is Array[Objects], each document has its own internal structure.

And finally if MongoDB saves, operates such structures, provides the functionality to create MongoDB -> Kafka -> MongoDB connection (as mentioned in docs) then obviously something is wrong with the JsonConverter because during this Data flow Source != Sink.

Hi @lyubick,

Currently the connector only supports list validation for Json Arrays. See Json Array compatibility. The connector internally uses the Kafka SchemaBuilder API which only allows a single type for the Array value.

I’ve re-opened: https://jira.mongodb.org/browse/KAFKA-175 to investigate further improving support for Json with Schema in the future.

Ross

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.