ERROR Unable to process record SinkRecord

Clive_Richards · March 2, 2021, 11:18am

Hi, I’m investigating approaches to store my IOT data from multiple ‘things’. The measurements from things are stared in Kafka a topic per thing, and within the thing the key is the measurement name (string) and the value is the value of the measurement either INT or Float.

If I use a file sink I get a file of values no problem, but get issues with the MongoDB Sink

ERROR Unable to process record SinkRecord{kafkaOffset=14217, timestampType=CreateTime} ConnectRecord{topic=‘MySensors-81’, kafkaPartition=0, key=81-0-17, keySchema=Schema{STRING}, value=1579, valueSchema=Schema{STRING}, timestamp=1614000439773, headers=ConnectHeaders(headers=)} (com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData:110)
org.apache.kafka.connect.errors.DataException: Could not convert value 1579 into a BsonDocument.

How do I define to the connector that the valueSchema is INT32 or Float (some readings have decimal)?

Robert_Walters · March 3, 2021, 11:47am

Can you send your configuration for the sink minus any authentication credentials? Instead of writing just the integer/float value, wrap the value in a json document and set the value.converter in the a sink something like:

"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",

Alternatively, to get it working you could set the converter to string

"value.converter": org.apache.kafka.connect.storage.StringConverter"

By default the sink connector is looking for a BSON document you need to tell it to take whatever is on the kafka topic and interpret it as a string, a JSON document, etc…

Note: You can write the message with an Avro or JSON schema as well, then use

"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",

Robert_Walters · March 3, 2021, 2:44pm

ok, I had another thought on this one. You could do a single message transform on the sink to takes that integer value and hoist it into a document with a key/value pair as follows:

"transforms": "HoistField,Cast",
"transforms.HoistField.type": "org.apache.kafka.connect.transforms.HoistField$Value",
"transforms.HoistField.field": "temp",
"transforms.Cast.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.Cast.spec": "temp:float64",

This should take that integer and create something that is like
{ “temp”:1712}

Clive_Richards · March 6, 2021, 2:09am

The example data below is a reading from a power meter sensor (thing) calculating watts via a led pulse count … I have a topic per “thing” and about 50 things atm want to grow .

The data is being created in the kafka topic through a webthings.io adapter, so I dont have too much control, but was contemplating asking the developer if he could toggle JSON in the value

I have tried every combination of converters - even figuring out how to populate kafka schema to entertain arvo converter that resulted in NPE

Your second thought is interesting - I just have no clue how to implement it - or are these parameters for the properties file?

Given programming isnt anywhere near the top of my skils list … it would be good to guide the form of the document somehow - although it is IOT I was have expecting a pattern was available … perhaps that pattern is json in the value field show the data type of the reading - if so, I’ll put a request into the git for the webthings kafka adapter - I have been assisting testing the code

I’m using kafka-connect-standalone using

bin/connect-standalone.sh config/connect-standalone.properties config/MongoSinkConnector.properties

The data in the topic looks like this:
~kafka/kafka/bin/kafka-console-consumer.sh --topic MySensors-81 --property print.key=true --property key.separator=: --from-beginning --bootstrap-server xxxxx.local:9092
81-0-17:1579
81-1-18:628.08
81-0-17:1650
81-1-18:628.09
81-0-17:1579

The connect-standolone.properties are

bootstrap.servers=xxxxx.local:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/share/kafka/plugins

The MongoSinkConnector.properties

name=mongo-sink
topics=MySensors-81
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
debug=true
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter.schemas.enable=false
connection.uri=mongodb://localhost/
database=Sensors
collection=sink
max.num.retries=1
retries.defer.timeout=5000
confluent.topic.security.protocol=“PLAINTEXT”
key.projection.type=none
key.projection.list=
value.projection.type=none
value.projection.list=
field.renamer.mapping=
field.renamer.regex=
document.id.strategy=com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy
post.processor.chain=com.mongodb.kafka.connect.sink.processor.DocumentIdAdder

Write configuration

delete.on.null.values=false
writemodel.strategy=com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneDefaultStrategy
max.batch.size = 0
rate.limiting.timeout=0
rate.limiting.every.n=0

Robert_Walters · March 7, 2021, 9:26pm

What would you like the data to look like when it is in MongoDB? To confirm, your data in the topic are strings and looks like this, correct?

81-0-17:1579
81-1-18:628.08
81-0-17:1650
81-1-18:628.09
81-0-17:1579

Clive_Richards · March 8, 2021, 3:26am

Yes that is the data - although the developer of the Producer has agreed to write as JSON - I’ll see what it looks like in the next couple of days …Errors on adapter startup · Issue #6 · tim-hellhake/kafka-bridge · GitHub

The current structure is :
topic: device id
message key: property name
message value: {[property name]: [property value]}.

The plan is to use MongoDB as the persistence for all telemetry - I’ll then be wanting to visualise the data and perform analytics - haven’t chosen any toolsets as yet for this.

Clive_Richards · March 9, 2021, 5:45am

The change has been made to the Producer - the data now formatted as per below using this consumer command … does this help? Would I then use the JSON converter?

~kafka/kafka/bin/kafka-console-consumer.sh --topic MySensors-81 --property print.key=true --property key.separator=: --from-beginning --bootstrap-server xxxxx.local:9092

81-0-17:{“81-0-17”:1304}
81-1-18:{“81-1-18”:0}
81-0-17:{“81-0-17”:5016}
81-1-18:{“81-1-18”:958.96}
81-0-17:{“81-0-17”:5039}
81-1-18:{“81-1-18”:959.01}
81-0-17:{“81-0-17”:4920}
81-1-18:{“81-1-18”:959.05}
81-0-17:{“81-0-17”:2595}
81-1-18:{“81-1-18”:959.1}
81-0-17:{“81-0-17”:5173}
81-1-18:{“81-1-18”:959.15}
81-0-17:{“81-0-17”:5358}
81-1-18:{“81-1-18”:959.19}
81-0-17:{“81-0-17”:4989}
81-1-18:{“81-1-18”:959.24}
81-0-17:{“81-0-17”:4941}
81-1-18:{“81-1-18”:959.28}

Robert_Walters · March 10, 2021, 7:06pm

Perhaps it might be easiest if the developer could represent the document like

{ “name”:“81-0-17,” “value”:1304},
{ “name”:“81-1-18”, “value”:0},
etc

If this can’t be done you have many other options. You could use use transforms to parse the message and put it in a JSON format

you can also write your own transform, or use kSQL to create a persistent query and transform the data into a new topic that contains a JSON document structure. You could also just take the existing message and insert into MongoDB as a string and have your application do the manipulation of the data.

Clive_Richards · March 13, 2021, 6:31am

Success using value.converter=org.apache.kafka.connect.json.JsonConverter

use Sensors
switched to db Sensors
show collections
sink
db.sink.find();
{ “_id” : ObjectId(“604c54c6868c0042fcc9da14”), “81-0-17” : NumberLong(808) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da15”), “81-1-18” : NumberLong(0) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da16”), “81-0-17” : NumberLong(1043) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da17”), “81-1-18” : 1056.89 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da18”), “81-0-17” : NumberLong(1048) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da19”), “81-1-18” : 1056.9 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da1a”), “81-0-17” : NumberLong(1066) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da1b”), “81-1-18” : 1056.91 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da1c”), “81-0-17” : NumberLong(1024) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da1d”), “81-1-18” : 1056.93 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da1e”), “81-0-17” : NumberLong(1003) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da1f”), “81-1-18” : 1056.94 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da20”), “81-0-17” : NumberLong(1020) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da21”), “81-1-18” : 1056.95 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da22”), “81-0-17” : NumberLong(936) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da23”), “81-1-18” : 1056.96 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da24”), “81-0-17” : NumberLong(914) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da25”), “81-1-18” : 1056.97 }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da26”), “81-0-17” : NumberLong(1065) }
{ “_id” : ObjectId(“604c54c6868c0042fcc9da27”), “81-1-18” : 1056.98 }

Now I have this working I can make schema design decisions about how I store the data. The key is inherently storing a lot of information - “-”<“metric number”>-<“metric reading type e.g watts, or temp in celcius, humidity in % etc”>. The metric reading type is held in a schema store defined by webthings.io so I could keep a copy as a lookup, but I would have to do it every time … not so easy if trying to use a simple visualisation app … this is this the whole database debate I guess…I think I’m leaning toward being readable … its not as if its someones name which can change and duplicates seem to be ok or worried about saving bytes on disk … a temperature reading for that date and time will always be that temperature reading for that moment in time …

Anyway I like your idea of transformations - it makes sense, so I’ll be researching that more…

Speaking of time - how do I get the reading date/time stamp into the document - just occurred to me - its not in the values -next problem to solve… anyway I’ll close this - thankyou for your help Robert

system · March 18, 2021, 6:31am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.