Mongo Spark Connector Schema Inference

I have a collection in mongo where i have around 2000 documents. I have a field in each document called purchaseDate and most of them are Bson Date Fields, but there are 10 documents where i have the field as a String representation of the date. Now, when i am reading from this collection i am getting this error —> com.mongodb.spark.sql.connector.exceptions.DataException: Invalid field: ‘purchase_date’. The dataType ‘timestamp’ is invalid for ‘BsonString{value=‘2023-04-20T00:00:00.000Z’}’… Can someone guide me on what needs to be done to mitigate this issue, i have tried giving the sampleSize to be used to inferschema to be very high, but that is having performance issues.

Hello @Guntaka_Jeevan_Paul ,

I don’t know if this will work, but before reading data from your collection, you should convert the purchaseDate to one specific type.

you can do so by using ‘$addField’ and ‘$toDate’ operators.
something like this -

{
  $addFields: {
    purchaseDate: {
      $toDate: "$purchaseDate"
    }
  }
}