Hi @Basant_Gurung,
So sampleSize is important in that directly relates to the number of documents used to infer the schema. If you chose 1 - then only a single document’s schema would be used. If your data is mixed and many documents have different shapes, then a larger sample will be required. As the sample is randomly selected from the collection - you need a relatively large size to ensure a representative sample.
When reading from the collection the documents are then shaped into the schema. So not having a schema that is representative of the data is problematic as data could be missed or type errors can occur converting into the corresponding Spark type.
Ross