I have a collection of 1000 documents of 1 MB avg doc size. I want to fetch 200 random docs. I am using the “sampleSize” property as follows. But it is fetching the entire collection. Please help! why is the “sampleSize” configuration not working? Is there any issue with the code?
val spark = SparkSession.builder()
.appName("Spark-MongoDB-Connector-Tests-001")
.config("spark.mongodb.read.connection.uri", "mongodb://x:x@localhost:27017/")
.config("spark.mongodb.read.database", "mydb")
.config("spark.mongodb.read.collection", "data_1000_docs_1mb_each")
.config("spark.mongodb.read.sampleSize", "200")
.getOrCreate()
spark.read.format("mongodb")
.load()
.toJSON.count()