When using mongo db connector of spark 2.x or spark 3.x, for the same volume of data, it both works well for sample partition and splitvector partition.
But, after upgrade to mongodb spark connector v10.x, we met strange issues:
1: There is log info “Partitioner Getting collection stats …”, and it took too long time for big data size
2: we have 2 collections, it only read the first one without the second one.
Is there any limitation for v10.x or what we can do to ignore the “Partitioner Getting collection stats” operation?
There is no Splitvector Partition for v10.x, is there any reason to remove it?