I have some questions pertaining to sampling a whole mongo database for machine learning training, and then creating a test sample from a reproducible random sample.
My first question is about the case where we would like to have an even distribution from our database. From this SO post,
It seems that the distribution is not quite even, or that sample is not able to access every sample in the database.
Another question I have is if it is possible to sample with a random seed.
This answer in this SO post says no, but in the comments someone mentions it is possible