I am working on a use case that I need to stream a collection to pyspark and then write stream back to the same collection on MongoDB. I did not find any document about stream read in pyspark.
Can anyone make me clear about it? Maybe It will be a new feature?
Today Structured Streaming isn’t supported, but you can make a connection to MongoDB via PyMongo and create a change_stream cursor to watch for data changes.
Is supporting Structured Streaming something that would benefit you? Can you describe your use case a bit more?
Thanks for yor clarification. Connecting via PyMongo is not streaming and It is not distributed I think, It means that you run a single thread python instance to read all the data. With the help of Spark Streaming(and Structured Streaming) all the workers can read stream a piece of data. Am I right?
My use case is to analyse(running A.I. algorithms) a collection of data in MongoDB which is being updated every 5 seconds and write the results.