Stream from MongoDB to MongoDB in PySpark

Thanks for yor clarification. Connecting via PyMongo is not streaming and It is not distributed I think, It means that you run a single thread python instance to read all the data. With the help of Spark Streaming(and Structured Streaming) all the workers can read stream a piece of data. Am I right?

My use case is to analyse(running A.I. algorithms) a collection of data in MongoDB which is being updated every 5 seconds and write the results.

Thanks.

1 Like