Cannot sink Windowed queried streaming data w/ spark to MongoDB


Using Spark Structured Streaming I am trying to sink streaming data to a MongoDB collection. The issue is that I am querying my data using a window as following:

def basicAverage(df): 
  return df.groupby(window(col('timestamp'), "1 hour", "5 minutes"), col('stationcode')) \
  .agg(avg('mechanical').alias('avg_mechanical'), avg('ebike').alias('avg_ebike'),  avg('numdocksavailable').alias('avg_numdocksavailable'))

And it seems that mongodb spark connector cannot support a writeStream containing windowed data because when I run the script, my collection remains empty and no error shows up. I tried to delete the window option on my query and the sink worked like a charm.

Here is my sink method:

queryBasicAvg.writeStream.format('mongodb').queryName("basicAvg") \
  .option("checkpointLocation", "./tmp/pyspark7/").option("forceDeleteTempCheckpointLocation", "true") \
  .option('spark.mongodb.connection.uri', 'mongodb://') \
  .option("spark.mongodb.database", 'velibprj').option("spark.mongodb.collection", 'stationsBasicAvg') \

Any thought on how to solve this issue ?

Thanks in advance

I solved the issue by using foreach sink method.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.