Docs Menu

Docs HomeMongoDB Spark Connector

Filters

When using filters with DataFrames or Datasets, the underlying MongoDB Connector code constructs an aggregation pipeline to filter the data in MongoDB before sending it to Spark. This improves Spark performance by retrieving and processing only the data you need.

MongoDB Spark Connector turns the following filters into aggregation pipeline stages:

  • And

  • EqualNullSafe

  • EqualTo

  • GreaterThan

  • GreaterThanOrEqual

  • In

  • IsNull

  • LessThan

  • LessThanOrEqual

  • Not

  • Or

  • StringContains

  • StringEndsWith

  • StringStartsWith

The following example filters and output the characters with ages under 100:

df.filter(df("age") < 100).show()

The operation outputs the following:

+--------------------+---+-------------+
| _id|age| name|
+--------------------+---+-------------+
|[5755d7b4566878c9...| 50|Bilbo Baggins|
|[5755d7b4566878c9...| 82| Fíli|
|[5755d7b4566878c9...| 77| Kíli|
+--------------------+---+-------------+
MongoDB Connector for Spark →
Share Feedback
© 2023 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2023 MongoDB, Inc.