Pyspark get list of collections

Hi all,

Is there a way to retrieve a complete list of collections (similar to ‘show collections’ using PySpark? I would like to execute a query across multiple collections but avoid creating a new spark read session each time I do so.

Cheers,

Ben.

import pyspark
from pyspark.sql import SparkSession

Create a SparkSession

spark = SparkSession.builder.appName(“Get MongoDB Collections”).getOrCreate()

Configure the SparkSession with the MongoDB connection information

spark.conf.set(“spark.mongodb.input.uri”, “mongodb://localhost:27017/mydb”)

Get a list of collections

collections = spark.read.format(“mongo”).listCollectionNames()

Print the list of collections

for collection in collections:
print(collection)

Hi @sagar_sadhu,

Thanks for your reply. I get the following error when I tried your code:
AttributeError: ‘DataFrameReader’ object has no attribute ‘listCollectionNames’

I can see listCollectionNames is a part of the standard mongodb libraries, but not pyspark. Does this sound correct to you?

Kind regards,

Ben.

Are you trying to only get a list of collections? As you pointed out that can be done via standard mongo drivers.
For example in python:
https://pymongo.readthedocs.io/en/stable/api/pymongo/database.html#pymongo.database.Database.list_collection_names

import pymongo

connect to MongoDB

client = pymongo.MongoClient()

get the database

db = client.my_database

list the collections

collections = db.list_collection_names()

The MongoDB Spark connector is limited to interacting with only one MongoDB collection during each read or write operation. As a result, it does not natively support reading or writing from multiple database/collections/ schemas, simultaneously in a single operation.

You can create a loop that iterates over the list of collections you want to read from, and for each collection, use the MongoDB Spark Connector to read the data into Spark.

for collection in collections:
   sparkDF = spark.read.format("mongo").option("collection", collection).load()