Hello.
I want my python script to get me all fields (and subfields) for every single collection (I need to validate no fields are added/removed in a deployment).
I tried:
collection_conn.aggregate(
[{"$project": {"arrayofkeyvalue": {"$objectToArray": "$$ROOT"}}},
{"$unwind": "$arrayofkeyvalue"},
{"$group": {"_id": "None", "allkeys":
{"$addToSet": "$arrayofkeyvalue.k"}}}])
And works great for small collections but when reaching over 1.5 million records… stalls for ever. So, for those big collections I tried this:
collection_conn.aggregate(
[{"$project": {"arrayofkeyvalue": {"$objectToArray": "$$ROOT"}}},
{"$unwind": "$arrayofkeyvalue"},
{"$sort": {"EchoTimeStamp": -1}},
{"$limit": 1000000},
{"$group": {"_id": "None", "allkeys":
{"$addToSet": "$arrayofkeyvalue.k"}}}], allowDiskUse=True)
It kind-of works… takes over 6 minutes to go thru the entire cursor and get me the keys. I also tried with a find.sort.limit – same thing. I have like 20 collections like that, so its not an option to do this.
The timestamp column is indexes (the only one).
So I am wondering if there is any way easier to do that. I see my IDE that has on the left pane all the Databases, collections fields and sub-fields, when I refresh, maybe takes a minute to refresh the entire collection…
Apologies if I am using incorrect terminology, just started to work with mongo 2 weeks ago. I call a Sub-Field all the fields of an Field type array.
Thank you in advance for your help and advice.
Juan Luna