How to store csv's in a folder structure

Calculant · October 21, 2023, 2:57am

Hi,

i have an ~10,000 .csv files in a folders in Windows [I can migrate it to Linux]

[1] They are all categorically arranged in a folder structure [like folder within folder within folder…5 levels of folder structure
[2] There are EVER-RUNNING python scripts, which update these .csv files and may add new .csv files.
[3] I don’t want to import .csv files into mongodb.
[4] Is there any way Mongodb can source data from these .csv files [given the folder structure].
[5] last requirement would be, can it AUTO sync to latest .csv files. given the number of .csv files may grow in each folder and also each file may grow in size [python updation].

Thanks

John_Sewell · October 22, 2023, 1:15pm

What do you want to get out of this? If you’re not loading them into Mongo then why does Mongo come into this? Do you want to be able to search and index them and are just thinking of being able to use the Mongo query language as a tool to accomplish this?

Calculant · October 22, 2023, 9:20pm

Hi,

[1] i re-looked at my question, i meant the following on
point 3. “i dont want to import .csv files into mongodb
MANUALLY”.
[2] I should be able to use any MongoDB tool and it should be able to create
collections within collections based on the nesting of folders and importing .csv AUTO.

Thanks

John_Sewell · October 23, 2023, 8:51am

I’m not aware of anything within Mongo or the db tools that could facilitate this automatically however it should be trivial to create something that monitors a folder and when a file is dropped in or updated runs a mongoimport to pull the data into a collection, you could then form the collection name based on the file / path or set a field on the import based on the file path.
You’d need to account for modified files, dropping the existing collection with the drop flag, but you may need to watch out for edge cases when the file is modified and then modified again while being processed.

When I say trivial…it’ll take some work but the mechanisms for monitoring should be fairly easy.

There was a similar question on reddit from a few years ago:
https://www.reddit.com/r/mongodb/comments/a55tgg/automatic_import_csv_files/