Import a CSV, TSV, ect... beginning at a particular row number or a row beginning with a known header row?

jamie_humphries · May 18, 2022, 5:06pm

I have data in TSV format. The data has a multiple rows preceding the header row.
How to start my import on a particular row number or preferably when a row contains the header info for variation in the preceding row info?

Ramachandra_Tummala · May 19, 2022, 11:08am

Have you tried headerfile and addFields?
I don’t think you can import partial data skipping some rows
Check mongo documentation for exact syntax

Stennie_X · May 19, 2022, 11:54am

Welcome to the MongoDB Community @jamie_humphries !

mongoimport’s --headlerline option uses the first line in the input source as a header as the field list.

If you have lines preceding the header to skip, you could edit your tsv file to remove them, or use another command-line utility to filter as required.

For example, on macOS or Linux the tail utility should be available by default. The syntax for tail to skip lines is tail +<lines to skip +1> ... , so to skip the first 3 lines and pass the filtered tsv to mongoimport I would use a command line like:

tail +4 sample-data.tsv | mongoimport -d sample -c data --type=tsv --headerline

Regards,
Stennie

jamie_humphries · May 19, 2022, 1:09pm

Thanks.
I appreciate it. In SQL you can ignore x number of rows. but you have to specify. When you need to import several hundred text files a day editing becomes a huge task and I can’t change the output of the machines I’m getting the data from.