Supported Data Formats
Data Lake can read the following data formats:
Comma-Separated and Tab-Separated Value Data Files
Your CSV or TSV file must start with a header row. Atlas Data Lake utilizes the header row as field names. The dot-delimited field names in the header row become nested fields or objects in JSON format. For each dot in the field name, Data Lake creates another level of nesting.
Suppose your Data Lake is reading a CSV file with content similar to the following:
company,location.state,location.city.name,location.city.street "MongoDB", "California", "Palo Alto", "Forest Ave"
For the data fields in the above example CSV file, Data Lake creates a JSON document similar to the following:
{ "company": "MongoDB", "location": { "state": "California", "city": { "name": "Palo Alto", "street": "Forest Ave", } }
Data Lake requires all field names at the same level of nesting to be unique. The following are examples of invalid field names in the header row:
One field duplicates another field at the same level of nesting.
ExampleConsider the following:
company,location,company In the header,
company
is repeated twice at the same level of nesting.One dot-delimited field duplicates another field at the same level of nesting.
ExampleConsider the following:
company,location,location.city In the header,
location
is both a stand-alone field and dot-delimited field at the same level of nesting.