How to create pdf from huge dataset of mongodb data about 10million data rows

Dark_Wick · April 1, 2022, 4:57pm

I want to create pdf from huge dataset of mongodb. (about 10million rows) No specific format of data. you can assume employee database Iam using MEAN stack

Approches tried:

Use nodejs lib. like pdfkit to convert mongodb result (arr of obj) to pdf by doing a for loop in result. (it causes heap out of storage issue + very slow)
Create temporary collection → do a mongoexport to csv → csv to html using awk → html to pdf using wkhtmltopdf tool. (this still is very slow)

After i do mongo query → i cannot store this data in some variable because it will cause heap out of storage issue, so i cannot do any further processing on this data. I can query using limit and skip to get data in chunks and create html and then pdf from it. but it seems very slow process.

Possible approach i think could be to create small pdfs and then merge them together, or by using streams.

What is the most efficient way to create pdf from huge datasets in mongo. for csv mongoexport works great, is there some standard way for pdfs?

steevej · April 1, 2022, 5:09pm

I would try to export into .csv.

Then load .csv into LibreOffice or Excel.

Print as .pdf

Dark_Wick · April 2, 2022, 4:27am

Hi @steevej thanks for answering, can we do this programmatically also via node/python/bash?

steevej · April 2, 2022, 12:21pm

The only way I am aware is via

https://poi.apache.org/

and

Dark_Wick · April 16, 2022, 1:04pm

Hi all, I tried libreoffice soffice tool , it worked great, i was able to create pdf for 5Mil data rows. I havent tested for more than that, i think it will work.

Dark_Wick · April 16, 2022, 1:05pm

we can install a headless version for libreoffice for usage via a language