Reading PDF documents in Python using PyPDF2

jitrocs · October 6, 2021, 8:41am

Can someone help me with how to read PDF documents from GridFS using the Python library (PyPDF2)?

Shane · October 6, 2021, 3:20pm

GridFS stores the name, contents, and optional metadata for a file and is agnostic to the type of file. Storing and reading a PDF file is the same as any other file. To upload and read a file:

my_db = MongoClient().test
fs = GridFSBucket(my_db)

# Upload a file:
with open('my.pdf', 'rb') as file:
    file_id = fs.upload_from_stream('my.pdf', file)

# Read file by _id:
with open('my-copy.pdf', 'wb+') as file:
    fs.download_to_stream(file_id, file)

# Read file by name:
with open('my-copy2.pdf', 'wb+') as file:
    fs.download_to_stream_by_name('my.pdf', file)

You can also add tags via the “metadata” argument to the various GridFSBucket upload methods.

jitrocs · October 7, 2021, 5:28am

Thank you @Shane for the examples.

system · October 12, 2021, 1:08pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.