Reading PDF documents in Python using PyPDF2

Can someone help me with how to read PDF documents from GridFS using the Python library (PyPDF2)?

GridFS stores the name, contents, and optional metadata for a file and is agnostic to the type of file. Storing and reading a PDF file is the same as any other file. To upload and read a file:

my_db = MongoClient().test
fs = GridFSBucket(my_db)

# Upload a file:
with open('my.pdf', 'rb') as file:
    file_id = fs.upload_from_stream('my.pdf', file)

# Read file by _id:
with open('my-copy.pdf', 'wb+') as file:
    fs.download_to_stream(file_id, file)

# Read file by name:
with open('my-copy2.pdf', 'wb+') as file:
    fs.download_to_stream_by_name('my.pdf', file)

You can also add tags via the “metadata” argument to the various GridFSBucket upload methods.

3 Likes

Thank you @Shane for the examples.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.