Using gridFS for big web url

dammy · January 26, 2023, 1:54pm

Dear all,

I am doing web scraping for my application. I store date, title and text information of a website along with its url. However some of the urls have big text data, which causes a problem with mondodb size limitations. During my search, I have found gridFS and planning to use it in a way that

store in mongodb db if size of text < something, else store using gridFS. Something like below;


if len(text)>0:

    # put a large document
    a = fs.put(b"hello world", filename = url) # to do
    
    large_url = 1
    text = ''


else: # keep an identifier for saying it is large url
    
    large_url = 0
    text = 'myurltext'
    
    
    
    
data = { 
        'url': url,
        'date': 'some date',
        'text': text,
        'title': 'title', 
        'large_url': large_url

        }

and then during access, if large_url is 1 use get operator of gridFS to assign text field from gridFS.

how applicable is this method?

what would be a better solution?

thanks