Dear all,
I am doing web scraping for my application. I store date, title and text information of a website along with its url. However some of the urls have big text data, which causes a problem with mondodb size limitations. During my search, I have found gridFS and planning to use it in a way that
store in mongodb db if size of text < something, else store using gridFS. Something like below;
if len(text)>0:
# put a large document
a = fs.put(b"hello world", filename = url) # to do
large_url = 1
text = ''
else: # keep an identifier for saying it is large url
large_url = 0
text = 'myurltext'
data = {
'url': url,
'date': 'some date',
'text': text,
'title': 'title',
'large_url': large_url
}
and then during access, if large_url is 1 use get operator of gridFS to assign text field from gridFS.
how applicable is this method?
what would be a better solution?
thanks