i have kinda big dataset (about 10 million records, 4gb). i have some keywords. i need to search these keywords in this collection and remove the documents containing any of them. the problem is my keyword list is also not small. it is about 20 thousand.
is there a faster way other than making 20 thousand queries ?
With a sophisticated enough regular expression, yes.
oh god right why didnt i think of this before. i will do some speed testing with this. thanks.