Facing Java Heap Space OOM issue when large data is read on limited memory

Basant_Gurung · October 19, 2023, 1:27pm

Hi @Ross_Lawley, I could see that setting the value for sampleSize in the spark conf to a lesser value like 10 does the trick. I would like to know how important schema inferring is, since in my case the documents may not have a similar structure always? what impact will it have if I set the sampleSize to 1?