Hi @Ross_Lawley, I could see that setting the value for sampleSize in the spark conf to a lesser value like 10 does the trick. I would like to know how important schema inferring is, since in my case the documents may not have a similar structure always? what impact will it have if I set the sampleSize to 1?