So I have convert a PDF to a HTML file and I’m using then, the HTML Parser to read the file.
But unfortunaltelly the result is not good.
I 've use a python library to convert the PDF File to the HTML file. You can find the file in the attachments.
Nestlé.txt (36.6 KB) Nestlé.xml (36.9 KB)
Ps: I can’t upload pdf and html files in this post, so i change the extension to txt and to xml.