Hi,
I am getting the below message when I am trying transformation of Table Reader to Strings to Document,
“Execute failed: Cell at index 0 is null!”
Hi,
I am getting the below message when I am trying transformation of Table Reader to Strings to Document,
“Execute failed: Cell at index 0 is null!”
Hi there @AhmetYavuz,
welcome to KNIME Community Forum!
What is content of Table Reader and how does Strings to Document configuration looks like?
Br,
Ivan
Hi @ipazin,
Thank you so much for your reply,
Actually I try to use NGram node, before using it I used PDF Parser for extract 2000 pdf documents containing sciencific articles, then I used Document Extractor to extract tables from PDF Parser,
I wrote this as table then read with Table Reader, so
Table Reader contains 2000 pdf article,
I want to use NGram after preprocessing, do you have any comment for big volume documents like thousands of artcile
Kind Regards,
Ahmet
Hi @AhmetYavuz -
The String to Document node only works on data of type string, whereas you created a table using the PDF Parser, which produces documents already. (That is, if I’m understanding you correctly - let me know if I’m not.)
We actually just published a blog post a couple of weeks back that deals with pre-processing of OCR data, along with creating bigrams for additional analysis. I realize you are using PDFs as input as opposed to images, but a lot of the concepts are similar. As with most of our blog posts, it comes with a workflow too - check it out here:
https://www.knime.com/blog/an-experiment-in-ocr-error-correction-sharing-treasure-on-the-knime-hub
Encountering such problem. Any solution
I answered here:
Hi There,
This after having read PDF file via a Tika Parser node where I am trying to read the contents of the PDF file.
The text I am trying to extract is from the pdf of the link below:
https://onlinelibrary.wiley.com/doi/abs/10.1002/bse.2703
Due to copyright issue I wont be able to share the pdf directly.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.