I’m trying to use the Strings to document node on a CSV file and I get and error:
ERROR Strings To Document 0:24 Execute failed: String index out of range: 7
The text file here contains a sample of the data.
I use these settings for the Strings to document node:
Title: contributions_id
Text : contributions_bodyText
Sources: contributions_url
categories: contributions_section_title
author: contributions_author_id
The text file looks a bit challenging. Large lines with various special characters, text with commas, quotation marks and then , as column separator … what could possibly go wrong … but as usually Readr saves the day … (hopefully)
this should definitely not happen, thanks for providing the file and reporting the issue.
I will have a look at the problem and create a ticket, if necessary.
In the meanwhile, I hope @mlauber71’s workaround works for you.
I tried to convert it into a Document following your instructions. It seems to work although I am not a specialist for Text Analysis. I compiled a few links to Text and sentiment analysis here.
Maybe you have a look. I did it on a Mac so it might be there is a text encoding thing going on. Everything should be UTF-8.
The problem seems to be with the tokenizer. I was trying to use Stanford NLP PTBT tokenizer and now I changed for OpenNLP SimpleTokenizer and it worked.