There is a problem with the injection of title in full text when creating a document from strings.
The preprocessing and transformation nodes often act on the full text.
Examples:
- Number filter will remove the title if the title is numbers.
- Bag of Words Creator will have the title as a term. If you choose to not have a title, the Strings to Document will insert Row ID (Row0 and so on) as title and you will have Row as term.
If you look at the Sentiment Classification example (08_Other_Analytics_Types/01_Text_Processing/03_Sentiment_Classification) and run it as is, you will have no titles from the Strings to Documents node (even if it is activated), and you will have a nice roc-curve with a score at 0.94.
If you then open the configuration of the Strings to Documents and deactivate the "Use title from column", and directly activate it again, you will have, after running the workflow, a roc-curve with a score of 1.0. This because now the model is trained on the document classification and it will always be correct.
Is the title injection in the full text correct or is it a bug/not so good feature?
By the way, I'm running Knime v3.4.1.
Best regards,
Max