I would like to do sentence extraction without loosing the original comment through the text preprocessing. I saw some featutre about using java snippet but I could not make it work. I would appreciate your help to know how to do it or if there are another solution. An example will be helpful as well as i am new to use knime !
could you please explain a little bit more detailed what you mean by "loosing the original comment through the text processing". I guess you want to end up with a data table consisting of a column containing the extracted sentences and another column containing the original text the senteces was extracted from, is that correct?
Yes this is correct. I have a file has review comments. I used sentence extractor to extract sentences. Then I used Extended NER prerpcessing. Then used BoW creator but the output table give me the sentence and lost the original review comment. Can you help please how to generate an output table has the review comment, sentences extracted, Terms, etc.?
i assume you are using the Strings to document node to create the documents. After that node use the sentence extractor to extract the sentences. To get the original text, the document was created from use the Document Data Extractor node and specify "Title", and "Document Body Text" to extract (only available in KNIME Textprocessing 2.8).
Attached you find an example workflow.
Thanks Kilian but the only problem here is the first row of extraction in the sentece column appears as the full comment, how can I exclude this from the result. see the attached
The title is extracted as a sentence itself as well. I guess that you are using the full comment column as title and text column? One approach would be to create generic titles for each comment like "commen 1", "comment 2". Therefore create another column e.g. by the java snippet node right beside the full comment text column and use this column in the Strings to Document node as title column. Later on you can easily filter these rows with a Row Filter.
Thanks Kilian, can u pass me a code please as I am still new in this, from java snipet to allow me create new column that do the folowing:
if value of column 1 = value of column2 then T else F
To create a comment title like "Comment 1", "Comment 2" etc. the code for the Java Snippet node would look like:
return "Comment " + ($$ROWINDEX$$ + 1);
To create Rules as you mentioned in your last post use the Rule Engine node. You do not have to write java code when using this node.