Can sentences be preserved with text preprosessing?

teemuh · May 25, 2009, 9:04am

Hi,

I would need to do text preprocessing (like stop word filtering and stemming) but also retain the sentence “structure” so that I could see what (stemmed and non-stop) words occur in the same sentence. Is there some way to do this?

Thank you!
-Teemu

kilian.thiel · May 26, 2009, 4:15pm

All preprocessing nodes can attache the original documents. Therefore only the column containing them needs to be specified in the dialog. These documents contain the original sentences before stemming or filtering was applied.

To compare the sentences, the original and the preprocessed in a way that preprocessed terms are detected and retruned in an outgoing data table you need to implement your own node. It is possible to access all parts contained in a document, like paragraphs, sentences, terms etc.

Hope i could help you.
Kilian