Cleaning data

I need to clean my data, however, it is more than just detecting some words and deleting them.

I need the software to detect a list of words and then detect any sentence and paragraph that these words have been mentioned. Then delet those sentences/paragraphs.

Is it possible to do that in KNIME and if so how?

 

Many thanks

Hi,

your request is a bit vague, but I will attempt some suggestions.

You could turn your data into documents with the Strings to Document node, then use the Sentence Extractor node to split them into sentences. The same can be achieved using Regular Expressions and loops, but it will be easier with Text Processing.

Next you can parse the sentences in search for your keywords. You may have to use a loop or turn the sentences back into documents, tag and filter.

Anyway filter out those that contain the keywords. Re-assemble the remaining ones. --> Clean data set.

Hope this helps.

Cheers,
Marco.

 

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.