Word co-occurence

ben_james · November 23, 2015, 1:00pm

Hi KNIME team,

Does the word co-occurence counter in KNIME have the ability to identify full-stops (e.g. for computing setence-level co-occurence) even when documents that have been parsed have inconsistent formatin? (e.g. when sometimes sentences are split between lines etc). I notice that some other packages require the documents that are given as input for parsing require sentences or paragraphs to be all together on one line (e.g. in a .txt file).

Thanks very much,

Ben.

kilian.thiel · November 27, 2015, 8:59am

Hi Ben,

yes, that is possible. Sentence tokenization is applied during parsing, when the document cells are created. All Parser nodes but also the Strings to Document node apply sentence tokenization. Once document cells are created all other textprocessing nodes can access these document cells on a sentence, term, and word level.

Cheers, Kilian

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.