Dear KNIME Labbers,
Unfortunately, I couldn’t find a hypenation node, and I’m not Java-savvy enough to create one myself. But I can use Google and do some research, so find my results below. Anyone up to the challenge of creating a hyphenation node?
UPDATE: Actually, this iText thingie already built into KNIME seems to support hyphenation as well:
Any chance for a front-end?
ok i’ll implement it ;-).
There will be a hyphenation node in the next textprocessing plugin release.
what kind of output do you expect of this hyphenation node ?
I suggest an output table containing the terms (of the bag of words) with a separator ("-" by default)
at all possible split positions.
Wow, I somehow never saw your answer, sorry about that! Happy New Year! Your proposal sounds like the most sensible solution, so feel free to go ahead with it.
The hyphenation node is available in the new version of the textprocessing plugin.
If you have any questions or problems about it just let me know.
I hope the new node is useful.
Pretty impressive work, many thanks for that! However, it refuses to accept extracted sentences as hyphenation target, even though the help claims that it will work on terms and strings, which would be useful. It also “swallows” my term frequency column, which I guess it shouldn’t.
I’m sorry, the description that the Hyphenation node accepts string and term is wrong. It is a regular preprocessing node and only accepts a bag of word (term-document tuples) as input. Like the other preprocessing nodes the output is again a bag of words consisting of a term column containing the preprocessed term and a document column with the preprocessed documents. All other columns are “swallowed”. You should first use the preprocessing nodes and do the hyphenation and than use the frequency nodes afterwards.
What exactly are you trying to do ? Perhaps i can you a few tips which nodes to use in which order.