Syllables / hyphenation

Ergonomist · October 29, 2009, 4:39pm

Dear KNIME Labbers,

Unfortunately, I couldn’t find a hypenation node, and I’m not Java-savvy enough to create one myself. But I can use Google and do some research, so find my results below. Anyone up to the challenge of creating a hyphenation node?

UPDATE: Actually, this iText thingie already built into KNIME seems to support hyphenation as well:

iText Java code sample (Extra jars needed: itext-hyph-xml.jar)

Any chance for a front-end?

Cheers,
E.

kilian.thiel · November 6, 2009, 7:58pm

Hi,

ok i’ll implement it ;-).
There will be a hyphenation node in the next textprocessing plugin release.

Cheers,
Kilian

kilian.thiel · November 13, 2009, 1:32pm

Hi Ergonomist,

what kind of output do you expect of this hyphenation node ?
I suggest an output table containing the terms (of the bag of words) with a separator ("-" by default)
at all possible split positions.

Cheers,
Kilian

Ergonomist · January 15, 2010, 3:23pm

Hi Kilian,

Wow, I somehow never saw your answer, sorry about that! Happy New Year! Your proposal sounds like the most sensible solution, so feel free to go ahead with it.

Many thanks,
E.

kilian.thiel · February 13, 2010, 5:11pm

The hyphenation node is available in the new version of the textprocessing plugin.
If you have any questions or problems about it just let me know.
I hope the new node is useful.
Cheers,
Kilian

Ergonomist · March 3, 2010, 1:02pm

Dear Kilian,

Pretty impressive work, many thanks for that! However, it refuses to accept extracted sentences as hyphenation target, even though the help claims that it will work on terms and strings, which would be useful. It also “swallows” my term frequency column, which I guess it shouldn’t.

Thanks again,
E.

kilian.thiel · March 16, 2010, 2:30pm

I’m sorry, the description that the Hyphenation node accepts string and term is wrong. It is a regular preprocessing node and only accepts a bag of word (term-document tuples) as input. Like the other preprocessing nodes the output is again a bag of words consisting of a term column containing the preprocessed term and a document column with the preprocessed documents. All other columns are “swallowed”. You should first use the preprocessing nodes and do the hyphenation and than use the frequency nodes afterwards.
What exactly are you trying to do ? Perhaps i can you a few tips which nodes to use in which order.
Cheers,
Kilian

system · June 2, 2023, 9:51pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.