KeyGraph Node delivers questionable results

If I understood the KeyGraph Paper correctly the algorithm requires the removement of Stopwords and the stemming of words to work correctly. However punctuantion needs to remain intact.

However the KeyGraph node gives dots a very high score to be a Keyword.

For example I have this text:

irland nordirland smart bord. funktioni angesicht mentalitaet gruen insel herrsch. killea/donegal irland. steh strass inmitt gruen huegel landschaft. nieselt. weid beobacht kueh gras wiederkaeu. zweispur landstrass rauscht verkehr lastwag autos britisch irisch kennzeich. grenzpost. fehlanzeig. zollhaeusch. schild anzeig staatlich grenz uebertritt. infrastruktur schrank bod gemalt zeich. nada. nient. steh jahr jahrzehntelang blutig umkaempft grenz protestant nord kathol sued hinweis gemetzel. deutet daraufhin stueck wald wiesenidyll brexit verhandlungsrund grossbritanni eu monat heftig gerung. kueh schaf eu aussengrenz errichtet.

And the KeyGraph Node gives me this Keyword:

Keyw

What am I doing wrong here?

Can somebody help?

Hi @gnime,

I think you must not keep punctuation. Otherwise KNIME will treat punctuation as Terms and due to their omni-presence in the corpus, the KeyGraph node will identify them as Keywords. Try removing them with punctuation erasure node.

Best,
Marten

2 Likes

Hey @Marten_Pfannenschmidt,

Thanks for your answer, I will try it.
After reading the paper from Ohsawa I was under the impression that the KeyGraph works based on sentences and therefore needs punctuation.

The Document datatype keeps information about a document’s structure even if you delete punctuation. I did not read the paper in detail, but if it uses structural information like sentences, deleting punctuation should not do any harm.

3 Likes