Replacing text from dictionary with wildcards

Is it possible to replace text from dictionary which can use wildcards? How can I count "stop" and "stops" and stopper" as one single word in TF node? Wildcarded "stop*" in dictionary replacer node does not work.

Hi atabek,

You can easily do this using the Snowball Stemmer node.



Thanks a lot for your suggestion Roland. But what I need is not a snowball stemmer (which has lots of mistaking stemming results for Turkish). I need to stem the Turkish words from a dictionary which accepts wildcards (especially *). This way, I can stem a Turkish word like this:

kalem* >kalem (for kalemler, kalemin, kalemde, kalemim, kalemlerim etc.)

This is actually a Turkish stemmer should do; but the present snowball stemmer (in Turkish) has many faulty results. Therefore, I need to do it with a dictionary replacer which accepts * wildcard.

Thanks for your help.

Hi Atabek,

the Dictionary Replacer node replaces terms in documents based on a dictionary. However, the dictionary can not contain regex. Then there is the Replacer node which can process regex. However the node takes only one regex as setting in the dialog and no dictionary. How many words do you have in your dictionary?

Cheers, Kilian

Thank you very much Kilian for your suggestions. As you most probbly know, Turkish is an agglutinative language that frequently uses suffixes. Therefore I need to remove certain types of suffixes from a list of words in a dictionary (usually a long list) in order to get their stems. I think I can do this by using Java snippet node. I will try this optin soon.

Thank you again for all our excellent work.