Indonesian stop word

Helo,
How if i want build “indonesia stopword” and “indonesia stemmer” in knime
What are the steps to build them in knime?
Thanks

Hi Missisutami,

the Stemmer node is using the snowball stemming library. This lib does not provide stemming for Indonesian languane.

For stop word flitering you need to bring your own stop word file with one word per line, like

this

is

a

stop

word

list

You can specify the path to that file in the node dialog.

Cheers, Kilian

hi,

So how do I use  indonesia stemmer in knime?

Hi April,

therefor you first need a stemming library (java) that can stem Indonesian language (or your own implementation of that). You can integrate external stemmers and other preprocessing nodes quite easy if you know how to implement a KNIME node.

Do you have a library or java classes that can stem indonesia language? If so, and it is open source, please point me to that. I would be interested in integrating it in the text processing extension.

Cheers, Kilian

So I can use java coding using existing node in knime?

 

 

To integrate your own stemming algorithm you would need to implement your own KNIME node. This is easily possible if you are familiar with Java and Eclipse.

Do you have a stemming lib in Java the can stem Indonesian texts?

A good start for implementing your own node is e.g.: https://tech.knime.org/developer/example/extension-wizard

Cheers, Kilian

i have one a Indonesia stemming lib in Java , link 

please tell me how to use it if succes integrated with Knime ,

ThankuVerymuch

 

 

Hello awianggara,

Besides the link Kilian posted, you might want to look at the source code of the KNIME Text Mining nodes. With this it should be easy to implement this node.
If you are not familiar with Java and the Eclipse framework, please point us to the official website of the stemming library. Then we might be able to include it in a future version of the KNIME Analytics Platform.

Best,
Ferry

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.