KNIME is a very useful、strong and convenient tool for data processing.
But I found some problems in processing Chinese text by using Text processing.If I develop a new node to process Chinese text well,how can I get their soure code(eg Bow Creator and Dictionary Tagger) so that I can make sure my new node can work well with subsistant nodes in the Text procesing?
hi nuaaer,I met the same problem that knime can't deal with chinese well . I foud that knime uses OPENNLP to deal with texts which couldn't support chinese and it's hard to disguise a new node to process Chinese text in the text processing plugin.So have you foud a better way to deal with chinese?
OPENNLP is used to tokenize the text and for part of speech tagging, was well as named entity recognition. I never processed chinese texts but I guess as long as if You don't want to part of speech tag Your text, recognize named entities or stemm the text it should work. Is the chinese text encoded in UTF-8? What problems do You have when processing chinese text?