Hi everyone,
Despite KNIME supports some limited attributes of Zemberek NLP tool, I need to benefit much from that library. For that I have tried several ways, yet having struggle.
I would like to apply TurkishSentenceNormalizer to each of my table rows:
First I downloaded Zemberek source jar file
I added files under workflow I would like to employ it. Then I put Java Snippet node. Add file in additional libraries section. And written codes below:
// system imports
import org.knime.base.node.jsnippet.expression.AbstractJSnippet;
import org.knime.base.node.jsnippet.expression.Abort;
import org.knime.base.node.jsnippet.expression.Cell;
import org.knime.base.node.jsnippet.expression.ColumnException;
import org.knime.base.node.jsnippet.expression.TypeException;
import static org.knime.base.node.jsnippet.expression.Type.*;
import java.util.Date;
import java.util.Calendar;
import org.w3c.dom.Document;
// Your custom imports:
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.stream.Collectors;
import zemberek.core.collections.Histogram;
import zemberek.core.collections.UIntSet;
import zemberek.core.turkish.Turkish;
import zemberek.morphology.TurkishMorphology;
import zemberek.morphology.analysis.SentenceAnalysis;
import zemberek.morphology.analysis.SentenceWordAnalysis;
import zemberek.morphology.analysis.SingleAnalysis;
import zemberek.morphology.analysis.WordAnalysis;
import zemberek.morphology.lexicon.DictionaryItem;
import zemberek.morphology.lexicon.RootLexicon;
import zemberek.morphology.lexicon.tr.TurkishDictionaryLoader;
import zemberek.morphology.morphotactics.Morpheme;
import zemberek.morphology.morphotactics.TurkishMorphotactics;
import zemberek.normalization.TextCleaner;
import zemberek.normalization.TurkishSpellChecker;
// system variables
public class JSnippet extends AbstractJSnippet {
// Fields for input columns
/** Input column: "column1" */
public String c_column1;
// Fields for output columns
/** Output column: "Checked" */
public String out_Checked;
// Your custom variables:
// expression start
public void snippet() throws TypeException, ColumnException, Abort {
// Enter your code here:
TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
TurkishSentenceNormalizer normalizer = new
TurkishSentenceNormalizer(morphology, lookupRoot, lmFile);
//String[] words= s
//for (String word : words) {
// System.out.println(word + " = " + spellChecker.suggestForWord(word));
//}
out_Checked = normalizer.normalize(c_column1);
Path lookupRoot = Paths.get("/home/aaa/zemberek-data/normalization");
Path lmFile = Paths.get("/home/aaa/zemberek-data/lm/lm.2gram.slm");
TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
TurkishSentenceNormalizer normalizer = new;
TurkishSentenceNormalizer(morphology, lookupRoot, lmFile);
// expression end
}
}
I am having trouble with instantiation step. How can I proceed?
Please get knwf file from cloud TurkishNLP
Thank you for your help!
Edo