Zemberek TurkishSentenceNormalizer

#1

Hi everyone,

Despite KNIME supports some limited attributes of Zemberek NLP tool, I need to benefit much from that library. For that I have tried several ways, yet having struggle.

I would like to apply TurkishSentenceNormalizer to each of my table rows:

First I downloaded Zemberek source jar file

I added files under workflow I would like to employ it. Then I put Java Snippet node. Add file in additional libraries section. And written codes below:

// system imports
import org.knime.base.node.jsnippet.expression.AbstractJSnippet;
import org.knime.base.node.jsnippet.expression.Abort;
import org.knime.base.node.jsnippet.expression.Cell;
import org.knime.base.node.jsnippet.expression.ColumnException;
import org.knime.base.node.jsnippet.expression.TypeException;
import static org.knime.base.node.jsnippet.expression.Type.*;
import java.util.Date;
import java.util.Calendar;
import org.w3c.dom.Document;


// Your custom imports:
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.stream.Collectors;
import zemberek.core.collections.Histogram;
import zemberek.core.collections.UIntSet;
import zemberek.core.turkish.Turkish;
import zemberek.morphology.TurkishMorphology;
import zemberek.morphology.analysis.SentenceAnalysis;
import zemberek.morphology.analysis.SentenceWordAnalysis;
import zemberek.morphology.analysis.SingleAnalysis;
import zemberek.morphology.analysis.WordAnalysis;
import zemberek.morphology.lexicon.DictionaryItem;
import zemberek.morphology.lexicon.RootLexicon;
import zemberek.morphology.lexicon.tr.TurkishDictionaryLoader;
import zemberek.morphology.morphotactics.Morpheme;
import zemberek.morphology.morphotactics.TurkishMorphotactics;
import zemberek.normalization.TextCleaner;
import zemberek.normalization.TurkishSpellChecker;
// system variables
public class JSnippet extends AbstractJSnippet {
  // Fields for input columns
  /** Input column: "column1" */
  public String c_column1;

  // Fields for output columns
  /** Output column: "Checked" */
  public String out_Checked;

// Your custom variables:

// expression start
    public void snippet() throws TypeException, ColumnException, Abort {
// Enter your code here:

TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
TurkishSentenceNormalizer normalizer = new
    TurkishSentenceNormalizer(morphology, lookupRoot, lmFile);

//String[] words= s
//for (String word : words) {
//    System.out.println(word + " = " + spellChecker.suggestForWord(word));
//} 
out_Checked = normalizer.normalize(c_column1);


    Path lookupRoot = Paths.get("/home/aaa/zemberek-data/normalization");
    Path lmFile = Paths.get("/home/aaa/zemberek-data/lm/lm.2gram.slm");
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    TurkishSentenceNormalizer normalizer = new;
    TurkishSentenceNormalizer(morphology, lookupRoot, lmFile);


    // expression end
    }
    }

I am having trouble with instantiation step. How can I proceed?

Please get knwf file from cloud TurkishNLP

Thank you for your help!

Edo

0 Likes