Phonetic processing


  Is there any way I can do phonetic processing of text? What I want to do is read in a document, generate a BoW from it, then, for each term, create a soundex value for the term. Ideally, I would like to generate different values using different phonetic algorithms, such as phonix, metaphone etc.

  Any ideas?

Thanks in advance



there is no dedicated node for this available in the Textprocessing extension. If there exist Java libs that create soundex values from Strings you could use the Java Snippet node using this lib. First create a bag of words, preprocess / filter it and use the Term To String node to convert the terms to strings. Then an extenal lib could be used to convert those strings.

Cheers, Kilian