Fuzzy String Match

Has anyone experimented with or developed a node for fuzzy string matches. For example, how well does the string “Albert” match the string “Albirt”?

You could try to use a java library such as SimMetrics and use it from the java snippet node (there is a tab where you can register jar libraries). It supports a bunch of different distance functions for Strings.

There is a node named “String Matcher” in the Textprocessing plugin (http://labs.knime.org/textprocessing) which computes the Levenshtein distance of strings of two input string lists (one list containing strings to compare and one reference string list) and outputs the k most similar strings of the reference string list for each string of the other string list.

I’ll give that a a shot. Thanks.

Could Knime developer please also add an n-gram fuzzy matching algorithm into the string matcher node?

 

The Indexing & Searching plugin from the KNIME labs provides fuzzy searching by using the ~ operator in the Index Query node. You can also perform multi term searches by concatenating several fuzzy single term queries with the AND operator such as "name:thomas~ AND name:smith~" . We will soon provide some examples that demonstrate the usage of the new Indexing & Searching plugin.

Hello everyone,

we have just uploaded three examples that demonstrate the new Indexing & Searching plugin including an example for typo detection in address databases that demonstrates the usage of multi term fuzzy queries.

Have fun,

Tobias

Hi Tobias thank you for the example nodes, I am looking at the fuzzyAddressMaching but I am unable to download or find the node called Double Input.

Best Regards 

It is located in the search duplicates

Hi,

the Double Input node is a Quick Form node that is used to configure the "Search Duplicates" meta node. The node is part of the KNIME Nodes to create KNIME Quick Forms extension which you can install via the Install new software.. entry in the Help menu.

Bye,

Tobias