Optimizing string matching within lists

tbtt · March 26, 2020, 9:48am

I have another chunk loop I want to improve/replace with something better performing. In this case I have a table with 2 columns, one is a string, the other is a list of strings, I want to use the string matcher to find the closest string in the enum. The same lists can be in multiple rows, thats why I think there is room for optimization. Using a chunk loop is pretty straight forward imo and looks in my example like this:

Example of input data;

string:       lists:
ab            ["abd", "adb", "cde"]
cd            ["abd", "adb", "cde"]
fg            ["xy", "bc", "gdfdcde"]
xd            ["xy", "bc", "gdfdcde"]

I thought about grouping the lists (to only have them once), but how can I then make sure that only the corresponding list gets matched with the string? Can the group loop be used for that? I have no experience with it.

Thanks!

izaychik63 · March 26, 2020, 6:16pm

It looks pretty reasonable. Column filter is not necessary here. To make sure the list field is selected you can choose enforce option in Ungroup node. Also, you can try

with n-gram.
Or you can use just Ungroup and String Similarity without loop plus Group by with Max function.

system · September 25, 2020, 6:16am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.