I created the Levenshtein Distance with “String Distance” and created an extra column with “Distance Matrix Calculate”.
This has the format “Distance vector”.
Now I would like to create an Excel file with the Excel Writer. Unfortunately it does not work. What is the problem?
Can I change the column format “Distance vector” to “String”, or is there another way to create an Excel file?
Many thanks and BR
Michael
Hello @Michael_Schm91 and thank you for your message.
May I ask what is the goal of writing the distance vector to (an Excel) file? What do you want to achieve? There is the option to use the <distance matrix writer> node though, that lets you write the distance vectors into e.g. a csv file.
Does this help your use case?
2 Likes
Hello @kevin_sturm ,
thanks for your feedback.
The export as CSV worked.
However, I have 5000 customer data that I would like to check for duplicates using the Levenshtein Distance.
I have removed the special characters / spaces in advance with various string manipulations and would like to look at the customers who are <6 from the distance, for example, to check them more closely for duplicates.
Unfortunately, the CSV export is not helpful because the remaining data from the original file is missing. Do you have another solution to illustrate this?
Many thanks and BR
Michael
Did you try to use the <similarity search> node?
“This node takes each row in the query table (Port 0) and searches the reference table (Port 1) for a number of rows matching the specified similarity/distance criteria. If multiple results are requested, the query result row is duplicated for each subsequent match.”
With this node, you can estimate the distance between your input table values and the values from the reference table.
Here are two example workflows that give you an idea how to apply the node:
Best regards
Kevin
2 Likes