Hi, I am kind of new to KNIME. So, pardon me if this question had been asked already.
Basically, I am working on deduplication exercise where a similarity score is being calculated between every two records containing addresses in the table (I am not taking nearest neighbour as 1 as I want the scores of each possible pair). But I have close to 8 lac records in my dataset, and KNIME is taking way too long for this similarity search.
I thought of an alternative to first cluster the records based on the addresses and then apply similary search for calculating similarity scores against each record in each cluster. When I tried running K-Means, it didn't pick up the address but some other numeric field.
Could anyone please suggest on how can I cluster the addresses in KNIME?
Thanks in advance
Vihar