Similarity Search

izaychik63 · August 18, 2020, 7:59pm

I was confused by Similarity Search because it returned me only exact matches. Finally I decided to increase Neighbor Count Parameter and some relevant results appeared. Unfortunately this parameter is not documented. Could you please add its description?
Thank you

izaychik63 · August 18, 2020, 10:20pm

After experimenting I see that need advice on solution.
My task is to make pairs of records with similar content.
Say, I have 7K records with text field about 800 characters.
I need to compare every record with all others and return pares matching by 80%.
The strait forward solution to make cross join and use String similarity is not working as
it require too much memory/time.
I made solution based on Similarity search but it returns only exact matches with Neighbor Count = 1.

Is where more effective solution?

Kathrin · August 19, 2020, 7:07am

Hi @izaychik63,

you could try using the distance calculate node.

This workflow shows you a little example of how the node can be used to calculate the distance between different sentences.

Best
Kathrin

izaychik63 · August 19, 2020, 11:34am

Thank you, @Kathrin. I’m looking for distance between documents rather than the sentences. It even could be close to the search of documents having close structure with little changes t o it. May be match 85% and not match 15%.

system · February 17, 2021, 11:35pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.