Record Linkage

Has anyone done more work with record linkage beyond the data duplication example in the Example Server.  Would love to see folks work flows.



I've also been looking at Java Libraries that may help for this problem In particular has any looks at libraries like Oyster



Hi Tom,

Duke is a nice record linkage/deduplication engine written in JAVA on top of Lucene - see




This is a very interesting topic! Any changes since 2016? Does anyone have workflows for this?

Ive been working with python recently and using the pandas-dedupe package. Also some other libraries for address cleanup, name normalization, etc.

Have you gotten Duke working from within Knime?