Record Linkage

Has anyone done more work with record linkage beyond the data duplication example in the Example Server.  Would love to see folks work flows.

 

 

I've also been looking at Java Libraries that may help for this problem In particular has any looks at libraries like Oyster 

https://sourceforge.net/projects/oysterer/?source=navbar

 

--Tom

Hi Tom,

Duke is a nice record linkage/deduplication engine written in JAVA on top of Lucene - see https://github.com/larsga/Duke.

best

Erich

 

This is a very interesting topic! Any changes since 2016? Does anyone have workflows for this?

Ive been working with python recently and using the pandas-dedupe package. Also some other libraries for address cleanup, name normalization, etc.

@Erich_Gstrein
Have you gotten Duke working from within Knime?