tgb417
1
Has anyone done more work with record linkage beyond the data duplication example in the Example Server. Would love to see folks work flows.
I've also been looking at Java Libraries that may help for this problem In particular has any looks at libraries like Oyster
https://sourceforge.net/projects/oysterer/?source=navbar
--Tom
Hi Tom,
Duke is a nice record linkage/deduplication engine written in JAVA on top of Lucene - see https://github.com/larsga/Duke.
best
Erich
This is a very interesting topic! Any changes since 2016? Does anyone have workflows for this?
tgb417
4
Ive been working with python recently and using the pandas-dedupe package. Also some other libraries for address cleanup, name normalization, etc.
tgb417
5
@Erich_Gstrein
Have you gotten Duke working from within Knime?