Fuzzy joiner


Beginner need some help.

I have two dataset from two different systems.
Systems are not fully syncronized and records have small differences and I need way to join to these records.

Differences between start time, end time and duration.
Mostly differences are in 10 second tolerance.

How I can combine/join records together?

Best regards

Hello @KristianW1977 and welcome to the KNIME community

You did mention about the 10 sec tolerance but you didn’t mention the the average duration of these periods. Or said it in question format: what percentage represents ± 10 sec over these periods?

I can assume as well that que are talking about an isolated tracking time system and not a multi parallel one. So the scenario is simple or multi systems are identified and can be looped through.

You can perform a Cross Joiner comparing pairs of possible vectors; and then calculate the overlap time length (or the average percentage of overlapped ) and the average percentage of unoverlapped tails. Many of these pairs won’t overlap, and assuming overlapping rates above (0.9?) will give you pairing match criteria.

I answered a topic some time ago related to systems overlapped time, it did identify starting-end timestamps of overlapping periods ; you may get some ideas for your workflow:

Some more literature about :

Let us know about your thoughts.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.