I frequently am comparing tables in KNIME that come from non-identical sources and am looking to map elements between them. I often have columns of “time” (in seconds) and “mass”, and I want to find elements that span the tables given certain tolerances for each. My usual method has been:
“Cross Joiner” -> “Math Formula” -> “Rule-based Row Filter” -> “Math Formula” -> “Rule-based Row Filter”
Where the first Math Formula node might calculate the difference in time - with the first Rule-based Row Filter than selecting only rows where the difference is less than or equal to a certain criteria - and the second Math Formula doing the same for mass (and the Rule-based Row Filter doing likewise).
This usually works quite well, though if I am working with large tables the resulting table from the Cross Joiner could be very large (hundreds of millions of rows) which makes the first Math Formula and Rule-based Row Filter nodes also slow.
Is there a quicker way I could do this? I’ve considered splitting the input tables into smaller sections so it could be parallelized but that could be a challenge when I have no idea before loading the data sets how large each input table will be.