Joining two data sets but taking into account some value deviations...

Hi guys,

I'm currently trying to evaluate some of my data. I found KNIME and it looks like this is the best way to reach my goal :-) After trying some things I'm able to get a suitable result, however, this is not as perfect as I thought. I know nothing about coding at all, so that's the reason why I decided to ask you.

Problem:  

I need something like the Joiner node but with a little extra. My values may deviate a litte as they are derived from measurements. Anyway, I need to compare them :-/

Columns RT1/mz1 and Columns RT2/mz2 should be joined if the corresponding pairs are similar. Similar means the values of RT2 have to be in a given range, e.g. RT1 +/- 0.2 min.

For mz1 and mz2 it is no problem to round the values to one digit as it works out perfectly. But RT (time range) should allow more deviation.

RT1 [min] mz1 RT2 [min] mz2
2.34 342.123 2.34 342.100
3.41 562.362 3.38 562.378
       

 

Is there anything "easy" that can solve this problem? Or do you think it's easily managed by some java?

Thank you guys! Thomas

Hi Thomas,

if your input table(s) are not too large, you should be able to achieve this using existing nodes. I would create the cross-product of your tables by adding an artificial column that contains only a constant value (e.g. "1") to both input tables. Then, you perform the join using the artificial column as join condition which will product an output table that contains all possible combinations of rows from table 1 with all rows from table 2. (Of course, if the input tables are large, this table can become huge.)

Then, you can use a Java Snippet Row Filter with a snippet like

if(Math.abs($RT1$ - $RT2$) < 0.2)

   return true;

else

  return false;

to identify the relevant rows.

thank you for the reply. I'm going to test this today.

Table size is up to 1000 rows. Wonder if this will work :-)

Update, Problem solved:

Hi guys,

so I finally managed to get the results I want. Actually the best way to solve this problem is using MySQL. I wrote a KNIME workflow to import certain tables into a SQL-DB.

After this I used an straight forward SQL query to compare RT and mz columns between two tables, all done in KNIME. The parameter is under control by a row-to-variable loop in order to run through a list of csv files that are compared to the same reference table (here: SPLref006)

SELECT
    SPLref006.RT_SPL,
    SPLref006.MZ_SPL,
    $$ { SfileID } $$.RT_exp,
    $$ { SfileID } $$.MZ_exp,
    $$ { SfileID } $$.Intensity
FROM
    SPLref006,
    $$ { SfileID } $$
WHERE
    (
        $$ { SfileID } $$.RT_exp BETWEEN (SPLref006.RT_SPL - 0.15)
        AND (SPLref006.RT_SPL + 0.15)
    )
AND (
    $$ { SfileID } $$.MZ_exp BETWEEN (SPLref006.MZ_SPL - 0.01)
    AND (SPLref006.MZ_SPL + 0.01)
)
ORDER BY
    $$ { SfileID } $$.RT_exp ASC

 

Combining a MySQL database with KNIME is really nice :-) 

Cheers, Thomas