Comparing data from 2 different files

Hi,

I have a file 1 with 3 columns and file 2 with 1 column. The task is to check if the values from column_1 from file 1 exist in file 2.
I have implemented this with Python node (2=>1) wherein I could read 2 input files data and perform the operation.

I would like to know if there is any JAVA node that will allow me to read 2 separate files as input. Please assist.

Regards,
Merlyne

Hi @merl_knime and welcome to the KNIME forum,

Have you tried the new Table Difference Finder node in KNIME 4.2?

:blush:

5 Likes

Hi @merl_knime,

Not that I know of. You need to use one reader node for each file and then either join tables based on column of interest using Joiner node either go with Reference Row Filter Reference Row Splitter node to see which values are there/missing.

Welcome to Community!

Br,
Ivan

1 Like

Thanks @armingrudd.
I’m currently using KNIME 4.1.3 version. I will try to download the new version and try this new node.

1 Like

Thanks, @ipazin.

I have never tried the Reference Row Filter and Reference Row Splitter. I’ll try it out.
For now, I converted one of the file’s column data as flow variables as lookup values and comparing it against the column from the second file. The processing times are really high 'coz of that implementation. Hopefully, these 2 nodes reduce it.

1 Like

Hi @merl_knime,

if I got it right you are using loop currently and then is not efficient as each value is one iteration. Mentioned nodes will definitely reduce it. If you’ll have trouble implementing it you can share dummy input data and desired output and I can create example workflow :wink:

Br,
Ivan

Hi @ipazin, thank you for your help.

I have attached the workflow with sample data.
File_1 has over 100,000+ records, contains null values and duplicates
File_2:LOOKUP can have more than 3000+ records of unique values.
I do not want to remove the duplicates from File_1 as they are connected to other columns.

I hope this test data helps and look forward to your solution :slight_smile:

Regards,
Merl

TEST_LOOKUP_DATA FROM FILE.knwf (6.5 KB)

Hi @merl_knime,

so you can go with Cell Replacer or Joiner node or Reference Row Splitter depending on what kind of output you want. See example attached and if any questions feel free to ask.

TEST_LOOKUP_DATA FROM FILE_ipazin.knwf (12.1 KB)

Br,
Ivan

1 Like

Hi @ipazin,

The Cell Replacer node is what I needed for my task. Thank you so much for your help.

Regards,
Merl

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.