Joiner is distorting the Values

Hi I am brand new to Knime so I am doing a simple concatenate and join to create a master table and I cannot get the values to come out correctly. I have read that a join will take each row and match it to EACH row on the other table, so if there are duplicates it will distort the results (output rows is greater than the input rows). I went back into my “key” tables and removed duplicates from my “key” column, but the problem still persists. The only thing I can think of now is that while my “key” columns are unique, the other columns in the table I am trying to join has duplicates. But I feel like that shouldnt matter. Why is this simple task so difficult to execute?

Example of Objective:

Concat
File 1: Jan 2019 Customer Product Purchases (no unique Invoice #)
File 2: Feb 2019 Customer Product Purchases (no unique Invoice #)
File 3: Mar2019 Customer Product Purchases (no unique Invoice #)

Join
File 4: Customer Account Information (using unique Cust ID to join to Purchase Data)
File 5: Vendor Account Information (using unique Cust ID to join to Purchase Data)

Specific Example:
File 1: John Smith ID: 123 and Jane’s Bakery ID: 456
File 2: Jon E Smith ID: 123 and Jane’s Bake Shop ID: 456

File 4: Cust ID: 123 = John E Smith, Address, Cust Type, Create Date, Etc
File 5: Vend ID: 456 = Janes Bakery, Address, Cust Type, Create Date, Etc

Hey,

I not fully understand what your desired output is, so its hard to understand your problem.
If you are using a Joiner Node the rows only get matched if your keys are equal.
If you are looking to match all rows of the respective tables, you need to use the Cross Joiner Node.

Thanks for the reply. I am looking to just do a simple concat of the purchase data in several monthly files (add rows together) and join the Customer Table and Vendor Table to the purchase data table (add columns without adding rows that would otherwise duplicate the values in the purchase data). What is a cross joiner do thats different than the regular joiner? Sorry, I am new to all of this so my jargon isnt really there yet.

A cross joiner performs a cartesian product, this is in my opinion not what you want, in my opinion your workflow should look similar to this one:

The Empty table nodes are only dummies (to be replaced with your corresponding file inputs.
In the joiner you want to add one expression where you choose the column names of the columns that contains the ids which should match (e.g. file 1 last row and file 5 first row).

If you choose the inner join (inside the joiner node configuration at the top) the result will only be the rows which ids are in both table, play around with the join modes to see the differences or do a quick online research, to get the result you want.

You shouldn’t have any problems with duplicates here, you will only get multiple rows for one id if one of the incoming tables has multiple rows with the same id you are joining on

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.