Joiner shows column results multiple times

guys, Hello I am just getting into KNIME and have run across a joiner issue that I can’t seem to solve with Google. The results are duplicating the last column Total Receipts Dollars but they are unique by SKU in my original data. In the expected results screenshot I used the Group by node but when I go to join it with the desired table it duplicates again. I am not sure if that makes sense but thanks for the time. I have added the results received, results workflow, results received setup, and Expected results in screenshots.


Expected Results

Hi @gittymoe ,
Welcome to the KNIME forum.

Duplication of rows within a joiner will always be because the join condition is such that more than one row in one table is being matched with a row in the other table. This could be because of (unexpected?) duplication of rows in the source data or because of a missing join condition.

I assume that the lower output table is the “matching” rows from the second joiner.

Without seeing the input data to the joiners, and the join conditions (joiner settings tab on the joiners) it is difficult to state exactly what your problem is, but the generated RowIDs give a strong clue about what is happening.

Take a look at the first 5 rows output:

If I have got this round the right way in my head, and assuming the default setting on the joiner node…

RowIDs “Row0” through “Row4”, in red, come from Excel Reader “Node 1”.

RowID “Row6” in green comes from Excel Reader “First Salee Mid Codes”

RowID “Row448303” in blue comes from Excel Reader “PO Data”

So this implies that that the duplication is occurring on the first joiner, where there are multiple rows in “Node1” that match to “First Sale Mid Codes”. I would say that either there is duplication in the Node 1 Excel sheet, or you are missing a join condition on the first Joiner node.

If you still cannot find where the cause is… to narrow down the problem, you may find it useful to place some row filters prior to the joiners and limit the rows to just those for a single SKU. That way you can investigate a manageable number of rows and you may find the problem shows itself.