Help w/ Joiner

Hi there,

I am trying to perform essentailly what is like a Excel look up function in KNIME. But the output has duplicate values.

I have some values for eg.

Sales
Month Receipt Date Sales Quantity SKU
12 123 21.03.2022 90000 73589 63333
12 123 22.03.2022 95000 88118 63333
12 123 23.03.2022 100000 88720 64579
12 246 24.03.2022 105000 75580 65825
12 369 25.03.2022 110000 87493 67071
12 369 25.03.2022 110000 87219 67071
12 615 27.03.2022 120000 89573 69563
COGS
Month Receipt Date Quantity SKU COGS
12 123 21.03.2022 73589 38500,3
12 123 22.03.2022 88118 63333 46101,58
12 123 23.03.2022 88720 64579 46416,53
12 246 24.03.2022 75580 65825 39541,95
12 369 25.03.2022 87493 45774,59
12 369 25.03.2022 87219 45774,59
12 615 27.03.2022 89573 46862,81

I would like to join these two tables together. What is important to have is everything. when i use Joiner it duplicates the values.

I noticed the duplicate values are the ones with the samr month, recipt, date, sales, Quantity, SKU & COGS info. I would like to have these info. but not duplicated. For e.g. it can be that on the same day we sell the same product with the same quantity and the same costs. That is correct. But somehow KNIME Duplicates these values even further.

Thanks in advance!

Hello @maya29
When configuring the joiner node you need to address an unique criteria that makes different every single row. In your use case it is a combination of 2-4 columns (Receipt (needed), AND Date (needed), AND/OR Month, AND/OR Sales).

Therefore you have to use at least two columns in your ‘Joining Coumns criteria’ and being selective in the columns and source of them to keep in the output:

BR

3 Likes

Thank you so much for getting back to me. I have spotted the same issue I am having in your output, unfortunately. The sales (110000) is showing up 4 times, and its corresponding COGS, whereas in my input file it is there only twice. :confused: Is there a way around this?

Hello
This is not totally right:

as $Sales$ is not a column in 2nd table, quantity is key.

3-4 columns (Quantity (need), Receipt (needed), AND Date (needed), AND/OR Month).
in fact quantity can work as key, but may not scale to real data.

I would use the 4 columns in your use case:

BR

Thanks. Last questions. how can i get the COGS alos in the output that don´t have a SKU? I included one in the inout file but in the output it is assigned a SKU.

Hello @maya29
Within the Column Selection tab, you can select manually those columns you want to include or exclude from Top (left) or Bottom (right) tables.

The ones already included in the Join Columns we selected the option to be merged, so we don’t care about the 4 columns.

In the following configuration I am including the SKU column from Table1. Aiming to remove it from the output, you will have to move the column into the Exclude box (blue arrow) in both tables.

BR

In the end this is what i am looking for. There are instances where there are COGS but no SKU and other instances where there are Sales and no COGS and I would like that to be in the output. Essentially I would like all info from Sales Input and only the COGS matching with its corresponding sales and also COGS without sales.

Expected OUTPUT
Month Receipt Date Sales Quantity SKU COGS
12 123 21.03.2022 90000 73589 63333 38500,3
12 123 22.03.2022 95000 88118 63333 46101,58
12 123 23.03.2022 100000 88720 64579 46416,53
12 246 24.03.2022 105000 75580 65825 39541,95
12 369 25.03.2022 110000 87493 67071 45774,59
12 369 25.03.2022 110000 87219 67071 45774,59
12 615 27.03.2022 120000 89573 69563
12 615 27.03.2022 46862,81

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.