Joiner on same file

Hello dear community,
I am trying to add the year ago values by using the joiner node (left outer join) and afterwards using concatenate to add the non match values from last year so I wont miss data from it.
However, as you can see in the image, I have a duplication problem.

Do you have any solutions to this ?

is ?

Hi there,

not really sure what are you doing. To avoid guessing can you provide some input file and expected output with logic behind it?

Br,
Ivan

1 Like

Hello Ivan,
I have a database including different years and months of transaction. I am trying to add to this file the data from the same month and year before.
Example:

My database :
Sender | Receiver | Year | Month | Quantity | Product |
X | Y | 2019 | March | 23 | Soap
X | Y | 2018 | March | 10 | Soap
Z | A | 2018 | March | 12 | Shampoo

  1. I use joiner (left outer joiner) to add the year ago quantities:
    Sender | Receiver | Year | Month | Quantity | Product | Quantity Year Ago
    X | Y | 2019 | March | 23 | Soap | 10
    X | Y | 2018 | March | 10 | Soap | ?
    Z | A | 2018 | March | 12 | Shampoo | ?

  2. I split the rows by using Quantity Year Ago : missing value :
    Split 1 :
    Sender | Receiver | Year | Month | Quantity | Product | Quantity Year Ago
    X | Y | 2019 | March | 23 | Soap | 10

Split 2:
Sender | Receiver | Year | Month | Quantity | Product | Quantity Year Ago
X | Y | 2018 | March | 10 | Soap | ?
Z | A | 2018 | March | 12 | Shampoo | ?

3.I filter the quantity quantity year ago column and rename the quantity colum as “Quantity Year Ago”
Sender | Receiver | Year | Month | Quantity Year Ago | Product
X | Y | 2018 | March | 10 | Soap
Z | A | 2018 | March | 12 | Shampoo

  1. I concatenate the output from (3) to Split 1:
    Sender | Receiver | Year | Month | Quantity | Product | Quantity Year Ago
    X | Y | 2019 | March | 23 | Soap | 10
    X | Y | 2018 | March | ? | Soap | 10
    Z | A | 2018 | March | ? | Shampoo | 12

  2. I use missing value node to replace the missing Quantity by 0
    Sender | Receiver | Year | Month | Quantity | Product | Quantity Year Ago
    X | Y | 2019 | March | 23 | Soap | 10
    X | Y | 2018 | March | 0 | Soap | 10
    Z | A | 2018 | March | 0 | Shampoo | 12

However, when I do this process I will have the secong line duplicating as I have already matched it with my first line.

My optimal output should be:
Sender | Receiver | Year | Month | Quantity | Product | Quantity Year Ago
X | Y | 2019 | March | 23 | Soap | 10
Z | A | 2018 | March | 0 | Shampoo | 12

Do you have any solutions to this ?
Thank you

Hi,

now you should be able to use Duplicate Row Filter node to get your output. I guess Sender, Receiver, Month and Product columns should be included :wink:

Tell me if this works for you. If not maybe some optimization in steps before can do the trick.

Br,
Ivan

2 Likes

Hello again,
Thank you for your proposition.
However, I cant upgrade my Knime 4.0 nor I cant find this node in my repository.
Do you have a solution in Java Snippet Row Filter or any other way ?

Hi there,

which version are you using then? Just out of the curiosity why can’t you upgrade KNIME? Version 4 is really nice…

Anyways I will take look at your flow and try to create an example workflow.

Br,
Ivan

1 Like

I am using Knime 3.7.2 and I cant update it yet as I am using SAP connector nodes. I

Thanks for your time :slight_smile:
BR,
Emine

Hi,
to get a clear picture of your data i have first two questions.

  1. Are Sender and Receiver for a Product and Month in 2018 always available or does the 2019 have new combinations?
  2. Does the file only contain data of the years 2018/2019?
1 Like

Hello,
to answer your
1.question : No we can have different combinations. That’s why I tried to create templates with all the possible combinations, however my data is big and there are 2M different combinations possible so it slowed down my workflow somehow.
2. question : It is a workflow that includes for now just 2018 and 2019 data.

Hi,
did you thougt about the use of the pivot node?

  • Group criteria: Sender, Receiver, Month, Product
  • Pivot Column: Year
  • Aggregation Column: Quantity
  • Aggregation mode: Sum and Count

The count give you the information of multiple entries for the same record.

2 Likes

Thank you or your answer Morpheus.
However, another obstacle is the fact that I have different years for the duplicating values as I join them I see the quantity of year 2018 on the 2019 line. So, unfortunately this wont bring me the solution :frowning:

BR,
Emine

Hello again, I found the solution.
A. First step is to split the data by year.
B. Afterwards by using 2 joiners (left outer join):

  • First joiner : Upper input for year 2019, lower input for 2018. Matching Sender Receiver and Product
  • Second joiner : Upper input year 2018, lower input for 2019. Matching Sender, Receiver and Product

C. Replacing the first joiner missing values with 0
D. Second joiner output should be filtered just non none matching rows with ? and then by using row filter, column rename (Changing the quantity column to quantity year ago)

E. Concatenate the result of D and C and you will have the data without duplicates.

BR,
Emine

1 Like

Hi @celik_e_4,

glad it got it working!

Regarding not updating there are SAP connector nodes available in 3.7.2 and not in 4.0.0.? :confused:

Br,
Ivan

1 Like

Hi Emine,
maybe my statement was not clear enough.
Using the pivot node you don’t need any joiner. Connect it directly to your source table.
After execution of the pivot node you should receive columns for each year. The uniques is guaranted by your key columns (grouping).

Refering to your last post the information should be not unique because your joins ignores the months.
The results should only be correct if you want the yearly information for receiver, sender and product. To get this data you have to aggregate your source table.

Br,
Hermann

3 Likes

Hello,
I don’t know a lot about the 4.0 version. However, IT team warned me not to do it yet because of this reason.

BR,
Emine

Hello Morpheus,
Sorry I forgot but I am joining by month as well the I checked ew points it seems correct.

thank you for your explanation. It works perfectly !!

BR,
Emine

1 Like

Hi there @celik_e_4,

just one more question about these SAP nodes. Do you maybe know if they are from EPAM?

Br,
Ivan

Hello Ivan,

I am sorry but I have no clue. It’s just because I had this warning from IT team that I should update it to 4.0 version.

BR,
Emine

Hi Emine,

no problem. Tnx anyways!

Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.