Joiner Node Issue

Hi all, I have a trouble with Joiner Node, after use it I have to clean rows with “duplicate node” because it make a lot of duplicated data.

In a TYPE TABLE I have several rows that sometimes could be duplicate, but i cannot fix this
In a TRANSACTION TABLE I have a list of transaction

If I join that tables i found this error:
Transaction 01 need to be just 1, but in a output table i have 2 rows because in TYPE TABLE there are 2 rows with the same code A

Wrong output:

  1. 01 A
  2. 01 A

Desidered output:

  1. 01 A

Can you help me guys:
I’d like to set in Joiner Node this rules: match just 1 time.

You cannot fix it or you don’t want to fix it because of your use case/business rules? If you GroupBy the type table and use for example Concatenate, First or Last the description that would create one entry and the joiner will generate the desired output.

If you just want to check if the code is available then there is little point in maintaining the duplicate entries in your type table.

3 Likes

these are business rules and I cannot change: I link to a company folder/files and in that tables rows are duplicate but i cannot modify that table

Then the duplicate row filter is the way to go. The Joiner node is not capable of arranging this.

1 Like

YES, but this take sometimes 20 minuted before to “clean the duplicate”.

Probably i set in a wrong way the node.

How can I order to node:
“do the control/check only in the column CODE but give me as output also DESC 1 and DESC 2”?

I think my node check all the columns instead only the column 1_OMM_codice omologazione

I would:

  • Import the type table.

  • GroupBy the code and make a List of the description column.

  • Use a Split Collection Column node to split the descriptions.

  • Rename the description with a Column Rename Regex node. Look for (Split Value ) , to be replaced by description_.

  • Do a Left outer Join on the code columns.

Input:

image

Output:

image

Thansk! I think this should be the right way: GroupBy Node.

By the way I have to study more the Column Rename Regex node because I don’t understand how it can help me.

It’s to prevent hardcoding the renaming of columns. Let’s assume you now have two descriptions per code. Renaming them in a Column Rename works fine until a third description is introduced on the source side. That means the Column Rename has to be amended to also include the additional entry.

With the Regex equivelant of the node, this is done automatically because it looks for Split Value and replaces that with description_. As such, the number of split values can be unlimited.

I think the issue is how i save all data.

Everyday a receive a xls with codes: some of them are new, other are know.

I think i should create a “smart csv” (code archive) that allaw to me to create a file without duplicats codes: in this way the next join wont be affect from duplicate rows

@gcas

sorry read only briefly through your problem statement, but can you not simply use the ‘Pivoting’ node to create columns out of the duplicates ?
– it will be less error prone than using regular expressions, as Pivoting node is data driven.
– although, you will have to create an additional column prior to the Pivoting node, to create your column labels to route the values to the correct target columns

I use this approach all the time to create columns out of ‘recurring’ values and to bring the table back to the desired granularity level.

cheers,

Herman