Merging two datasets

Hello everybody, I’m new to ML and I’m trying to solve this issue
I have two dataset with the same attributes, I need to add two columns (x and y) to both the datasets with a boolean attribute in the first dataset all the x values must be “true” and in the second dataset must be “false”, vice versa for the y attribute which values must be “false” for the first dataset and “true” in the second dataset. Then I need to join both datasets and if there are some rows with the same values evaluated with the exception of some columns (x,y,G1,G2,G3), those rows have to be put in another dataset and those rows will have x and y values equal to “true” (obviously).
Thank you very much for your attention

You can use a pair of Constant Value Column nodes (with Value setting type set to “Boolean”). And I think you then need a Concatenate node and not the Joiner node (even though you call it “join”), since it looks like you want to combine the datasets top-to-bottom (similar columns, different rows) and not side-by-side (different columns, similar rows). I don’t understand your last sentence, but perhaps you can solve that with a Row Splitter somehow.

Thank you very much for the answer and forgive me for the delay. With your suggestion I was able to add the constant value column very easily. For what concerns the second problem I have two different datasets, and I have to combine (thank you for the hint) this two tables excluding the identical rows basing the exclusion on certain attributes and not others. I’m sorry for my misunderstanding sentence.

Maybe the Reference Row Filter node can help you here.

I don’t have to filter one data set from a single attribute of the other one… let me explain better:
there are two dataset with n attributes for each dataset (the attributes are the same), combining those attribute there will be some rows with identical values for some attributes but those rows are not identical if we consider all the attributes in the two dataset. So I need to base the exclusion on certain attributes and not some others. The excluded rows have to be put in a third dataset and, of course, in the end I should have two datasets, the first one that is the merged dataset and the second one that is made of the excluded rows that have identical values in certain attributes of the two first dataset.
Am I clear enough?

Thank you very much for your help Aswin

I have created an example that uses you descriptions and includes a Joiner that joins all identical rows by three Reference columns and then a Reference Row filter to exclude the non matching Rows from each data set.

Maybe you can have a look and see if these options can help you achieve your goal. Another possibility would be to concatenate the reference columns in question and use them to join or reference the data sets.

kn_example_row_match_exclude.knwf (55.5 KB)

1 Like

Do you want to compare attributes within rows or between rows? In case you want tp compare values within rows, you can use a Rule-based Row Splitter:


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.