Advance Row Filter with unique concatenation help required

beloshi · May 30, 2018, 3:15am

Dear Fellow KNIME Community Members,

Thanks for generating my interest in such a wonderful tool I am searching since last 2 hours to see if there is a solution of my problem went through many posts in order to see if the solution I require exists or not. My case is somewhat different, below is an explanation of my case and the required output.

I am working with 10 to 12 different data files somewhat the data look like in the example below;

|123 |ssmm | | |
|456 |werh |wer |lsfj |
|789 |shytj |sdfe | |
|222 |lou | |df |
|333 |dty |fed | |
|444 |ivh |dfef | |
|555 |kfl |sdfe | |
|666 |dfhu | |dfe |
|777 |dfgfd | |dfe3434 |
|142 |hyi |f333f | |
|123 | |sfere | |
|456 |sfesfe | | |
|123 | | |erybr |
|1234 | | | |

There are four columns above I am removing duplicates by the unique ID as my primary key its not an issue but these Ids are repeating, I can remove the Ids which has no data by Missing Values in all the 3 columns.

Scenario 1: What if I have String Data in 1 columns and remaining two are BLANK as I have mentioned above in my dataset. I want the result like this ;

Scenario 2: Is there an operator where we can define a RULE to check the Specified columns for the data and if there is that It should consider the WHOLE ROW in the OUTPUT with Blank columns remaining blank and data in the column as it is.

So can I achieve it by using any RULE where it should say if the row has all 3 columns blank remove that whole row and if it has value in one row keep it and same and concatenate if the same id is repeated with the unique text. Minimalistic solution will be highly appreciated.

Your help in this regard will be highly appreciated.

Please find the example in the excel sheet to better understand it as well. Excel SHeet Attached

example data set and output.xlsx (9.8 KB)

Kind Regards

Bilal

izaychik63 · May 30, 2018, 3:55pm

It is not clear what exactly needs to be done. For beginning look at Excel reader and Rule-based Row Filter.

beloshi · May 30, 2018, 6:13pm

Hello izaychik63,

Here is the dataSet for example I am using

I am using the Unique ID as the primary key to remove the duplication and unique concatenation to get the unique values by Groupby Node and remove the Unique ID row if it doesn’t have data or empty.

I want the result something

So my question is is there a simple way to use the RULE ENGINE or some rule node to achieve this result where;

1- Unique ID 123 is Unique concatenated.
2- Unique ID 1234 is removed as it has no data .
3- Rest of the Unique IDs remain same as they have data in all three columns or one or two column.
PS: I want the Blank columns as I have tried in KNIME , KNIME interprets blanks as “?” question Mark.

I hope I am clear with my example this time.

Best Regards

Bilal

Martin_K · May 31, 2018, 6:07am

Hi,

Here is a workflow which should help, however not very simple - too many nodes

Martin K.

Advanced_Row_Filter.knwf (19.6 KB)

beloshi · May 31, 2018, 6:51am

HI Martin,

Thank you very much for the solution, this is exactly what I am looking for but yeah there are several nodes so it will help me understand it more. Thanks for the efforts.

Cheers

Bilal

beginner · May 31, 2018, 7:25am

Maybe I’m missing something but the group by node should be able to do that.

Group by the Unique ID and set all the other columns in the aggregation tab to Concatenate (or list) and uncheck the “missing” tick box.

basically the second node in the workflow above. I don’t get what all the other stuff should achieve as the output after the group by is identical as in the end.

GroupBy%20only

Martin_K · May 31, 2018, 10:08am

Hi beginner,

Thanks for your note
I forgot on “Concatenate” option, that’s the right choice. It outputs column of String type (as was required),
so it really simplifies the workflow.
Best regards !

Martin K.