How to check if a column in grouped data (multiple rows for a group key) has all values in a list?

vkgautham · October 15, 2023, 1:08pm

I have some data like this:

Group Key|Col1

100|C

101|R

101|C

101|…

102|R

102|S

How to filter all those rows, where there is atleast one record each with “R” and “C” in Col1 for a given group key?

What I have done,

Group Loop Start → Removed all duplicates for Col1 …

From here I am able to get if the count is more than 1 using a GroupBy (meaning two unique values in Col1 for two rows).

How do I compare that those values are “R” and “C” one in each row.

I am thinking of using Java Snippet and iterate through the group, and use temp variable to find if both “R” and “C” are found, and filter them. But then it is like a programmer. Any other way in Knime?

Thanks

iCFO · October 15, 2023, 1:35pm

vkgautham · October 15, 2023, 1:40pm

I am not sure how the Node suggested will help me,

but I want to filter out all those rows (atleast 2 rows) in a group and one row having Col1 as “R” and second row having Col1 as “C”.

iCFO · October 15, 2023, 2:11pm

I am not sure if I fully understand the output, but I tend to lean on the Rank node to do counts per group. Then you could easily filter down to just the groups containing a requisite number of rows per group (or target individual row numbers per group if you use ordinal)

You can also use your Rank output to drive a group by loop for additional options

marzukim · October 16, 2023, 2:33am

hi @vkgautham
is this what you expected …

if not, you can disregard the attachment.
KNIME_2rows_MM.knwf (2.2 KB)

rgds.

vkgautham · October 16, 2023, 8:06am

@marzukim
Unfortunately I could not see any nodes in the workflow, but do let me know what is the approach.

Thanks

marzukim · October 16, 2023, 8:58am

hi @vkgautham
please find the new attachment. i apologize for the inconvenience.
KNIME_2rows_MM.knwf (103.4 KB)

rgds

vkgautham · October 16, 2023, 9:16am

Hi @marzukim Thanks for the workflow, but it is picking up those groups where, either “R” or “C” is present. However, the use case requires to filter out those groupKeys where both “R” and “C” are present in the dataset.

I have achieved the same using Java Snippet this way:

The custom variable in Java Snippet is available at Group scope, I iterate all the rows and set the flags if “R” and “C” are found. Then if both the flags are true, I output some status (ex: “P”). Then filter out all those group keys, based on the status.

Thanks

marzukim · October 16, 2023, 9:56am

if i understand correctly, you want to filter out all the R and C keys. you can do either by using column expression node, the rule-based row splitter node to split the unwanted keys or by reversing the logic condition in the row filter node in my workflow …

rgds
p.s i’m not an expert in java

vkgautham · October 16, 2023, 11:24am

@marzukim
Its not regular filter. What I am trying is, the filtered group should contain both the records: one with “R” and another with “C”. If a group is just having “R”, it should not be selected, and likewise, if a group is having only “C” it should also be not selected.

Thanks

iCFO · October 16, 2023, 11:49am

Are “R” and “C” actual static values, or are you just using those as an example of values that could vary by group?

You could just concatenate all of the unique values into 1 column and filter via a Regex match on both of those values to see which groups contain both. Then you could filter out the groups that don’t meet your criteria.

KNIME_2rows_MM.knwf (54.6 KB)

marzukim · October 16, 2023, 12:43pm

revised approach, filter out only the groups that contain both r and c keys based on my dummy data.
KNIME_2rows_MM-R2.knwf (113.6 KB)
rgds.

system · January 14, 2024, 12:44pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.