How to eliminate rows if they have the same values in some columns

Hi,
I have a table in which I need to eliminate all rows with the same: DAY, HOUR, MINUTE, SECONDS:

image

So, In this table I should eliminate all yellow rows.
How can I do?

Many thanks
Giad

You could apply a Duplicate Row Filter node, with DAY HOUR MINUTE and SECOND in the “Include” selection, and in the advanced tab tell it to keep duplicate rows with “Add column showing duplicates” checked.

Then apply a Row Filter node selecting “Include rows by attribute value”, “duplicate-type-classifier” as column to test and under matching criteria choose “use pattern matching” select “unique” in the dropdown.

2021-03-26 10_02_44-KNIME Analytics Platform
(I think!)

3 Likes

Hi thanks!
But I think there is something wrong:
If I include only UNIQUE, I lose other CHOSEN that are not duplicate:

Capture

I see “chosen” that are unique, but why I see CHOSEN? I should see UNIQUE for all white rows
Thanks Giad

Hi @giad ,
unique = there is no duplicate for this row
chosen = there are duplicates for this row, but this row is chosen among the duplicates
duplicate = there is duplicates for this row, and it is marked as duplicate of a chosen row

ok thanks, but I need to eliminate both rows with the same values, not only “duplicate”.
So, I need to eliminate all black rows in my image, not only the second row which is the duplicate.

Giad

Yes, sorry, I did not read properly what you needed, and once I did, I corrected my post, but you’re too fast and replied already :slight_smile:

So, in your case, you want to keep the unique only. You should not see chosen for those that do not have duplicates

Sorry, I don’t understand…
for example: in my image above, Why the row with values: 20-16-1-51 is CHOSEN?
It should be UNIQUE, It is not duplicated. So, I see UNIQUE, I can keep only UNIQUE but it is not like that…

Thansk
Giad

Can you share your workflow with the sample data? I want to check why you are seeing chosen for those who are already unique.

I also have an alternative way to achieve what you need, but I need some data to work with

example data set.xlsx (9.9 KB)

here the data set.
Thanks Giad

hi @giad , I processed your sample data, and I don’t see any issue on my part, see below:

Left grid is the duplicate row filter result, and right grid is the groupby result.

I don’t know why on your side you see chosen for the 20-16-1-51. As you can see on my side, it comes out as unique.

Here is the result after filtering on unique:

Is that the expected result?

Here’s the workflow: How to eliminate rows if they have the same values in some columns.knwf (12.3 KB)

1 Like

It is correct what you did, but I replicate each step as your workflow but I continue to see:
Capture

Why? I cannot understand

Can you share the workflow you are using so I can see what you are doing?

the configuration of the duplicate row filter does not seem to be the same as @bruno29a .
You could check that

Hi @giad

I cannot understand why in your example some of those unique rows appear to be marked as “chosen”, and I have been unable to replicate your result by simply messing with configuration options.

This leads me to believe it is one of a few things: either there is something else in your workflow that we don’t know about which is filtering out what I see as “missing duplicate” rows that should be there with the “chosen” rows, or something is wrong with your knime installation, or you have stumbled on a bug.

Which version of knime are you using? As @bruno29a has already said, it would be useful to see you example end-to-end workflow including your initial data file (I think you have only posted the resultant output file, which means we cannot easily backtrack to see what has happened).

If you have somehow found a bug or really cannot get this mechanism to work, and time is of the essence that I can suggest trying a slightly different approach using an idea from @bruno29a 's workflow - where he tests with a groupby, you could take that idea forward and do the following which should also yield the desired result:

image

Here you use GroupBy on DAY HOUR MIN SEC and have it record the count as a new column. You then filter out all rows with a count > 1. Finally you join back on the original dataset using DAY HOUR MIN SEC. The resultant set will be only those with a DAY HOUR MIN SEC combination that appeared only once.

1 Like

The solution that @takbb gave was going to be my alternative solution, but I wanted to see @giad 's workflow first to understand why the duplicate row node was not giving the expected results - I would suspect that some manipulation is happening somewhere, and thus the Groupby could also yield the same unwanted result if the same manipulation is happening.

2 Likes

I had a feeling you may have been thinking along those lines @bruno29a, and credit to you as I was simply following your lead. I too would like to see the failing workflow so we aren’t left hanging wondering if there is a bug or a better explanation for this behaviour. And it would help others in future, especially if it turns out that there is some “trap” to be avoided.

2 Likes

Ok great I solved it!
Many thanks to everybody

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.