How the keep the first row among repetitive rows in a table - data manipulation

busraa · July 15, 2019, 2:31pm

Hi everyone,
I am kind of new in KNIME. I am trying to make a data manipulation for my thesis.
I have the input table below. I just need to keep the first row for a unique article. For example for article 1922, I only need the first row with author_ID=1105826 and score=0.50

I tried it with GroupBy but it didn’t help. I kindly need your suggestions.

INPUT
article_ID author_ID score
1350 891898 0.06
1922 1105826 0.50
1922 1036060 0.10
2661 243949 0.02
2985 857965 0.02
5182 306771 0.50
5182 255639 0.50
5182 335639 0.06

OUTPUT
article_ID author_ID score
1350 891898 0.06
1922 1105826 0.50
2661 243949 0.02
2985 857965 0.02
5182 306771 0.50

Thanks,
Busra

Corey · July 15, 2019, 3:13pm

Hi busraa, welcome to KNIME and the forum!

The Group By node should be the right way to go though.
When you’re configuring that node you’ll want article_ID to be your group by and all of the others should be set to first in the manual aggregation tab.

If this isn’t working for you maybe you can summarize what’s going wrong.

Configuration for Group By node to try:

busraa · July 15, 2019, 6:36pm

Hi Corey, I tried and it really helped. Thank you so much !

Corey · July 15, 2019, 8:45pm

Great news!
Feel free to ask more questions if you bump into issues, and best of luck with your project.

ana_ved · July 16, 2019, 9:23am

Hi everyone!

Corey’s answer is great!! I just wanted to highlight also that if you are using KNIME 4.0, you can also play around with the Duplicate Row Filter. It does exactly that: find duplicates and keep the rows you would like to based on a certain rule - for instance, keeping the first row.

busraa · July 17, 2019, 6:17pm

Thanks!

system · January 16, 2020, 6:18am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.