@Cairo first you might want to investigate why you have so many (unwanted?) duplicate rows . More often than not the question is not one of technology but of concept:
Dealing with duplicates is a constant theme with data scientist. And a lot of things can go wrong. The easiest ways to deal with them is SQL’s GROUP BY or DISTINCT. Just get rid of them and be done. But as this examples might demonstrate this might not always be the best option. Even if your data provider swears your combined IDs are unique especially in Big Data scenarios there might still be lurking some muddy duplicates and you should still be able to deal with them.
And you should be able t…
https://hub.knime.com/-/spaces/-/latest/~kyA_KJ2QUUgI7g61/
I know it has been asked before and the reply is GroupBy node but I have not idea how to do it. Could anyone please share a workflow or at least tell me the settings in the configuration?
3 Likes