Filtering Unique rows - need help


I'm trying to filter out unique rows. My input dataset has 4 dupes where ID is 271 and 2 dupes where ID is 251. There's not much I can do about the input data, it's an export from an enterprise management system. The whole project is reconciling data among the enterprise system, quickbooks, the physical folders and my client's head. The flow I attached is only the manipulation on the enterprise system (HomeTrak = ht), not the others.

Node 20 is my filtering out where ID = 271, which I only know is a dupe because of Node 35, which shows me the count of IDs. I didn't discover 251 was a dupe until after I'd run through the data and found some issues.

The only way I could figure out how to do it was to isolate 271 then pull one of the rows through (based on row ID) and concatenate it to the original file that doesn't have any of the 271s. (Nodes 20, 23, 22). This doesn't work when the records are in different rows (which I don't expect it to).

In SQL I'd do a SELECT DISTINCT because I know that the records are exactly the same (through eyeing it, not through any node formula). 

Any help is appreciated, thanks.

GroupBy node could be your best friend for this task.

Another solution is to identify duplicates with RowID node, then filter them out using String Manipulation and Rule engine filter.


just to add some more information to the previous post.

a SELECT DISTINCT can be achieved by adding all the columns you are interested in to the group columns of the GroupBy node dialog.