I have an issue. When I am trying to remove duplicate string data using duplicate row filter node, it cannot delete all the duplicate data. After transferring the data to excel file (using Excel sheet appender), i can remove duplicate data using excel built in function “Remove Duplicates”. How can I resolve this issue?
Hi @emshihab , I have used this node many times and have never encountered such issue.
Can you please share your data or sample data and how you configured the node? Or perhaps share your workflow?
And also show what the expected results for the data?
Due to data security issue, I cannot share data now. But i will create a sample data and share with you.
I think I found the issue. Excel 365 can understand “Case Sensitive” and consider as duplicate value. But knime considers both row as unique. Can you suggest anything how to resolve this issue?
I would convert them all to the same case using a String Manipulation node and then do the duplicate row removal.
Hi @emshihab , convert them to lower case, and then remove duplication.
If you want to keep the original data as is, you can convert to lowercase into a new column, and then apply duplication filter on that new column, and then remove that new column after the operation
also keep in mind that the “famous” whitespace characters can be a pain as well if you forget to remove them.
That’s a very good point @Daniel_Weikert .
@emshihab , you can use strip() to get rid of leading and trailing whitespaces:
In parallel, you can also check for whitespaces by adding some text before and after your records:
join("XXX", $Part number$, "XXX")
As you can see, the last 2 records have a whitespace at the end.
After removing duplicates:
I put something together for you. Here’s what the workflow looks like:
Here’s the workflow: Remove duplicate different case.knwf (9.7 KB)
thank you very much for the solution.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.