Duplicates - how to deal with them

Good day Knime Expert,
I have a list of materials from a quote request and need the history of the client for the items as well as any Discounts applicable for the specific items.
All data is in one file as per the example attached. However, I want the required outcome to display the Discount number and % on the same line as the new requested item.
In the example the lines in red is my problem.
Have used a flow for Column Combiner, Duplicate Row Filter, Rule_based Row FIlter and GroupBy.
The results are also in the attachment.

The history fo
Test Data.xlsx (50.8 KB)
r each item is required separately and is not affected in the combination.

Hi @Hannesd

Could you perhaps clarify/draft your expected output for those lines in red? You mention the desire to display the discounter number and the % but it seems the discount is always zero and with % you mean something with Matrix % (right)?

In the Excel you mention “Need to write info from Matrix code and % into the Quote line” but it’s not very obvious what that quote line refers to, I only see a column date quote.

1 Like

I need to have the info in Matrix code and Matrix % to be written in the same line above as per the sheet that was attached. Below is the required output. This data is refering the line just above as currently on the example sheet.
Required outcome for the line with Martix code and %:
ABC 15765 25HDB BUCKET 25l 5 Each 7813 2/17/2023 DME9922 2.5 15 15765,25HDB chosen

The discount mentioned is confusing and can be ignored for this enquiry, only internal reference.
Trust this makes better understanding.

Attached is the file with the line in yellow required and the line not to display in red.
Other data still required but not applicable for the removal of Duplicates.
Test Data.xlsx (50.9 KB)

You can do this with a combination of First and Last in the groupBy node.


ID 15 is retained with the values on Matrix code and % of ID 16

Grouping:

image

1 Like

Hello ArjenEX,

Will try it and see if I can get the same result. Thanks for your response.

Afternoon Arjen EX,

Used your proposal in real data and got the single line per item.
Thanks for your reply and assistance.
Hannesd

1 Like

Glad to hear! :slight_smile: If it helped you please mark the post as solution so that other KNIME’ers can also benefit from this more easily in the future.

1 Like

I now have another scenario of duplicates to be removed. This time the item may be on the original list any number of times (this case 1 to 4 times.
Various Attributes are checked, and each set of Attributes are validated. A column indicated if the set is not the same by displaying False in the column next to the set.
Example: C of O (Table 1) vs C of O (right) (Table 2), if value of Table 2 is different to Table 2, next Column CoO(False) indicate False.
If the next Attribute to check is also different between the tables, a second line is created for the same item, but for the next Attribute.
I require the final output to have the second line detail to display on line 1 then.

Workflow used:
String Manipulation (crating a ID number)
Column Combiner (Concatenate Number and Code) for a unique reference
Column Resorter
Column Rename
GroupBy - loose the Column to indicate False for other Attributes than the first set.

Data sample attached.
Test Data 2.xlsx (10.8 KB)

New day and fresh in the morning helped.
Problem solved by changing GroupBy - Manual Aggregation.
Select the columns which are required to be added into one item line.


1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.