If there is multiple records sent for record type 02D or 61, we should remove the complete record set for that plan, ssn from the file
In this example since we have 2 lines for record 02D
35844 001-11-1111 02D 0108202000000000 04042002
35844 001-11-1111 02D 0108202000000000 0404200201012023
so in this case all the lines should be deleted and put to a diff file for reporting purposes
How can we do it in knime.
The way i tried was split the record as below using cell splitter by position
and fetched
Plan SSN Record identifier Record Type
35844 001-11-1111 02 D
Then i used rule based row splitter and fetched records
Record Identifier = 02 and Record type =“D” => TRUE
Record Identifier = 61 => TRUE
it got me all record 02D and i used duplicate row filter to Identify duplicates
but by doing this, i am only bale to delete the 02D but not the complete record set that matches this plan and ssn.
Any help is grreatly appreciated
First of all, you can use the file or csv reader to create a table to split the columns about the dataset. If you have a separator, better because you can limit the possibilities from this transformation.
First column = 35844
Second column = 001-11-1111
Third column = 02D
Fourth column = 0108202000000000
Fifth column = 04042002
Made it, you can just filter row if you have some conditions using rule engine to set values as “this matches” and then just make the process as you need.
If you can’t do it, you can use rule engine node to sign a flag if the line have the “flag” condition as:
I tried it, but with rule engine i can set a flag and Identify 03D and 61 and remove them. The challenge is if there is a multiple 02D and 61 in the dataset, we have to remove all the records for this dataset
This is one dataset, which contains 14 rows. One dataset is the data for one plan and ssn.
here for plan 35844 and ssn 001-11-1111, we have 14 records
since this dataset contains 2 records for 02D, all 14 rows should be removed
OK, so why don’t you use the row filter or row split with wildcards?
You can do the same process, and the, with one of these node, you can set to use wildcards link * 02D * to me excluded. With split one, you can see who can go one and who will be removed…
Hi @chaithuj , another possibility I think is that having got your list of duplicated 02D or 61 records, you use a Joiner to join back to the original dataset, joining by plan number. The joiner will do your filtering for you…
Steps:
First of all perform a further duplicate row filter on your “duplicates” list, detecting duplicate plans in the list and having it remove the duplicate rows (keeping just the first in each case). This then gives you a dataset of plans to be removed with one row per plan.
Attach your original dataset to the upper port of the joiner and the new filtered duplicates list to the lower port.
Configure the joiner to return both matching and left unmatched (left outer) rows and return them to different ports. Only return columns from the top input table.
Then the top output port will be the records to be deleted as these matched the duplicates list.
The middle output put is the remaining rows that you want to keep, as these didn’t match the duplicates list.
Sitting on a train at the moment without KNIME otherwise I’d upload a simple example. Hope that helps.