Request help for Cell Splitter Node

Madhu_Sudhan · November 11, 2024, 9:33am

I have 100 lines items in two different file, i have to do validation if both files columns data are matching correctly based on the Employee ID , but we have one catch is File A multiple columns like Employee ID, Country, Zip code etc however File B has only one column which consist all the above columns data in one cell itself , so now we have to split the cell in B file and compare all columns of A file with B file, please suggest node and its steps to resolve this. The format of B file cells for each employees like below
"XXXXXXXXXXXX (XXX-XXX)

XXXXXXXXXXXX (XXX-XXX)

XXXX XXXX XXXX (XXX-XXX)

XXXXXXXXXX (XXX-XXX)

XX/XXX/XXXX/XXXXXX (XXX-XXXX)"

takbb · November 11, 2024, 11:12am

Hi @Madhu_Sudhan , welcome to the KNIME community.

In your “example” it would probably be clearer and easier for people to help if you could provide readable examples (they can be fictitious) rather than just XXXXX as being able to see if there is an order to the fields, or any delimiters will likely make a difference to the approach taken.

Also, is there a particular pattern to Employee ID? People may suggest regex but they’d need to know what patterns are present, if any.

Madhu_Sudhan · November 12, 2024, 5:00am

Hello Sir…thanks for your swift response
K i have GID in A column of A file and same with B File, now i have Government IDs like Gove-1,Gove-2, Gove-3, Gove-4 and Gove-5 in 5 different columns (Each id as different number format) in A file for each employees. However in B file we have these all Governments IDs is in one column itself integrated with in the cells for each row employees Example for Employee XYZ Row cell consist (xxxxx Gove-1, xxxxxxxxGove-2, xxxxxxxx Gove-3, xxxxxxxxxGove-4 and xxxxxxxGove-5) with. Now we have Compare A file all 5 IDs within the cells of each employees of B and give us the result as True or False

Madhu_Sudhan · November 12, 2024, 7:08am

AND SEQUENCE OF iDS are dynamic it may be different in each rows.

takbb · November 12, 2024, 9:37am

So a single row could look like this, with the commas and the “government ID” after each field?

xxxxx Gove-1, xxxxxxxxGove-2, xxxxxxxx Gove-3, xxxxxxxxxGove-4, xxxxxxxGove-5

and the xxxxxxx could be Employee ID, Country, Zip Code, etc is that what you are saying or am I misreading what you said?

Madhu_Sudhan · November 12, 2024, 10:52am

Yes each cells has this below format hover sequence may vary like first cell the below format and second cell 3 4 5 1 forth cell 2 1 5 3 4 like wise every cell sequence of a column different
xxxxx Gove-1, xxxxxxxxGove-2, xxxxxxxx Gove-3, xxxxxxxxxGove-4, xxxxxxxGove-5
Example format below

A column a per the below
National Identifier
XXXX XXX XXXX (IND-AAD) XXXXXXXXXX (IND-PAN) XX/XX/XXXX/XXXXXX (IND-EPFO)
XXXXXXXXXXXX (IND-UAN) XXXXXXXXXXXX (IND-PRAN) XXXX XXX XXXX (IND-AAD) XXXXXXXXXX (IND-PAN) XX/XX/XXXX/XXXXXX (IND-EPFO)
XXXXXXXXXXXX (IND-UAN) XXXXXXXXXXXX (IND-PRAN) XXXX XXX XXXX (IND-AAD) XXXXXXXXXX (IND-PAN) XXXXXXXX (IND-PAS) XX/XX/XXXX/XXXXXX (IND-EPFO)
Rest other columns asper the below
|IND-UAN|IND-PRAN|IND-AAD|IND-PAN|IND-PAS|IND-EPFO|
so flow should extract before characters and past under correct headers which with in the brackets
|IND-UAN|IND-PRAN|IND-AAD|IND-PAN|IND-PAS|IND-EPFO

takbb · November 12, 2024, 11:11am

thanks for the additional info @Madhu_Sudhan, and are the government identifiers that are shown in the above the complete list (i.e. there are those 6 identifiers) or might there be others?

Madhu_Sudhan · November 12, 2024, 11:57am

There are total 7 identifiers with in the cell , but not mandatory all cell should have all 7 , one cell may consist 4 or cell consist 7 or 5 also not sequence, so it should split match headers and move

Madhu_Sudhan · November 13, 2024, 4:40am

Madhu_Sudhan · November 21, 2024, 1:32pm

Dear Team,

I am still awaiting for response or solution for the above topic, need your kind help.

Regards
Madhusudhan J

takbb · November 22, 2024, 10:05pm

Hi @Madhu_Sudhan

please try this:

cell splitting with identifiers.knwf (83.4 KB)

You will need to install “Palladian” extension if not already installed, which contains the Regex Extractor node.To me this is likely the best regex extraction node for the situation where there are potentially multiple (but an unknown number of) items to be matched per row.

Madhu_Sudhan · November 25, 2024, 12:20pm

Dear Sir, I am not able to open the above flow after i download, i can not see any node on my screen. Could you please me sending the new download link

takbb · November 25, 2024, 1:15pm

Hi @Madhu_Sudhan , I just downloaded the workflow using the above link and it is valid, so re-uploading it isn’t likely to change anything. What version of KNIME are you using? Are you able to download other workflows ok?

edit: as an alternative location, in case there is something odd with the above link, I have saved it to the hub here.

system · February 23, 2025, 1:16pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.