I have 100 lines items in two different file, i have to do validation if both files columns data are matching correctly based on the Employee ID , but we have one catch is File A multiple columns like Employee ID, Country, Zip code etc however File B has only one column which consist all the above columns data in one cell itself , so now we have to split the cell in B file and compare all columns of A file with B file, please suggest node and its steps to resolve this. The format of B file cells for each employees like below
"XXXXXXXXXXXX (XXX-XXX)
In your “example” it would probably be clearer and easier for people to help if you could provide readable examples (they can be fictitious) rather than just XXXXX as being able to see if there is an order to the fields, or any delimiters will likely make a difference to the approach taken.
Also, is there a particular pattern to Employee ID? People may suggest regex but they’d need to know what patterns are present, if any.
Hello Sir…thanks for your swift response
K i have GID in A column of A file and same with B File, now i have Government IDs like Gove-1,Gove-2, Gove-3, Gove-4 and Gove-5 in 5 different columns (Each id as different number format) in A file for each employees. However in B file we have these all Governments IDs is in one column itself integrated with in the cells for each row employees Example for Employee XYZ Row cell consist (xxxxx Gove-1, xxxxxxxxGove-2, xxxxxxxx Gove-3, xxxxxxxxxGove-4 and xxxxxxxGove-5) with. Now we have Compare A file all 5 IDs within the cells of each employees of B and give us the result as True or False
Yes each cells has this below format hover sequence may vary like first cell the below format and second cell 3 4 5 1 forth cell 2 1 5 3 4 like wise every cell sequence of a column different
xxxxx Gove-1, xxxxxxxxGove-2, xxxxxxxx Gove-3, xxxxxxxxxGove-4, xxxxxxxGove-5
Example format below
A column a per the below
National Identifier
XXXX XXX XXXX (IND-AAD) XXXXXXXXXX (IND-PAN) XX/XX/XXXX/XXXXXX (IND-EPFO)
XXXXXXXXXXXX (IND-UAN) XXXXXXXXXXXX (IND-PRAN) XXXX XXX XXXX (IND-AAD) XXXXXXXXXX (IND-PAN) XX/XX/XXXX/XXXXXX (IND-EPFO)
XXXXXXXXXXXX (IND-UAN) XXXXXXXXXXXX (IND-PRAN) XXXX XXX XXXX (IND-AAD) XXXXXXXXXX (IND-PAN) XXXXXXXX (IND-PAS) XX/XX/XXXX/XXXXXX (IND-EPFO)
Rest other columns asper the below
|IND-UAN|IND-PRAN|IND-AAD|IND-PAN|IND-PAS|IND-EPFO|
so flow should extract before characters and past under correct headers which with in the brackets
|IND-UAN|IND-PRAN|IND-AAD|IND-PAN|IND-PAS|IND-EPFO
thanks for the additional info @Madhu_Sudhan, and are the government identifiers that are shown in the above the complete list (i.e. there are those 6 identifiers) or might there be others?
There are total 7 identifiers with in the cell , but not mandatory all cell should have all 7 , one cell may consist 4 or cell consist 7 or 5 also not sequence, so it should split match headers and move
yes this are |IND-UAN|IND-PRAN|IND-AAD|IND-PAN|IND-PAS|IND-EPFO identifiers with in the cell , but not mandatory all cell should have all 7 , one cell may consist 4 or cell consist 7 or 5 also not sequence, so it should split match headers and move
You will need to install “Palladian” extension if not already installed, which contains the Regex Extractor node.To me this is likely the best regex extraction node for the situation where there are potentially multiple (but an unknown number of) items to be matched per row.
Dear Sir, I am not able to open the above flow after i download, i can not see any node on my screen. Could you please me sending the new download link
Hi @Madhu_Sudhan , I just downloaded the workflow using the above link and it is valid, so re-uploading it isn’t likely to change anything. What version of KNIME are you using? Are you able to download other workflows ok?
edit: as an alternative location, in case there is something odd with the above link, I have saved it to the hub here.