Hello all!
I’m currently trying to extract a certain number and date out of a list of PDFs and then rename the PDF with that number and date. To be specific, I’m extracting the Employee Number from a payslip as well as the month and year.
Here lies the issue. Employee Number is under the address of the payslip and some addresses have more lines than others. The Employee Number also isn’t structured information, so I’m finding it tricky to extract the four digits when the position changes from payslip to payslip.
I’ve started with the Tika Parser and tried different combinations of the Sentence Extrator node and String Manipulaton node but no luck. Any help would be much appreciated!
Will attach two dummy payslips with multiple address lines.
The Employee Number of interest is the four digit number preceding the name at the top of the “table” after the address.
Commanders Payslips Feb 24_Part8.pdf (121.7 KB)
LA Chargers Payslips Feb 24_Part8 2.pdf (116.5 KB)