Regex split by pattern and capture text in between too

Hi,

I am trying to get full transactions data from bank account OCRed document.

The information is split as follows (can be seen easily in first screenshot):
1st line contains the operation details (number, date, value date, transaction type and amount.)
2nd to x line there might be transaction description (can be nothing in there too)

So far, using the regex extractor node I can always detect the first line containing the bank operation which is on one line. (all occurences). Unfortunately I can’t figure out which operator to use to gather also all the information in between the occurences. In my case this would become the transaction description.

Here is the code I am using so far:

(?<operation>\d{0,3}) (?<date>\d{2}\-\d{2}) (?<dateval>\d{2}\-\d{2}) (?<ops_type>.*?)[ ]{0,4}(?<amount>[-]{0,1}[ ]{0,4}\d{0,3}[\.]{0,1}\d{0,3}\,\d{2})[ ]{0,4}EUR(?<Description>)

I tried for the description part to add (?.) or (?.?) but then I get the first occurence and all other transactions are lost.

Has anyone an idea of which operator I should try to get the transaction description that can sometimes appear between the occurences ?

Here is the sample file if needed:
transaction test.csv (615 Bytes)

Many Thanks,
Laurent.

1 Like

You need a positive lookahead. The regex

EUR(?<description>).*?(?=\d{1,3})

will match EUR followed by any character token between 0 and unlimited times, which itself is followed by a 1 to 3 digit number, but the result will NOT include the number.

The full string looks something like this:

(?<operation>\d{0,3}) (?<date>\d{2}\-\d{2}) (?<dateval>\d{2}\-\d{2}) (?<ops_type>.*?)[ ]{0,4}(?<amount>[-]{0,1}[ ]{0,4}\d{0,3}[\.]{0,1}\d{0,3}\,\d{2})[ ]{0,4}EUR(?<description>).*?(?=\d{1,3})

3 Likes

Hi Elsamuel,

Thank you for this idea as it works with the scrambled test document I provided. Unfortunately in the real file the beneficiary is a bank account and thus, the XXX are numbers. I am providing here a new example.

I will test your solution to see if it can be done twice (1 before and 1 after a bank account if found).

Many Thanks,
Laurent.
transaction test_elsmauel.csv (620 Bytes)