Hi, new to the forum so please let me know if it’s ok to post here.
I need help extracting information from a string that comes from a Bill in PDF.
I’ve already managed to separate some parts of the text so I’m left with a huge string that follows this pattern:

(Quantity) “X” (Price) (SKU Number) (Description) (Total Price)

and it repeats for X amount of times. Some times though, when the Quntity is 1 then the (Quantity) “X” (Price) is ommited. Also, some times there are extra lines detailing a discount.

I know I have a lot to do here, but right now I’m having trouble envisioning how to make such a task, specially to make it a loop that looks for the next SKU until there are no more on the string.

Any help is appreciated.

You might want to upload sampel data to get help
I would give the Regex Extractor Node a try

String Sample.txt (2.0 KB)

Sorry, this is a sample of a string where it contains the Quantity, Price, SKU and Product Description.
It’s difficult to see at first sight, but you’ll notice that the pattern I mentioned is there. The only thing I didn’t go into detail is that the “TMP OFERTA” is a discount line for the previous line.

As an example, this is how the first few lines should look. Sorry that it’s in spanish, if needed I can change it.


There is certainly some cleaning to do but if you use csv reader with column delimiter sth. like “X $” and transpose your table you might have a starting point

