Regex to insert values into a column

I have a column of data that looks something like this:

OR:185520, TR: 3956, CO:8, PR:BC6-1-A2, Gross QTY:5003.
OR:185583, TR: 29771, CO:1, PR:BC1-1-A1, Gross QTY:5001.
OR:186772, TR: 32438, CO:4, PR:BC6-1-A1, Gross QTY:6703.
OR:186728, TR: 32359, CO:2, PR:BC2-1-A1, Gross QTY:3900.
OR:187641, TR: 32438, CO:1, PR:BC6-1-A3, Gross QTY:6302.
OR:922609, TR: 170412, CO:5, PR:P-MN52, Gross QTY:5797.
OR:19956, TR: 1032, CO:1, PR:P-MN23, Gross QTY:4000.
OR:14496, TR: 1024, CO:7, PR:P-MN32, Gross QTY:5000.

The value that follows the PR: is either in a BC#-1-A# or P-MN## pattern. I my current workflow I have 1 regex module for each pattern which puts the values into separate. The 2 patterns configured are


The columns merged into a single column later which in a round about way meets my requirement.

The problem is this generates a lot of errors just becuase for every row in the data that is processed one of the regex modules will fail. What I would like to do is be able to pickup both pattern with a single Regex module. It will  remove 6 modules from my workflow.

I hope someone may have some suggestions.




I have split the data using two nodes:

1) I used the "String Replacer" node to replace all occurrences of "," with ":"

2) I used the "Cell Splitter" node and used ":" as the delimiter

The Arr[7] column now contains the numbers you want.  Of course this will only work if your data is always in the same format.

OR you could do this with one node:

Use the  "Cell Splitter" node and used "," as the delimiter.  Arr[3] will contain your PR information.


Hi Macca,

The problem is that the data order of the single column that I am looking at is not always identical, but the format of the data components is. So  in the above example the data components are OR: TR: CO: PR: Gross QTY but there may occasional be a slight variation for example Gross QTY: may be QTY: which I can easily manage. The tricky part is the order may not always be the same which is why I am using regex to extract the data I need. 

My workflow, as is, is producing the results but I would like to streamline the flow a bit by keeping modules to a minimum and get rid of the unnecessary errors in the log. This flow will need to work though about 2-3 million rows of data a day across about 4 different sourcesets, but this will easily grow exponentially, so optimisation where possible is important.

Thanks for you feedback but for this flow I am going to have to stick with the regex but hopefully I will be able to optimise it a bit more.