Extracting Text with Regex

Hi guys,

I have a text column with a blob of text in each cell. Each blob begins with some alphanumeric code followed by a new line feed, like 4f5ghjhjkll77xk for example, and then there is text. So, it looks like this:

some text blah blah blah
some more text blah blah blah

So, I am trying to extract the top alphanumeric code and when I test it out in a regex tester, it works. Here is my regex:


I tried using the Regex Split and it is not giving me what I need. I’ve been also trying using Java and Python snippets but somehow having problems with them too.

Could somebody please help me out with this?



IMHO, it can be done without regex, just attach “String Manipulation” node a place there
the expression like:


You will get substring beginning with first char and ending with first occurence of line break.

Martin K.

1 Like

Martin, first - thank you! This, however, partially helped me. I was hoping to learn a method here that would help me extract other text from within that same column. I’m not sure how to do that with this method. Or do you think it is possible to adapt it to extract other text? You see, every cell has some structured text there and I want to use that to get the information out of there. For example the blob in each cell has a portion that looks like this:

Name: Some Name

Phone: 555-555-5555

Address: 123 Street St

Etc: etc

How do I get these out as well?

Hi cageybee,

It’s a complex question. On this particular sample, there might be applicable regular expression like:

Name: (.* .)\n\nPhone: (\d+-\d+-\d+)\n\nAddress: (.)\n\nEtc: (.*)

You will get four new columns containing Name, Phone, Address and Etc. Try to analyse your data and separate static part of your samples from dynamic one. I also recommend you to visit site like regex101.com, there you wil be able to find and test suitable expression covering the sample set as good as possible.

Regards !

Martin K.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.