Regex split?

Hi everybody,

I have hundred of thousand of rows with free text. I would like to extract all words 'the' and the next and previous words. I don't kwow how many time the word the is present in the text and I would not like to extract words that contain 'the' (e.g. 'There')

E.g.

1) I like the Afro school. I want to buy the book related to the CC. There is something interesting inside

2) The pen is on the table. THE, is the best film.

3) I like the sun today

I would like to get:

 

I like the Afro school. I want to buy the book related to the CC. There is something interesting inside like the Afro buy the book to the CC
The pen is on the table. THE, is the best film. The pen . THE, is the best
I like the sun today like the sun    

Please could you help me?

Thanks in advance

How about String Manipulation with regexReplace ? The idea would be to search for ( )+(the|The|THE)+( )+, i.e. "the" preceded and followed by a space. Replace the space this by "_the_". Now you could use Cell Splitter with space as delimiter. Unpivot the newly created columns by using wildcard for selecting the variables to unpivot, after that filter all the rows that do not contain "the".

Many Thanks!