repeated words

Hello,

I wanted to ask how I could identify a comment which it has 2 or more words that are repeated. For example "Today there is a really really important event", or "hurry hurry, its the last day".

I found this regex (\b\w+\b)\W+\1, but it doesnt really works as it only allow me to get comments of only  2 words, like "hurry hurry"..

 I will like to filter out all these entire sentences/comments that have 2 or more repeated words. Probably ussing a Java snipped row filter?, but I am not sure about what code I should use.

Any help welcome!,

Thanks,

 

Francisco

Depending on what you are using to check the match, you might need to make the regex match the entire string, so e.g. adding .* to the beginning and end (or .*? to make it lazy):

.*(\b\w+\b)\W+\1.*

 

Steve

How about. (\b\w+\b)(\s\1)+

 

(\b\w+\b) is your word. And in brackets makes it a capturing group.

\s is your white space such as space or line break.

\1 is to repeat the first set of brackets, I.e. Your word.

And having (\s\1) in brackets as another capturing group now allows us to proceed this with a + to have it 1 or more times.

I've not tried it, but hopefully it's right.

PS. Just tried this out now and works as expected. Simply use the String Replacer node, and choose Regular Expression. For replacement text, enter in $1    ...this refers to the first capture group (i.e. contents of the first brackets). And choose to replace all occurrances.

Simon.

Thank you very much, I will try now and let you know if it works!.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.