Help with a simple regex problem...


I'm trying to extract an email from a string and I'm having a case of brain freeze.  The approach I would use in Perl/Java isn't working and I'm not sure if I'm missing something.... 

Here are sample strings (each block is one line, disregard the wrapping):

[This post has been edited by KNIME since it contained confidential customer data.]

My regex that kinda works (a publised email validation regex doest work, so I simplified for now):


The one that I would love to put in place is closer to:

.*\s([_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,}))\.? (but again, it doesn't work)


The main problem is that nothing I do to remove the trailing period works.  I have tried:




Any thoughts?




If you could specify how they do not work, that might help diagnosing the problem.

Here, you can see my attempt with the following regex: [\s<](\w+@\w+(?:\.\w+)+)

Cheers, gabor

Thanks... this helps a bit, but I'm still missing some... here's my modified one (based on yours):


On the list below, this matches the right things on: (pasting the list at the end)

But when I run it within knime, none of the numeric emails (except for "") match...

[This post has edited by KNIME since it contained confidential customer data.]


Hi again,

To clarify, the data I posted was public data from genbank (I would never post confidential info on a forum like this).  I think there's either a bug in how Knime is parsing the string, or some slight difference in the interpretation of the regex that I'm missing.  I can use the regex in Java/Perl without issues, but knime misses many of the hits.  The message I get is not that useful:

"545 input string(s) did not match the pattern or contained more groups than expected".  IS there a way to increase verbosity?

I think it has to do with the end of string handling, but nothing I try works.