I'm trying to extract an email from a string and I'm having a case of brain freeze. The approach I would use in Perl/Java isn't working and I'm not sure if I'm missing something....
Here are sample strings (each block is one line, disregard the wrapping):
[This post has been edited by KNIME since it contained confidential customer data.]
My regex that kinda works (a publised email validation regex doest work, so I simplified for now):
.*\s(.*@.*)
The one that I would love to put in place is closer to:
.*\s([_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,}))\.? (but again, it doesn't work)
The main problem is that nothing I do to remove the trailing period works. I have tried:
To clarify, the data I posted was public data from genbank (I would never post confidential info on a forum like this). I think there's either a bug in how Knime is parsing the string, or some slight difference in the interpretation of the regex that I'm missing. I can use the regex in Java/Perl without issues, but knime misses many of the hits. The message I get is not that useful:
"545 input string(s) did not match the pattern or contained more groups than expected". IS there a way to increase verbosity?
I think it has to do with the end of string handling, but nothing I try works.