Extract Specific Mail from the each cell

Hi Team,

Good Day,

I want to extract the Mail ID in the description column of the attached excel sheet. please kindly provide me with an update.

I have tried using the Excel reader node to read the excel file and then column filter to get description column and cell splitter node to split the cell but the Mail ID which I want to get is in different positions in each cell.

Please suggest me with the nodes and logic to get it.

The Mail ID that I required is in the line starting with Change Owner.
change_request.xlsx (17.4 KB)

Thanks,
Subramanyam Kinthada

Assuming you’re trying to extract an email, I’d just use Regex in a Regex Extractor node.

The expression (?<Extracted Change Owner>Change\sOwners.* \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b) is a good start.

Depending on your goal, you may need to another extractor node on the output of this one.

Example end result:

If you have multiple email addresses that match, you’ll need to do more work to extract the one you want.

4 Likes

Hello @Subramanyam
I’ve tried to approach by coding it out of KNIME, and implement it within a ‘String Manipulation’ node.

regexReplace(
	regexReplace(
		regexReplace(
			$Description$, "((?:).*\\n+)+(.*Change Owners.*:[\\s+]|.*Change Owners[\\s+]\\/[\\s+])(\\n.*|.*)((?:).*\\n|.*)+", "$3"
			)
	 	, "[^0-9a-zA-Z.@\\s]+", " "
	 	)
	 , "(?s)(?:(.*?)(\\S+@\\S+.com)((\\s+\\d+|\\d+)+)?(\\s+\\S+|\\S+)?)", "$2 | "
	 )

As @elsamuel mentioned the casuistic is overwhelming. For example in Row2 ‘Change Owners’ is a whole paragraph; unformatted with plenty of line breaks and emails. The logic rule applied for this case is ‘emails before a line break’, capturing ‘chandra.mr@pg.com’

In Row1, there is the possibility of capturing two emails as they are in the same text line; and so on…

I would be very careful to apply it, QC would be needed.

BR

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.