I have the following text (example):
A0630 PATTERSON - 1JAN12 TO 31DEC12
I need to extract Patterson from it by excluding certain
I used the following regex to find the pieces I want to exclude, but cannot find a way to get the remainder:
[a-zA-Z]\d{4}|[0-9]{1,2}[a-zA-Z]{3}[0-9]{2}
Hi @IrynaK , Looking at your regex attempt, it looks like the data format is pretty much in the way it is in the line you presented.
If thatâs the case, would it make sense to say that you basically could do a substring up to the dash (-), and the remove the A0630? Actually you could also do a substring between the first space and the (-).
Thank you!
The text can have or not have dashes, it is all over the place. It is a free form text. There are just some dates in different formats and ids that I have identified in regex that I need to exclude and want to see the rest of it.
it will be any number of words or nothing left after the dates and the id are extracted; the text can have dates or may not, the text can have ids or may not as well.
(dont worry about the different date formats, I have already accounted for those)
J5667 want to cancel the contract
J7695 20211001-20250930 INV12356
Hi @IrynaK, I agree with @bruno29a that it does sound rather open-ended. The joy of trying to discern useful information from free text!
Would it be an option maybe to remove any âwordâ sequence that also contains at least one digit and see what that leaves behind? Is that what you were originally aiming for?
As @bruno29a has mentioned, ideally you need a good amount of representative sample data to be able to make better suggestions or find any patterns.
As it stands it looks like every row needs a different hand crafted regex!
I have handled the expected formats I want to exclude. My regex says âA | B | Câ, which finds those parts of text and Regex Extractor extracts them. But what I want is regex for âEXCLUDE A | B| Câ. The help I am looking for is how to handle EXCLUDE in the Regex Extractor.
Iâm struggling with that in Regex Extractor too, and more generally to get a âeverything exceptâ with Regex either inside or outside of KNIME.
I was a little confused because I thought you were saying you ONLY wanted âPATTERSONâ from your original sample data, but if you are happy that your original regex does what you need, then maybe using it much like @bruno29a did using String Manipulation instead to ârub outâ using regex is a possibility:
Thank you all for helping! My full multi condition regex did not work in String manipulation node.
However, I have solved it differently.
I used Strings to Document node and then Regex Filter and it worked!
Thank you everyone again!