Extracting text between two consecutive date strings

Jhuma · September 23, 2020, 10:12am

Hi
I am working on text classification. I have ticket logs and I need to extract text between the first two dates(which are be in different formats) for study. I solved some portion of it using string replacer (to remove newline) and Regex Extractor. But the problem is there are different formats of date in the same text and I am not sure how to use if statement here. My code should be like- if the text starts with one of the date formats then it will discard the date and take the text unlit it gets another date format and stop there and consider text before the date. Also if the text doesnot start with any date format then also it will consider the text and search for the next date to end. I am attaching a file for better understanding. Can anyone help me on this.Attaching also the Regex used. I am not able to put everything together to get the solution in one go [Book1.xlsx|attachment] (upload://7rXmZKsjkYEiwp9B7QYSBWA2WDoc1.docx (522.1 KB) VP.xlsx) (11.8 KB)

Alice_Krebs · September 25, 2020, 3:11pm

Hi @Jhuma

what KNIME version do you work on? What does the output of your Regex extractor look like? I think something went wrong with the files you wanted to attach here
Have you tried the sentence extractor? https://kni.me/n/DliTHxP_U0Ji3Wb0

Jhuma · October 4, 2020, 3:24am

Hi Alice,
I am using Knime 4.1.3. I dnt know how sentence extractor will help. I am attaching the files again. In theexcel file “Text to be extracted” is the text I want out of "Text"
Doc1.docx (524.0 KB)
Book1.xlsx (11.8 KB)
Regards
Jhuma

Alice_Krebs · October 6, 2020, 12:11pm

Hi @Jhuma

I uploaded a workflow that might go into the right direction: https://kni.me/w/BAzZRDqaZSjxS_g_
I have two proposed solutions there, one using Regex, one using the text processing nodes. Hope that helps.

Best,
Alice

system · April 7, 2021, 12:11am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.