Sentence Extractor using Regex

After much thought, I don’t think KNIME is suitable for this kind of work. First, I also ran into issues with Regex Split as well when dealing with the pdf data. So I switched to Python’s regex split (using the Python Script node) which was also not great. Therefore, I switched to using the word document and that worked better with Python’s regex split. If you are familiar with Javascript, then you can also use the Column Expression node to directly write code as well.

Going deeper into each generated section will require more work. Here is what I can provide as a starter in case you want to try:




structured_taxonomy.knar.knwf (115.0 KB)

1 Like