I have a situation.
I want to read all the sentences in the pdf and filter only those sentences which conatins certain words set of words from a different table. Kindly Help.
I have been to able to reach at some level so far.
Attaching my workflow for reference
- Top Table - Parsed pdf as a single row, created bag of words and trying to extract seletive rows from colomn “MOA”
- Bottom table - parse pdf, extract sentence, filter my rows of my choice and then extarct only relevant row.
Stuck at both places.
Kindly help and also suggest a better way to read a single pdf for certain words if any.
KNIME_PDF Parser MOA.knwf (43.7 KB)
Hi @Suhas and welcome to the forum -
I didn’t spend a lot of time thinking about your approach, since I don’t have your original data files to play with, but I think I can at least help with the syntax errors in your rules that are causing the Rule-based Row Filter (Dictionary) nodes in both branches to fail.
Here you need to include some additional escaped quotes, like
\", in a few places. Here’s how I modified the expression in the String Manipulation node in your bottom branch, for example:
Does that help?
Thanks for your reply.
This solved the error in the rule based filter node. After executing it creates an empty table but.
Attaching the source pdf file. I need to find paragraphs containing the words in the table creator node in the workflow.
Unable to attach pdf file.
below is the link for the same - https://www.gmrgroup.in/pdf/GEPL-MOA-and-AOA-August.pdf
Can you also suggest alternates to get paragrahps from PDF please.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.