Filtering specific rows based on Rule Based Filter (Dictionary)

Suhas · February 5, 2021, 10:42am

Dear Knimers,

I have a situation.
I want to read all the sentences in the pdf and filter only those sentences which conatins certain words set of words from a different table. Kindly Help.
I have been to able to reach at some level so far.

Attaching my workflow for reference

Top Table - Parsed pdf as a single row, created bag of words and trying to extract seletive rows from colomn “MOA”
Bottom table - parse pdf, extract sentence, filter my rows of my choice and then extarct only relevant row.
Stuck at both places.

Kindly help and also suggest a better way to read a single pdf for certain words if any.

KNIME_PDF Parser MOA.knwf (43.7 KB)

Suhas

ScottF · February 5, 2021, 4:34pm

Hi @Suhas and welcome to the forum -

I didn’t spend a lot of time thinking about your approach, since I don’t have your original data files to play with, but I think I can at least help with the syntax errors in your rules that are causing the Rule-based Row Filter (Dictionary) nodes in both branches to fail.

Here you need to include some additional escaped quotes, like \", in a few places. Here’s how I modified the expression in the String Manipulation node in your bottom branch, for example:

Does that help?

Suhas · February 6, 2021, 5:02am

Hi ScottF

Thanks for your reply.
This solved the error in the rule based filter node. After executing it creates an empty table but.
Attaching the source pdf file. I need to find paragraphs containing the words in the table creator node in the workflow.

Unable to attach pdf file.
below is the link for the same - https://www.gmrgroup.in/pdf/GEPL-MOA-and-AOA-August.pdf

Can you also suggest alternates to get paragrahps from PDF please.

Thanks
SUHAS

system · August 7, 2021, 5:02pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.