Filtering specific rows based on Rule Based Filter (Dictionary)

Dear Knimers,

I have a situation.
I want to read all the sentences in the pdf and filter only those sentences which conatins certain words set of words from a different table. Kindly Help.
I have been to able to reach at some level so far.

Attaching my workflow for reference :slight_smile:

  1. Top Table - Parsed pdf as a single row, created bag of words and trying to extract seletive rows from colomn “MOA”
  2. Bottom table - parse pdf, extract sentence, filter my rows of my choice and then extarct only relevant row.
    Stuck at both places.

Kindly help and also suggest a better way to read a single pdf for certain words if any.

KNIME_PDF Parser MOA.knwf (43.7 KB)

Suhas

Hi @Suhas and welcome to the forum -

I didn’t spend a lot of time thinking about your approach, since I don’t have your original data files to play with, but I think I can at least help with the syntax errors in your rules that are causing the Rule-based Row Filter (Dictionary) nodes in both branches to fail.

Here you need to include some additional escaped quotes, like \", in a few places. Here’s how I modified the expression in the String Manipulation node in your bottom branch, for example:

Does that help?

Hi ScottF

Thanks for your reply.
This solved the error in the rule based filter node. After executing it creates an empty table but.
Attaching the source pdf file. I need to find paragraphs containing the words in the table creator node in the workflow.

Unable to attach pdf file.
below is the link for the same - https://www.gmrgroup.in/pdf/GEPL-MOA-and-AOA-August.pdf

Can you also suggest alternates to get paragrahps from PDF please.

Thanks
SUHAS

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.