Automation of information extraction from labolatory reports

Hello Knime Community!
I need help, I want to extract specific information from lab reports. Train a model to identify such specific information and extract it whenever a lab report in loaded. The information to be extracted from PDF documents is the model number, test standard, range of radio frequency of operation, and the maximum radiated power. A sample document is shown below.
[Report]RED(RF)[300330]_A216A-293_LG Electronics Inc._PM22GN.pdf (2.7 MB)
I have been able to use the PDF Parser, followed by the Document Data Extractor and finally the Document Viewer in my workflow. However, I now dont know how to extract only the information that I want. I have many lab reports but they do not have the information at the same position. So I want to train a model to identify such information and automate the extraction of that information from lab reports.
Please kindly help, thank you.
Regards,
Kuddy

If you have reports with different formats it would be helpful if you could upload several examples. Also, can you share your workflow?

Thank you for the response! Please find below the snippet of the workflow that I have built.


Regards

Some actual data and your workflow would be helpful. Can’t do much with a screenshot. Also, I can’t find exact matches for “The information to be extracted from PDF documents is the model number, test standard, range of radio frequency of operation, and the maximum radiated power.” in your sample PDF. You know your data but Knimers don’t.

Thank you once again for your assistance. Please find below the link to the workflow of what I have implemented so far.
knime://My-KNIME-Hub/Users/kuddybest/Private/KNIME_project_tutorial%201

For the pdf documents I failed to highlight the information that I need to be extracted. However, I can indicate to you which information and on what page for the sample report below. 1. Model - MGU21 APN ( page 1). 2. Standard - EN 300 328 V2.2.2 (page1). 3. Operating frequency range - 2402MHz - 2480MHz (page 14). 4. Power e.i.r.p - 3.8dBm (page 16 inside the first table).
65408RRF001A1s.pdf (1.7 MB)
Some of the pdf documents are failing to load here. Is it possible to share through google drive?
Thank you for your help!

@kuddybest you could try these examples how to extract text from pdf files:

Thank you for your assistance! However, the nodes written in R are failing to run. In the first top left node I configured it to read from the local folder in my laptop, see the image below.

I dont know what else should I do to make those nodes to run.
Regards,

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.