KNIME & AZURE FORM Recognizer (Cognitive Services)

Hi all,

I have create on AZURE a custom model to extract information from multiple contracts and it works pretty well. In the next step I would like automatically upload the most recent stack of PDFs to Blob and apply the built model from AZURE. The final output should result in a structures excel file which should be written by KNIME.

Does someone has experience on that topic and can maybe share his workflow to help me out here?

It would be pretty awesome!

KR!

As far as moving files from a local resource to a cloud resource, you might try using the Azure Blob Storage Connector together with the Transfer Files node (making sure to enable the optional input ports on that node).

I’m not sure how you might apply your custom model. Possibly with one of the REST nodes, if Azure supports that?

2 Likes

Hi @s3marube,

I´m currently using azure forms recongizer, but I don’t upload the files to the blob storage. I´m using the prediction API directly. Is there a reason, why you would upload the pdf`s beforehand?

Here is my python code:

import knime_io as knio
import pandas as pd

input_table = knio.input_tables[0].to_pandas()

output_table_1 = pd.DataFrame (columns = ['value', 'type', 'confidence', 'id'])


from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential


endpoint = 'https://germanywestcentral.api.cognitive.microsoft.com'
key = "yyy"
model_id = "xxx"

credential = AzureKeyCredential(key)
document_analysis_client = DocumentAnalysisClient(endpoint, credential)

result_list = list()

for index, row in input_table.iterrows():
    file = row['Location']
    
    with open(file, "rb") as fd:
        document = fd.read()
    
    poller = document_analysis_client.begin_analyze_document(model_id=model_id, document=document, locale='de')
    result = poller.result()
    
    for analyzed_document in result.documents:
        for name, field in analyzed_document.fields.items():
            #result_list.append({'path': file, 'name':name, 'value':field.value, 'confidence':field.confidence})
            result_list.append({'path': file, 'name':name, 'value':field.value if field.value else field.content, 'confidence':field.confidence})

df_new = pd.DataFrame(result_list, columns=["path", "name", "value", "confidence"])

df_new = df_new.astype(str)  

knio.output_tables[0] = knio.write_table(df_new)

You could of course also use the REST-API directly and paste the url to the file:

curl -v -i POST “{endpoint}/formrecognizer/documentModels/{modelID}:analyze?api-version=2022-08-31” -H “Content-Type: application/json” -H “Ocp-Apim-Subscription-Key: {key}” --data-ascii “{‘urlSource’: ‘{your-document-url}’}”

and get the results afterwards:

curl -v -X GET “{POST response}” -H “Ocp-Apim-Subscription-Key: {key}”

Also when you use the blobstorage, you could trigger azure forms recognizer directly from the upload event, but than you would have to store the results somewhere und pull them later.

Hope it helps,

Paul

2 Likes

Hi goodvirus,
thanks for replying me. I’m just kept the part with Blob since I’m more use to work in Azure than in KNIME. I will try out your solution! To be able to write code you have conda installed on your local machine? Is there an free alternative? We are not having Anaconda in our company.

Many thanks in advance!

Hi @s3marube,

you could use the free python enviorment and install the packages with pip, no conda requiert.
But when you using blob anyway, why not go all the way and lets blob trigger azure forms and then write the results into a database. Something like this:
Upload Blob → Trigger Event → Catch with Azure function → Azure Function calls azure forms → store result in cosmos-DB → Get results with knime.
That solution would also scale really well.

Best regards,

Paul

3 Likes

Thanks! It’s a pretty good solution which worked for me.
Appreciate it!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.