API Post Request with File or Python Script Node

Hello Knime Community,

I have seen several posts with a simliar question, but was not able to get a solution.
We are using an external service called Mindee, where documents are processed and extracted data from the document is sent back.

Was someone in the meantime able to get a working solution?

A other way I tried was with the new Python Nodes. But the error message I received was:

Execute failed: TypeError: expected str, bytes or os.PathLike object, not ArrowSourceTable

Source Example:

from mindee import Client, PredictResponse, product

# Init a new client

mindee_client = Client(api_key="my-api-key-here")

custom_endpoint = mindee_client.create_endpoint("my_endpoint", "my_user")

# Load a file from disk

input_doc = mindee_client.source_from_path("/path/to/the/file.ext")

#Load a file from disk and parse it.

# The endpoint name must be specified since it cannot be determined from the class.

result: PredictResponse = mindee_client.parse(product.CustomV1, input_doc, endpoint=custom_endpoint)

# Print a brief summary of the parsed data

print(result.document)
>

When I use the os Modul in the node to get the files.The following Error ocurred:
KnimeUserError: Output table ā€˜0’ must be of type knime.api.Table or knime.api.BatchOutputTable, but got <class ā€˜mindee.parsing.common.document.Document’>

So I need to get the result.document into a Knime-Table.

Thanks and have a nice day,
Sven

@sven-abx the code is not good to read, maybe try another format

Then: you will have to convert the data into a pandas dataframe and subsequently into an arrow table to get the content back to knime:

3 Likes

Thanks for your reply.
I adapted the Python Node and have now a Pandas Dataframe, but currently I’m not able to pass just one file thru: input_doc = mindee_client.source_from_path("/path/to/the/file.ext")

BR,
Sven

@sven-abx maybe you could elaborate on this issue. What is the error message. Have you adapted the path?

The error message is: TypeError: expected str, bytes or os.PathLike object, not Series

I wrote the data into a variable:
image

Source:

processed_table = knio.BatchOutputTable.create()

i = 0
for batch in knio.input_tables[0].batches():

    input_batch = batch.to_pandas()
    input_batch['index_column'] = input_batch.index

    for index, row in input_batch.iterrows():
    
        file_to_process = input_batch.iloc[[i]]
        print(i)
        input_doc = mindee_client.source_from_path(file_to_process)
        #input_doc = mindee_client.source_from_path(input_batch.iloc[[i]].Path)
        #print(input_batch.iloc[[i]])
        
        i += 1
        #print(i)

I also tried it in Python Legacy Node:
Source:

for every_file in files_to_process.iterrows():

    file_to_check = Path(str(files_to_process.iloc[[i]]))
    print(file_to_check)

    i += 1

then when I execute:
input_doc = mindee_client.source_from_path(file_to_check)

the following error occures:


[Errno 2] No such file or directory: '                                                   Path\nRow0  G:\\Meine Ablage\\Projekte\\KNIME...'
Traceback (most recent call last):
  File "<string>", line 23, in <module>
  File "C:\Users\Sven\.conda\envs\Mindee\lib\site-packages\mindee\client.py", line 460, in source_from_path
    input_doc = PathInput(input_path)
  File "C:\Users\Sven\.conda\envs\Mindee\lib\site-packages\mindee\input\sources.py", line 250, in __init__
    self.file_object = open(filepath, "rb")  # pylint: disable=consider-using-with
FileNotFoundError: [Errno 2] No such file or directory: '                                                   Path\nRow0  G:\\Meine Ablage\\Projekte\\KNIME...'

file_to_check:
Row0 G:\Meine Ablage\Projekte\KNIME...

have you tried printing the intermediate dtypes to identify the pandas series error location?
br

1 Like

thanks for your reply @Daniel_Weikert

file_to_check, the variable I like to use is as follows:
FileNotFoundError: [Errno 2] No such file or directory: ’ Path\nRow0 G:\\Meine Ablage\\Projekt\\KNIME...'

So, the Problem is Path\nRow0 which is included.

It does look like the ā€œfile_to_checkā€ does contain a Row0 and a path in one cell. This is being provided to the code and this will not work because ā€œRow0ā€ will not make a lot of sense.

What is it that these information is coming from? Is it possible to just provide the path.

@mlauber71 the information is comming from a List/Files Folders Node.

My goal is…

  1. Send every file to the api endpoint
  2. Retrieve the OCR data of the file
  3. Use the data in the following nodes

It looks like that this Row0 is the index. But is also there, when I do:
file_to_check = Path(str(files_to_process.Path.iloc[[i]])) where Path is the Path of the List Files/Folders Node
or
file_to_check = Path(str(files_to_process.Location.iloc[[i]])) where Loction is a string of Path to String Node.

In the Python Node I use: files_to_process = input_table_1.copy()

Currently used Code:

from mindee import Client, PredictResponse, product
from pathlib import Path
import os

# Copy input to output
files_to_process = input_table_1.copy()

# Init a new client
mindee_client = Client(api_key="....")
custom_endpoint = mindee_client.create_endpoint("endpoint", "user")


i=0
for every_file in files_to_process.iterrows():
	
	print(every_file)
	file_to_check = Path(str(files_to_process.iloc[[i]]))
	print(file_to_check)
	
	i += 1

    # Load a file from disk
	input_doc = mindee_client.source_from_path(file_to_check)

What does input_table_1 look like?
iterrows normaly creates tuples. I would assume every_file[1] contains the data? but it’s probably easier to debug with a sample for people in the forum to help.
Without seeing the data i just guess you could try to use

files_to_process.iloc[i, 0]

Hope someone else can do a better support here
br

1 Like

Is a Pandas Dataframe. And every_file is a tuple.

@sven-abx I assume you would need just a string with a path and not a tupel. Maybe you ask ChatGPT what the right syntax for your case is.

You might have to loop thru your dataframe using index and row and then within the loop the syntax would be like

row["path"]

like in this example

3 Likes
for index, row in files_to_process.iterrows():
	
	file_to_check = Path(row['Path'])

I asked ChatGPT as you mentioned and a working solution is shown above. @mlauber71 is mark your post as solution.

Thanks for the support and patience.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.