Parallel Execution

Ashok121 · November 22, 2023, 4:27pm

Hello Knime community,

I have 1 million rows in csv file reader
Running python scripit on entire 1 million rows is taking time so much
Currently I am doing using row filter,
Is there any possibility of parallel execution in knime
If so can anyone share me the workflow how it works

Thanks

takbb · November 22, 2023, 5:46pm

Hi @Ashok121 , to help get a feel for the “scale of the pain”, what chunk size are you using in the loops?

Also, I’m guessing the python is performing some specialist or complex operation or something not easily performed using the core nodes. What kind of operation is it performing, and how long does one iteration of a loop currently take?

Trivial, I know, but is there a reason why the Column Rename has to be inside the loop rather than performed once prior to the loop start? (Every little helps )

Ashok121 · November 23, 2023, 4:56am

Thank you @takbb for your response I will move column rename
Main aim of python scripit is to extract the value

Python scripit node is extracting value from text and classifies the text weather it’s True positive or False positive.

Total 1 million rows
Total 4 chunks
Each Chunk size is 0.25 millions rows

One iteratation it’s taking around 2 hour 45 minutes

Is there way for parallel execution

shajooae · November 23, 2023, 5:54am

Hi Ashok,

The Row filter component has lot of variables which takes good amount of time to sort the large number of rows and select our preferred particular value (Eg: $${ICounter}$$) from dimension. Best of luck.
-Shajahan

takbb · November 23, 2023, 8:03am

Hi @Ashok121 ,
I don’t quite understand.

Do each of the 4 branches handle .25m rows?

By chunk size, I was referring to the setting on the chunk loop. If the “chunk size” is also .25m, then what is the purpose of the loop, as that would mean it’s a single iteration…?

Sorry if i missed or misinterpreted what you are saying.

Ashok121 · November 23, 2023, 11:11am

Total number of lines 1 million
Splited 1 millions to 4 parts (0.25 million) using row filter
Later using chunk loop start iterating 10,000 lines each time

system · February 21, 2024, 11:11am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.