Failure in python node

natanzi · January 7, 2020, 7:03am

Dears,
I want to run a simple code by python nodes for 117 M row data, but the process on pyhton node has been failed, As I checked this node is Ok with 400,000 row but for more row is not clear… Would you please elaborate the limition on this node?
I have 32 GIG ram and my heap size is 16 GIG.

BR,
Milad

DaveK · January 7, 2020, 12:31pm

Hi @natanzi,

could you post the exact error message you are experiencing? Also, could you maybe attach your KNIME log file?

Cheers,
David

natanzi · January 7, 2020, 1:44pm

Dear @DaveK ,
I found the issue which is related to ram overwrite. I change my code to run for each 1000 row instate of 100 M.

for i in range(0,10):
df=input_table_1[i:i+1000]
df.loc[df[‘CALLED_CALLING_NUMBER’].isin(input_table_2[‘Short_code’]), ‘Fee’]=‘0’
output_table=df
is there any other soultion to run codes on 100 M data in one stage?

BR,
Milad

DaveK · January 7, 2020, 2:00pm

If you are working with large amounts of rows, batching is usually a good idea in general. The Python node already does this when the input table get too big (however with a very large default value of 0.5M rows). It is adjustable in the ‘Options’ tab. You could try to run your original code with a lower chunk size, e.g. 0.1M.

Cheers,
David

natanzi · January 7, 2020, 2:13pm

@DaveK,
Thanks, I have change the default value and its work.

BR,
Milad

system · January 14, 2020, 2:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.