Dears,
I want to run a simple code by python nodes for 117 M row data, but the process on pyhton node has been failed, As I checked this node is Ok with 400,000 row but for more row is not clear… Would you please elaborate the limition on this node?
I have 32 GIG ram and my heap size is 16 GIG.
Dear @DaveK ,
I found the issue which is related to ram overwrite. I change my code to run for each 1000 row instate of 100 M.
for i in range(0,10):
df=input_table_1[i:i+1000]
df.loc[df[‘CALLED_CALLING_NUMBER’].isin(input_table_2[‘Short_code’]), ‘Fee’]=‘0’
output_table=df
is there any other soultion to run codes on 100 M data in one stage?
If you are working with large amounts of rows, batching is usually a good idea in general. The Python node already does this when the input table get too big (however with a very large default value of 0.5M rows). It is adjustable in the ‘Options’ tab. You could try to run your original code with a lower chunk size, e.g. 0.1M.