Python Script: avoid auto-cast to int

kienerj · February 10, 2022, 12:54pm

Python script node is casting a string column to integer in the dataframe inside the script. However it^s important to keep it as string due to the genius design of the IDs having leading zeroes which matter.

How can I avoid this automatic cast?

goodvirus · February 10, 2022, 1:23pm

Hi @kienerj,

what python scripting note are you using, because I testet it with the default one (not labs) and it still keeps the string type and the leading zeros.
Could you share a small workflow?

Best Regards,

Paul

kienerj · February 11, 2022, 5:08am

the non-labs one as well. it’s possile a certain step leads to this issue and not the initial loading into python. have to check.

EDIT: a simple pass-through triggers the issue:

output_table_1 = input_table_1.copy()

I can also verify that the input table already has it wrong with simple print statements.

Windows 10, Knime 4.5.1

goodvirus · February 11, 2022, 5:35am

Hi @kienerj,

Strange I roll the same setup.
Could you Share a example Workflow?

Thanks,
Paul

kienerj · February 11, 2022, 5:39am

I can reproduce with the most trivial example.
Leading Zeroes.knwf (7.0 KB)

I’m using the columnar table backend. Or else it could be regional settings…

goodvirus · February 11, 2022, 8:47am

Ok I don’t know why this does not work with your configuration.
I tried with Column Backend and without and with all 3 different serialization option.
Still it keeps the column type as string in the output…
What python version are you using?

Maybe someone else can help?
As a workaround you could append the zeros after the python node, if there is a fixed (which I assume) length for the id`s

Best regards,

Paul

kienerj · February 11, 2022, 1:34pm

I’m using python 3.9.7, pyarrow 6.0.1, pandas 1.3.3

EDIT:

And I found the issue:

Serialization was set to: csv (experimental). If I switch to Arrow it works fine.
Not sure why it was on csv. I usually used arrow before.

system · May 12, 2022, 1:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.