Unknown data type when converting pandas to KNIME

I have a basic script that generates a column with empty values.

Script:

import knime.scripting.io as knio
import pandas as pd
knio.output_tables[0] = knio.Table.from_pandas(
    pd.DataFrame(
        [[None, None],['1', None]], columns=['First column', 'Second column']
    )
)

When I execute, I get a ? data type in KNIME. Does anyone know how can I enforce a data type (e.g. string) in this conversion?

image

Hello @toscano
Believe it or not (call @takbb), you can arrange it with ‘String Manipulation’ node (Replace column option):

to String:

string($Second column$)

to Integer:

toInt(string($Second column$))

BR

Thanks for your reaction. Indeed, the nodes could convert the type later.

I am however interested in we could do it straight in the Python script, to avoid additional node usage. To give more context, the example script is just a simplified version of the problem. I am building a node that will read different sort of data types, so I cannot always anticipate which columns might present this issue.

I have tried to convert it to string into the python node astype(str) which also does not work.

1 Like

Hello @toscano
Thank you for extend the topic’s context, I would expect the exact circumstances. I personally don’t think that you can avoid automate data frame conversion between technologies or platforms. Let’s see what other user experiences can contribute to this subject.

But for the time being, String Manipulation and String Manipulation (Multi Column) can handle the conversion for the whole DF in one single node.

BR

Maybe specify the data type per column via inserting specific pd series as columns?
See python pandas create dataframe and force multiple column types - Stack Overflow

Best regards
Steffen

Hi @toscano,

I think that what you have here is a pandas/python issue rather than a KNIME one, and I don’t think KNIME can change the way python works.

I don’t even see it as a technology interface/interoperability issue: I think you have a fundamental problem that you are trying to put None into a data frame, but then at the same time want to be able to enforce that the column you have just put None into accepts only Strings.

But that is a contradiction.

None in Python is a datatype of its own, which represents an absence of value
A String cannot be None, because it is an str type.
A None cannot be a str because it is a NoneType

I think that the solution from @gonhaddock provides the KNIME answer. Happy to be proven wrong, but I don’t think there is a python answer to this. An alternative solution, but still requiring a KNIME node is to subsequently use the Column Auto Type Cast instead of String Manipulation.

2 Likes

@toscano
If you adjust it inside the script it should work

Result
res

br

5 Likes

… well tbh, not happy I was wrong, but happy there is a solution :wink:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.