Hello all, in our environment we often have the use case, that incoming data is prepossessed in Knime and then later analyzed using python. Now I know that you can run python within knime but that is not exactly what we need. In our case the data needs to be exported from knime and then analyzed on a different machine with python. The question now is how to do that efficiently? Saving the data as CSV is not optimal since it doesn’t store the column type. I know the best way would probably be a dedicated database but if that’s not an option what would you recommend to use? Thanks a lot!

@TheLeo you could use Parquet to keep the column types. Or SQLite database: [image] use the new (KNIME 4.6+) Python Script node and bundled Python version to... use the new (KNIME 4.6+) Python Script node and bundled Python version to read Parquet file into KNIME, export it again, put it into SQLite database and read i… ORC could also be an option but it might be more complicated. Both ORC and Parquet would allow to read and write data in chunks. [image] Read ORC file into KNIME's Python node – mlauber71 Read ORC file into KNIME's Python node If you want to transfer several files from outside sources into the Python environment and not loose column types ORC is… If you absolutely must have a text only file. ARFF is another option: [image] use the new (KNIME 4.6) integrated Python extension to read and write ARFF... use the new (KNIME 4.6) integrated Python extension to read and write ARFF file into KNIME, export it again as Parquet, put it into SQLite database and read it… If you want to explore more about exchange between KNIME and Python you might have a look here: [image] Use functions and Python code from an external .PY file or a Juypter notebook... Use functions and Python code from an external .PY file or a Juypter notebook in KNIME's Python Source node This workflow demonstrates how modules in (external… [image] UnicodeEncodeError: 'charmap' codec can't encode characters KNIME Analytics Platform @JayR without having checked this out I would see these ways forward make sure the file in the jupyter notebook really is a pandas dataframe without any funny names or data types (first) experiment with the formats for communicating between python and knime (In the preferences) store the data from the jupyter notebook as csv or parquet file and load it back (you will have to deal with all the path things). Not very elegant but it might just work I was planning on expanding this example to sho…

Export data from KNIME for processing in Python (externaly)

KNIME Analytics Platform

mlauber71 September 23, 2022, 7:44am 2

@TheLeo you could use Parquet to keep the column types. Or SQLite database:

ORC could also be an option but it might be more complicated. Both ORC and Parquet would allow to read and write data in chunks.

If you absolutely must have a text only file. ARFF is another option:

If you want to explore more about exchange between KNIME and Python you might have a look here:

2 Likes

Column Expression: Dynamically Extract and Define Type

Use "Excel Writer", but got 2 worksheets instead of 1

Performance / speed issue with pandas 1.5.0