KNIME to Python Exporter: Roadmap for Additional Node Support

Dear KNIME users,

I want to thank you for using k2pweb and knime2py and share news from

During the last two months, I collected workflow profiles and web service usage logs. This helped me fix one race-condition issue in the platform and gather statistics about node usage in different projects. Now I have data showing which nodes are used most frequently, so I can focus on supporting them.

I created four tasks in the project’s task tracker. Each task contains five nodes that I need to implement in the release. These are the most frequent nodes used by you that are not yet supported by knime2py.

Here is the list: KNIME factory for the node, and the number of times it was used.

org.knime.base.node.preproc.groupby.GroupByNodeFactory, 171
org.knime.base.node.preproc.joiner3.Joiner3NodeFactory, 97
org.knime.js.base.node.quickform.filter.value.ValueFilterQuickFormNodeFactory, 94
org.knime.base.node.preproc.constantvalue.ConstantValueColumnNodeFactory, 82
org.knime.base.node.rules.engine.RuleEngineFilterNodeFactory, 60
org.knime.base.node.preproc.rename.RenameNodeFactory, 39
org.knime.base.node.preproc.filter.row.RowFilterNodeFactory, 38
org.knime.database.node.io.reader.query.DBQueryReaderNodeFactory, 37
org.knime.database.extension.postgres.node.connector.PostgreSQLDBConnectorNodeFactory, 36
org.knime.base.node.preproc.colcombine2.ColCombine2NodeFactory, 32
org.knime.base.node.preproc.pivot.Pivot2NodeFactory, 29
org.knime.base.node.preproc.columnresorter.ColumnResorterNodeFactory, 26
org.knime.base.node.preproc.unpivot2.Unpivot2NodeFactory, 26
org.knime.expressions.base.node.formulas.FormulasNodeFactory, 12
org.knime.time.node.calculate.datetimedifference.DateTimeDifferenceNodeFactory, 10
org.knime.time.node.manipulate.datetimeshift.DateTimeShiftNodeFactory, 8
org.knime.base.node.preproc.cellsplit2.CellSplitter2NodeFactory, 7
org.knime.python3.scripting.nodes2.script.PythonScriptNodeFactory, 7
org.knime.base.node.preproc.duplicates.DuplicateRowFilterNodeFactory, 6
org.knime.base.node.preproc.sorter.SorterNodeFactory, 5

Thank you for your participation.

P.S. If the node you need is not in the list, please contact me directly here or on Instagram. I will include support for your node in a future release.

3 Likes

Hi @VitaliiKaplan,

thanks for sharing your project. I really like the idea!

Since the other thread KNIME to Python exporter - #10 has been closed, I’ll try to continue with the discussion on the data processing here.

@Add94: Thanks for sharing your insights on the data processing backends. Instead of going for DuckDb, I’d recommend going either for Ibis or Polars.

DuckDb might provide a slightly superior performance, but the reason why Python+Pandas became so popular also applies here: Having a clean and easily understandable API to data structures is much more important than leveraging the last 0.1% of computational performance.

Polars API is close to Pandas API, thus everyone from novice to data processing specialist will feel more confident with using Polars than DuckDB. SQL is only used by few data scientists, especially when looking at the typical user base of Knime.

Furthermore I’ve got the feeling that Polars is evolving faster than DuckDB, especially by moving towards distributed computing.

Best regards,

Johannes

Sorry, on LinkedIn :o)

Good idea.

First, I need to learn what the Polaris API is. Technically, since my development process is now strongly supported by AI code assistant, adding a new API may not be too difficult.

However, I think it is better to implement exporters for more nodes first. As you can see, users of the web service requested Python code for the not-yet-supported “GroupBy” node 171 times during the last two months. So, I am going to focus on those nodes first.

I have added a task about it to the task-tracker.

1 Like

Thanks a lot Johannes! Ultimately it’s all down to a preference - some would claim (myself included :slight_smile: ) that SQL is the universal data language, some would say that dataframe API approach is superior. Ibis is a sort of a middle ground as it tries to please both groups by introducing a swappable execution back-end philosophy. One thing I would be reluctant to agree with is the development roadmap claim (“polars evolving faster”). Duckdb has a huge ecosystem of extensions and community contributions , invests time and resources into their own lakehouse format. Polars mostly develops by their core team, focuses on their core API improvements , GPU execution and their cloud offering - very different philosophies. I’d argue duckdb is much more versatile when it comes to use cases. Polars is also less mature in the sense of how robust the engine is (out of core processing, disk spilling etc) - those are not 0.1% improvements, it’s either you process a query or hit an OOM error. Polars is nowhere near the capability of processing a TB query on a standard laptop. I understand this might be a ‘tail’ use case vs a typical data scientist workload though.

In the end, knime2py is a great project and even being able to export workflows to pandas would be huge! This is a great contribution to the community, thank you for taking on this project @VitaliiKaplan

1 Like

I agree that SQL is the universal data language and everyone should be able to use it.

But reality hits hard: Only few of those working with data really know how to write SQL, while many understand the “more expressive” Pandas/Polars APIs.

Ok, thanks a lot for your insights! That’s quite interesting to read and sounds reasonable. Maybe I’ll have another shot at DuckDb.