I mentioned this is a post inthis topic.
A numpy / pandas like library in Java to avoid the serialization penalty would be an excellent tool to have. As a first step having a java snippet node than can operate on the whole table and not just a single row having the table in a nd4j array (or better if something exists like pandas /numpy combination).
For more advanced users this could prevent having to use complex looping structures which will speed of said workflows orders of magnitudes.
I’m wondering if anything like this is planned?
Creating this thread to get updates on this.
What kind of operations would you want to do in the Snippet / with the matrix?
Basically anything I have to use a Python Snippet which means having programmatically access to the full table.
Yeah that isn’t very concrete but since the python snippet is useful, this would be even more useful as it saves the serialization penalty which sadly is extremely high.
Some recent simple example:
output_table = input_table.stack().groupby(level=0).nlargest(3).unstack().reset_index(level=1, drop=True).reindex(columns=input_table.columns)
(keeps 3 highest columns per row and sets all others to null/NaN, mabye there is a node that can do that I’m not aware off)
That is a one liner and yeah I previously had it done in KNIME using normal nodes but with looping. It was simply too slow with looping.
But examples can be far more complex. It’s mostly either around “column-based ranking” as in this example or needing access to whole table.
The title of this topic had me believe this was more related to matrix operations in the classical sense, thanks for bringing up this example!
I have created a ticket for this and will keep you and this topic posted.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.