I am wanting to append a number of similarity columns to a input table where each column is the similarity to a row in a (second input) reference table. However, I don’t know the number of rows in the reference table at the time of generating the specs for the output table, so I am stuck on this approach.
I resorted to creating a DataCell extension containing an array, and that is working, although there is more work to do to use this new DataCell. But then I noticed the CellSplitter node, where you preprocess the data table to determine the number of column to create. So it appears there is a solution to holding off on defining the output table specs until you have the actual data table in hand.
Can you provide more information? I was having problems following how the preprocessing was happening in CellSpiltterCellFactory.
I think the only way to do it is the JPython Script 2:1 module where you can input two different tables and produce an arbitrary output only dependend on your Phyton skills.
The python nodes are certainly an option from a user perspective but I think the initial post was about writing new nodes to accomplish the task? Some thoughts on that:
- If the number of newly appended columns is unknown at the time of configuration (during the configure() call), you can simply return null or an empty DataTableSpec array. The node will then be executable, though it will not provide its output spec.
- As for the new DataCell class that holds an array of similarity values … you may want to look at collection of cells that are new in v2.0, specifically on ListCell and SetCell (both of which are extensions of DataCell that keep a collection of cells). This type is used by, for instance the “Create Collection Column” and the “Split Collection Column”.
- There is a project on http://labs.knime.org/distance-matrix, which allows the processing of distance (or similarity) values. It also defines a new type “DistanceVector”, which represents distances to other rows in a table. The assumption here, however, is that all records are in the same table, whereas you have similarity values from elements in one table to elements in another table?
Hope you find any of these comments useful. Bernd
The null or empty DataTableSpec I think is the answer I was originally looking for. But the ListCell may be a better solution to the problem. (At least keep a cleaner table.) I need to look into what nodes are available that work with ListCell columns.
And yes, the references are from a separate table.