Adding multiple columns based on the number of rows in a second table

karlmsmith · March 26, 2009, 1:35am

I am wanting to append a number of similarity columns to a input table where each column is the similarity to a row in a (second input) reference table. However, I don’t know the number of rows in the reference table at the time of generating the specs for the output table, so I am stuck on this approach.

I resorted to creating a DataCell extension containing an array, and that is working, although there is more work to do to use this new DataCell. But then I noticed the CellSplitter node, where you preprocess the data table to determine the number of column to create. So it appears there is a solution to holding off on defining the output table specs until you have the actual data table in hand.

Can you provide more information? I was having problems following how the preprocessing was happening in CellSpiltterCellFactory.

michael.hecht · March 27, 2009, 2:15pm

Hello,

I think the only way to do it is the JPython Script 2:1 module where you can input two different tables and produce an arbitrary output only dependend on your Phyton skills.

wiswedel · March 27, 2009, 4:30pm

The python nodes are certainly an option from a user perspective but I think the initial post was about writing new nodes to accomplish the task? Some thoughts on that:

If the number of newly appended columns is unknown at the time of configuration (during the configure() call), you can simply return null or an empty DataTableSpec[] array. The node will then be executable, though it will not provide its output spec.
As for the new DataCell class that holds an array of similarity values … you may want to look at collection of cells that are new in v2.0, specifically on ListCell and SetCell (both of which are extensions of DataCell that keep a collection of cells).
There is a project on http://labs.knime.org/distance-matrix, which allows the processing of distance (or similarity) values. It also defines a new type “DistanceVector”, which represents distances to other rows in a table. The assumption here, however, is that all records are in the same table, whereas you have similarity values from elements in one table to elements in another table?

Hope you find any of these comments useful. Bernd

karlmsmith · March 27, 2009, 7:48pm

The null or empty DataTableSpec I think is the answer I was originally looking for. But the ListCell may be a better solution to the problem. (At least keep a cleaner table.) I need to look into what nodes are available that work with ListCell columns.

And yes, the references are from a separate table.

Thank you,
Karl