How to get the configure() Routine to examine Table Data

How do I examine the data of an input table from within the configure(final DataTableSpec[] inSpecs) routine?

I am trying to verify that the rows of names in the first input table match the column header names in the second input table. But I can think of several other reasons why I might want to check the table data before running the execute() routine:

  1. It may be illegal to run a node against an empty table so I would want the configure() routine to check how many rows the input table had; or
  2. I might want to add a new column to the output for every row in the input table, so I would need to know how many rows the input table had in order for the configure() routine to create the output DataTableSpec.

If these scenarios need to be handled in the execute() routine, then how do I correctly create the outputTableSpec in advance?

Note that I checked the source code for the Transpose Node as it must need to know how many rows are in the input table in advance in order to define the output DataTableSpec. But the Transpose Node configure() routine only has a single line of code:

    protected DataTableSpec[] configure(final DataTableSpec[] inSpecs)
            throws InvalidSettingsException {
        return new DataTableSpec[1];
    }

 

I don't know how the Transpose Node handles this, but in general, you shouldn't be able to do what you're describing. A Node tries to configure itself as soon as it is connected, even if no data is available. The only things available at that time are the column specs (unless they aren't).

Instead, you might want to look into the Pivot Node, which faces the same problem. The solution is simply to provide no outgoing specs until the Node is executed, and to throw errors if something goes wrong. Not the greatest solution, I know, and being unable to configure Nodes downstream of Pivots, Loop Ends or other Nodes doing this is not perfect, either. But these are things the spec- and processing systems are currently not designed for.

That somewhat cryptic solution for transpose just returns null for the table output. (Reference arrays are initialized to null value.) For this reason you have no information about the output, so this is not a recommended usage unless you really do not know the output table structure (which case this is preferred).