Custom PortObjects

Hi,

 

I have developed a custom PortObject to hold a load of records that is not in table format.

My problem is that when running a node with a custom port the org.knime.core.node class's execute method forces the port to make a copy of itself (calling the copyPortObject method and thus invoking the port's serialize/deserialize mechanism).  This is in contrast to a BufferedDataTable where the reference is reassigned (snippet below from org.knime.core.node with my annotations) :-

if (data[i] == null) { // optional input
            inData[i] = null;  // (checked above)

        } else if (data[i] instanceof BufferedDataTable) {

            inData[i] = data[i]; //I want to assign a reference to my object...

        } else {

            //...rather than make a copy of it

            exec.setMessage("Copying input object at port " +  i);

            ExecutionMonitor subExec = exec.createSubProgress(0.0);

            try {

                inData[i] = copyPortObject(data[i], subExec);

            } catch (CanceledExecutionException e) {

                createWarningMessageAndNotify("Execution canceled");

                return false;

            } catch (Throwable e) {

                createErrorMessageAndNotify("Unable to clone input data "

                        + "at port " + i + ": " + e.getMessage(), e);

                return false;

            }

        }</pre>

 

The necessary creation of a copy of the port object presents a huge performance impact for me (particularly in loops!) due to the serialize/deserialize - I'd like to find some way around this, but unless I'm missing something I have to get a copy rather than a reference to the data in the custom port object.

Any advice on how to get round this and/or any chance this feature can be looked at in future. 

Yes, all port objects except for tables are currently copied. This is because some port object implementations modify their input in place so it was considered saver to just clone it. Most, if not all, port types in the KNIME core are relatively small (represent DB query information, a PMML model, an image of few MBs at most, etc) ... so it's ok for them to be copied.

I understand that if a custom port type is large it becomes a bottleneck. One open feature request is to adapt the FileStore to be wrapped in a FileStorePortObject (which currently does not exist but is planned), which then would allow efficient handling of larger (non-table) data items. These changes are planned for 2.9 (end of this year).

Maybe as an alternative, you could look at GraphPortObject (the port object underlying the network mining extension). It's only storing UUIDs and then accessing the objects using a static repository (which adds some cleanup overhead).

Hope that helps!
  Bernd