Differences using RowIterator in DataTable objects (KNIME 2.7.4 vs KNIME 2.10.1)

Hi,

I am experiencing a weird situation using RowIterator to loop over tables which implement DataTable interface.
In KNIME 2.7.4, the code in [1] (which basically loops over a table trying to find and save the RowKey of specific rows) was designed and implemented. It worked reasonably well.
Currently, I have moved to KNIME 2.10.1 and  the same code takes much more time (10 fold approx.).

I am trying to find out the reason inside my own code, but I am not able to find out what has changed. In addtiion the lines:

this.getSelectedIndividuals().add(dataRow.getKey());
this.updateTargetStats(targetValue);

basically update the content of java 'Set' objects.

I am aware that is very unlikely, but has anybody experienced low performance in loop operations (using the same loop structure) ? How could I find the root of the problem ?

Thanks in advance

Oscar

[1]

 for (final DataRow dataRow : this.getDataTable()) {
            
            final String targetValue = ((StringValue) dataRow
                    .getCell(targetColIdx)).getStringValue();
            
            if (pInclude && (pFold == pFoldDistribution[idx])) {
                this.getSelectedIndividuals().add(dataRow.getKey());
                this.updateTargetStats(targetValue);
            }
            idx++;
        }

 

 

Hi Oscar:

My best guess is that there is a different memory situation. The iterator that you are getting now is backed by a file whereas it was  an in-memory list previously in 2.7. This is completely transparent to the implementation - it's just a "RowIterator" after all.

In KNIME 2.8 we added some memory observer that operates magically in the background and monitors heap/memory usage. If memory gets low at any point it sends a signal to all tables and instructs them to swap out to disc (again, completely transparent to the node implementation and the end user).

Depending on what else is happening in your code (how often do you iterate the input table?)  you may want to add a cache node in front of it the node or cache the data when you first iterate over it.

Hope that helps!

Regards,
 Bernd

PS:   You may want to check for missing values in this block:

final String targetValue = ((StringValue) dataRow.getCell(targetColIdx)).getStringValue();

 

Hi Bernd,

thanks for your message. Your proposal works nice, I placed a 'Cache' Node just before the problematic node and now it works as before (2.7.4). (In this node I have to iterate the input table often).

Thanks also for your piece of advice in checking the missing values. I'll add this piece of code.

Oscar