simple end loop problem

tcheeseright · October 21, 2014, 10:26am

Dear KNIMEgurus,

I have what seems a stupid question. I want to produce a non-symetrical similarity table of 1300 rows x 160000 columns (or vice versa). Once produced I want the equivalent of a COUNTIF over the rows (or columns) to count how many cells are above a certain threshold. I've got most of this sorted by using a code snippet to update a total for each row as the loop proceeds but I am struggling to close the loop.

If I use a stand loop end then it concatenates the rows with each iteration which is not helpful.

If I use a loop end append column then it adds extra columns with each iteration which causes memory issues.

All I wan tto do is close the loop and take the values from the last iteration, no extra rows, no extra columns. Should be simple but just can't seem to make it happen.

Any ideas?

If there is better way of solving the original problem rather than the specific then please feel free to advise.

Tim

Aaron_Hart · October 21, 2014, 12:42pm

Hi Tim,

What about something like: Unpivot, Row Filter (by threshold) & GroupBy (to Count)? As for the unpivot, you if you set it up with the "Skinny" version of your data, it will be easier to configure.

Regards,

Aaron

tcheeseright · October 21, 2014, 6:14pm

Hi Aaron,

Thanks for the simple solution. I knew it must be possible. It is working on a small dataset using the column append loop end but I'm a bit concerned that I end up with a table with >160K columns that unpivots to 1350*160,000 rows. I've increased the memory footprint in knime.ini as far as I can. Do you think that KNIME will cope with this amount of data?

Cheers

Tim

tcheeseright · October 22, 2014, 11:39am

Hi Aaron,

As expected the loop causes a Java heap space error before it completes.

I don't really need all the data that I create - I just need to keep a running total (for each row) of the number of times that a caluclated value is above a threshold. This would be really easy in a traditional programming language, is there no way to do that in KNIME?

At the moment the only option seems to be to write all the data our to disk then reprocess it.

Best

Tim

Aaron_Hart · October 22, 2014, 5:09pm

I don't know why I didn't think of this yesterday, but this is very close to one of the templates in the Java Snippet node. Somthing like the code pasted below should do the job I think (and much faster, to boot).

// The field out_lineTotal is defined below in the table "Output".
out_lineCount = 0;

//The Threshold
double t = 0.5;

// iterate over columns
for (int i = 0; i < getColumnCount(); i++) {
	if (isType(i,tDouble) && getCell(i, tDouble) >= t && !isMissing(i)) {
		out_lineCount += 1;
	}
}

Ellert_van_Koperen · October 23, 2014, 12:13pm

Nice, bookmarked.

tcheeseright · October 24, 2014, 8:57pm

Great, I'll try it.

BTW I found that I hadn't previously increased my stack space in knime.ini so did that (now 3g) and the full loop completes giving a table of 130,000 rows by 1350 cols. which is really cool.