simple end loop problem

Dear KNIMEgurus,

I have what seems a stupid question. I want to produce a non-symetrical similarity table of 1300 rows x 160000 columns (or vice versa). Once produced I want the equivalent of a COUNTIF over the rows (or columns) to count how many cells are above a certain threshold. I've got most of this sorted by using a code snippet to update a total for each row as the loop proceeds but I am struggling to close the loop.

If I use a stand loop end then it concatenates the rows with each iteration which is not helpful.

If I use a loop end append column then it adds extra columns with each iteration which causes memory issues.

All I wan tto do is close the loop and take the values from the last iteration, no extra rows, no extra columns. Should be simple but just can't seem to make it happen.

Any ideas?

If there is better way of solving the original problem rather than the specific then please feel free to advise.

Tim

Hi Tim, 

What about something like: Unpivot, Row Filter (by threshold) & GroupBy (to Count)?  As for the unpivot, you if you set it up with the "Skinny" version of your data, it will be easier to configure. 

Regards,

Aaron

Hi Aaron,

Thanks for the simple solution. I knew it must be possible. It is working on a small dataset using the column append loop end but I'm a bit concerned that I end up with a table with >160K columns that unpivots to 1350*160,000 rows. I've increased the memory footprint in knime.ini as far as I can. Do you think that KNIME will cope with this amount of data?

Cheers

Tim

Hi Aaron,

As expected the loop causes a Java heap space error before it completes.

I don't really need all the data that I create - I just need to keep a running total (for each row) of the number of times that a caluclated value is above a threshold. This would be really easy in a traditional programming language, is there no way to do that in KNIME?

At the moment the only option seems to be to write all the data our to disk then reprocess it.

Best

Tim

 

 

I don't know why I didn't think of this yesterday, but this is very close to one of the templates in the Java Snippet node.  Somthing like the code pasted below should do the job I think (and much faster, to boot). 

// The field out_lineTotal is defined below in the table "Output".
out_lineCount = 0;

//The Threshold
double t = 0.5;

// iterate over columns
for (int i = 0; i < getColumnCount(); i++) {
	if (isType(i,tDouble) && getCell(i, tDouble) >= t && !isMissing(i)) {
		out_lineCount += 1;
	}
}

Nice, bookmarked.

Great, I'll try it.

BTW I found that I hadn't previously increased my stack space in knime.ini so did that (now 3g) and the full loop completes giving a table of 130,000 rows by 1350 cols. which is really cool.