Belated Happy New Year. :-)
Is there a way to "tweak" the behaviour of the methods implemented in the item set finder node? As for examples, I've noticed that crunching my data set on K2.3.0 (Windows), apriori runs out of memory and fails, when it could get a quite substantial piece of harddisk. By contrast, SaM on the same data set makes intense use of harddisk, but it neither fails when running out of disk space, nor does it let me point it to a bigger drive.
EDIT: Using multiples cores would be nice, too. :-)
Any thoughts or helpful suggestions would be appreciated.
the temporary input and output of the algorithm is written to the default temp directory of your pc which is retrieved by System.getProperty("java.io.tmpdir"). You can change the default temp directory by adding the system property -Djava.io.tmpdir=C:\mydir to the knime.ini file in your KNIME installation folder. On windows you can also change the default temp directory by changing the TMP environment variable.
You might also want to try other algorithms like the FPGrowth algorithm which uses less memory and is usually much faster then the other integrated item set algorithms.
In addition you can find a nice introduction to frequent pattern mining and a description of most of the integrated algorithms at http://www.borgelt.net/courses.html#fpm.
Thanks Tobias, very useful hints indeed!
Christian's slides had actually stimulated me to embark on this exploration, but I'm afraid I didn't read further concerning practical usage recommendations. That's how I got stuck with close to 3 days of SaM crunching using way beyod 1 TB of disk space, I guess... ;-)