Perfomances : Cache or mysql ?

keru · March 6, 2012, 4:36pm

Friendly greetings !

I run KNime desktop on my quadcore/8GB Desktop PC, i installed a mysql 5.5 on the same computer that run KNime.

I'm working with a dataset of approximatively 50 Millions line * 10 columns. (some kind of highly customised acceslog with a lot of missing data).

I couldn't find any documentation about the cache node.

Should i use the cache node instead of doing a "SELECT * From mytable;" using the database reader ?

Should is use the database reader when i want only some columns ?

I guess using a "where" sql query is faster than reading from cache + KNime filter ?

Well... many questions in one : "when should i use the cache node ?"

EDIT : or a table reader/writer ?

Thank you very much :)

dmt · March 6, 2012, 5:01pm

You should definitely try to do as much preprocessing in the database as possible. You can either accomplish that by performing the selection and filtering directly in your SQL statement or by using a combination of "Database Connector", "Database Column Filter", and "Database Row Filter". They all operate directly on the database. The "Database Connection Reader" then pulls the preprocessed data into KNIME.

The "Cache" node only caches all input data onto disk which might speed up iterating over data transformed in KNIME. You can find details and an example in the node description:
"Caches all input data (rows) onto disk. This is particularly useful when connected to a preceding node, which performs a column transformation. For example, a filter column node, which "hides" most of the columns to the output. This node only caches the data that is actually contained in the input table. Iterating on it may be considerably faster than accessing the table of a wrapper node."

keru · March 6, 2012, 5:09pm

So the database is my friend \o/

Thank you :)