In-Database Analytics

bigdatapg · April 24, 2013, 6:41pm

Is it possible to run in-database analytics for simple aggregates or statistics on an JDBC connected data source? Typically, I'd like to run the Statistics node on 100M+ to 1B row tables handled by appliances (such as IBM Pure Data or Teradata).

If not, is there a way to create templates to generate SQL by customizing the standard nodes to database-specific SQL queries.

For example, get the meta data from a list of table fields and types and generate the necessary SQL code per the field names and types for basic statistics on numeric data.

gabriel · April 30, 2013, 9:04am

KNIME comes along with a whole set of database nodes (see category Database in the Node Repository). All those nodes allow for connecting to a JDBC database, as soon as the corresponding driver is registered within the KNIME preferences. For example, the Database Reader allows connecting to a database and reads in the data from the ResultSet return by the SELECT statement. We also provide so-called Database Connector nodes which work inisde the database and only compose an SQL statement execute at the end of the workflow within th Database Connection Reader node. For writting SQL code, I would also recommend looking into the Database Query node.