Database sampling

Hi everybody,
I successfully made a JDBC connection to my database using KNIME. Now I’d like to make a stratified sampling, for extracting a defined rows number. How can I do? I was able to execute sampling only on data read with ARFF reader, I wasn’t able to make it on data read from a db…

I hope you can help me
Cheers
Carmelo

Stratified sampling can be done with the Row Sampling node. Unfortunately, stratified sampling in databases is not trivial and there is no node in KNIME which does the stratified sampling directly within the database. Thus, you have to use the Database Reader node, read in the whole table and connect it to the Row Sampling node, which then applies the stratified sampling. Best,
Fabian

My db has more than 1 million records. Do you think is it possible to use the Database Reader node, read in the whole table and connect it to the Row Sampling node?

1 million records should be no problem since KNIME has a smart data buffering mechanism, i.e. the data doesn’t have to reside in main memory but is written to the hard drive if the table gets too large. This means that the amount of data you can process with KNIME is only limited by your hard drive space.
Give it a try and let us know.

I successfully read the whole database using the Database Reader node and then sampled it using the Row Sampling node :slight_smile:

After installing some plugins using the KNIME update sites downloaded as a zip file, I’m not able to read my database table using the Database Reader node. I’m using KNIME SDK version 2.0.

  • Using the sun.jdbc.odbc.JdbcOdbcDriver driver, I get this error:
    “ERROR Database Reader Execute failed: Operation not yet supported”

  • Using the com.mysql.jdbc.Driver driver, I get this error:
    “ERROR Database Reader Execute failed: java.lang.OutOfMemoryError: Java heap space”

I hope you can help me solving this problem
Cheers
Carmelo

The first problems seems to be related to databases which do not support all functions to retrieve data. Is it possible that you change the log level to DEBUG (go to KNIME preference page) and send me the whole stacktrace from the KNIME Console view. The same holds for the second problem, I am currently trying o reproduce the problem…
Thanks, Thomas

It seems to be a problem with the data type of the last column in my table. It is a varchar(1). Chaning it to int or to char(1) it works. How come? Maybe a bug?

Is the Memory or the UnspportedType Error related to this issue, varchar vs. char? Both types are read in the same way from the ResultSet using getAsciiStream which is very likely not supported by all database, but which - as I understood it - is the most generic one for char-type columns.