[Bug Report] Handling of Numeric Data Type for SQLite Database

arclogo · April 19, 2016, 3:09pm

Background:

SQLite has the possibility to define data columns as 'numeric' type, which can be converted to either integer or real (i.e. float/double) type depending on the content of existing data in a column. Under this context, a read data column SHOULD be defined in KINME as real (i.e. double) type as long as any cell in it has a non-integer format.

Error Situation:

Currently KNIME 'SQLite Connector' combined with 'Database Reader' will determine the integer or double data format for a 'numeric' type of SQLite data column ONLY BASED ON THE DATA IN THE FIRST ROW...

Proof Study Case (Attachment File):

Firstly I created a simly SQLite table reading workflow as follows:

'SQLite Connector' -> 'Database Reader'

Here I used the sample database 'db.sqlite' from example 004001. But before importing, I made the following change to the first row of data table 'data' using an SQLite databse editor:

1. Change the row 1 cell 'sepal_length' value from 5.1 to 5

2. Change the row 1 cell 'petal_length' value from 1.4 to 1

Then I saved the changes and executed the workflow. By looking at the 'Sepc-Columns' sheet of the resulting data table, I can see that both 'sepal_length' and 'petal_length' columns are labeled as 'Number (integer)' instead of 'Number (double precision)' despite the fact that all data, except the first row, are double floats in these two columns...

Bug Scope:

All SQLite connections and queries, may also be relevant for MySQL or PostgreSQL database if they have similar 'data affinity' rules (no exprierence with these two, thus just a wild guess)....

error_case_sqlite.zip

thor · April 19, 2016, 5:07pm

This is a bug in the JDBC driver for MySQL. It returns data types for columns only based on the actual returned rows. For determining the KNIME table structure we don't want to query all rows, because this could take a long time depending on the table size. Therefore we only query the first row and use its data types for the columns.

arclogo · April 20, 2016, 10:47am

@ thor:

Thanks for the explanation...this is also pretty much my assumption...i.e. simplification due to reading time considerations....maybe in future this bug can be mitigated to some degree by sampling more random cells from a column, but that's still not a reliable solution yet...

Anyway I bypassed this bug by force-converting all 'numeric' data columns in my sqlite database into 'real' type, and it works for now. But I do have one new question just out of curiosity:

Where can I change the default digit number after decimal delimiter in KNIME table view? Currently I can only view max of 3 digits after decimal delimiter, but I can see the stored data actually have a much higher accuracy if I convert it into string...So this has to be just a viewing setting, but I can't find where I can change it diretcly...

Thanks in advance!

arclogo · April 21, 2016, 1:47pm

Never mind, I found the solution here:

https://tech.knime.org/forum/knime-general/decimal-point-in-knime

thor · April 21, 2016, 5:22pm

The next KNIME Analytics Platform release in July will have native H2 support. This is much better replacement for SQLite.

Windowsfreak · June 9, 2016, 9:43am

In my case, KNIME totally ignores the table metadata of a SQLite database.

When inserting additional data using the Database Writer, the data type is dermined by the content of the first field. If a value > 2^31-1 is inserted into a column whose first value is below that margin, the writer fails due to incompatible types, no matter what field type is set in the settings or contained in the database (long in this case).

When reading data using the Database Reader, any numeric(30,10) field can be interpreted as Integer (when first value has no fraction and a value < 2^31), Long (no fraction), Double as well as String (first value is NULL), depending on whether the first row contains a fraction or not. When the first row happens to not have any fractions, the subsequent rows will have their numbers truncated.

It would be great if KNIME had taken care of the true datatype, apply it from the schema, make it selectable, use Double in all cases or similar.

thor · June 12, 2016, 11:51am

The DB Writer cannot change the type of columns in an existing database. Therefore it's expected that you get an error when trying to insert a value that is too big for the current data type.

Concerning reading data, this is a bug in the SQLite JDBC driver, as I have already mentioned earlier. It seems that the latest JDBC driver (which will be part of the upcoming 3.2 release) doesn't have this bug any more.