File Reader/database reader questions

Hi,
I just had a look at this new version of KNIME, and tried using the file reader but encountered a few problems :

1)When opening a tab delimited file with the default parameters, everything is fine.
But when I choose the column delimiter and scroll down the preview, an erreor message appears in red :
"Too few data elements (line: 65 (Row64)), source ‘file:/C/Doc…’ The number of column was determined by the entries above line no.4"
Actually, the Row ID of line 65 is “ERROR_ROW (Row64)” and I cannot scroll down the preview anymore.

The file used contains tab delimited data, but also some desciption lines before and after the data, as well as some empty lines. Is it the explanation ?

2)I haven’t been able to change the missing value pattern. When I switch it from ‘?’ to ‘’ or anything else, the preview still displays ‘?’. Why ?
And is there a way to define a missing value pattern for the whole file and not only for one column (which can take some time to change when there are lots of columns…) ?

3)Using the database reader for oracle, is there a way to indicate that numeric columns are not floats, for example ID columns ? This can be quite annoying in some cases.

4)In the former version of knime, the database connector did not exist. What is its use and difference with using the database reader ?

Thanks a lot for your answers !

  1. You might have noticed the “Advanced” button in the File Reader’s dialog. Click on it, find the tab “short lines” and be sure to check the “Allow short lines” checkbox. This should help.

What do you mean by description lines? You can enter a “Single line comment” to ignore these lines. If, for example, all description lines start with “#” you can enter here a “#” in order to ignore them.

  1. A missing value is always displayed as a “?”. The missing value pattern option lets you define what is recognized as a missing value. For example in some cases a “-1” is actually a missing value, or “NULL”. In these cases you can enter “NULL” as a missing value pattern and as soon as the File Reader node finds an entry with “NULL” it will represent it as a missing value, i.e. display it with “?”.

So far, there is no option or node to enter a missing value pattern for the whole table - but it is on our feature request list (since it is very helpful for database imports).

  1. The difference between the nodes with database ports (like the Database Connector, Database Query, etc.) is that they do not pull the data into KNIME, but operate directly on the data in the database instead, i.e. they subsequently generate an SQL query (or a view in the database if you checked that option in the “Table options” tab. The Database Connection Reader then imports the prepared data into KNIME

Thanks for these answers.
The 1st explanation worked perfectly. The description lines are just a few comments on the data contained in the file, before and after the data, not preceded by any special character. So I guess lines that do not contain valuable data have to be preceeded by a ‘#’.

I guess you forgot my third question :
3)Using the database reader for oracle, is there a way to indicate that numeric columns are not floats, for example ID columns ? This can be quite annoying in some cases.

Thanks a lot !

Hi, just to give you a quick answer on comment 3: Within the Database Reader node the column types can’t be changed manually, they are always read as returned by the database meta data. The ID column(s) are also not treated separately. But in order to change column types you can append the Number To String, Rename, and/or Row ID node. Hope this helps, otherwise can you please give a short example how the outcome of the Database Reader looks like and what are the column types (of the ID columns) defined in your oracle database. Best, Thomas


Here is an example using the following request :

select * from ORGANISM

With any SQL editor (Toad, sqlplus…), the result is :

| ID | NAME | USER |

| 6167 | Human | PLC |
| 6168 | Mouse | PLC |

But in KNIME, using the node “Database Reader”, the result is :

| ID | NAME | USER |

| 6167.0 | Human | PLC |
| 6168.0 | Mouse | PLC |

Description of the table “ORGANISM” in oracle :

ID : NUMERIC(9)
NAME : VARCHAR2(40)
USER : VARCHAR2(15)

For KNIME,

  • ID is considered as a Double
  • NAME and USER are considered as Doubles

    I think it is not a problem that KNIME considers NUMERIC values as doubles. But it shouldn’t add “.0” when the real value is an Integer. This can be very annoying when trying to join the table using IDs with a column from a text file containing the corresponding IDs, but without the “.0” for example, since you will always have to convert data.

It could be a good solution to let the user change retrieved column types as in the “File Reader” node.

Sorry, just to correct an error :
For KNIME,

  • ID is considered as a Double
  • NAME and USER are considered as Strings

Oh I forgot to tell you that I also found a very small problem in the File Reader :
Chen you try to change a the type of a String column to an Integer and then click on “Ok” to validate the node, you get an error message “The settings were not changed, the node will be reset”. It seems that this functionality does not work either from Double to Integer.

Hi,
I am having a hard time reproducing the problem. Could you send the example file and instructions how to reproduce this? When I change a String column to an integer or double I immediately get an error message and can’t OK the dialog.

  • Peter.

I migrated from version V2.0 to V2.0.1 and it seems to work fine.
Thanks !

Concerning numeric datatypes of Oracle databases (Database Reader), did you have a look at the example I posted on the 12 February, 2009, 13:47 (with a correction at 13:49) ?

Hi, I guess this is only a renderer issue. The SQL type in the database in NUMERIC. KNIME interpret this a as a double and therefore shows the ID as formatted double (one decimal place), that is the default behavior. In your example, the ID is truncated to an integer number. I am not sure if one can ensure that an ID is always an integer!?!
Cheers, Thomas

I am not sure if one can ensure that an ID is always an integer!?!
Probably not !

But since it can be quite annoying, I’m sure it could be a good thing to let the user change the retrieved column types (as in the “File Reader” node : in the configuration dialog, you have the possibility to change the column type by right-clicking on the retrieved column headers).
In the example above, that would mean changing the retrieved DOUBLE type to INTEGER or STRING.

Another good solution would be a node that allows the user to do these column types conversions.

Yes, I see your point. We have a Number To String node, but it converts double columns into String columns using also the default formatting ending up with Strings like “x.0”. For the sake of completeness, we need a node to truncate/round double values into doubles/ints and then convert them into a String (ID). Currently you can only use the Java Snippet node to change the ID column into a String by calling “return “” + ((int) $ID$);”. Hope this helps.
Cheers, Thomas