Set Data Type of Columns in File Reader

Hi,

I am trying to read a CSV file into KNIME.  I have some columns of data where the cells are either empty, numbers or numbers separated by a hash tag (6#6).  When I read the file KNIME tries to automagically guess the data type.  When it guesses integer, however, I loose the hash tag separated values because they are strings.

Is there any way to tell KNIME to import columns as a certain data type or possibly tell it to just import every column as a string type and then I can change the type as needed?

Thanks,

Troy

I did find this post but it only works if all of the data is imported in the first place.

http://tech.knime.org/forum/knime-users/tablerow-to-variable-inconsistent-variable-data-type

1 Like

OK, found a workaround that seems to work.

  1. Import the CSV file using the Line Reader node (single string column)
  2. Filter out some metadata that is above the data table in the file
  3. Split the data table by the comma (since the original column was string all columns are typed as string)
  4. Do a little magic to get the data table header set (wish there was an easier way, like a node to just say, this row should be my KNIME table header, probably with optional data type forcing)
  5. Remove the original column that is the whole original CSV file
  6. Remove the data table header row since it has now been set as the KNIME table header.

If anyone knows of a better way to do this, I'm all ears!

Thanks,

Troy

You can right-click on the column header in the File Reader's dialog and change the type in the popup dialog.

Mainly the CSV reader has a habit of looking only at the first P rows, and thus of guessing the columntype wrong.

The file reader does it a lot better, though it would be nice if (in both readers) there would be an option to force everything as a string.

Fantastic, thanks Thor!  I'm obviously still learning a few tricks of the KNIME trade.

I agree with van Koperen though, would be nice if the CSV reader incorporated the same preview table that the file reader has for locking down known data types.

Also, the" Row as Table Header" node idea would be handy.

Thanks,

Troy 

Hi everybody,

I really, really would second van Koperen's take that there should be an option to force everything to a string, that would be at least a predictable outcome.

At the moment, I am working on importing data from csv's in a dynamic fashion, i.e., the columns change, but the file reader should be able to handle any table coming in without configuration.

I can change the column names dynamically by passing a header file and insert the header, check.

What I can't change are columns that look like integers but really are strings (numeric id's). The problem is that converting the falsely identified integer columns back to string fails, if the numeric string is really long, e.g., 123456789012345678902828387. This ID will be converted to scientific notation and I can't get the ID back from it. (at least as far as I know)

To work around this problem I am now so desperate that I am putting an additional line in the data that has a String for every column I want to be a string and a number for the other columns. This really gross workaround would actually work, if Knime would now not consider the single string value as the missing value identifier and classifies this row as "missing" and puts the whole column back to Int again! It's like Zombies, you just can't kill it :).

I am now considering putting a second string line in the data set, hoping that knime can only have one missing value indicator per column and will now finally accept my "strings".

Is there any other way to tell the file reader that I want strings, but nothing as strings? Using the line reader doesn't work for me because I have to deal with line breaks in strings etc.

Thanks a lot in advance and best regards,

 

Dennis

 

I am trying to read web pages whith Dataset reader of Palladian node.

ut allwais i obtain the following error:

DatasetReaderNodeModel             Could not read line 0   "www.infogroup.it"

DatasetReaderNodeModel             Could not read line 1    "www.repubblica.it"

In the file i have two liens like:

www.infogroup.it

www.repubblica.it

Where il my error?

Thanks a lot in advance and best regards,

Walkirie