Error messaging

Hi,

When working with text files I often get messages like:

1) ERROR File Reader Wrong data format. In line 1380 (Row1379) read 'X' for an integer (in column #8).

2) ERROR File Reader Wrong data format. In line 1402 (Row1401) read '39.5' for an integer (in column #452).

Of course with a header there are no numbers displayed and if one selects no header all columns become strings. Would it be possible to either have the number in the dialogue box with each column name or get the name and number in the error message?

Does anybody have any tips or tricks for importing files like this with Knime?

Best regards,

Jay

Hi Jay,

that's a good point, the column name should be included in the error message (the rowID is, why should the column name not be shown...).

The displayed error row could give a hint where things went wrong. The preview table in the FileReader dialog shows as last row the row where the error occurred. It contains all values it could read, and then '?' in the cell for which it read the errornous data item (and all following cells).
But I admit, this is not really practical with 1500 rows and 500 columns....
But other that that...

If it is really an error in the file, I use an editor, search for the specified row ID (or go to the line number), then search for the specified error pattern (in your case "X" or "39.5") and fix it. But then you would have to read the file back in with the filereader (which it only does if you change some settings, because otherwise it doesn't realize that the content changed), and wait for the next error to appear.

If the problem is in fact that the filereader chose the wrong settings (which seems to be the case when it reads 39.5 for an integer and could be, as it analyzes only the first 1000 lines of the file), then you could - in the filereader dialog - right-click on the column header of the preview table and adjust the column type (and probably the missing value pattern to "X"). But it is again not practical doing this for 500 columns...

Let me know if you got any further, otherwise we start editing FileReaderSettings manually! (As a last resort.)

- Peter.

Hi Peter,

Yes it would be very helpful to have the name show up (and possibly the number as well). Thanks for your help. I do the same, default to a text editor, to make the alterations to the file. My editor lags quite a bit on these files. The file has several hundred thousand rows and ~500 columns. Do you know if any good, fast open-source editors (for Windows) that can handle large files? Do you know any good open-source file splitters and combiners (for Windows)?

When I went to change the column settings on each error the node wouldn't read the file at all. It went down to one row, showing an error.

Is there anyway to change the error field from “?” to something else? When the line has fields with missing values it's hard to distinguish.

Best Regards,

Jay

Okay, the fix is in, next version has it.

That is one huge file.
I use Emacs for Windows. It can handle big files (bigger than Window' Notepad 8) ) and usually is quite fast. It is under GPL. It is good. But it certainly is not inuitive. Not at all.
The only file splitter/joiner/manipulator I know are the Unix ones - which are available under Windows only if you install the Cygwin stuff.

Quote:
When I went to change the column settings on each error the node wouldn't read the file at all. It went down to one row, showing an error.

I don't get that, sorry. Did you change the type of that Integer column it read 39.5 for to Double and it didn't read over that column? Sorry, can you explain that a bit more?

I realize that the '?' in an error cell is to be confused with a missing cell - but it actually is a missing cell... I can't display ERROR or something like that, because that would require a string cell, which I can't insert into a Double column. That last row, the error row, is still a regular table row. Meaning it has to have the correct types and length and everything. But maybe I can mark it somehow. Let me think about that.

The filereader needs some more features for huge files. Also more convenient ways to specify settings for a huge number of columns. I know that - but that doesn't really help you now getting your data in though.

- Peter.