How to force column data type in "Excel Reader (XLS)" node?

Hello,

Let's say I have an Excel file with this data:

 

  Column A Column B
Row 1 Sara 99.43
Row 2 Joe 12.88
Row 3 Tim NA
Row 4 Karl No data

 

So, I want to force the "Excel Reader (XLS)" node to read "Column B" as Double... always.

I don't want the node to consider that column as String, and then I have to do the "String to Number" conversion.

The reason I don't want to do the conversion, I because I loose the decimals in the cell. For example, let's say B2 shows 12.88 because of the cell formating, but the actual value of the cell is 12.8799999. Using "String to Number", I loose the decimals.

I attach another example.

Any ideas?

Thanks!

Hi Juan, 

this is not possible.

You can use a String to number afterwards.

You will not loose deciamls, but it will only display some digits in the table view. You can see them if you chose Full Precision Renderer.

Cheers, Iris

There is a reason it is being considered a string instead of double. If Knime can't interpret the number field always as a double, it will conisder it a string. I have noticed commas cause a number to become a string.

One option is to remove the conflicts before loading the file into knime.

Another option is to filter columns twice, first by type (string), then by the column names that you expect to be doubles (as a flow variable), and remove commas or whatever is causing it to be a string, add a "Column auto type cast" node so they are converted to doube, and then reattach the columns to the original dataset.

Iris,

Yes. I loose precision because the numbers are shown as 0.00, so everything "hidden" because of the text reading... is lost in the conversion.

Check the attached example.

Any ideas?

glenerik,

Thank you for your comment.

In this particular case, I don't have any issue with the comma or dot.

The issue is with the number rendering in Excel before it's being read by Knime.

I have the same issue, did you managed to solve it without manually change the number rendering in Excel ?

I don’t have an easy solution.

What I had done, is two reads of the same file using R.
I read the XLS with R readxl.

One read considering the whole sheet is numeric. And a second one, considering the whole file is text. Then I selectively merge the two reads.

The trick, is I’m not using KNIME to read the XLS. I’m using R.

2 Likes

that is actually a good idea :slight_smile:

I built a workflow that uses your idea and does three imports of an Excel file, one all numeric, one all strings and one guess. For the numeric ones when it might actually be a String there will be a huge number of missings. I do a calculation of % missing and I set a variable (75%) if there are 75% or more missing and the guess suggests a String it is a string.

Obviously if you have date variables you would have to do something different.

Here the results are stored in KNIME tables that are named after the Excel file and sheet and also contain columns with that information.

kn_example_r_readxl.knar (179.5 KB)

… hmm did someone say over engineered …

3 Likes

Thanks for the replies, as I’m not super familiar with R, I’ll probably go for a Python script. Thanks a lot !

Hi all,

the new Excel Reader node available in version 4.3 always reads doubles with full precision, even if there are Strings in the column. Thanks for the feedback!

Best,
Simon

6 Likes