How to detect the decimal separator

FVarlet · April 25, 2022, 8:54pm

Hello,

My current workflow load various Excel files. Some columns are mainly numeric values, but the decimal separator might vary: “.” for several files and “,” on other files. The thousand separators can also vary.
For example, I can have 12,365,326.12643 on a file and 12.365.326,12643 on another file. The Excel cells can be formated as a number or as a string.

I would like to convert this column into double type.
Is there an easy way to do it?

rfeigel · April 25, 2022, 11:23pm

Try configuring your Excel input as string, then use the string to number node with the separators configured correctly. It will convert both types of strings to double.

FVarlet · April 26, 2022, 3:03am

Thanks @rfeigel ,

Do you know how to identify the decimal separator (or the thousand separator) automaticly?
Then I could use a variable for the string to number node.

I forget to mention, but this is not very important: these columns can also contains text, like “no sales”. The string to number node might delete these values, which is perfect.

FVarlet · April 26, 2022, 7:47am

I have the feeling that, if I defined “,” as decimal separator and “.” as thousand separator, or “.” as decimal separator and “,” as thousand separator in the string to number node, I will get the same result, and I don’t really need to distinguish them.
Is it true?

If true, then I don’t really need to define it with a variable

rfeigel · April 26, 2022, 5:35pm

There are probably a dozen ways of addressing your issue. Attached is a simple approach using your number formats and a non-numeric string. The missing values from the string to number nodes are either set to 0, or the row is removed.

String to Number Test.knwf (42.8 KB)

FVarlet · April 26, 2022, 9:32pm

Thanks @rfeigel ,

Sorry, that doesn’t help me as I load several files with a single Excel reader node.
I’ve attached an example that simulates the load of 2 fictional files, one using the comma as decimal separator and the second one using the point.
This workflow fails to recognize 123,3264 in the first case or 123.3264 in the second case (only one is properly recognize).
String to Number_Example.knwf (13.6 KB)

duristef · April 27, 2022, 4:54pm

I’m afraid you can’t upload all files with the same node if you want to detect separators, because formats get mixed up and it’s difficult, if not impossible at all, to get them on a single value basis.
I’ve made a workflow that tries to fix the problem.
My assumptions are
a) Only one pair of separators is used in each spreadsheet
b) All the columns of the spreadsheet that contain numbers are formatted in the same way: either all strings or all numeric
The files are examined one by one and a column is taken as an example to try to determine which pair of separators is used in the spreadsheet. These chars are passed as parameters to the “String to Number” node.

This is the result.
Input: three files with 5 rows and 4 columns each. The first two files have the same string columns and different separators, the third has numeric columns and a comma as a separator for decimals. I’ve found that when Knime reads the columns as string, it transforms automatically the original values, e.g. 12365326,12643 becomes 1.236532612643E7, but 1234,326 becomes 1234.326

Here’s the workflow. To try it you just have to change the first node
String to Number_Example.knwf (35.2 KB)
If my assumptions are not correct, an inner loop must be introduced to work on each column of the spreadsheet.
If inside each single column there can be both numbers and strings, I can’t think of a solution
Please let me know if the wf is of any help

FVarlet · April 28, 2022, 9:11am

Thanks @duristef ,

It’s exactly what I tried to avoid. I had something similar in mind, with the use of loops that try to identify the decimal separator for each file.

I’ll have a look at your workflow and will make some tests in the coming days.

Daniel_Weikert · April 28, 2022, 6:58pm

can you provide a better sample for testing purposes?
br

system · July 27, 2022, 6:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.