Excel Import: Numbers recognized as string, which leads to an automatic roundup of all numbers

Hey Community!

I am fairly new to KNIME but enjoying the work a lot!

Working with a lot of Excel Imports (xlsx) I approached a problem with the numbers imported.
I have a column with euro amounts (formated as a currency in excel), which is not recognized as double, int, whatever but as a string.
I thought I just help myself with using the String to Number node, but it seems that I am losing information on the way as all the decimals are gone rounded up in the total euro amount.

Is there any way to define the format of a column before its set automatically?
Or is the problem with the String to Number node?

Thank you for helping me!

1 Like

Hi there!

I think there is no way to define format before. String to Number node should work. At least it works for me. Define your type (double) and decimal separator and that should do it. I donā€™t see a reason why would numbers end up rounded :confused:

Br,
Ivan

1 Like

Hello,
I am new to Knime as well and I am sorry to pick this old topic up again.
I came to the same finding!
I want to read in multiple excel sheets with flows and to avoid data issues I would love to go for string (works perfectly this way in Alteryx as V_WString). The excel sheets are formatted, e.g. partly show cells with decimals and partly rounded to full values (partly also pure text). but excel has the info stored that there are more digits in case of the numbers!
In case a column is interpreted as string => the not shown (formatted to 0 digits) excel digits are cut off.
In case a column is interpreted as double=> the not shown (formatted to 0 digits) excel digits are shown and can be further processed.

To me this honestly makes no sense, since I would expect all digits to be part of the stringā€¦ I do not want to have a ā€œpictureā€ of the excel formatting but the actual data without proactive truncationā€¦

Surfing the web the problem might also come up with transforming string to dateā€¦ Is this also known?

How can I (as business user, no coder) avoid this basic problem?

Thanks and best regards,
Christoph

1 Like

In case a column is interpreted as string => the not shown (formatted to 0 digits) excel digits are cut off.
In case a column is interpreted as double=> the not shown (formatted to 0 digits) excel digits are shown

Can you provide us example data/strings/numbers of both cases?

1 Like

I have a collection of Excel import examples and solutions ranging from simple to complex.

In your case a variation of 510 might help. Force the import as string and either let KNIME decided what to do or write your own rule.

Regarding the euros it might be necessary to remove the euro sign.

Maybe you could give us an example.

Hi @CEbersbach,

No problem. As @elsamuel can you share workflow example where this can be seen? Think this way would be easiest to figure it out and find possible workaround. See here how you can create workflow example and share it.

Br,
Ivan

1 Like

Hi Thank you very much.

Please find attached a small Excel sample on how I get the data. Yes, they are dirtyā€¦
But since it was possible to easily read them in Alteryx I have hoped to extract everything as well in Knimeā€¦
There is no chance I can change the input format.
I would love to extract all digits from every cell as string and in a slightly different use case a date-Format also comes into play (DD.MM.YYYY).

I am not a coder as said and I have hoped that there is an out of the box solution, since I have hoped that such dirty data are - unfortunately - daily businessā€¦

Thanks and best regards,
Christoph

Knime Sample.xlsx (11.8 KB)

Thank you very much for the hint.

To be honest, this is outside my understanding, since I have no/very limited coding skills.

Is some use case like this wrapped into a node I could use out of the box?

Thanks and best regards,

Christoph

Hi,

Are you seeing only 3 digits even if in excel there are more? Try to render the column (right click on the column) to full precision and you will see all the digits

image

1 Like

Hi, thank you. unfortunately the renderers are not available since the numbers are - at first sight correctly - interpreted as string, but the strings do not have any decimals in case they are not visible in Excel. But Excel calculates with all digits of course and I need theseā€¦

Thanks and BR
Christoph

Hi,

But your columns have text and numbers ā€¦ First with String to number node transform the column in double and then the renders will be there, but you will loose the text

Hi,
Try to reshape the input table (delete rows 0 and 1 or combine all in one row and use it as header) or import from row 3.

The issue is that your input file is set up the way that I see people tend to use spreadsheets. That is, the layout is designed for humans to read and not for machines. Your initial statement of the problem made it seem as if KNIME was misbehaving when in fact the problem is the layout of your input file.

Youā€™re going to have to do some work to get your input file in the optimal shape for use in KNIME. For KNIME to do what it does best, every row in the table needs to be a unique record, defined by a unique Row ID. Unique column headers define each column, and these are typically extracted from a single row in the source data. In a typical data analytics project, this is usually where most of the work is.

Maybe if you told us what analyses/manipulations you plan on doing and what your desired end result looks like, weā€™d be able to nudge you in the right direction. Otherwise youā€™re just going to keep getting seemingly random suggestions that wonā€™t work because thereā€™s some other complication that you havenā€™t mentioned.

None of this is about coding ability.

3 Likes

What you can do is adjust the settings. You could tell KNIME to start reading data from line 4 and that the header is in line 3.

This would give you this:

If the structure of your tables does change you might have to be creative. You could first import the sheet ā€˜as isā€™ then identify the header line. Turn that information into a Flow Variable and use this line in informing the real import node where to start looking for header and data.

3 Likes

Thank you very much for your reply.
Basically I have to collect approx. 50 sheets a month and would like to clean, prep and consolidate them for a Tableau Dashboard/Analysis. There I am dependent on reading in all decimals even if they are not shown in Excel in order to reflect the numbers correctly.
This is without doubt not a nice format but I have still hoped I could easily do the cleaning. And since there is always the risk of typos I hoped I can simply extract all data (including not shown decimals) stored in the Excel cells as stringā€¦This would also stabilize the loop since as said those templates are subject to manual adaptions. Similarly some automated Business Warehouse Reports (Excel Add In) could have some difficulties in the format and taking every piece of data out of the cells as string would make the process bulletproof.
Moreover I would like to share this approach with colleagues for their data which makes it more demanding to have it stable.Of course downstream there is a lot of cleaning and prepping needed to make sth meaningful out of it.

BR
Christoph

Thank you. This is a great approach. Partly unfortunately the structure/content can change due to manual input.

I think you have these options:

  • identify ā€˜markersā€™ that hint to headers or that would re-appear and use that to guide KNIME to the right points (eg. header would always have a monthā€™s name or something)
  • if you have several such table beneath each other you would have to construct a loop
  • use this approach (510 from my collection) to let an R package decide which is the ā€˜dominantā€™ column type (number/string) and use that. Judging from your example you then might loose header informations:

Maybe you could provide us with a full example that represents your challenge (including most of the quirks). Then it would be easier to find a solution.

Since Alteryx is based on R you could try to check what R procedure is used beneath to achieve this stunning feat - maybe it is possible to recreate that in an R snippet.

Thank you very much for your support.
I will try to have a look if there is something possible with R, with the help of a colleague.

Cutting the data selection a bit further allowed me to read them in for one of my use cases.
Great support in the community. Highly appreciated.
Stay healthy and have a nice weekend
BR Christoph

3 Likes

Hi @CEbersbach,

tnx for example file. I see it and would expect to have decimals as well when file is imported in KNIME. Letā€™s see if this can be improved. (Internal reference: AP-14929)

Br,
Ivan

3 Likes