I am fairly new to KNIME but enjoying the work a lot!
Working with a lot of Excel Imports (xlsx) I approached a problem with the numbers imported.
I have a column with euro amounts (formated as a currency in excel), which is not recognized as double, int, whatever but as a string.
I thought I just help myself with using the String to Number node, but it seems that I am losing information on the way as all the decimals are gone rounded up in the total euro amount.
Is there any way to define the format of a column before its set automatically?
Or is the problem with the String to Number node?
I think there is no way to define format before. String to Number node should work. At least it works for me. Define your type (double) and decimal separator and that should do it. I donāt see a reason why would numbers end up rounded
Hello,
I am new to Knime as well and I am sorry to pick this old topic up again.
I came to the same finding!
I want to read in multiple excel sheets with flows and to avoid data issues I would love to go for string (works perfectly this way in Alteryx as V_WString). The excel sheets are formatted, e.g. partly show cells with decimals and partly rounded to full values (partly also pure text). but excel has the info stored that there are more digits in case of the numbers!
In case a column is interpreted as string => the not shown (formatted to 0 digits) excel digits are cut off.
In case a column is interpreted as double=> the not shown (formatted to 0 digits) excel digits are shown and can be further processed.
To me this honestly makes no sense, since I would expect all digits to be part of the stringā¦ I do not want to have a āpictureā of the excel formatting but the actual data without proactive truncationā¦
Surfing the web the problem might also come up with transforming string to dateā¦ Is this also known?
How can I (as business user, no coder) avoid this basic problem?
In case a column is interpreted as string => the not shown (formatted to 0 digits) excel digits are cut off.
In case a column is interpreted as double=> the not shown (formatted to 0 digits) excel digits are shown
Can you provide us example data/strings/numbers of both cases?
No problem. As @elsamuel can you share workflow example where this can be seen? Think this way would be easiest to figure it out and find possible workaround. See here how you can create workflow example and share it.
Please find attached a small Excel sample on how I get the data. Yes, they are dirtyā¦
But since it was possible to easily read them in Alteryx I have hoped to extract everything as well in Knimeā¦
There is no chance I can change the input format.
I would love to extract all digits from every cell as string and in a slightly different use case a date-Format also comes into play (DD.MM.YYYY).
I am not a coder as said and I have hoped that there is an out of the box solution, since I have hoped that such dirty data are - unfortunately - daily businessā¦
Are you seeing only 3 digits even if in excel there are more? Try to render the column (right click on the column) to full precision and you will see all the digits
Hi, thank you. unfortunately the renderers are not available since the numbers are - at first sight correctly - interpreted as string, but the strings do not have any decimals in case they are not visible in Excel. But Excel calculates with all digits of course and I need theseā¦
But your columns have text and numbers ā¦ First with String to number node transform the column in double and then the renders will be there, but you will loose the text
The issue is that your input file is set up the way that I see people tend to use spreadsheets. That is, the layout is designed for humans to read and not for machines. Your initial statement of the problem made it seem as if KNIME was misbehaving when in fact the problem is the layout of your input file.
Youāre going to have to do some work to get your input file in the optimal shape for use in KNIME. For KNIME to do what it does best, every row in the table needs to be a unique record, defined by a unique Row ID. Unique column headers define each column, and these are typically extracted from a single row in the source data. In a typical data analytics project, this is usually where most of the work is.
Maybe if you told us what analyses/manipulations you plan on doing and what your desired end result looks like, weād be able to nudge you in the right direction. Otherwise youāre just going to keep getting seemingly random suggestions that wonāt work because thereās some other complication that you havenāt mentioned.
If the structure of your tables does change you might have to be creative. You could first import the sheet āas isā then identify the header line. Turn that information into a Flow Variable and use this line in informing the real import node where to start looking for header and data.
Thank you very much for your reply.
Basically I have to collect approx. 50 sheets a month and would like to clean, prep and consolidate them for a Tableau Dashboard/Analysis. There I am dependent on reading in all decimals even if they are not shown in Excel in order to reflect the numbers correctly.
This is without doubt not a nice format but I have still hoped I could easily do the cleaning. And since there is always the risk of typos I hoped I can simply extract all data (including not shown decimals) stored in the Excel cells as stringā¦This would also stabilize the loop since as said those templates are subject to manual adaptions. Similarly some automated Business Warehouse Reports (Excel Add In) could have some difficulties in the format and taking every piece of data out of the cells as string would make the process bulletproof.
Moreover I would like to share this approach with colleagues for their data which makes it more demanding to have it stable.Of course downstream there is a lot of cleaning and prepping needed to make sth meaningful out of it.
identify āmarkersā that hint to headers or that would re-appear and use that to guide KNIME to the right points (eg. header would always have a monthās name or something)
if you have several such table beneath each other you would have to construct a loop
use this approach (510 from my collection) to let an R package decide which is the ādominantā column type (number/string) and use that. Judging from your example you then might loose header informations:
Maybe you could provide us with a full example that represents your challenge (including most of the quirks). Then it would be easier to find a solution.
Since Alteryx is based on R you could try to check what R procedure is used beneath to achieve this stunning feat - maybe it is possible to recreate that in an R snippet.
Thank you very much for your support.
I will try to have a look if there is something possible with R, with the help of a colleague.
Cutting the data selection a bit further allowed me to read them in for one of my use cases.
Great support in the community. Highly appreciated.
Stay healthy and have a nice weekend
BR Christoph
tnx for example file. I see it and would expect to have decimals as well when file is imported in KNIME. Letās see if this can be improved. (Internal reference: AP-14929)