I have a variable which is humidity. It is labelled as an integer. However, I think it should be a double since humidity is a continuous type of data.
Will there be any significance if I change the variable to a double in terms of analysis such as finding mean and stuff.
Hi @oksdojvs !!
If Knime is dealing with the humidity column as an integer type instead of double, for sure will affect statistics calculations.
Take a look on your input data for humidity column. It has double values? when you read the file using Knime, are you scaning a top sampling or all the available data ? Knime’s reading nodes have an specific parameter to infer the data type from the input.
If you can give us more details, some data, or even a workflow, would be great.
I already set the data rows scanned to be unlimited and this is the table.
Thank you @oksdojvs
Observe that humidity is coming as an integer from your input, you don’t have any double values there, so from technical Knime point, you can work with this integer variable and statistics will be ok (based in your input).
On the other hand, I’am not sure if humidity measure by nature has to be double values. The answer in this point will be attached to your context business/study case.
Thank you so much @cristiancandia for your prompt answers. Really appreciate them
Hi @oksdojvs , when Knime is reading a file, it tries to guess what the column type is based on the data as there is no meta data that accompanies the file, unlike a db table where the column type is defined. So it can only guess based on the data that it’s reading.
Since your humidity column only has integer, Knime is guessing that it’s an integer column. You can override this.
Take a look at the following example. I created a file that contains integers, like yours. By default, knime sees the column as int:
You can tell Knime to change the column type via the Transformation tab:
After changing to Double, you can see the change in the column:
But as per @cristiancandia “I’am not sure if humidity measure by nature has to be double values. The answer in this point will be attached to your context business/study case”. But just in case you really want the column to be read as Double, that’s how you would force it.
From the theory there wil be a difference.
But statistics from your measured data is the output.
Let’s have a look at the input (data). Your humidity sensor has on hand hand a (technical) accuracy and the accuracy of the display.
Most humidity sensor I’m aware of have a technical accuracy which is (absolut) around 5%.
Having the scale from 0-100% that’s an error of approx. 5%.
Adding more accuracy to the measured value by using double instead of int will bring only cosmetic improvements: now you see the measured output with much more digits.
I’m not sure whether this is really helpfull at the end of the day.
Hey, a meteorological data question! These are my favorite
Strictly speaking, humidity is a continuous measurement and therefore should be classified as a double, not an integer. However, since it looks like you have % relative humidity that is only accurate to 1% based on the precision of your instrument… in this case… it doesn’t really matter.