Statistics after ETL

Hello, colleagues,

I need again your advice. I analysed the dataset “Space Missions from 1957” downloaded from Kaggle.
After all operations with ETL, I tried to perform EDA-analysis and found that in the Statistics Output Table I have wrong (in my opinion, correct me if I am wrong) values. E.g. the max value in the “Launch Services Price” column is 450… 5.3 (million dollars if we are speaking as expert), meanwhile the minimum value is 1,160…5,000 (thousands dollars). But the statistics shows that minimum is 5.3 and the maximum is 5,000… what is wrong in my workflow?




Initially the price column had string format and I made the transformation using String to Number with the following settings:

Thank you in advance.
Best regards,
Ekaterina

Hi @Felis90,

It seems there may be a misunderstanding regarding the representation of values in your dataset. If the values are recorded in different units (e.g., millions vs. thousands), this could lead to confusion. For instance, a price of 5.3 million dollars should be represented as 5,300 (thousands).
You need to convert all values to a consistent unit (either all in millions or all in thousands).

Regards,
Yogesh

1 Like

Have you replaced any thousands separators (e.g. , or .) by nothing using String manipulation? If not, you probably should, otherwise you risk having those converted to missing values, which is not what you would want.