Problems using Gaussian Data Asigner

Hello Everyone:

I´ve just created 160000 client id for make a clustering. I have decided to have 3 cluster. When I use the gaussian data asigner node it works fine, but it does not assigned the cluster that I want. For example, I have 3 cluster, first has mean recency equal 4, second 30 and the third 10. The node generate the variable recency well, but when I see them using a conditional box plot I can realized that cluster first has a mean recency equal 10, but that mean it suppose to be for clurter third. The same happens with the other clusters. I have made again the complete workflow but the problem is still there. The point is that de node do well creating variables by cluster, but his assignations does not match. Someone can help me!!!

 

Thaks in advance

Gabriel

Hi Gabriel,

this sounds strange. Note that also the Data Generators always have a Random Component, but I am sure you aware of this.

Would you mind sending me an examplary workflow? Even here or via Email: iris.adae@knime.com

Best, Iris

Dear Iris:

I never send you an example, but now I am doing a new workflow and I have the same problem. I does not assigned what I want using a conditional feature (nominal) in this case, I want to create a income (Ingreso in Spanish) according occupation.

Could you give a hand?

The workflow is Proy_Minimarket_Prep_v2 I have highlighted in orange the affected nodes.

Thanks

Gabriel 

Dear Gabriel,

yes you are right, there seems to be a problem with the generator. I need to investigate this further.

I made you another workflow using the Random Number Assigner Apache, which uses a library in the background. It will generate the numbers with the correct mean and std.deviation.

 

Sorry, I have to correct myself. Just thought about it and the problem is the min and max limitation. If you use this, you will afterards not get results from the full gaussian distribution but only from those inside the range. Hence, the mean and std.dev. of your result will change and reflect your true values. I would adapt the std.deviation, they were pretty high in some cases.

Best regards, Iris

Iris:

I was reviewing the workflow with apache and it looks great.

In relation to the subject of the Gaussian Data Asigner I was trying to change min and max bounds but problem persist, the percentage of each class is good, but not its mean and Std Dev.