Hey there, undergrad here,
I have a dataset comprising of morphological data aquired through sholl analysis (and some others) from a total of 150 neuron tracings. I performed a PCA on said data, and used K-Means to cluster it. We chose to use 2 clusters visually, as we can clearly see two different types of neurons in our dataset, in terms of morphology.
My question is, what is the best way to validate said number of clusters? Should I even try to, or just go with defining two different clusters visually?
(Don't have much knowledge in statistics...)
Hi, usually I use a protocal like in attachment for such problems. If you look at the R snippet node and right click and then "View R std output" and scroll down, you'll see a result like this :
1 groups 2 groups 3 groups 4 groups 5 groups 6 groups 7 groups
SSE 19488 14385.210 10183.107 7184.574 6268.360 5603.687 5006.852
calinski NA 2303.937 2966.972 3706.373 3422.821 3216.563 3128.468
8 groups 9 groups 10 groups
SSE 4571.933 4229.411 3939.997
calinski 3024.365 2925.872 2844.331
You'll see that in that case the Calinski criterion is maximasized for groups number 4 (automatically selected).