Bonjour,
I should like trying to apply the χ2 test by contingency table in the case of the die play. Rolling a die 600 times in a row gave the following results: 88 109 107 94 105 97 for respectely 1 2 3 4 5 6. Ideally, the expected value for each side should be 100. The number of degrees of freedom is 6 - 1 = 5. We wish to test the hypothesis that the die is not rigged, with a risk α = 0.05. The null hypothesis here is therefore: “The die is balanced”.
We can perform the Chi-square calculation by hand. Considering this hypothesis to be true, the variable Chi2 defined above is : ( 88 - 100 )^2/100 + ( 109 - 100 )^2/100 + ( 107 - 100 )^2/100 + ( 94 - 100 )^2/100 + ( 105 - 100 )^2/100 + ( 97 - 100 )^2/100 = 3 , 44 The χ2 distribution with five degrees of freedom gives the value below which we consider the draw to be compliant with a risk α = 0.05: P(Khi2 < 11.07) = 0.95. Since 3.44 < 11.07, we cannot reject the null hypothesis: this statistical data does not allow us to consider that the die is rigged.
Below, my Pandas code to try to find this result with chi2_contengency.
import pandas as pd
from scipy.stats import chi2_contingency
dico = {‘face’ : [1 ,2, 3, 4, 5, 6], ‘effectifs’ : [88, 109, 107, 94, 105, 97]}
tab = pd.DataFrame(dico)
print(“tab”,tab.head(6))
ta = pd.crosstab(tab[‘face’],tab[‘effectifs’])
print("ta = ",ta)
test = chi2_contingency(tab)
print(“table de chi2_contingency =”, test)
That produces:
tab face effectifs
0 1 88
1 2 109
2 3 107
3 4 94
4 5 105
5 6 97
ta = effectifs 88 94 97 105 107 109
face
1 1 0 0 0 0 0
2 0 0 0 0 0 1
3 0 0 0 0 1 0
4 0 1 0 0 0 0
5 0 0 0 1 0 0
6 0 0 1 0 0 0
table de chi2_contingency = (4.86, 0.432, 5)
Where chi2 is 4.86, p-value is 0.432 and 5 degrees of freedom.
That do not produce Chi2 = 3.44 as expected. Something is wrong.
Perhaps the table should be presented in a different way, for example with columns with the error or the expected value.
I don’t know. Do you have an idea?
Regards,
Atapalou