# 蒙地卡羅

Posted on Oct 25, 2008 by Chung-hong Chan

Come on! 你知道為甚麼是寫在這裡嗎？因為現實中根本沒有使用的可能。人們根本不會 care 你有沒有做的！無處宣洩只好在這裡。

 > str(iris) 'data.frame': 150 obs. of 5 variables: $Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... \$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... 
iris 數據是三種鳶尾花（setosa, versicolor 及 virginica ）的花萼長闊度（Sepal Length 及 Sepal Width ）及花瓣長闊度（Petal Length 及 Petal Width）。費雪認為花萼及花瓣長闊度可用來為鳶尾花分類。它用這個數據設立了 Linear Discriminant model 。這是這個數據的歷史，現在這個經典數據是會用於一般的統計學書藉甚至人工智能研究。

 > new.iris < - subset(iris, Species !="setosa") 

 > model1 < - glm(Species~Sepal.Length, family=binomial("logit"),data=new.iris) > summary(model1) Call: glm(formula = Species ~ Sepal.Length, family = binomial("logit"), data = new.iris) Deviance Residuals: Min 1Q Median 3Q Max -1.85340 -0.90001 -0.04717 0.96861 2.35458 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -12.5708 2.9068 -4.325 1.53e-05 *** Sepal.Length 2.0129 0.4654 4.325 1.53e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 138.63 on 99 degrees of freedom Residual deviance: 110.55 on 98 degrees of freedom AIC: 114.55 Number of Fisher Scoring iterations: 4 

 > confint(model1) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -18.825319 -7.340756 Sepal.Length 1.176685 3.015218 

Bootstrap 的例子如下：

27,27,30,42,42 = 33.6
27,30,34,34,34 = 31.8
34,34,34,34,40 = 35.2

 > bsa < - function(formula, data, indices) { d <- data[indices,] fit <- glm(formula, data=d, family=binomial("logit")) return(coef(fit)[2]) } 

 > boot.results < - boot(data=new.iris,statistic=bsa,R=1000,formula = Species~Sepal.Length) > boot.results ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = new.iris, statistic = bsa, R = 1000, formula = Species ~ Sepal.Length) Bootstrap Statistics : original bias std. error t1* 2.012927 0.09547335 0.5014038 

> plot(boot.results)

 > boot.ci(boot.results) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = boot.results) Intervals : Level Normal Basic 95% ( 0.935, 2.900 ) ( 0.815, 2.826 ) Level Percentile BCa 95% ( 1.200, 3.211 ) ( 1.103, 3.011 ) Calculations and Intervals on Original Scale Some BCa intervals may be unstable Warning message: In boot.ci(boot.results) : bootstrap variances needed for studentized intervals 

Only after that, 在我的標準來說，上面做出來的 Logistic regression model 才是 publishable 。 ((Sorry ，但我自己都不是次次 do ，所以我做出來的 model 都不是 publishable 的。)) 因為有更多的證據證明你的模型是正確的。