# Statistical analysis of commenting behaviour in blog

Posted on Nov 29, 2006 by Chung-hong Chan

commentdata < - c(3,4,5,2,20,0,2,0,1, 0,2,0,0,6,1,6,2,2,3,0,0,0,3,0,0,3,6,1,0,2,4,3,0, 1,1,3,0,2,0,3,0,2,1,1,0,0,8,0,1,1,0,0,0,9,1,6,0, 5,18,0,0,3,2,2,5,1,2,0,1,0,5,5,7,0,3) 

> stem(commentdata)

 The decimal point is at the | 0 | 000000000000000000000000000000000000000 2 | 00000000000000000000 4 | 0000000 6 | 00000 8 | 00 10 | 12 | 14 | 16 | 18 | 0 20 | 0 

>

> ks.test(commentdata,"pnorm",mean=mean(commentdata), sd=sqrt(var(commentdata)))

 One-sample Kolmogorov-Smirnov test data: commentdata D = 0.2485, p-value = 0.0001896 alternative hypothesis: two-sided 

Warning message: cannot compute correct p-values with ties in: ks.test(commentdata, "pnorm", mean = mean(commentdata), sd = sqrt(var(commentdata))) > 

>library(vcd) >> gf < - goodfit (commentdata, type="nbinomial", method = "MinChi") > summary(gf)

 Goodness-of-fit test for nbinomial distribution 

 X^2 df P(> X^2) Pearson 7.831838 9 0.5511779 Warning message: Chi-squared approximation may be incorrect in: summary.goodfit(gf) > 

Negative Binomial Distribution，證明人客留言是一個Poisson Process（泊松過程）。泊松過程的特點是發生機會低。我其實不能將MacGrass Comment的數字fit在一個Poisson distribution。又證明客人留Comment的可能性不是一個Constant rate。而係可能有Time-dependency/episode dependency。即是人客留言，是受post出來的時間，又或本身有幾多個留言所影響。