Homework 5
13.13
The gamma probability density function is
$$
f(y,r,\lambda) = \frac{\lambda^r}{\Gamma(r)} e^{-\lambda y} y^{r-1}, \quad\text{for } y,\lambda\geq 0
$$
Show that the gamma is a member of the exponential family.
Proof:
Since
$$
\begin{align}
f(y,r,\lambda) &= \exp\lbrace r\log\lambda - \log\Gamma(\lambda) - \lambda y + (r-1)\log y \rbrace \
&= \exp\lbrace \underbrace{(\frac{\lambda}{r} y - \log\frac{\lambda}{r})/(-\frac1r)}{[\theta y - b(\theta)]/a(\phi)} + \underbrace{(r-1)\log y - \log\Gamma(r) + r\log r}{h(y,\phi)} \rbrace
\end{align}
$$
Set $\theta = \dfrac{\lambda}{r}$, $\phi = r$, we can derive that $b(\theta) = \log(\theta)$, $a(\phi) = -\dfrac1r$
$$
h(y,\phi) = (\phi-1)\log y - \log\Gamma(\phi) + \phi\log\phi
$$
Thus
$$
\begin{align}
&E(y) = \frac{\mathrm{d} b(\theta)}{\mathrm{d} b(\theta)} = \frac{1}{\theta} = \frac{r}{\lambda} \
&\mathrm{Var}(y) = \frac{\mathrm{d}^2 b(\theta)}{\mathrm{d}\theta^2} \cdot a(\phi) = \frac{r}{\lambda^2}
\end{align}
$$
13.14
The exponential porbabilty density function is
$$
f(y,\lambda) = \lambda e^{-\lambda y}, \quad\text{for } y,\lambda\geq 0
$$
Show that the exponential distribution is a member of the exponential family.
Proof:
Notice the exponential distribution is a special case of gamma distribution.
$$
f(y, \lambda) = \exp(-\lambda y + \log\lambda)
$$
Set $\theta = -\lambda$, $b(\theta) = -\log\lambda = -\log(-\theta)$, with respect to $\phi = 1$, $a(\phi) = 1$ and $h(y,\phi)=0$
Thus, similar result can be derived
$$
\begin{align}
&E(y) = \frac{\mathrm{d}b(\theta)}{\mathrm{d}\theta} = \frac{1}{\lambda} \
&\mathrm{Var}(y) = \frac{\mathrm{d}^2 b(\theta)}{\mathrm{d}\theta^2}\cdot1 = \frac{1}{\lambda^2}
\end{align}
$$
which indicates the distribution to be a member of NEF.
13.15
The negative binomial probabilty mass function is
$$
f(y,\pi,\alpha) = \binom{y+\alpha-1}{\alpha-1} \pi^{\alpha}(1-\pi)^y \quad\text{for } y=0,1,\ldots,\alpha>0 \text{ and } 0\leq\pi\leq 1
$$
Show that the negative binomial is a memeber of the exponential family.
Proof:
$$
\begin{align}
f(y,\pi,\alpha) &= \binom{y+\alpha-1}{\alpha-1} \pi^{\alpha}(1-\pi)^y \
&= \frac{\alpha}{y+\alpha} \binom{y+\alpha}{y}\pi^{\alpha}(1-\pi)^y \
\text{Set }n=y+\alpha\quad &= (1-\frac{y}{n})\binom{n}{y}\pi^{n-y}(1-\pi)^y \
&= \exp\Big{ (n-y)\log\pi+y\log(1-\pi)+\log (1-\frac{y}{n})\binom{n}{y} \Big} \
&= \exp\Big{ \underbrace{\Big( \frac{y}{n}\log\frac{1-\pi}{\pi} + \log\pi \Big)}{y\theta + b(\theta)}/\frac1n + \underbrace{\log (1-\frac{y}{n})\binom{n}{y}}{h(y,\phi)} \Big}
\end{align}
$$
Set
$$
\begin{align}
&\theta=\log\frac{1-\pi}{\pi} \
&b(\theta)=-\log\pi = \log(1+e^{\theta})
\end{align}
$$
with respect to $\phi=1$, $a(\phi)=\dfrac1n$, $h(y,\phi)=\log (1-\dfrac{y}{n})\binom{n}{y}$
Thus
$$
\begin{align}
&E(y) = \frac{\mathrm{d}b(\theta)}{\mathrm{d}\theta} = \frac{e^\theta}{1+e^\theta} = 1-\pi \
&\mathrm{Var}(y) = \frac{\mathrm{d}^2 b(\theta)}{\mathrm{d}\theta^2}\cdot a(\phi) = \frac1n\cdot\frac{e^\theta}{(1+e^\theta)^2} = \frac1n\pi(1-\pi)
\end{align}
$$
13.16
The data in the table below are from an experiment designed to study the advance rate $y$ of a drill. The four design factors are $x_1$=load, $x_2$=flow, $x_3$=drilling speed, and $x_4$=type of drilling mud (the original experiment is described by Cuthbert Daniel in his 1976 book on industrial experimentation).
Fit a generalized linear model to the advance rate response. Use a gamma response distribution and a log link, and include all four regressors in the linear predictor.
First, we write all these data into a csv file named drill
and import it into R dataframe.
1 | data <- read.table("/path/to/drill.csv", header = TRUE, sep = ",") |
To study the advance rate of a drill, we need to transform the columns into dichotomous factor called load
, flow
, speed
, type
with the following code.
1 | data$load[data$x1=="+"] <- 1 |
1 | 0 1 |
The dichotomous factor can now be used as the outcome variable in a regression model using gamma response distribution and a log link.
1 | fit.full <- glm(y ~ load + flow + speed + type, data, family=Gamma(link = "log")) |
1 | Call: |
From the p-values for the regression coefficients (last column), we can see that load
, flow
, speed
and type
may make a significant contribution to the equation. Thus, we can reject the hypothesis that the parameters are $0$. That is to say, each regression coefficient in the model is statistically significant $(p<0.5)$.
Let’s take a look at the regression coefficients:
1 | coef(fit.full) |
1 | (Intercept) load flow speed type |
Since log(odds) are difficult to interpret, we can exponentiate them to put the result on an odds scale:
1 | exp(coef(fit.full)) |
1 | (Intercept) load flow speed type |
Now, we can see that the advance rate of a drill is increased by a factor of 1.140259
for one-unit increase in drill load (holding flow, speed and type constant). Similar increasing trend can be detected in other factors. Because the predictor variables can’t equal 0, the intercept isn’t meaningfull in this case.
Also, we can use the confint()
function to obtain confident intervals for the coefficients.
1 | exp(confint(fit.full)) |
1 | 2.5 % 97.5 % |
97.5 percent confidence intervals fro each of the coefficients on an odds scale are printed above.
Finally, we use compare the residual deviance with the residual degrees of freedom to detect the overdispersion.
1 | deviance(fit.full)/df.residual(fit.full) |
1 | [1] 0.009570289 |
which is far away from 1, suggesting the overdispersion.
Deviance can be draw by the command
1 | fit.full$deviance |
1 | [1] 0.1052732 |
Bonus Problems
1.
Prove that the deduction matrix of variance $\mathrm{Var}(\hat\beta^*{q+1}) - \mathrm{Var}(\hat\beta{q+1})$ is positive semidefinite.
What we knew is that
$$
\begin{align}
&\mathrm{Var}(\hat\beta^) = \sigma^2(X’X)^{-1} \
&\mathrm{Var}(\hat\beta^{q+1}) = \sigma^2(X{q+1}’X_{q+1})^{-1}
\end{align}
$$
where $X = \left( \begin{array}{c:c} X_{q+1} & X_r \end{array}\right)$.
Consider the normal matrix
$$
X’X = \begin{pmatrix}
X{q+1}’X{q+1} & X_{q+1}’X_r \
Xr’X{q+1} & X_r’Xr
\end{pmatrix} = \begin{pmatrix}
\Sigma{11} & \Sigma{12} \
\Sigma{21} & \Sigma_{22}
\end{pmatrix}
$$
Inverse of this bolck matrix is already given as
$$
\begin{pmatrix}
\Sigma{11} & \Sigma{12} \
\Sigma{21} & \Sigma{22} \
\end{pmatrix}^{-1} =\, \begin{pmatrix}
\Sigma{11}^{-1}+\Sigma{11}^{-1}\Sigma{12}\Sigma{22.1}^{-1}\Sigma{21}\Sigma{11}^{-1} & -\Sigma{11}^{-1}\Sigma{12}\Sigma{22.1}^{-1} \
-\Sigma{22.1}^{-1}\Sigma{21}\Sigma{11}^{-1} & \Sigma_{22.1}^{-1} \
\end{pmatrix}
$$where
$$
\Sigma{22.1}=\Sigma{22}-\Sigma{21}\Sigma{11}^{-1}\Sigma_{12}
$$
Thus, we say the first entry of the inverse matrix should be
$$
\Sigma{11}^{-1}+\Sigma{11}^{-1}\Sigma{12}\Sigma{22.1}^{-1}\Sigma{21}\Sigma{11}^{-1}
$$
Set $A = \Sigma{11}^{-1}\Sigma{12} = (X{q+1}’X{q+1})^{-1}\cdot(X_{q+1}’X_r)$ to be the alias matrix. Hence we have
$$
\mathrm{Var}(\hat\beta^*{q+1}) - \mathrm{Var}(\hat\beta{q+1}) = \sigma^2 A\big(X_r’X_r - Xr’X{q+1}A\big)^{-1}\,A’
$$
which is obviously positive semidefinite.
2.
Prove that
$$
F = \dfrac{SS_A/(a-1)}{SSE/(n-a)} \sim F{a-1, n-a}
$$under the hypothesis $H_0$.
What we learned is that
$$
\begin{align}
SST &= \sum{i=1}^a \sum_{j=1}^{ni} (y{ij} - \bar y)^2 \
&= \sum{i=1}^a \sum{j=1}^{ni} (y{ij} - \bar y{i\cdot} + \bar y{i\cdot} - \bar y)^2 \
&= \underbrace{\sum{i=1}^a \sum{j=1}^{ni} (y{ij} - \bar y{i\cdot})^2}{SSE} + \underbrace{\sum{i=1}^a \sum_{j=1}^{ni} (\bar y{i\cdot} - \bar y)^2}_{SS_A}
\end{align}
$$
For clarity, we give the coresponding matrix form:
$$
\begin{align}
SST &= y’(I-\frac1n \boldsymbol{1}\boldsymbol{1}’)y \
&= y’(I - A + A - \frac1n \boldsymbol{1}\boldsymbol{1}’)y \
&= \underbrace{y’(I - A)y}{SSE} + \underbrace{y’(A - \frac1n \boldsymbol{1}\boldsymbol{1}’)y}{SS_A}
\end{align}
$$
where $A = \dfrac{1}{n1}\boldsymbol{1}{n1}\boldsymbol{1}{n_1}’ \oplus \cdots \oplus \dfrac{1}{na}\boldsymbol{1}{na}\boldsymbol{1}{n_a}’$
Since $\boldsymbol{1}_{ni}’\boldsymbol{1}{n_i} = ni$, we know that $I{n_i} - \dfrac{1}{ni}\boldsymbol{1}{ni}\boldsymbol{1}{n_i}’$ is a Household matrix, which is definitely symetric and idempotent. i.e.
$$
A’=A \qquad A^2 = A
$$
Next, we assume that $B=I-A$, $C=A-\dfrac1n\boldsymbol{1}\boldsymbol{1}’$
We claim that
$$
\begin{align}
&B’=B \qquad B^2 = B \
&C’=C \qquad C^2 = C
\end{align}
$$
with its rank
$$
\mathrm{rank}(B)=n-a \quad \mathrm{rank}(C)=a-1
$$
Thus, under the normal hyperthesis $Y\sim N(0,\sigma^2I)$, we have
$$
\begin{align}
&\frac{Y’BY}{\sigma^2} \sim \chi^2(n-a) \
&\frac{Y’CY}{\sigma^2} \sim \chi^2(a-1)
\end{align}
$$
together with
$$
\begin{align}
BC &= (I-A)(A-\frac1n\boldsymbol{1}\boldsymbol{1}’) \
&= A-\frac1n\boldsymbol{1}\boldsymbol{1}’-A^2+\frac1n A\boldsymbol{1}\boldsymbol{1}’ \
&= A-\frac1n\boldsymbol{1}\boldsymbol{1}’-A+\frac1n\boldsymbol{1}\boldsymbol{1}’ \
&= 0
\end{align}
$$
The last two equation follows from the fact that
$$
\begin{align}
A \cdot \boldsymbol{1} &= \big( \frac{1}{n1}\boldsymbol{1}{n1}\boldsymbol{1}{n_1}’ \oplus \cdots \oplus \frac{1}{na}\boldsymbol{1}{na}\boldsymbol{1}{na}’ \big) \cdot \big(\boldsymbol{1}{n1} \oplus \cdots \oplus \boldsymbol{1}{n_a}\big) \
&= \frac{1}{n1}\boldsymbol{1}{n1}\boldsymbol{1}{n1}’\boldsymbol{1}{n_1} \oplus \cdots \oplus \frac{1}{na}\boldsymbol{1}{na}\boldsymbol{1}{na}’\boldsymbol{1}{na} \
&= \boldsymbol{1}{n1} \oplus \cdots \oplus \boldsymbol{1}{n_a} = \boldsymbol{1}
\end{align}
$$
Hence, we know that $Y’(I-A)Y$ and $Y’(A-\dfrac1n\boldsymbol{1}\boldsymbol{1}’)Y$ are mutually independent, which indicates
$$
F=\frac{Y’CY/(a-1)}{Y’BY/(n-a)} = \frac{SS_A/(a-1)}{SSE/(n-a)} \sim F{a-1, n-a}
$$
Furthermore, we can derive two unbiased estimates.
$$
\begin{align}
E(SS_E) &= E\big(Y’(I-A)Y\big) \
&= \mu’(I-A)\mu + \mathrm{tr}\big((I-A)\sigma^2I\big) \
&= \mu’\mu - \mu’A\mu + \sigma^2(n-a) \
&= \sigma^2(n-a)
\end{align}
$$
The last equation is derived from the hypothesis.
Similarly,
$$
\begin{align}
E(SS_A) &= E\big(Y’(A-\frac1n\boldsymbol{1}\boldsymbol{1}’)Y\big) \
&= \mu’(A-\frac1n\boldsymbol{1}\boldsymbol{1}’)\mu + \mathrm{tr}\big((A-\frac1n\boldsymbol{1}\boldsymbol{1}’)\sigma^2I\big) \
&= \mu’\mu - \mu’A\mu + \sigma^2(a-1) \
&= \sigma^2(a-1)
\end{align}
$$
That’s to say
$$
E\big(SS_E/(n-a)\big) = E\big(SS_A/(a-1)\big) = \sigma^2
$$
Kai Liu, wrote on 2017/11/28
Student ID: 116071910015
Mail: liu1995@sjtu.edu.cn