#
Generalized Linear Models (`glm`

)

# Table of Contents

- Fit Logistic Regression Models
- Perform the K-fold cross-validation for glm
- Make predictions with Logistic Regression Models
- Calculate the confidence interval for mean response of glm
- Perform Likelihood Ratio Test
- Deal with
`glm.fit: algorithm did not converge`

- Use the
`glm`

model as a classifier - Calculate Sensitivity and Specificity
- Include all 2-way interactions in a glm model in R

# Fit Logistic Regression Models howto

```
n = 100
x = rnorm(n)
eta = x
p = 1 / (1 + exp(-eta))
y = rbinom(n , size = 1, prob = p)
glm(y ~ x, family = binomial)
```

```
Call: glm(formula = y ~ x, family = binomial)
Coefficients:
(Intercept) x
0.2585 0.8823
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 138
Residual Deviance: 119.8 AIC: 123.8
```

- For
`binomial`

and`quasibinomial`

families the response can also be specified as a factor,.**where the first level denotes failure(0) and all others success(1)** - Interpret the coefficients as the change in
`link`

values, not the response itself.

# Perform the K-fold cross-validation for glm howto

```
library(boot)
v = cv.glm(data, model, K = 5)
# raw cross-validation estimate of prediction error
v$delta[1]
```

`cv.glm`

has also a parameter named `cost`

, which is a function accepts the `actual`

and `predcited`

value and returns `cost`

, a non-negative scalar value.

When `data`

has factor variable, there might be a case that a random sampled group observation doesn't cover all the factor levels. In this case, an error occurs. You can just try it again until it succeeds:

```
crossvalidation_kfold = function(model, data, K = 5) {
ret = NULL
attempt = 0
MAX_ATTEMPT = 100
while (is.null(ret) & attempt < MAX_ATTEMPT) {
attempt = attempt + 1
try({
# NOTE: cv.glm() can fail because some split might not contain
# all the levels of factor variables.
# So, just retry until it succeeds.
# SEE: https://stackoverflow.com/questions/19946930/r-cross-validation-on-a-dataset-with-factors
ret = cv.glm(data, model, K = K)
}, silent = TRUE)
}
stopifnot(!is.null(ret))
ret$delta[1]
}
```

# Make predictions with Logistic Regression Models howto

```
# type = "link" to get the predicted eta value
predict(fit_glm, newdata = somedata, type = "link")
# type = "response" to get the predicted probability of y = 1
predict(fit_glm, newdata = somedata, type = "response")
```

# Calculate the confidence interval for mean response of glm howto

```
# se.fit = TRUE to get the standard error
eta_hat = predict(model, newdata, se.fit = TRUE, type = "link")
# level = 0.95
critical_value = qnorm(0.975)
margin = critical_value * eta_hat$se.fit
# Calculate the link interval
lower_eta = eta_hat$fit - margin
upper_eta = eta_hat$fit + margin
# Calculate the response interval
p = function(eta) 1 / (1 + exp(-eta))
c(p(lower_eta), p(upper_eta))
```

# Perform Likelihood Ratio Test howto

# Deal with `glm.fit: algorithm did not converge`

howto

- Increase
`maxit`

argument when fitting, which is`25`

by default.

# Use the `glm`

model as a classifier howto

```
# levels: yes no
pred = ifelse(predict(model, data, type = "response") > 0.5, "yes", "no")
# Misclassification Rate
mean(pred != y)
```

# Calculate Sensitivity and Specificity howto

```
calc_sensitivity = function(predicted, actual) {
tbl = table(predicted, actual)
TP = tbl["no", "yes"]
FN = tbl["no", "yes"]
TP / (TP + FN)
}
calc_specificity = function(predicted, actual) {
tbl = table(predicted, actual)
TN = tbl["no", "no"]
FP = tbl["yes", "no"]
TN / (TN + FP)
}
```