# Fit Logistic Regression Models howto

``````n = 100
x = rnorm(n)
eta = x
p = 1 / (1 + exp(-eta))
y = rbinom(n , size = 1, prob = p)
glm(y ~ x, family = binomial)``````
``````
Call:  glm(formula = y ~ x, family = binomial)

Coefficients:
(Intercept)            x
0.2585       0.8823

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:      138
Residual Deviance: 119.8    AIC: 123.8
``````
• For `binomial` and `quasibinomial` families the response can also be specified as a factor, where the first level denotes failure(0) and all others success(1).
• Interpret the coefficients as the change in `link` values, not the response itself.

# Perform the K-fold cross-validation for glm howto

``````library(boot)
v = cv.glm(data, model, K = 5)
# raw cross-validation estimate of prediction error
v\$delta[1]``````

`cv.glm` has also a parameter named `cost`, which is a function accepts the `actual` and `predcited` value and returns `cost`, a non-negative scalar value.

When `data` has factor variable, there might be a case that a random sampled group observation doesn't cover all the factor levels. In this case, an error occurs. You can just try it again until it succeeds:

``````crossvalidation_kfold = function(model, data, K = 5) {
ret = NULL
attempt = 0
MAX_ATTEMPT = 100
while (is.null(ret) & attempt < MAX_ATTEMPT) {
attempt = attempt + 1
try({
# NOTE: cv.glm() can fail because some split might not contain
# all the levels of factor variables.
# So, just retry  until it succeeds.
# SEE: https://stackoverflow.com/questions/19946930/r-cross-validation-on-a-dataset-with-factors
ret = cv.glm(data, model, K = K)
}, silent = TRUE)
}
stopifnot(!is.null(ret))
ret\$delta[1]
}``````

# Make predictions with Logistic Regression Models howto

``````# type = "link" to get the predicted eta value
predict(fit_glm, newdata = somedata, type = "link")

# type = "response" to get the predicted probability of y = 1
predict(fit_glm, newdata = somedata, type = "response")``````

# Calculate the confidence interval for mean response of glm howto

``````# se.fit = TRUE to get the standard error
eta_hat = predict(model, newdata, se.fit = TRUE, type = "link")

# level = 0.95
critical_value = qnorm(0.975)
margin = critical_value * eta_hat\$se.fit

lower_eta = eta_hat\$fit - margin
upper_eta = eta_hat\$fit + margin

# Calculate the response interval
p = function(eta) 1 / (1 + exp(-eta))
c(p(lower_eta), p(upper_eta))``````

# Perform Likelihood Ratio Test howto

``anova(small_model, larger_model, test = "LRT")``

# Deal with `glm.fit: algorithm did not converge`howto

• Increase `maxit` argument when fitting, which is `25` by default.
``fit = glm(..., maxit = 50)``

# Use the `glm` model as a classifier howto

``````# levels: yes no
pred = ifelse(predict(model, data, type = "response") > 0.5, "yes", "no")

# Misclassification Rate
mean(pred != y)``````

# Calculate Sensitivity and Specificity howto

``````calc_sensitivity = function(predicted, actual) {
tbl = table(predicted, actual)
TP = tbl["no", "yes"]
FN = tbl["no", "yes"]
TP / (TP + FN)
}

calc_specificity = function(predicted, actual) {
tbl = table(predicted, actual)
TN = tbl["no", "no"]
FP = tbl["yes", "no"]
TN / (TN + FP)
}``````

# Include all 2-way interactions in a glm model in R howto

``glm(y ~ . * ., data)``