# Model Formulae

# Table of Contents

# Overview

The operator `~`

is used to define a model formula in R.

```
response ~ op_1 term_1 op_2 term_2 op_3 term_3 …
```

- response
- is a vector or matrix, (or expression evaluating to a vector or matrix) defining the response variable(s).
- op
_{i} - is an operator, either
`+`

or`-`

, implying the inclusion or exclusion of a term in the model, (the first is optional). - term
_{i} is either

- a vector or matrix expression, or
`1`

, - a factor, or
- a formula expression consisting of factors, vectors or matrices connected by formula operators.

In all cases each term defines a collection of columns either to be added to or removed from the model matrix.

- a vector or matrix expression, or

Notations:

`Y ~ M`

`Y`

is modeled as`M`

.`M_1 + M_2`

- Include
`M_1`

and`M_2`

. `M_1 - M_2`

- Include
`M_1`

leaving out terms of`M_2`

. `M_1 : M_2`

- The tensor product of
`M_1`

and`M_2`

. If both terms are factors, then the “subclasses” factor. `M_1 %in% M_2`

- Similar to
`M_1:M_2`

, but with a different coding. `M_1 * M_2`

`M_1 + M_2 + M_1:M_2`

.`M_1 / M_2`

`M_1 + M_2 %in% M_1`

.`M^n`

- All terms in
`M`

together with “interactions” up to order`n`

`I(M)`

- Insulate
`M`

. Inside`M`

all operators have their normal arithmetic meaning, and that term appears in the model matrix.

`poly(x, ..., degree = 1, raw = FALSE)`

reference

```
sim = function(sample_size = 250) {
x = runif(n = sample_size, min = -1, max = 1) * 2
y = 3 + -6 * x ^ 2 + 1 * x ^ 4 + rnorm(n = sample_size, mean = 0, sd = 3)
data.frame(x, y)
}
data = sim()
unname(coef(lm(y ~ x + I(x^2) + I(x^3) + I(x^4), data)))
unname(coef(lm(y ~ poly(x, degree = 4, raw = TRUE), data)))
unname(coef(lm(y ~ poly(x, degree = 4), data)))
```

```
[1] 2.7807038 -0.3471052 -5.8677261 0.1627440 0.9384000
[1] 2.7807038 -0.3471052 -5.8677261 0.1627440 0.9384000
[1] -2.4457333 0.5099472 -50.5814792 4.0435950 18.0532313
```

`poly()`

calculates an orthogonal polynomial by default.`x + I(x^2) + ... + I(x^n)`

is equivalent to`poly(x, degree = n, raw = TRUE)`

.

# Create a formula from a string howto

```
y ~ x1 + x2
```

Also, you can create a formula from a variable:

```
npred = 10
preds = paste("x", 1:npred, sep="", collapse = " + ")
as.formula(sprintf("y ~ %s", preds))
```

```
y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10
```