10.3 Formulações

Baseado em (Wilkinson and Rogers 1973), (Chambers and Hastie 1993) apresentam as possíveis formulações da linguagem R.

i Expressão Significado
1 \(T \sim F\) \(T\) é modelado com \(F\)
2 \(F_a + F_b\) Inclui \(F_a\) e \(F_b\)
3 \(F_a - F_b\) Inclui todos os \(F_a\) exceto o que está em \(F_b\)
4 \(F_a * F_b\) \(F_a + F_b + F_a:F_b\)
5 \(F_a / F_b\) \(F_a + F_b\) %in% \((F_a)\)
6 \(F_a : F_b\) ou \(F_b\) %in% \(F_a\) O fator indexado conjuntamente por \(F_a\) e \(F_b\)
7 \(F^m\) Todos os termos de \(F\) cruzados até a ordem \(m\)
8 \(T \sim \; .\) \(T\) é modelada com todas as variáveis (exceto \(T\))

Exemplo 10.1 Formulação \(T \sim F\) com o banco de dados cars.

fit <- lm(dist ~ speed, data = cars)
summary(fit)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Exemplo 10.2 Formulação \(F_a + F_b\) com o banco de dados airquality.

fit <- lm(Temp ~ Ozone + Solar.R + Wind + as.factor(Month),
          data = airquality)
summary(fit)
## 
## Call:
## lm(formula = Temp ~ Ozone + Solar.R + Wind + as.factor(Month), 
##     data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.9220  -3.0386   0.0148   3.0856  12.0292 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       64.247098   2.722613  23.598  < 2e-16 ***
## Ozone              0.121180   0.022020   5.503 2.74e-07 ***
## Solar.R            0.011901   0.006044   1.969   0.0517 .  
## Wind              -0.250226   0.183341  -1.365   0.1753    
## as.factor(Month)6 11.261885   2.069698   5.441 3.59e-07 ***
## as.factor(Month)7 12.031054   1.613653   7.456 2.90e-11 ***
## as.factor(Month)8 12.335145   1.680223   7.341 5.08e-11 ***
## as.factor(Month)9  9.358031   1.473927   6.349 5.93e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.268 on 103 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.7139, Adjusted R-squared:  0.6944 
## F-statistic: 36.71 on 7 and 103 DF,  p-value: < 2.2e-16

Exemplo 10.3 Formulações \(F_a \pm F_b\) e \(T \sim \; .\) com o banco de dados airquality.

fit <- lm(Temp ~ . - Month + as.factor(Month) - Day,
          data = airquality)
summary(fit)
## 
## Call:
## lm(formula = Temp ~ . - Month + as.factor(Month) - Day, data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.9220  -3.0386   0.0148   3.0856  12.0292 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       64.247098   2.722613  23.598  < 2e-16 ***
## Ozone              0.121180   0.022020   5.503 2.74e-07 ***
## Solar.R            0.011901   0.006044   1.969   0.0517 .  
## Wind              -0.250226   0.183341  -1.365   0.1753    
## as.factor(Month)6 11.261885   2.069698   5.441 3.59e-07 ***
## as.factor(Month)7 12.031054   1.613653   7.456 2.90e-11 ***
## as.factor(Month)8 12.335145   1.680223   7.341 5.08e-11 ***
## as.factor(Month)9  9.358031   1.473927   6.349 5.93e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.268 on 103 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.7139, Adjusted R-squared:  0.6944 
## F-statistic: 36.71 on 7 and 103 DF,  p-value: < 2.2e-16

Exemplo 10.4 Formulação \(F_a * F_b\) com o banco de dados airquality.

fit <- lm(Temp ~ Ozone * as.factor(Month),
          data = airquality)
summary(fit)
## 
## Call:
## lm(formula = Temp ~ Ozone * as.factor(Month), data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.8133  -3.1431   0.3708   2.8843  11.4275 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             62.88422    1.43129  43.935  < 2e-16 ***
## Ozone                    0.16288    0.04454   3.657 0.000399 ***
## as.factor(Month)6        6.86610    3.57469   1.921 0.057450 .  
## as.factor(Month)7       15.00550    2.53227   5.926 3.92e-08 ***
## as.factor(Month)8       15.05456    2.28654   6.584 1.80e-09 ***
## as.factor(Month)9        4.83879    2.09236   2.313 0.022676 *  
## Ozone:as.factor(Month)6  0.12484    0.10593   1.179 0.241209    
## Ozone:as.factor(Month)7 -0.06147    0.05443  -1.129 0.261308    
## Ozone:as.factor(Month)8 -0.06244    0.05105  -1.223 0.224012    
## Ozone:as.factor(Month)9  0.12882    0.05903   2.182 0.031309 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.949 on 106 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.749,  Adjusted R-squared:  0.7277 
## F-statistic: 35.15 on 9 and 106 DF,  p-value: < 2.2e-16

Exemplo 10.5 Formulação \(F_a/F_b\) com o banco de dados airquality.

fit <- lm(Temp ~ Ozone / as.factor(Month),
          data = airquality)
summary(fit)
## 
## Call:
## lm(formula = Temp ~ Ozone/as.factor(Month), data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6323  -2.9636   0.9709   4.0807  12.2975 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             69.890925   0.988360  70.714  < 2e-16 ***
## Ozone                    0.002643   0.043547   0.061 0.951704    
## Ozone:as.factor(Month)6  0.281518   0.070349   4.002 0.000114 ***
## Ozone:as.factor(Month)7  0.204859   0.042385   4.833 4.38e-06 ***
## Ozone:as.factor(Month)8  0.192246   0.042267   4.548 1.40e-05 ***
## Ozone:as.factor(Month)9  0.245123   0.047102   5.204 9.13e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.099 on 110 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.6046, Adjusted R-squared:  0.5866 
## F-statistic: 33.64 on 5 and 110 DF,  p-value: < 2.2e-16

Exemplo 10.6 Formulação \(F_a:F_b\) com o banco de dados airquality.

fit <- lm(Temp ~ Ozone:as.factor(Month),
          data = airquality)
summary(fit)
## 
## Call:
## lm(formula = Temp ~ Ozone:as.factor(Month), data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6323  -2.9636   0.9709   4.0807  12.2975 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             69.890925   0.988360  70.714  < 2e-16 ***
## Ozone:as.factor(Month)5  0.002643   0.043547   0.061    0.952    
## Ozone:as.factor(Month)6  0.284161   0.064693   4.392 2.59e-05 ***
## Ozone:as.factor(Month)7  0.207503   0.022200   9.347 1.23e-15 ***
## Ozone:as.factor(Month)8  0.194889   0.020360   9.572 3.74e-16 ***
## Ozone:as.factor(Month)9  0.247766   0.035040   7.071 1.50e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.099 on 110 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.6046, Adjusted R-squared:  0.5866 
## F-statistic: 33.64 on 5 and 110 DF,  p-value: < 2.2e-16

Exemplo 10.7 Formulação \(F^m\) com o banco de dados airquality.

fit <- lm(Temp ~ Ozone^3,
          data = airquality)
summary(fit)
## 
## Call:
## lm(formula = Temp ~ Ozone^3, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.147  -4.858   1.828   4.342  12.328 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 69.41072    1.02971   67.41   <2e-16 ***
## Ozone        0.20081    0.01928   10.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.819 on 114 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.4877, Adjusted R-squared:  0.4832 
## F-statistic: 108.5 on 1 and 114 DF,  p-value: < 2.2e-16

Referências

Chambers, John M., and Trevor J. Hastie. 1993. Statistical Models in s. Chapman & Hall, London.
Wilkinson, GN, and CE Rogers. 1973. “Symbolic Description of Factorial Models for Analysis of Variance.” Journal of the Royal Statistical Society Series C: Applied Statistics 22 (3): 392–99. https://www.jstor.org/stable/2346786.