Chapter 7 Correlation and Regression

The word “regression” is not a happy choice from an etymological point of view, but it is already so ingrained in statistical literature that we have not tried to replace it with another that would more conveniently express its essential properties. It was introduced by Galton, in connection with the heredity of height. Galton found that the children of parents whose heights differ by \(x\) inches in relation to the average height of all parents, present a deviation in relation to the average height of all children of less than \(x\) inches, that is, there is what Galton called a “regression into mediocrity”. In general, the idea linked to the word “regression” has nothing to do with this meaning, and should be considered only as a conventional expression. (Yule and Kendall 1948, 246)

For more details see (Galton 1889, 95–99) and (Stigler 1997).

The topic of correlation and regression is discussed in most courses involving basic statistics in Brazil. As it is a basic material, this terminology will be used. The topics of correlation will be covered in Section 7.1, simple linear regression in Section 7.2 and multiple linear regression in Section 7.3. More advanced approaches use the designation linear model, and can be found in the Advanced Statistics material (in Portuguese) by the same author.

References

Galton, Francis. 1889. Natural Inheritance. Macmillan; Company. https://archive.org/details/in.ernet.dli.2015.221860.
———. 1997. “Regression Towards the Mean, Historically Considered.” Statistical Methods in Medical Research 6 (2): 103–14. https://doi.org/10.1177/096228029700600202.
Yule, G. Udny, and Maurice G. Kendall. 1948. Introdução à Teoria Da Estatı́stica. Translated by Evandro de Oliveira Silva. 13th ed. IBGE - Instituto Brasileiro de Geografia e Estatı́stica.