9.6 Análise Fatorial
The object of factor analysis is to find a lower-dimensional representation that accounts for the correlations among the features. (Duda, Hart, and Stork 2001, 580)
(Duda, Hart, and Stork 2001, 580) sugerem considerar a análise fatorial como “uma simples modificação dos métodos hierárquicos” (Seção 11.2), substituindo “uma matriz \(n \times n\) de distâncias entre as amostras” por “uma matrix de correlação \(d \times d\)” conforme Eq. (4.24).
A análise fatorial pertence a uma classe de modelos que envolvem variáveis latentes, também chamadas variáveis não observadas ou, como o nome sugere, fatores. Este tipo de variável é utilizado quando não é possível observar diretamente o fenômeno em estudo. Exemplos típicos são advindos das ciências do comportamento, quando deseja-se medir inteligência geral, resiliência ou extroversão.
9.6.1 Análise Fatorial Exploratória (EFA)
library(lavaan) # cfa
library(semPlot) # feramentas graficas, semPath
x <- read.csv('https://filipezabala.com/data/gfactor.csv')
head(x)
## Years Months Age Pitch Light Weight Classics French English Mathematics Residuals.Pitch Residuals.Light
## 1 10 9 10.7500 50 10 4 16 19 10 7 35.42250 -3.33624
## 2 12 4 12.3333 3 10 6 5 6 6 5 -14.68840 -3.38897
## 3 11 1 11.0833 10 10 6 5 6 6 5 -5.23239 -3.34734
## 4 10 11 10.9167 60 10 9 22 23 22 22 45.09510 -3.34179
## 5 13 7 13.5833 4 12 5 1 1 1 2 -16.14450 -1.43059
## 6 12 6 12.5000 2 10 10 4 2 2 1 -16.01590 -3.39452
## Residuals.Weight Residuals.Classics Residuals.French Residuals.English Residuals.Mathematics
## 1 -7.80519 2.956350 5.770520 -2.99002 -5.81043
## 2 -6.09824 -1.709530 -0.643188 -1.49377 -2.34005
## 3 -5.86688 -6.710150 -5.842900 -5.83292 -6.65877
## 4 -2.83604 9.623100 10.463800 9.58853 9.76540
## 5 -7.32959 -0.708905 -0.443481 -2.15463 -1.02133
## 6 -2.12908 -2.042780 -3.949890 -4.91522 -5.76422
##
## Call:
## factanal(x = x2, factors = 2)
##
## Uniquenesses:
## Age Pitch Light Weight Classics French English Mathematics
## 0.517 0.933 0.802 0.549 0.062 0.005 0.279 0.175
##
## Loadings:
## Factor1 Factor2
## Age -0.693
## Pitch 0.259
## Light 0.435
## Weight 0.671
## Classics 0.961 0.123
## French 0.995
## English 0.784 0.327
## Mathematics 0.882 0.216
##
## Factor1 Factor2
## SS loadings 3.864 0.814
## Proportion Var 0.483 0.102
## Cumulative Var 0.483 0.585
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 11.12 on 13 degrees of freedom.
## The p-value is 0.6
modelo1 <- 'g =~ Age + Pitch + Light + Weight + Classics + French + English + Mathematics'
fit1 <- cfa(modelo1, data = x2)
summary(fit1)
## lavaan 0.6-18 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 16
##
## Number of observations 23
##
## Model Test User Model:
##
## Test statistic 27.521
## Degrees of freedom 20
## P-value (Chi-square) 0.121
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## g =~
## Age 1.000
## Pitch -5.429 4.662 -1.165 0.244
## Light -0.508 0.883 -0.575 0.566
## Weight -0.203 1.588 -0.128 0.898
## Classics -8.563 1.948 -4.396 0.000
## French -8.447 1.934 -4.368 0.000
## English -7.086 1.905 -3.720 0.000
## Mathematics -7.765 1.924 -4.035 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Age 0.664 0.200 3.320 0.001
## .Pitch 272.895 80.592 3.386 0.001
## .Light 10.333 3.048 3.390 0.001
## .Weight 33.915 10.001 3.391 0.001
## .Classics 2.065 1.414 1.460 0.144
## .French 2.752 1.492 1.844 0.065
## .English 15.408 4.762 3.236 0.001
## .Mathematics 9.699 3.141 3.088 0.002
## g 0.599 0.319 1.875 0.061
## gfi fmin chisq df pvalue
## 0.799 0.598 27.521 20.000 0.121
## lavaan 0.6-18 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 16
##
## Number of observations 23
##
## Model Test User Model:
##
## Test statistic 27.521
## Degrees of freedom 20
## P-value (Chi-square) 0.121
##
## Model Test Baseline Model:
##
## Test statistic 153.306
## Degrees of freedom 28
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.940
## Tucker-Lewis Index (TLI) 0.916
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -509.459
## Loglikelihood unrestricted model (H1) -495.698
##
## Akaike (AIC) 1050.917
## Bayesian (BIC) 1069.085
## Sample-size adjusted Bayesian (SABIC) 1019.570
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.128
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.235
## P-value H_0: RMSEA <= 0.050 0.165
## P-value H_0: RMSEA >= 0.080 0.758
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.096
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## g =~
## Age 1.000 0.774 0.688
## Pitch -5.429 4.662 -1.165 0.244 -4.201 -0.246
## Light -0.508 0.883 -0.575 0.566 -0.393 -0.121
## Weight -0.203 1.588 -0.128 0.898 -0.157 -0.027
## Classics -8.563 1.948 -4.396 0.000 -6.626 -0.977
## French -8.447 1.934 -4.368 0.000 -6.536 -0.969
## English -7.086 1.905 -3.720 0.000 -5.483 -0.813
## Mathematics -7.765 1.924 -4.035 0.000 -6.009 -0.888
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Age 0.664 0.200 3.320 0.001 0.664 0.526
## .Pitch 272.895 80.592 3.386 0.001 272.895 0.939
## .Light 10.333 3.048 3.390 0.001 10.333 0.985
## .Weight 33.915 10.001 3.391 0.001 33.915 0.999
## .Classics 2.065 1.414 1.460 0.144 2.065 0.045
## .French 2.752 1.492 1.844 0.065 2.752 0.061
## .English 15.408 4.762 3.236 0.001 15.408 0.339
## .Mathematics 9.699 3.141 3.088 0.002 9.699 0.212
## g 0.599 0.319 1.875 0.061 1.000 1.000
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 10678999 570.4 20862538 1114.2 NA 13510007 721.6
## Vcells 19351811 147.7 31092066 237.3 102400 25840822 197.2
##
## Call:
## factanal(factors = 1, covmat = ability.cov)
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.535 0.853 0.748 0.910 0.232 0.280
##
## Loadings:
## Factor1
## general 0.682
## picture 0.384
## blocks 0.502
## maze 0.300
## reading 0.877
## vocab 0.849
##
## Factor1
## SS loadings 2.443
## Proportion Var 0.407
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 75.18 on 9 degrees of freedom.
## The p-value is 1.46e-12
##
## Call:
## factanal(factors = 2, covmat = ability.cov)
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.455 0.589 0.218 0.769 0.052 0.334
##
## Loadings:
## Factor1 Factor2
## general 0.499 0.543
## picture 0.156 0.622
## blocks 0.206 0.860
## maze 0.109 0.468
## reading 0.956 0.182
## vocab 0.785 0.225
##
## Factor1 Factor2
## SS loadings 1.858 1.724
## Proportion Var 0.310 0.287
## Cumulative Var 0.310 0.597
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191
## The signs of factors and hence the signs of correlations are
## arbitrary with promax rotation.
update(ability.FA, factors = 2, rotation = "promax")
##
## Call:
## factanal(factors = 2, covmat = ability.cov, rotation = "promax")
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.455 0.589 0.218 0.769 0.052 0.334
##
## Loadings:
## Factor1 Factor2
## general 0.364 0.470
## picture 0.671
## blocks 0.932
## maze 0.508
## reading 1.023
## vocab 0.811
##
## Factor1 Factor2
## SS loadings 1.853 1.807
## Proportion Var 0.309 0.301
## Cumulative Var 0.309 0.610
##
## Factor Correlations:
## Factor1 Factor2
## Factor1 1.000 0.557
## Factor2 0.557 1.000
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191