9.6 Análise Fatorial

The object of factor analysis is to find a lower-dimensional representation that accounts for the correlations among the features. (Duda, Hart, and Stork 2001, 580)

(Duda, Hart, and Stork 2001, 580) sugerem considerar a análise fatorial como “uma simples modificação dos métodos hierárquicos” (Seção 11.2), substituindo “uma matriz \(n \times n\) de distâncias entre as amostras” por “uma matrix de correlação \(d \times d\)” conforme Eq. (4.22).

A análise fatorial pertence a uma classe de modelos que envolvem variáveis latentes, também chamadas variáveis não observadas ou, como o nome sugere, fatores. Este tipo de variável é utilizado quando não é possível observar diretamente o fenômeno em estudo. Exemplos típicos são advindos das ciências do comportamento, quando deseja-se medir inteligência geral, resiliência ou extroversão.

9.6.1 Visualizando

https://github.com/dustinfife/flexplavaan

9.6.2 Análise Fatorial Exploratória (EFA)

library(lavaan)  # cfa
library(semPlot) # feramentas graficas, semPath
x <- read.csv('https://filipezabala.com/data/gfactor.csv')
head(x)

##   Years Months     Age Pitch Light Weight Classics French English Mathematics Residuals.Pitch Residuals.Light
## 1    10      9 10.7500    50    10      4       16     19      10           7        35.42250        -3.33624
## 2    12      4 12.3333     3    10      6        5      6       6           5       -14.68840        -3.38897
## 3    11      1 11.0833    10    10      6        5      6       6           5        -5.23239        -3.34734
## 4    10     11 10.9167    60    10      9       22     23      22          22        45.09510        -3.34179
## 5    13      7 13.5833     4    12      5        1      1       1           2       -16.14450        -1.43059
## 6    12      6 12.5000     2    10     10        4      2       2           1       -16.01590        -3.39452
##   Residuals.Weight Residuals.Classics Residuals.French Residuals.English Residuals.Mathematics
## 1         -7.80519           2.956350         5.770520          -2.99002              -5.81043
## 2         -6.09824          -1.709530        -0.643188          -1.49377              -2.34005
## 3         -5.86688          -6.710150        -5.842900          -5.83292              -6.65877
## 4         -2.83604           9.623100        10.463800           9.58853               9.76540
## 5         -7.32959          -0.708905        -0.443481          -2.15463              -1.02133
## 6         -2.12908          -2.042780        -3.949890          -4.91522              -5.76422

x2 <- x[,3:10]
S <- cor(x2)
factanal(x2,2)

## 
## Call:
## factanal(x = x2, factors = 2)
## 
## Uniquenesses:
##         Age       Pitch       Light      Weight    Classics      French     English Mathematics 
##       0.517       0.933       0.802       0.549       0.062       0.005       0.279       0.175 
## 
## Loadings:
##             Factor1 Factor2
## Age         -0.693         
## Pitch        0.259         
## Light                0.435 
## Weight               0.671 
## Classics     0.961   0.123 
## French       0.995         
## English      0.784   0.327 
## Mathematics  0.882   0.216 
## 
##                Factor1 Factor2
## SS loadings      3.864   0.814
## Proportion Var   0.483   0.102
## Cumulative Var   0.483   0.585
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 11.12 on 13 degrees of freedom.
## The p-value is 0.6

modelo1 <- 'g =~  Age + Pitch + Light + Weight + Classics + French + English + Mathematics'
fit1 <- cfa(modelo1, data = x2)
summary(fit1)

## lavaan 0.6.17 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        16
## 
##   Number of observations                            23
## 
## Model Test User Model:
##                                                       
##   Test statistic                                27.521
##   Degrees of freedom                                20
##   P-value (Chi-square)                           0.121
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   g =~                                                
##     Age               1.000                           
##     Pitch            -5.429    4.662   -1.165    0.244
##     Light            -0.508    0.883   -0.575    0.566
##     Weight           -0.203    1.588   -0.128    0.898
##     Classics         -8.563    1.948   -4.396    0.000
##     French           -8.447    1.934   -4.368    0.000
##     English          -7.086    1.905   -3.720    0.000
##     Mathematics      -7.765    1.924   -4.035    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Age               0.664    0.200    3.320    0.001
##    .Pitch           272.895   80.592    3.386    0.001
##    .Light            10.333    3.048    3.390    0.001
##    .Weight           33.915   10.001    3.391    0.001
##    .Classics          2.065    1.414    1.460    0.144
##    .French            2.752    1.492    1.844    0.065
##    .English          15.408    4.762    3.236    0.001
##    .Mathematics       9.699    3.141    3.088    0.002
##     g                 0.599    0.319    1.875    0.061

fitMeasures(fit1, fit.measures = c('gfi','fmin', 'chisq', 'df', 'pvalue'))

##    gfi   fmin  chisq     df pvalue 
##  0.799  0.598 27.521 20.000  0.121

summary(fit1, standardized = TRUE, fit.measures = TRUE); gc()

## lavaan 0.6.17 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        16
## 
##   Number of observations                            23
## 
## Model Test User Model:
##                                                       
##   Test statistic                                27.521
##   Degrees of freedom                                20
##   P-value (Chi-square)                           0.121
## 
## Model Test Baseline Model:
## 
##   Test statistic                               153.306
##   Degrees of freedom                                28
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.940
##   Tucker-Lewis Index (TLI)                       0.916
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -509.459
##   Loglikelihood unrestricted model (H1)       -495.698
##                                                       
##   Akaike (AIC)                                1050.917
##   Bayesian (BIC)                              1069.085
##   Sample-size adjusted Bayesian (SABIC)       1019.570
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.128
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.235
##   P-value H_0: RMSEA <= 0.050                    0.165
##   P-value H_0: RMSEA >= 0.080                    0.758
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.096
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     Age               1.000                               0.774    0.688
##     Pitch            -5.429    4.662   -1.165    0.244   -4.201   -0.246
##     Light            -0.508    0.883   -0.575    0.566   -0.393   -0.121
##     Weight           -0.203    1.588   -0.128    0.898   -0.157   -0.027
##     Classics         -8.563    1.948   -4.396    0.000   -6.626   -0.977
##     French           -8.447    1.934   -4.368    0.000   -6.536   -0.969
##     English          -7.086    1.905   -3.720    0.000   -5.483   -0.813
##     Mathematics      -7.765    1.924   -4.035    0.000   -6.009   -0.888
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Age               0.664    0.200    3.320    0.001    0.664    0.526
##    .Pitch           272.895   80.592    3.386    0.001  272.895    0.939
##    .Light            10.333    3.048    3.390    0.001   10.333    0.985
##    .Weight           33.915   10.001    3.391    0.001   33.915    0.999
##    .Classics          2.065    1.414    1.460    0.144    2.065    0.045
##    .French            2.752    1.492    1.844    0.065    2.752    0.061
##    .English          15.408    4.762    3.236    0.001   15.408    0.339
##    .Mathematics       9.699    3.141    3.088    0.002    9.699    0.212
##     g                 0.599    0.319    1.875    0.061    1.000    1.000

##            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells  9282161 495.8   17333572 925.8         NA 12225767 653.0
## Vcells 15571507 118.9   25842636 197.2     102400 21468847 163.8

semPaths(fit1, 'std', mar=c(2,1.4,2.7,1.4), layout = 'circle')

semPaths(fit1, 'std', mar=c(2,1.4,2.7,1.4))

(ability.FA <- factanal(factors = 1, covmat = ability.cov))

## 
## Call:
## factanal(factors = 1, covmat = ability.cov)
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.535   0.853   0.748   0.910   0.232   0.280 
## 
## Loadings:
##         Factor1
## general 0.682  
## picture 0.384  
## blocks  0.502  
## maze    0.300  
## reading 0.877  
## vocab   0.849  
## 
##                Factor1
## SS loadings      2.443
## Proportion Var   0.407
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 75.18 on 9 degrees of freedom.
## The p-value is 1.46e-12

update(ability.FA, factors = 2)

## 
## Call:
## factanal(factors = 2, covmat = ability.cov)
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.455   0.589   0.218   0.769   0.052   0.334 
## 
## Loadings:
##         Factor1 Factor2
## general 0.499   0.543  
## picture 0.156   0.622  
## blocks  0.206   0.860  
## maze    0.109   0.468  
## reading 0.956   0.182  
## vocab   0.785   0.225  
## 
##                Factor1 Factor2
## SS loadings      1.858   1.724
## Proportion Var   0.310   0.287
## Cumulative Var   0.310   0.597
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191

## The signs of factors and hence the signs of correlations are
## arbitrary with promax rotation.
update(ability.FA, factors = 2, rotation = "promax")

## 
## Call:
## factanal(factors = 2, covmat = ability.cov, rotation = "promax")
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.455   0.589   0.218   0.769   0.052   0.334 
## 
## Loadings:
##         Factor1 Factor2
## general  0.364   0.470 
## picture          0.671 
## blocks           0.932 
## maze             0.508 
## reading  1.023         
## vocab    0.811         
## 
##                Factor1 Factor2
## SS loadings      1.853   1.807
## Proportion Var   0.309   0.301
## Cumulative Var   0.309   0.610
## 
## Factor Correlations:
##         Factor1 Factor2
## Factor1   1.000   0.557
## Factor2   0.557   1.000
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191

9.6.3 Análise Fatorial Confirmatória (CFA)

Referências

Duda, Richard O, Peter E Hart, and David G Stork. 2001. Pattern Classification. John Wiley & Sons, Inc.