8.2 Análise Fatorial

The object of factor analysis is to find a lower-dimensional representation that accounts for the correlations among the features. (Duda, Hart, and Stork 2001, 580)

A análise fatorial é uma técnica matemática utilizada para se reduzir um sistema complexo de correlações a um número menor de dimensões. Consiste, literalmente, em decompor uma matriz em fatores, em geral uma matriz de coeficiente de correlação. (Gould, Coelho, and Rocha 1991, 259)

Duda, Hart, and Stork (2001) (580) sugerem considerar a análise fatorial como “uma simples modificação dos métodos hierárquicos” (Seção 10.2), substituindo “uma matriz \(n \times n\) de distâncias entre as amostras” por “uma matrix de correlação \(d \times d\)” conforme Eq. (4.24).

A análise fatorial pertence a uma classe de modelos que envolvem variáveis latentes, também chamadas variáveis não observadas ou, como o nome sugere, fatores. Este tipo de variável é utilizado quando não é possível observar diretamente o fenômeno em estudo. Exemplos típicos são advindos das ciências do comportamento, quando deseja-se medir inteligência geral, resiliência ou extroversão.

8.2.1 EFA x CFA

A Análise Fatorial costuma ser tratada pelas abordagens exploratória (EFA na sigla em inglês) e confirmatória (CFA), resumidas na tabela a seguir9.

Critério EFA CFA
Objetivo Explorar a estrutura dos dados sem hipóteses prévias Testar uma estrutura fatorial teórica pré-definida
Suposições Não requer um modelo prévio Exige um modelo teórico especificado
Flexibilidade Aberta, descobre fatores com base nos dados Rígida, valida relações pré-estabelecidas
Método principal PCA, Máxima Verossimilhança, Eixo Principal Modelagem de Equações Estruturais (SEM)
Aplicação típica Pesquisa exploratória, redução de dimensionalidade Validação de escalas, confirmação de teorias
Critério de ajuste Autovalores (>1), Scree plot, % de variância explicada CFI, RMSEA, TLI, \(\chi^2\)
Variáveis x Fatores Define relações com base nos dados Testa relações predeterminadas

Exemplo 8.3 Exemplo de Análise Fatorial Exploratória, EFA. Retirado da documentação de stats::factanal().

# A little demonstration, v2 is just v1 with noise,
# and same for v4 vs. v3 and v6 vs. v5
# Last four cases are there to add noise
# and introduce a positive manifold (g factor)
v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
cor(m1)
##           v1        v2        v3        v4        v5        v6
## v1 1.0000000 0.9393083 0.5128866 0.4320310 0.4664948 0.4086076
## v2 0.9393083 1.0000000 0.4124441 0.4084281 0.4363925 0.4326113
## v3 0.5128866 0.4124441 1.0000000 0.8770750 0.5128866 0.4320310
## v4 0.4320310 0.4084281 0.8770750 1.0000000 0.4320310 0.4323259
## v5 0.4664948 0.4363925 0.5128866 0.4320310 1.0000000 0.9473451
## v6 0.4086076 0.4326113 0.4320310 0.4323259 0.9473451 1.0000000
factanal(m1, factors = 3) # varimax (ortogonal: uncorrelated factors, default)
## 
## Call:
## factanal(x = m1, factors = 3)
## 
## Uniquenesses:
##    v1    v2    v3    v4    v5    v6 
## 0.005 0.101 0.005 0.224 0.084 0.005 
## 
## Loadings:
##    Factor1 Factor2 Factor3
## v1 0.944   0.182   0.267  
## v2 0.905   0.235   0.159  
## v3 0.236   0.210   0.946  
## v4 0.180   0.242   0.828  
## v5 0.242   0.881   0.286  
## v6 0.193   0.959   0.196  
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.893   1.886   1.797
## Proportion Var   0.316   0.314   0.300
## Cumulative Var   0.316   0.630   0.929
## 
## The degrees of freedom for the model is 0 and the fit was 0.4755
factanal(m1, factors = 3, rotation = 'promax') # promax (allows correlation between factors)
## 
## Call:
## factanal(x = m1, factors = 3, rotation = "promax")
## 
## Uniquenesses:
##    v1    v2    v3    v4    v5    v6 
## 0.005 0.101 0.005 0.224 0.084 0.005 
## 
## Loadings:
##    Factor1 Factor2 Factor3
## v1          0.985         
## v2          0.951         
## v3                  1.003 
## v4                  0.867 
## v5  0.910                 
## v6  1.033                 
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.903   1.876   1.772
## Proportion Var   0.317   0.313   0.295
## Cumulative Var   0.317   0.630   0.925
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000   0.462   0.460
## Factor2   0.462   1.000   0.501
## Factor3   0.460   0.501   1.000
## 
## The degrees of freedom for the model is 0 and the fit was 0.4755
# The following shows the g factor as PC1
prcomp(m1) # signs may depend on platform
## Standard deviations (1, .., p=6):
## [1] 3.0368683 1.6313757 1.5818857 0.6344131 0.3190765 0.2649086
## 
## Rotation (n x k) = (6 x 6):
##          PC1         PC2        PC3        PC4        PC5         PC6
## v1 0.4168038 -0.52292304  0.2354298 -0.2686501 -0.5157193  0.39907358
## v2 0.3885610 -0.50887673  0.2985906  0.3060519  0.5061522 -0.38865228
## v3 0.4182779  0.01521834 -0.5555132 -0.5686880  0.4308467  0.08474731
## v4 0.3943646  0.02184360 -0.5986150  0.5922259 -0.3558110 -0.09124977
## v5 0.4254013  0.47017231  0.2923345 -0.2789775 -0.3060409 -0.58397162
## v6 0.4047824  0.49580764  0.3209708  0.2866938  0.2682391  0.57719858
## formula interface
factanal(~v1+v2+v3+v4+v5+v6, factors = 3,
         scores = 'Bartlett')$scores
##       Factor1    Factor2    Factor3
## 1  -0.9039949 -0.9308984  0.9475392
## 2  -0.8685952 -0.9328721  0.9352330
## 3  -0.9082818 -0.9320093  0.9616422
## 4  -1.0021975 -0.2529689  0.8178552
## 5  -0.9039949 -0.9308984  0.9475392
## 6  -0.7452711  0.7273960 -0.7884733
## 7  -0.7098714  0.7254223 -0.8007795
## 8  -0.7495580  0.7262851 -0.7743704
## 9  -0.8080740  1.4033517 -0.9304636
## 10 -0.7452711  0.7273960 -0.7884733
## 11  0.9272282 -0.9307506 -0.8371538
## 12  0.9626279 -0.9327243 -0.8494600
## 13  0.9229413 -0.9318615 -0.8230509
## 14  0.8290256 -0.2528211 -0.9668378
## 15  0.9272282 -0.9307506 -0.8371538
## 16  0.4224366  2.0453079  1.2864761
## 17  1.4713902  1.2947716  0.5451562
## 18  1.8822320  0.3086244  1.9547752

Exemplo 8.4 Exemplo de Análise Fatorial Confirmatória, CFA.

library(lavaan)  # cfa
library(semPlot) # feramentas graficas, semPath
x <- read.csv('https://filipezabala.com/data/gfactor.csv')
head(x)
##   Years Months     Age Pitch Light Weight Classics French English Mathematics Residuals.Pitch Residuals.Light Residuals.Weight
## 1    10      9 10.7500    50    10      4       16     19      10           7        35.42250        -3.33624         -7.80519
## 2    12      4 12.3333     3    10      6        5      6       6           5       -14.68840        -3.38897         -6.09824
## 3    11      1 11.0833    10    10      6        5      6       6           5        -5.23239        -3.34734         -5.86688
## 4    10     11 10.9167    60    10      9       22     23      22          22        45.09510        -3.34179         -2.83604
## 5    13      7 13.5833     4    12      5        1      1       1           2       -16.14450        -1.43059         -7.32959
## 6    12      6 12.5000     2    10     10        4      2       2           1       -16.01590        -3.39452         -2.12908
##   Residuals.Classics Residuals.French Residuals.English Residuals.Mathematics
## 1           2.956350         5.770520          -2.99002              -5.81043
## 2          -1.709530        -0.643188          -1.49377              -2.34005
## 3          -6.710150        -5.842900          -5.83292              -6.65877
## 4           9.623100        10.463800           9.58853               9.76540
## 5          -0.708905        -0.443481          -2.15463              -1.02133
## 6          -2.042780        -3.949890          -4.91522              -5.76422
x2 <- x[,3:10]
S <- cor(x2)
factanal(x2,2)
## 
## Call:
## factanal(x = x2, factors = 2)
## 
## Uniquenesses:
##         Age       Pitch       Light      Weight    Classics      French     English Mathematics 
##       0.517       0.933       0.802       0.549       0.062       0.005       0.279       0.175 
## 
## Loadings:
##             Factor1 Factor2
## Age         -0.693         
## Pitch        0.259         
## Light                0.435 
## Weight               0.671 
## Classics     0.961   0.123 
## French       0.995         
## English      0.784   0.327 
## Mathematics  0.882   0.216 
## 
##                Factor1 Factor2
## SS loadings      3.864   0.814
## Proportion Var   0.483   0.102
## Cumulative Var   0.483   0.585
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 11.12 on 13 degrees of freedom.
## The p-value is 0.6
modelo1 <- 'g =~  Age + Pitch + Light + Weight + Classics + French + English + Mathematics'
fit1 <- cfa(modelo1, data = x2)
summary(fit1)
## lavaan 0.6-19 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        16
## 
##   Number of observations                            23
## 
## Model Test User Model:
##                                                       
##   Test statistic                                27.521
##   Degrees of freedom                                20
##   P-value (Chi-square)                           0.121
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   g =~                                                
##     Age               1.000                           
##     Pitch            -5.429    4.662   -1.165    0.244
##     Light            -0.508    0.883   -0.575    0.566
##     Weight           -0.203    1.588   -0.128    0.898
##     Classics         -8.563    1.948   -4.396    0.000
##     French           -8.447    1.934   -4.368    0.000
##     English          -7.086    1.905   -3.720    0.000
##     Mathematics      -7.765    1.924   -4.035    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Age               0.664    0.200    3.320    0.001
##    .Pitch           272.895   80.592    3.386    0.001
##    .Light            10.333    3.048    3.390    0.001
##    .Weight           33.915   10.001    3.391    0.001
##    .Classics          2.065    1.414    1.460    0.144
##    .French            2.752    1.492    1.844    0.065
##    .English          15.408    4.762    3.236    0.001
##    .Mathematics       9.699    3.141    3.088    0.002
##     g                 0.599    0.319    1.875    0.061
fitMeasures(fit1, fit.measures = c('gfi','fmin', 'chisq', 'df', 'pvalue'))
##    gfi   fmin  chisq     df pvalue 
##  0.799  0.598 27.521 20.000  0.121
summary(fit1, standardized = TRUE, fit.measures = TRUE); gc()
## lavaan 0.6-19 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        16
## 
##   Number of observations                            23
## 
## Model Test User Model:
##                                                       
##   Test statistic                                27.521
##   Degrees of freedom                                20
##   P-value (Chi-square)                           0.121
## 
## Model Test Baseline Model:
## 
##   Test statistic                               153.306
##   Degrees of freedom                                28
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.940
##   Tucker-Lewis Index (TLI)                       0.916
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -509.459
##   Loglikelihood unrestricted model (H1)       -495.698
##                                                       
##   Akaike (AIC)                                1050.917
##   Bayesian (BIC)                              1069.085
##   Sample-size adjusted Bayesian (SABIC)       1019.570
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.128
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.235
##   P-value H_0: RMSEA <= 0.050                    0.165
##   P-value H_0: RMSEA >= 0.080                    0.758
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.096
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     Age               1.000                               0.774    0.688
##     Pitch            -5.429    4.662   -1.165    0.244   -4.201   -0.246
##     Light            -0.508    0.883   -0.575    0.566   -0.393   -0.121
##     Weight           -0.203    1.588   -0.128    0.898   -0.157   -0.027
##     Classics         -8.563    1.948   -4.396    0.000   -6.626   -0.977
##     French           -8.447    1.934   -4.368    0.000   -6.536   -0.969
##     English          -7.086    1.905   -3.720    0.000   -5.483   -0.813
##     Mathematics      -7.765    1.924   -4.035    0.000   -6.009   -0.888
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Age               0.664    0.200    3.320    0.001    0.664    0.526
##    .Pitch           272.895   80.592    3.386    0.001  272.895    0.939
##    .Light            10.333    3.048    3.390    0.001   10.333    0.985
##    .Weight           33.915   10.001    3.391    0.001   33.915    0.999
##    .Classics          2.065    1.414    1.460    0.144    2.065    0.045
##    .French            2.752    1.492    1.844    0.065    2.752    0.061
##    .English          15.408    4.762    3.236    0.001   15.408    0.339
##    .Mathematics       9.699    3.141    3.088    0.002    9.699    0.212
##     g                 0.599    0.319    1.875    0.061    1.000    1.000
##            used  (Mb) gc trigger   (Mb) limit (Mb) max used   (Mb)
## Ncells 14294294 763.4   23992167 1281.4         NA 23992167 1281.4
## Vcells 36879060 281.4   84353464  643.6     102400 84353464  643.6
semPaths(fit1, 'std', mar=c(2,1.4,2.7,1.4), layout = 'circle')

semPaths(fit1, 'std', mar=c(2,1.4,2.7,1.4))

(ability.FA <- factanal(factors = 1, covmat = ability.cov))
## 
## Call:
## factanal(factors = 1, covmat = ability.cov)
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.535   0.853   0.748   0.910   0.232   0.280 
## 
## Loadings:
##         Factor1
## general 0.682  
## picture 0.384  
## blocks  0.502  
## maze    0.300  
## reading 0.877  
## vocab   0.849  
## 
##                Factor1
## SS loadings      2.443
## Proportion Var   0.407
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 75.18 on 9 degrees of freedom.
## The p-value is 1.46e-12
update(ability.FA, factors = 2)
## 
## Call:
## factanal(factors = 2, covmat = ability.cov)
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.455   0.589   0.218   0.769   0.052   0.334 
## 
## Loadings:
##         Factor1 Factor2
## general 0.499   0.543  
## picture 0.156   0.622  
## blocks  0.206   0.860  
## maze    0.109   0.468  
## reading 0.956   0.182  
## vocab   0.785   0.225  
## 
##                Factor1 Factor2
## SS loadings      1.858   1.724
## Proportion Var   0.310   0.287
## Cumulative Var   0.310   0.597
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191
update(ability.FA, factors = 2, rotation = "promax")
## 
## Call:
## factanal(factors = 2, covmat = ability.cov, rotation = "promax")
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.455   0.589   0.218   0.769   0.052   0.334 
## 
## Loadings:
##         Factor1 Factor2
## general  0.364   0.470 
## picture          0.671 
## blocks           0.932 
## maze             0.508 
## reading  1.023         
## vocab    0.811         
## 
##                Factor1 Factor2
## SS loadings      1.853   1.807
## Proportion Var   0.309   0.301
## Cumulative Var   0.309   0.610
## 
## Factor Correlations:
##         Factor1 Factor2
## Factor1   1.000   0.557
## Factor2   0.557   1.000
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191

References

Duda, Richard O, Peter E Hart, and David G Stork. 2001. Pattern Classification. John Wiley & Sons, Inc.
Gould, Stephen Jay, Ana Luı́sa de Hurbano Seco Coelho, and Jorge Rocha. 1991. A Falsa Medida Do Homem. Livraria Martins Fontes Editora LTDA.

  1. A tabela foi adapatada de uma sugestão de estrutura apresentada por DEEPSEEK CHAT. Tabela comparativa: Análise Fatorial Exploratória (AFE) vs. Confirmatória (AFC) [resposta em um chat de IA]. 2 jun. 2025. Disponível em: https://chat.deepseek.com.↩︎