8.2 Análise Fatorial
The object of factor analysis is to find a lower-dimensional representation that accounts for the correlations among the features. (Duda, Hart, and Stork 2001, 580)
A análise fatorial é uma técnica matemática utilizada para se reduzir um sistema complexo de correlações a um número menor de dimensões. Consiste, literalmente, em decompor uma matriz em fatores, em geral uma matriz de coeficiente de correlação. (Gould, Coelho, and Rocha 1991, 259)
Duda, Hart, and Stork (2001) (580) sugerem considerar a análise fatorial como “uma simples modificação dos métodos hierárquicos” (Seção 10.2), substituindo “uma matriz \(n \times n\) de distâncias entre as amostras” por “uma matrix de correlação \(d \times d\)” conforme Eq. (4.24).
A análise fatorial pertence a uma classe de modelos que envolvem variáveis latentes, também chamadas variáveis não observadas ou, como o nome sugere, fatores. Este tipo de variável é utilizado quando não é possível observar diretamente o fenômeno em estudo. Exemplos típicos são advindos das ciências do comportamento, quando deseja-se medir inteligência geral, resiliência ou extroversão.
8.2.1 EFA x CFA
A Análise Fatorial costuma ser tratada pelas abordagens exploratória (EFA na sigla em inglês) e confirmatória (CFA), resumidas na tabela a seguir9.
Critério | EFA | CFA |
---|---|---|
Objetivo | Explorar a estrutura dos dados sem hipóteses prévias | Testar uma estrutura fatorial teórica pré-definida |
Suposições | Não requer um modelo prévio | Exige um modelo teórico especificado |
Flexibilidade | Aberta, descobre fatores com base nos dados | Rígida, valida relações pré-estabelecidas |
Método principal | PCA, Máxima Verossimilhança, Eixo Principal | Modelagem de Equações Estruturais (SEM) |
Aplicação típica | Pesquisa exploratória, redução de dimensionalidade | Validação de escalas, confirmação de teorias |
Critério de ajuste | Autovalores (>1), Scree plot, % de variância explicada | CFI, RMSEA, TLI, \(\chi^2\) |
Variáveis x Fatores | Define relações com base nos dados | Testa relações predeterminadas |
Exemplo 8.3 Exemplo de Análise Fatorial Exploratória, EFA. Retirado da documentação de stats::factanal()
.
# A little demonstration, v2 is just v1 with noise,
# and same for v4 vs. v3 and v6 vs. v5
# Last four cases are there to add noise
# and introduce a positive manifold (g factor)
v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
cor(m1)
## v1 v2 v3 v4 v5 v6
## v1 1.0000000 0.9393083 0.5128866 0.4320310 0.4664948 0.4086076
## v2 0.9393083 1.0000000 0.4124441 0.4084281 0.4363925 0.4326113
## v3 0.5128866 0.4124441 1.0000000 0.8770750 0.5128866 0.4320310
## v4 0.4320310 0.4084281 0.8770750 1.0000000 0.4320310 0.4323259
## v5 0.4664948 0.4363925 0.5128866 0.4320310 1.0000000 0.9473451
## v6 0.4086076 0.4326113 0.4320310 0.4323259 0.9473451 1.0000000
##
## Call:
## factanal(x = m1, factors = 3)
##
## Uniquenesses:
## v1 v2 v3 v4 v5 v6
## 0.005 0.101 0.005 0.224 0.084 0.005
##
## Loadings:
## Factor1 Factor2 Factor3
## v1 0.944 0.182 0.267
## v2 0.905 0.235 0.159
## v3 0.236 0.210 0.946
## v4 0.180 0.242 0.828
## v5 0.242 0.881 0.286
## v6 0.193 0.959 0.196
##
## Factor1 Factor2 Factor3
## SS loadings 1.893 1.886 1.797
## Proportion Var 0.316 0.314 0.300
## Cumulative Var 0.316 0.630 0.929
##
## The degrees of freedom for the model is 0 and the fit was 0.4755
##
## Call:
## factanal(x = m1, factors = 3, rotation = "promax")
##
## Uniquenesses:
## v1 v2 v3 v4 v5 v6
## 0.005 0.101 0.005 0.224 0.084 0.005
##
## Loadings:
## Factor1 Factor2 Factor3
## v1 0.985
## v2 0.951
## v3 1.003
## v4 0.867
## v5 0.910
## v6 1.033
##
## Factor1 Factor2 Factor3
## SS loadings 1.903 1.876 1.772
## Proportion Var 0.317 0.313 0.295
## Cumulative Var 0.317 0.630 0.925
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.000 0.462 0.460
## Factor2 0.462 1.000 0.501
## Factor3 0.460 0.501 1.000
##
## The degrees of freedom for the model is 0 and the fit was 0.4755
## Standard deviations (1, .., p=6):
## [1] 3.0368683 1.6313757 1.5818857 0.6344131 0.3190765 0.2649086
##
## Rotation (n x k) = (6 x 6):
## PC1 PC2 PC3 PC4 PC5 PC6
## v1 0.4168038 -0.52292304 0.2354298 -0.2686501 -0.5157193 0.39907358
## v2 0.3885610 -0.50887673 0.2985906 0.3060519 0.5061522 -0.38865228
## v3 0.4182779 0.01521834 -0.5555132 -0.5686880 0.4308467 0.08474731
## v4 0.3943646 0.02184360 -0.5986150 0.5922259 -0.3558110 -0.09124977
## v5 0.4254013 0.47017231 0.2923345 -0.2789775 -0.3060409 -0.58397162
## v6 0.4047824 0.49580764 0.3209708 0.2866938 0.2682391 0.57719858
## Factor1 Factor2 Factor3
## 1 -0.9039949 -0.9308984 0.9475392
## 2 -0.8685952 -0.9328721 0.9352330
## 3 -0.9082818 -0.9320093 0.9616422
## 4 -1.0021975 -0.2529689 0.8178552
## 5 -0.9039949 -0.9308984 0.9475392
## 6 -0.7452711 0.7273960 -0.7884733
## 7 -0.7098714 0.7254223 -0.8007795
## 8 -0.7495580 0.7262851 -0.7743704
## 9 -0.8080740 1.4033517 -0.9304636
## 10 -0.7452711 0.7273960 -0.7884733
## 11 0.9272282 -0.9307506 -0.8371538
## 12 0.9626279 -0.9327243 -0.8494600
## 13 0.9229413 -0.9318615 -0.8230509
## 14 0.8290256 -0.2528211 -0.9668378
## 15 0.9272282 -0.9307506 -0.8371538
## 16 0.4224366 2.0453079 1.2864761
## 17 1.4713902 1.2947716 0.5451562
## 18 1.8822320 0.3086244 1.9547752
Exemplo 8.4 Exemplo de Análise Fatorial Confirmatória, CFA.
library(lavaan) # cfa
library(semPlot) # feramentas graficas, semPath
x <- read.csv('https://filipezabala.com/data/gfactor.csv')
head(x)
## Years Months Age Pitch Light Weight Classics French English Mathematics Residuals.Pitch Residuals.Light Residuals.Weight
## 1 10 9 10.7500 50 10 4 16 19 10 7 35.42250 -3.33624 -7.80519
## 2 12 4 12.3333 3 10 6 5 6 6 5 -14.68840 -3.38897 -6.09824
## 3 11 1 11.0833 10 10 6 5 6 6 5 -5.23239 -3.34734 -5.86688
## 4 10 11 10.9167 60 10 9 22 23 22 22 45.09510 -3.34179 -2.83604
## 5 13 7 13.5833 4 12 5 1 1 1 2 -16.14450 -1.43059 -7.32959
## 6 12 6 12.5000 2 10 10 4 2 2 1 -16.01590 -3.39452 -2.12908
## Residuals.Classics Residuals.French Residuals.English Residuals.Mathematics
## 1 2.956350 5.770520 -2.99002 -5.81043
## 2 -1.709530 -0.643188 -1.49377 -2.34005
## 3 -6.710150 -5.842900 -5.83292 -6.65877
## 4 9.623100 10.463800 9.58853 9.76540
## 5 -0.708905 -0.443481 -2.15463 -1.02133
## 6 -2.042780 -3.949890 -4.91522 -5.76422
##
## Call:
## factanal(x = x2, factors = 2)
##
## Uniquenesses:
## Age Pitch Light Weight Classics French English Mathematics
## 0.517 0.933 0.802 0.549 0.062 0.005 0.279 0.175
##
## Loadings:
## Factor1 Factor2
## Age -0.693
## Pitch 0.259
## Light 0.435
## Weight 0.671
## Classics 0.961 0.123
## French 0.995
## English 0.784 0.327
## Mathematics 0.882 0.216
##
## Factor1 Factor2
## SS loadings 3.864 0.814
## Proportion Var 0.483 0.102
## Cumulative Var 0.483 0.585
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 11.12 on 13 degrees of freedom.
## The p-value is 0.6
modelo1 <- 'g =~ Age + Pitch + Light + Weight + Classics + French + English + Mathematics'
fit1 <- cfa(modelo1, data = x2)
summary(fit1)
## lavaan 0.6-19 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 16
##
## Number of observations 23
##
## Model Test User Model:
##
## Test statistic 27.521
## Degrees of freedom 20
## P-value (Chi-square) 0.121
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## g =~
## Age 1.000
## Pitch -5.429 4.662 -1.165 0.244
## Light -0.508 0.883 -0.575 0.566
## Weight -0.203 1.588 -0.128 0.898
## Classics -8.563 1.948 -4.396 0.000
## French -8.447 1.934 -4.368 0.000
## English -7.086 1.905 -3.720 0.000
## Mathematics -7.765 1.924 -4.035 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Age 0.664 0.200 3.320 0.001
## .Pitch 272.895 80.592 3.386 0.001
## .Light 10.333 3.048 3.390 0.001
## .Weight 33.915 10.001 3.391 0.001
## .Classics 2.065 1.414 1.460 0.144
## .French 2.752 1.492 1.844 0.065
## .English 15.408 4.762 3.236 0.001
## .Mathematics 9.699 3.141 3.088 0.002
## g 0.599 0.319 1.875 0.061
## gfi fmin chisq df pvalue
## 0.799 0.598 27.521 20.000 0.121
## lavaan 0.6-19 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 16
##
## Number of observations 23
##
## Model Test User Model:
##
## Test statistic 27.521
## Degrees of freedom 20
## P-value (Chi-square) 0.121
##
## Model Test Baseline Model:
##
## Test statistic 153.306
## Degrees of freedom 28
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.940
## Tucker-Lewis Index (TLI) 0.916
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -509.459
## Loglikelihood unrestricted model (H1) -495.698
##
## Akaike (AIC) 1050.917
## Bayesian (BIC) 1069.085
## Sample-size adjusted Bayesian (SABIC) 1019.570
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.128
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.235
## P-value H_0: RMSEA <= 0.050 0.165
## P-value H_0: RMSEA >= 0.080 0.758
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.096
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## g =~
## Age 1.000 0.774 0.688
## Pitch -5.429 4.662 -1.165 0.244 -4.201 -0.246
## Light -0.508 0.883 -0.575 0.566 -0.393 -0.121
## Weight -0.203 1.588 -0.128 0.898 -0.157 -0.027
## Classics -8.563 1.948 -4.396 0.000 -6.626 -0.977
## French -8.447 1.934 -4.368 0.000 -6.536 -0.969
## English -7.086 1.905 -3.720 0.000 -5.483 -0.813
## Mathematics -7.765 1.924 -4.035 0.000 -6.009 -0.888
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Age 0.664 0.200 3.320 0.001 0.664 0.526
## .Pitch 272.895 80.592 3.386 0.001 272.895 0.939
## .Light 10.333 3.048 3.390 0.001 10.333 0.985
## .Weight 33.915 10.001 3.391 0.001 33.915 0.999
## .Classics 2.065 1.414 1.460 0.144 2.065 0.045
## .French 2.752 1.492 1.844 0.065 2.752 0.061
## .English 15.408 4.762 3.236 0.001 15.408 0.339
## .Mathematics 9.699 3.141 3.088 0.002 9.699 0.212
## g 0.599 0.319 1.875 0.061 1.000 1.000
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 14294294 763.4 23992167 1281.4 NA 23992167 1281.4
## Vcells 36879060 281.4 84353464 643.6 102400 84353464 643.6
##
## Call:
## factanal(factors = 1, covmat = ability.cov)
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.535 0.853 0.748 0.910 0.232 0.280
##
## Loadings:
## Factor1
## general 0.682
## picture 0.384
## blocks 0.502
## maze 0.300
## reading 0.877
## vocab 0.849
##
## Factor1
## SS loadings 2.443
## Proportion Var 0.407
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 75.18 on 9 degrees of freedom.
## The p-value is 1.46e-12
##
## Call:
## factanal(factors = 2, covmat = ability.cov)
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.455 0.589 0.218 0.769 0.052 0.334
##
## Loadings:
## Factor1 Factor2
## general 0.499 0.543
## picture 0.156 0.622
## blocks 0.206 0.860
## maze 0.109 0.468
## reading 0.956 0.182
## vocab 0.785 0.225
##
## Factor1 Factor2
## SS loadings 1.858 1.724
## Proportion Var 0.310 0.287
## Cumulative Var 0.310 0.597
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191
##
## Call:
## factanal(factors = 2, covmat = ability.cov, rotation = "promax")
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.455 0.589 0.218 0.769 0.052 0.334
##
## Loadings:
## Factor1 Factor2
## general 0.364 0.470
## picture 0.671
## blocks 0.932
## maze 0.508
## reading 1.023
## vocab 0.811
##
## Factor1 Factor2
## SS loadings 1.853 1.807
## Proportion Var 0.309 0.301
## Cumulative Var 0.309 0.610
##
## Factor Correlations:
## Factor1 Factor2
## Factor1 1.000 0.557
## Factor2 0.557 1.000
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191
References
A tabela foi adapatada de uma sugestão de estrutura apresentada por DEEPSEEK CHAT. Tabela comparativa: Análise Fatorial Exploratória (AFE) vs. Confirmatória (AFC) [resposta em um chat de IA]. 2 jun. 2025. Disponível em: https://chat.deepseek.com.↩︎