Chapter 2 Descriptive Statistics

The exploratory data analysis, or descriptive statistics, is directly connected to the organization and description of the data. It brings together a reasonable amount of tools that can help in understanding observed values. It is used, for example, to assess how observations are distributed, where they are positioned and how they present themselves in terms of distribution and association.

In this chapter, concepts and methods of data exploration will be presented, a fundamental step for more advanced statistical analysis. For further discussion we recommend (Tukey 1977), a milestone in exploratory data analysis.

After reading this chapter, the reader should be able to interpret the following example, adapted from (Waring et al. 2022) at the suggestion of João Brito. More details at this link (in Portuguese) from Wiki R.

# Load packages
library(skimr)
library(tidyverse)

# Load data
data(starwars)

# An alternative to summary()
skimr::skim(starwars) # HTML and docx
Table 2.1: Data summary
Name starwars
Number of rows 87
Number of columns 14
_______________________
Column type frequency:
character 8
list 3
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name 0 1.00 3 21 0 87 0
hair_color 5 0.94 4 13 0 12 0
skin_color 0 1.00 3 19 0 31 0
eye_color 0 1.00 3 13 0 15 0
sex 4 0.95 4 14 0 4 0
gender 4 0.95 8 9 0 2 0
homeworld 10 0.89 4 14 0 48 0
species 4 0.95 3 14 0 37 0

Variable type: list

skim_variable n_missing complete_rate n_unique min_length max_length
films 0 1 24 1 7
vehicles 0 1 11 0 2
starships 0 1 17 0 5

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
height 6 0.93 174.36 34.77 66 167.0 180 191.0 264 ▁▁▇▅▁
mass 28 0.68 97.31 169.46 15 55.6 79 84.5 1358 ▇▁▁▁▁
birth_year 44 0.49 87.57 154.69 8 35.0 52 72.0 896 ▇▁▁▁▁
# skimr::skim_without_charts(starwars) # PDF

References

Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley Publishing Company.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino de la Rubia, Hao Zhu, and Shannon Ellis. 2022. Skimr: Compact and Flexible Summaries of Data. https://CRAN.R-project.org/package=skimr.