1.3 Data

Statistics starts with data. (Breiman 2001, 199)

1.3.1 Kaggle

Kaggle is an online community of data analysts. It allows users to find and publish databases, collaboratively explore and build models, and enter competitions to solve data challenges.

Exercise 1.4 Download the RAVDESS database, preferably via the command line.

  1. Apply the functions from the voice package.
  2. Compare by actor, gender, intensity and emotion.

1.3.2 UCI Machine Learning Repository

The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. There are currently more than 600 datasets available, and all datasets can be viewed through a searchable interface by referencing the material used in accordance with the citation policy.

1.3.3 Rdatasets

Rdatasets is a collection of 2337 datasets that were originally distributed along with the R statistical software environment and some of its add-on packages.

1.3.4 Base dos Dados

Base dos Dados is “a non-profit and open source non-governmental organization that works to universalize access to quality data”. Remember that to use BigQuery you need to have a Google account and associate it according to this step by step.

1.3.5 Open Data and National Data Catalog

The Brazilian Open Data Portal and National Data Catalog is a tool that allows you to find data published by the federal government and local governments to conduct research, develop applications, and create new services.

References

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science 16 (3): 199–231. https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full.