1.3 Data
Statistics starts with data. (Breiman 2001, 199)
1.3.1 Kaggle
Kaggle is an online community of data analysts. It allows users to find and publish databases, collaboratively explore and build models, and enter competitions to solve data challenges.
1.3.2 UCI Machine Learning Repository
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. There are currently more than 600 datasets available, and all datasets can be viewed through a searchable interface by referencing the material used in accordance with the citation policy.
1.3.3 Rdatasets
Rdatasets is a collection of 2337 datasets that were originally distributed along with the R statistical software environment and some of its add-on packages.
1.3.4 Base dos Dados
Base dos Dados is “a non-profit and open source non-governmental organization that works to universalize access to quality data”. Remember that to use BigQuery you need to have a Google account and associate it according to this step by step.
1.3.5 Open Data and National Data Catalog
The Brazilian Open Data Portal and National Data Catalog is a tool that allows you to find data published by the federal government and local governments to conduct research, develop applications, and create new services.