
Chapter 2. Exploratory Data Analysis
Exploratory data analysis is a very important topic in the field of data analysis. It is an approach of analyzing the data and summarizing the main characteristics of the dataset. The main objective of exploratory data analysis is to check various hypotheses in order to get a better understanding about the dataset.
Exploratory data analysis includes many statistical techniques and visual and nonvisual analysis. When your study has to be communicated with peers as well as with other audience with non-data science backgrounds, it is advisable to use a lot of visual techniques that help in better communications.
Some of the expectations out of exploratory data analysis are getting insights out of the data, extracting the important variables in the dataset (depending on the problem to be solved), identifying the outliers in the data, and getting results of various testing hypotheses. These results play a very important role in how to solve the business problems, and if it is a modeling problem, then deciding on which model to use and how to apply it to the dataset for enhanced accuracy.
In this chapter, you will learn how to perform exploratory data analysis starting with getting a generalized view on the data, analysis of one variable at a time, then bi-variable analysis, and finally, analyzing multiple variables to get a better understanding on interdependencies.
The topics that will be covered in this chapter are as follows:
- Titanic dataset
- Descriptive statistics
- Inferential statistics
- Univariate analysis
- Bivariate analysis
- Multivariate analysis (scatter plot with segments, heatmap, and tabulation)