Applied Data Visualization with R and ggplot2

上QQ阅读APP看书，第一时间看更新

Activity: Creating a Histogram and Explaining its Features

Scenario

Histograms are useful when you want to find the peak and spread in a distribution. For example, suppose that a company wants to see what its client age distribution looks like. A two-dimensional distribution can show relationships; for example, one can create a scatter plot of the incomes and ages of credit card holders.

Aim

To create and analyze histograms for the given dataset.

Prerequisites

You should be able to use ggplot2 to create a histogram.

This is an empty code, wherein the libraries are already loaded. You will be writing your code here.

Steps for Completion

Use the template code and load the required datasets.
Create the histogram for two cities.
Analyze and compare two histograms to determine the point of difference.

Outcome

Two histograms should be created and compared. The complete code is as follows:

df_t <- read.csv("data/historical-hourly-weather-data/temperature.csv")
ggplot(df_t,aes(x=Vancouver))+geom_histogram()
ggplot(df_t,aes(x=Miami))+geom_histogram()

Refer to the complete code at https://goo.gl/tu7t4y.

Take a look at the following output histogram:

From the preceding plot, we can determine the following information:

Vancouver's maximum temperature is around 280.
It ranges between 260 and 300.
It's a right-skewed distribution.

Take a look at the following output histogram:

From the preceding plot, we can determine the following information:

Miami's maximum temperature is around 300
It ranges between 280 and 308
It's a left-skewed distribution

Differences

Miami's temperature plot is skewed to the right, while Vancouver's is to the left.
The maximum temperature is higher for Miami.