Activity: Creating a Histogram and Explaining its Features
Scenario
Histograms are useful when you want to find the peak and spread in a distribution. For example, suppose that a company wants to see what its client age distribution looks like. A two-dimensional distribution can show relationships; for example, one can create a scatter plot of the incomes and ages of credit card holders.
Aim
To create and analyze histograms for the given dataset.
Prerequisites
You should be able to use ggplot2 to create a histogram.
This is an empty code, wherein the libraries are already loaded. You will be writing your code here.
Steps for Completion
- Use the template code and load the required datasets.
- Create the histogram for two cities.
- Analyze and compare two histograms to determine the point of difference.
Outcome
Two histograms should be created and compared. The complete code is as follows:
df_t <- read.csv("data/historical-hourly-weather-data/temperature.csv")
ggplot(df_t,aes(x=Vancouver))+geom_histogram()
ggplot(df_t,aes(x=Miami))+geom_histogram()
Refer to the complete code at https://goo.gl/tu7t4y.
Take a look at the following output histogram:
From the preceding plot, we can determine the following information:
- Vancouver's maximum temperature is around 280.
- It ranges between 260 and 300.
- It's a right-skewed distribution.
Take a look at the following output histogram:
From the preceding plot, we can determine the following information:
- Miami's maximum temperature is around 300
- It ranges between 280 and 308
- It's a left-skewed distribution
Differences
- Miami's temperature plot is skewed to the right, while Vancouver's is to the left.
- The maximum temperature is higher for Miami.