Hands-On Mathematics for Deep Learning
上QQ阅读APP看书,第一时间看更新

Classical probability

Let's suppose we have a random variable that maps the results of random experiments to the properties that interest us. The aforementioned random variable measures the likelihood (probability) of one or more sets of outcomes taking place. We call this the probability distribution. Consider probability distribution as the foundation of the concepts we will study in this chapter.

There are three ideas that are of great importance in probability theory—probability space, random variables, and probability distribution. Let's start by defining some of the more basic, yet important, concepts.

The sample space is the set of all the possible outcomes. We denote this with Ω. Suppose we have n likely outcomes—then, we have , where wi is a possible outcome. The subset of the sample space (Ω) is called an event.

Probability has a lot to do with sets, so let's go through some of the notation so that we can get a better grasp of the concepts and examples to come.

Suppose we have two events, A and B, ⊆ Ω. We have the following axioms:

  • The complement of A is AC, so .
  • If either A or B occurs, this is written as ∪ B (read as A union B).
  • If both A and B occur, this is written as ∩ B (read as A intersect B).
  • If A and B are mutually exclusive (or disjoint), then we write .
  • If the occurrence of A implies the occurrence of B, this is written as ⊆ B (so, ).

Say we have an event, ∈ Ω, and . In this case, the probability of A occurring is defined as follows:

This is the number of times A can occur divided by the total number of possible outcomes in the sample space.

Let's go through a simple example of flipping a coin. Here, the sample space consists of all the possible outcomes of flipping the coin. Say we are dealing with two coin tosses instead of one and h means heads and t means tails. So, the sample space is Ω = {hh, ht, th, tt}. 

All of the possible results of the experiment make up the event space, . On finishing the experiment, we observe whether the outcome, ω ∈ Ω, is in A.

Since, in each event, , we denote P(A) as the probability that the event will happen and we read P(A) as the probability of A occurring.

Continuing on from the previous axioms,  must satisfy the following:

  • for all cases of .
  • .
  • If the events A1, A2, … are disjoint and countably additive—that is,  for all cases of i, j—we then have .

The triple  terms are known as the probability space.

As a rule of thumb, when , then event A happens almost surely and when , then event A happens almost never.

Using the preceding axioms, we can derive the following:

So, .

Additionally, if we have two events, A and B, then we can deduce the following:

.

Continuing on from the preceding axioms,  must satisfy the following:

To find the probability of anything, we usually have to count things. Let's say we have a bucket filled with tennis balls and we pick a ball from the bucket r times; so, there are n1 possibilities for the first pick, n2 for the next pick, and so on. The total number of choices ends up being n1×n2×…×nr.