Let \(A\), \(B\), \(A_1, A_2, \ldots, A_n\) be events.

**Union Bound:**\(\Pr[A \cup B] \leq \Pr[A] + \Pr[B]\).**Conditional Probability:**If \(\Pr[A] \neq 0\), the conditional probability of \(B\) given \(A\) is defined to be \[\Pr[B \; | \;A]=\frac{\Pr[B\cap A]}{\Pr[A]}.\]**Chain Rule:**\[\begin{aligned} & \Pr[A_1 \cap \cdots \cap A_n] = \\ & \qquad \Pr[A_1] \cdot \Pr[A_2 \; | \;A_1] \cdot \Pr[A_3 \; | \;A_1 \cap A_2] \cdots \Pr[A_n \; | \;A_1 \cap A_2 \cap \cdots \cap A_{n-1}]. \end{aligned}\]

**Expected Value of a Random Variable:**\[\mathbb E[\mathbf{X}] = \sum_{\ell \in \Omega} \Pr[\ell]\cdot \mathbf{X}(\ell) = \sum_{x \in \text{range}(\mathbf{X})} \Pr[\mathbf{X}=x]\cdot x.\]**Linearity of Expectation:**\[\mathbb E[c_1\mathbf{X} + c_2\mathbf{Y}] = c_1\mathbb E[\mathbf{X}] + c_2\mathbb E[\mathbf{Y}].\]**Markov’s Inequality:**If \(\mathbf{X}\) is a non-negative random variable and \(c > 0\), then \[\Pr[\mathbf{X} \geq c \cdot \mathbb E[\mathbf{X}]] \leq \frac{1}{c}.\]

I flip a coin which comes up heads with probability \(1/3\) and tails with probability \(2/3\). If it comes up heads, I roll a \(3\)-sided die and a \(4\)-sided die. If it comes up tails, I roll two \(3\)-sided dice. Define the following events: \[A = \text{coin comes up tails}, \quad B = \text{dice sum to 6}, \quad C = \text{dice come up equal}.\] Compute \[\Pr[B], \quad \Pr[C], \quad \Pr[B \cup C], \quad \Pr[B \; | \;A], \quad \Pr[A \; | \;B], \quad \Pr[C \; | \;A], \quad \Pr[A \; | \;C].\]

Here is the probability tree associated with the random experiment described in the problem.

(We are showing part of the tree due to space constraints.)

**\(\Pr[B]\):**The event consists of outcomes \((H,3,3), (H,2,4)\) and \((T,3,3).\) The first two outcomes have probability \(\frac{1}{3} \cdot \frac{1}{3} \cdot \frac{1}{4}\) and the last event takes place with probability \(\frac{2}{3} \cdot \frac{1}{3} \cdot \frac{1}{3}.\) In total, \(\Pr[B] = 7/54.\)**\(\Pr[C]\):**The event consists of outcomes \((H,1,1), (H,2,2), (H,3,3)\) and \((T,1,1)\), \((T,2,2)\), \((T,3,3)\). The first three events occur with probability \(\frac{1}{3} \cdot \frac{1}{3} \cdot \frac{1}{4}\) and the last three occur with probability \(\frac{2}{3} \cdot \frac{1}{3} \cdot \frac{1}{3}.\) In total, we can compute \(\Pr[C] = 11/36.\)**\(\Pr[B \cup C ]\):**The only outcome that is in \(B \setminus C\) is \(\{(H,2,4)\}\). Since this outcome occurs with probability \(\frac{1}{3} \cdot \frac{1}{3} \cdot \frac{1}{4}\), we can conclude that \(\Pr[B \cup C ] = \Pr[C \cup \{(H,2,4)\}] = \Pr[C] + \Pr[\{(H,2,4)\}] = \frac{1}{3}\).**\(\Pr[B \; | \;A]\) and \(\Pr[A \; | \;B]\):**The only outcome that belongs to \(A \cap B\) is \((T,3,3)\). Therefore, \(\Pr[A \cap B] = \frac{2}{3} \cdot \frac{1}{3} \cdot \frac{1}{3} = 2/27\). Using the definition of conditional probability and \(\Pr[A] = 2/3\), we have \(\Pr[B \; | \;A] = \Pr[A \cap B] / \Pr[A] = 1/9\). Similarly, \(\Pr[A \; | \;B] = \Pr[A \cap B] / \Pr[B] = 4/7\).**\(\Pr[C \; | \;A]\) and \(\Pr[A \; | \;C]\):**Note that \(A \cap C = \{(T,1,1), (T,2,2), (T,3,3)\}\). So \(\Pr[A \cap C] = \frac{6}{27} = \frac{2}{9}\). Using the definition of conditional probability, \(\Pr[A \; | \;C] = \Pr[A \cap C] / \Pr[C] = 8/11\) and \(\Pr[C \; | \;A ] = \Pr[A \cap C ] / \Pr[A] = 1/3\).

Méré is making blinis for his guests at the salon. Unfortunately, his not-so-trusty griddle is not feeling very well today and randomly burns blinis with probability \(\frac{1}{3}\). In addition, with Méré’s impressive blini making skills, with a functioning griddle he only burns a blini with probability \(\frac{1}{4}\).

You may assume that he does not get demoralized if he burns a blini (i.e., he does not burn other blinis with higher or lower probability as a result).

What is the probability that a blini is burnt?

Suppose Méré made three blinis. Given that at least one blini burns, what is the probability that they all burn?

Prove Bayes’ rule, that for any two events \(A, B\), we have \[\Pr[A \; | \;B] = \frac{ \Pr[ B \; | \;A ] \cdot \Pr[A]} {\Pr[B \; | \;A] \cdot \Pr[A] + \Pr[B \; | \;\overline{A}] \cdot \Pr[\overline{A}]}.\]

Pascal joins Méré at his salon with a new griddle. Pascal’s blini making skills are even more impressive and only burns a blini with probability \(\frac{1}{5}\), and makes blinis twice as fast! They make blinis on their individual griddles and collate the blinis in a single pile. Fermat comes along and grabs a burnt blini off the pile. What is the probability that Pascal burnt the blini?

Let \(G\) denote the event that the griddle doesn’t burn the blini. Let \(M\) denote the event that Méré doesn’t burn the blini. Then the problem statement tells us that \(\Pr[G] = 2/3\) and \(\Pr[M \; | \;G] = 3/4\). \[\begin{aligned} &~\Pr[\text{blini is burnt}]\\ =&~1 - \Pr[\text{blini is not burnt}]\\ =&~1 - \Pr[G \cap M]\\ =&~1 - \Pr[G] \cdot \Pr[M \; | \;G]\\ =&~1 - \frac{2}{3} \cdot \frac{3}{4}\\ =&~\frac{1}{2}. \end{aligned}\]

Implicit in the problem description is the following: Whether a blini is burnt by the griddle or not, is not effected by whether previous blinis made by the griddle are burnt.

There are \(2^3=8\) possible outcomes, each of which happen with probability \(\left(\frac{1}{2}\right)^3 = \frac{1}{8}\). Of those outcomes, \(7\) of them result in at least one blini burning, and of those, one of them results in all the blinis burning. Thus, the probability we are looking for is \[\frac{1\cdot\frac{1}{8}}{7\cdot\frac{1}{8}} = \frac{1}{7}.\]

This pretty much follows from two definitions. The law of total probability says that the denominator on the right equals \(\Pr[B]\). And the numerator is \(\Pr[B \cap A]\). So we want to show that \(\Pr[A \; | \;B] = \frac{ \Pr[ B \cap A ]}{\Pr[B]}\), which is again just by definition.

Let \(A\) be the event that Méré made the blini, and let \(B\) be the event that the blini burns. We want to compute \(Pr[\overline{A} \; | \;B]\). We have \[\Pr[A] = \frac{1}{3}, \quad \Pr[\overline{A}] = \frac{2}{3}, \quad \Pr[B \; | \;A] = \frac{1}{2}, \quad \Pr[B \; | \;\overline{A}] = \frac{1}{5}.\] By Bayes’ Theorem, \[Pr[\overline{A} \; | \;B] = \frac{\Pr[B \; | \;\overline{A}] \cdot \Pr[\overline{A}]}{\Pr[B \; | \;A] \cdot \Pr[A] + \Pr[B \; | \;\overline{A}] \cdot \Pr[\overline{A}]} = \frac{\frac{1}{5}\cdot\frac{2}{3}}{\frac{1}{2}\cdot\frac{1}{3} + \frac{1}{5}\cdot\frac{2}{3}} = \frac{4}{9}.\]

We have an \(m\times n\) grid of squares (\(m,n\geq2\)). Each square is randomly colored black or white with probability \(\frac{1}{2}\) each, independently for all squares. (An example outcome with \(m = 3, n = 5\) is shown below.)

Define a *pane* to be a \(2 \times 2\) subgrid. Let \(\mathbf{X}\) be the number of panes whose four squares all have the same color. (These panes may overlap. E.g., above \(\mathbf{X}=4\) because there are 3 all-white panes and 1 all-black pane.) Determine (with proof) a formula for \(\mathbb E[\mathbf{X}]\) in terms of \(m\) and \(n\).

We will write \(\mathbf{X}\) as a sum of indicator random variables and then apply linearity of expectation to compute \(\mathbb E[\mathbf{X}]\).

For all \(1 \leq i \leq m-1\), \(1 \leq j \leq n-1\), define the following indicator random variables. \[\mathbf{I}_{i,j} = \begin{cases} 1 & \text{if the $2\times2$ pane with the $(i,j)$-th square in the top left corner is monochromatic,}\\ 0 & \text{otherwise.} \end{cases}\] Then observe that \(\mathbf{X} = \sum_{i=1}^{m-1} \sum_{j=1}^{n-1} \mathbf{I}_{i,j}\). Therefore by linearity of expectation, \[\mathbb E[\mathbf{X}] = \mathbb E\left[\sum_{i=1}^{m-1} \sum_{j=1}^{n-1} \mathbf{I}_{i,j} \right] = \sum_{i=1}^{m-1}\sum_{j=1}^{n-1} \mathbb E[\mathbf{I}_{i,j}].\] The expectation of an indicator random variable is equal to the probability that the random variable is 1. Therefore, \[\begin{aligned} \mathbb E[\mathbf{I}_{i,j}] &= \Pr[\text{all four squares are black}] + \Pr[\text{all four squares are white}]\\ &= \left(\frac{1}{2}\right)^4 + \left(\frac{1}{2}\right)^4 \\ &= \frac{1}{8}.\end{aligned}\] We can plug this into our expression for \(\mathbb E[\mathbf{X}]\) above and conclude that \(\mathbb E[\mathbf{X}] = \frac{(m-1)(n-1)}{8}\).

Alice flips a fair coin \(n+1\) times and Bob flips a fair coin \(n\) times. What is the probability that Alice’s coin comes up heads strictly more than Bob’s does?

Let \(A\) and \(B\) be the number of heads Alice and Bob gets respectively. Let \(A'\) be the number of heads Alice gets on the first \(n\) flips, and let \(L\) be the event that Alice’s last flip lands heads. By the law of total probability, \[\begin{aligned} \Pr[A>B] & = \Pr[A'\geq B \; | \;L] \cdot \frac{1}{2} + \Pr[A'>B] \cdot \frac{1}{2}\\ & = \frac{1}{2}\left(\Pr[A'\geq B] + \Pr[A'>B]\right)\\ & = \frac{1}{2}\left(\Pr[A'\geq B] + \Pr[A'<B]\right) \tag{by symmetry}\\ & = \frac{1}{2}\cdot1 = \frac{1}{2}.\end{aligned}\]

Méré rolls a \(7\)-sided dice until the number \(7\) comes up. He claims that he is more likely to stop after an odd number of rolls than an even number of rolls, because “\(7\) is odd”. Is Méré’s dubious reasoning actually correct?

The probability of the game ending after exactly \(r\) rolls is \(\frac{1}{7} \cdot \left(\frac{6}{7}\right)^{r-1}\) (geometric random variable). Let \(A\) be the event that the game ends after an odd number of rolls. Then adding up the probabilities of all the outcomes in \(A\), we have \(\Pr[A] = \frac{1}{7} \sum_{i=0}^\infty \left(\frac{6}{7}\right)^{2i}\). Similarly, let \(B\) be the event that the game ends after an even number of rolls and note that \(\Pr[B] = \frac{1}{7} \sum_{i=0}^\infty \left(\frac{6}{7}\right)^{2i+1}\). Now subtract the two and see that \[\Pr[A] - \Pr[B] = \frac{1}{7} \sum_{i=0}^\infty \left(\frac{6}{7}\right)^{2i} - \left(\frac{6}{7}\right)^{2i+1}.\] Each term in the sum is positive, so the whole thing is strictly positive, meaning that \(A\) is more likely than \(B\) after all.

Recall that a permutation of the set \(\{1,2,\ldots, n\}\) is a bijection \(\pi: \{1,2,\ldots, n\} \to \{1,2,\ldots, n\}\). Write pseudocode that takes as input a number \(n\) and outputs a uniformly random permutation of \(\{1, 2, \ldots, n\}\). Your algorithm should be \(O(n)\) time. Your algorithm can use RandInt(\(m\)) (for \(m \leq n\)), which returns a uniformly random element of \(\{1,2,\ldots,m\}\), and Bernoulli(\(p\)), which returns \(1\) with probability \(p\) and returns \(0\) with probability \(1-p\). We assume calls to these functions are \(O(1)\) time. Also, accessing a certain index of an array takes \(O(1)\) time. Prove why your algorithm correctly outputs each possible permutation with probability \(1/n!\).

Here is the algorithm.

```
Input: n
- Create array A[1..n]
- Initialize A[i] = i for all i in {1,...,n}.
- For i = 1 to n:
- k = RandInt(n - (i - 1)) + (i - 1)
- Swap(A[i], A[k])
- Return A
```

We claim that the above algorithm produces a permutation uniformly at random. Fix \(\pi\) to be an arbitrary permutation, and let \(E_i\) be the event \[E_i \; = \; (A[1]= \pi(1) ) \; \wedge \; (A[2] = \pi(2)) \; \wedge \; \dots \; \wedge \; (A[i]=\pi(i)).\]

**Claim:** At the end of the \(i\)’th iteration, \[\Pr[E_i] = \frac{1}{n} \cdot \frac{1}{n-1} \dots \frac{1}{n-i+1}. \quad (*)\]

We now prove this claim by induction on \(i\).

*Base Case:* At the end of the first iteration, \(A[1]\) is simply a uniformly random element from \(\{1, \dots, n\}\). Therefore \(\Pr[E_1] = 1/n\) as desired.

*Induction Step:* Suppose the claim holds at the end of all iterations up to \(i-1\). At the end of iteration \(i-1\), we assume \[\Pr[E_{i-1}] = \frac{1}{n} \cdot \frac{1}{n-1} \cdots \frac{1}{n-i+2}.\] Conditioning on the event \(E_{i-1}\), \(\Pr[A[i]= \pi(i) \; | \;E_{i-1}] = \frac{1}{n-i+1}\). This is because \(A[i,\dots,n]\) contains the elements \(\{1,\dots, n\} \setminus \{\pi(1),\dots, \pi(i-1)\}.\) Since \(A[i]\) is chosen uniformly random from this set, it has probability \(\frac{1}{n-i+1}\) of choosing element \(\pi(i)\). Thus, we can conclude that \(\Pr[E_{i-1} \wedge (A[i] = \pi(i))] = \Pr[E_{i-1}] \cdot \Pr[ (A[i]=\pi(i)) \; | \;E_{i-1}].\) This has value \(\frac{1}{n} \cdot \frac{1}{n-1} \dots \frac{1}{n-c+1}\). Hence, \((*)\) holds true after the \(i\)’th iteration.

Since we have shown that \((*)\) holds true after each iteration, at the end of the program, the probability that \(A = \pi\) is exactly \(\frac{1}{n!}.\) As \(\pi\) was an arbitrary permutation, we output every permutation uniformly at random.