Text

Overview

Show that 2n>n2^n > n for all integers n1n \geq 1.

(Line numbers are added for easy referencing.)

  1. Fn=F_n =2n>n2^n > n

  2. F1=F_1 =2>12 > 1\checkmark

  3. Fn    Fn+1:F_n \implies F_{n+1}:

  4. 2n+1=22n>2n (induction) n+12^{n+1} = 2 \cdot 2^n > 2\cdot n \text{ (induction) } \geq n+1 because n1n \geq 1

  5. Therefore proved.

  1. Is the basic structure right?

  2. Are you starting the right way?

  3. Did you read your proof out loud?

  4. Is the purpose of every sentence clear?

  5. Are you giving the right level of detail?

  6. Should you break up your proof?

  7. Does your proof type-check?

  8. Are your references clear?

  9. Where are you using the assumptions?

  10. Is the proof idea clear?

  1. Fn=F_n =2n>n2^n > n

  2. F1=F_1 =2>12 > 1\checkmark

  3. Fn    Fn+1:F_n \implies F_{n+1}:

  4. 2n+1=22n>2n (induction) n+12^{n+1} = 2 \cdot 2^n > 2\cdot n \text{ (induction) } \geq n+1 because n1n \geq 1

  5. Therefore proved.

We will prove that for all integers n1n \geq 1, 2n>n2^n > n. The proof is by induction on nn.

Let’s start with the base case which corresponds to n=1n=1. In this case, the inequality 2n>n2^n > n translates to 21>12^1 > 1, which is indeed true.

To carry out the induction step, we want to argue that for all n1n \geq 1, 2n>n2^n > n implies 2n+1>n+12^{n+1} > n+1. We do so now. For an arbitrary n1n \geq 1, assume 2n>n2^n > n. Multiplying both sides of the inequality by 22, we get 2n+1>2n2^{n+1} > 2n. Note that since we are assuming n1n \geq 1, we have 2n=n+nn+12n = n+n \geq n+1. Therefore, we can conclude that 2n+1>n+12^{n+1} > n + 1, as desired.

If you line up any number of dominoes in a row, and knock the first one over, then all the dominoes will fall.

If you line up an infinite row of dominoes, one domino for each natural number, and you knock the first one over, then all the dominoes will fall.

Every natural number greater than 11 can be factored into primes.

  • Set up a proof by contradiction.

  • Let mm be the minimum number such that SmS_m is not true.

  • Show that SkS_k is not true for k<mk < m, which is the desired contradiction.

At any party, at any point in time, define a person’s parity as odd or even according to the number of hands they have shaken. Then the number of people with odd parity must be even.

  • We have a time-varying world state: W0,W1,W2,W_0, W_1, W_2, \ldots.

  • The goal is to prove that some statement SS is true for all world states.

  • Argue:

    • Statement SS is true for W0W_0.

    • If SS is true for WkW_k, then it remains true for Wk+1W_{k+1}.

Fix some finite set of symbols Σ\Sigma. We define a string over Σ\Sigma recursively as follows.

  • Base case: The empty sequence, denoted ϵ\epsilon, is a string.

  • Recursive rule: If xx is a string and aΣa \in \Sigma, then axax is a string.

We define a binary tree recursively as follows.

  • Base case: A single node rr is a binary tree with root rr.

  • Recursive rule: If TT' and TT'' are binary trees with roots r1r_1 and r2r_2, then TT, which has a node rr adjacent to r1r_1 and r2r_2 is a binary tree with root rr.

    image

Let TT be a binary tree. Let LTL_T be the number of leaves in TT and let ITI_T be the number of internal nodes in TT. Then LT=IT+1L_T = I_T + 1.

  • You have a recursively defined set of objects, and you want to prove a statement SS about those objects.

  • Check that SS is true for the base case(s) of the recursive definition.

  • Prove SS holds for “new” objects created by the recursive rule, assuming it holds for “old” objects used in the recursive rule.

We prove statement SS by induction on the height of a tree. Let’s first check the base case... and it is true.

For the induction step, take an arbitrary binary tree TT of height hh. Let TT' be the following tree of height h+1h+1:

image

[Some argument showing why SS is true for TT'] Therefore we are done.

Let LL be a set of binary strings (i.e. strings over Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}) defined recursively as follows:

  • (base case) the empty string, ϵ\epsilon, is in LL;

  • (recursive rule) if x,yLx, y \in L, then 0x1y0L{\texttt{0}}x{\texttt{1}}y{\texttt{0}} \in L.

This means that every string in LL is derived starting from the base case, and applying the recursive rule a finite number of times. Show that for any string wLw \in L, the number of 0{\texttt{0}}’s in ww is exactly twice the number of 1{\texttt{1}}’s in ww.

In an induction argument on recursively defined objects, if you say that you will use structural induction, then the assumption is that the parameter being inducted on is the minimum number of applications of the recursive rules needed to create an object. And in this case, explicitly stating the parameter being inducted on, or the induction hypothesis, is not needed.

An alphabet is a non-empty, finite set, and is usually denoted by Σ\Sigma. The elements of Σ\Sigma are called symbols or characters.

Given an alphabet Σ\Sigma, a string (or word) over Σ\Sigma is a (possibly infinite) sequence of symbols, written as a1a2a3a_1a_2a_3\ldots, where each aiΣa_i \in \Sigma. The string with no symbols is called the empty string and is denoted by ϵ\epsilon.

In our notation of a string, we do not use quotation marks. For instance, we write 1010{\texttt{1010}} rather than “1010{\texttt{1010}}”, even though the latter notation using the quotation marks is the standard one in many programming languages. Occasionally, however, we may use quotation marks to distinguish a string like “1010{\texttt{1010}}” from another type of object with the representation 10101010 (e.g. the binary number 10101010).

The length of a string ww, denoted w|w|, is the number of symbols in ww. If ww has an infinite number of symbols, then the length is undefined.

Let Σ\Sigma be an alphabet. We denote by Σ\Sigma^* the set of all strings over Σ\Sigma consisting of finitely many symbols. Equivalently, using set notation, Σ={a1a2an: nN, and aiΣ for all i}.\Sigma^* = \{a_1a_2\ldots a_n : \text{ $n \in \mathbb N$, and $a_i \in \Sigma$ for all $i$}\}.

We will use the words “string” and “word” to refer to a finite-length string/word. When we want to talk about infinite-length strings, we will explicitly use the word “infinite”.

By Definition (Alphabet, symbol/character), an alphabet Σ\Sigma cannot be the empty set. This implies that Σ\Sigma^* is an infinite set since there are infinitely many strings of finite length over a non-empty Σ\Sigma. We will later see that Σ\Sigma^* is always countably infinite.

The reversal of a string w=a1a2anw = a_1a_2\ldots a_n, denoted wRw^R, is the string wR=anan1a1w^R = a_na_{n-1}\ldots a_1.

The concatenation of strings uu and vv in Σ\Sigma^*, denoted by uvuv or uvu \cdot v, is the string obtained by joining together uu and vv.

For nNn \in \mathbb N, the nn’th power of a string uu, denoted by unu^n, is the word obtained by concatenating uu with itself nn times.

We say that a string uu is a substring of string ww if w=xuyw = xuy for some strings xx and yy.

For strings xx and yy, we say that yy is a prefix of xx if there exists a string zz such that x=yzx = yz. If zϵz \neq \epsilon, then yy is a proper prefix of xx.

We say that yy is a suffix of xx if there exists a string zz such that x=zyx = zy. If zϵz \neq \epsilon, then yy is a proper suffix of xx.

Let AA be a set and let Σ\Sigma be an alphabet. An encoding of the elements of AA, using Σ\Sigma, is an injective function Enc:AΣ\text{Enc}: A \to \Sigma^*. We denote the encoding of aAa \in A by a\left \langle a \right\rangle.

If wΣw \in \Sigma^* is such that there is some aAa \in A with w=aw = \left \langle a \right\rangle, then we say ww is a valid encoding of an element in AA.

A set that can be encoded is called encodable.

Describe an encoding of Z\mathbb Z using the alphabet Σ={1}\Sigma = \{{\texttt{1}}\}.

Let Σ\Sigma be an alphabet. Any function f:ΣΣf: \Sigma^* \to \Sigma^* is called a function problem over the alphabet Σ\Sigma.

A function problem is often derived from a function g:ISg: I \to S, where II is a set of objects called instances and SS is a set of objects called solutions. The derivation is done through encodings Enc:IΣ\text{Enc}: I \to \Sigma^* and Enc:SΣ\text{Enc}': S \to \Sigma^*. With these encodings, we can create the function problem f:ΣΣf : \Sigma^* \to \Sigma^*. In particular, if w=xw = \left \langle x \right\rangle for some xIx \in I, then we define f(w)f(w) to be Enc(g(x))\text{Enc}'(g(x)).

image

One thing we have to be careful about is defining f(w)f(w) for a word wΣw \in \Sigma^* that does not correspond to an encoding of an object in II (such a word does not correspond to an instance of the function problem). To handle this, we can identify one of the strings in Σ\Sigma^* as an error string and define f(w)f(w) to be that string.

Let Σ\Sigma be an alphabet. Any function f:Σ{0,1}f: \Sigma^* \to \{0,1\} is called a decision problem over the alphabet Σ\Sigma. The co-domain of the function is not important as long as it has two elements. One can think of the co-domain as {No,Yes},\{\text{No}, \text{Yes}\}, {False,True}\{\text{False}, \text{True}\} or {Reject,Accept}\{ \text{Reject}, \text{Accept}\}.

As with a function problem, a decision problem is often derived from a function g:I{0,1}g: I \to \{0,1\}, where II is a set of instances. The derivation is done through an encoding Enc:IΣ\text{Enc}: I \to \Sigma^*, which allows us to define the decision problem f:Σ{0,1}f: \Sigma^* \to \{0,1\}. Any word wΣw \in \Sigma^* that does not correspond to an encoding of an instance is mapped to 00 by ff.

Instances that map to 11 are often called yes-instances and instances that map to 00 are often called no-instances.

Any (possibly infinite) subset LΣL \subseteq \Sigma^* is called a language over the alphabet Σ\Sigma.

Since a language is a set, the size of a language refers to the size of that set. A language can have finite or infinite size. This is not in conflict with the fact that every language consists of finite-length strings.

The language {ϵ}\{\epsilon\} is not the same language as \varnothing. The former has size 11 whereas the latter has size 00.

There is a one-to-one correspondence between decision problems and languages. Let f:Σ{0,1}f:\Sigma^* \to \{0,1\} be some decision problem. Now define LΣL \subseteq \Sigma^* to be the set of all words in Σ\Sigma^* that ff maps to 1. This LL is the language corresponding to the decision problem ff. Similarly, if you take any language LΣL \subseteq \Sigma^*, we can define the corresponding decision problem f:Σ{0,1}f:\Sigma^* \to \{0,1\} as f(w)=1f(w) = 1 if and only if wLw \in L. We consider the set of languages and the set of decision problems to be the same set of objects.

image

It is important to get used to this correspondence since we will often switch between talking about languages and talking about decision problems without explicit notice.

When talking about decision/function problems or languages, if the alphabet is not specified, it is assumed to be the binary alphabet Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}.

You may be wondering why we make the definition of a language rather than always viewing a decision problem as a function from Σ\Sigma^* to {0,1}\{0,1\}. The reason is that some operations and notation are more conveniently expressed using sets rather than functions, and therefore in certain situations, it can be nicer to think of decision problems as sets.

The reversal of a language LΣL \subseteq \Sigma^*, denoted LRL^R, is the language LR={wRΣ:wL}.L^R = \{w^R \in \Sigma^* : w \in L\}.

The concatenation of languages L1,L2ΣL_1, L_2 \subseteq \Sigma^*, denoted L1L2L_1L_2 or L1L2L_1 \cdot L_2, is the language L1L2={uvΣ:uL1,vL2}.L_1L_2 = \{uv \in \Sigma^* : u \in L_1, v \in L_2\}.

For nNn \in \mathbb N, the nn’th power of a language LΣL \subseteq \Sigma^*, denoted LnL^n, is the language obtained by concatenating LL with itself nn times, that is, Ln=LLLLn times.L^n = \underbrace{L \cdot L \cdot L \cdots L}_{n \text{ times}}. Equivalently, Ln={u1u2unΣ:uiL for all i{1,2,,n}}.L^n = \{u_1u_2\cdots u_n \in \Sigma^* : u_i \in L \text{ for all } i \in \{1,2,\ldots,n\}\}.

The star of a language LΣL \subseteq \Sigma^*, denoted LL^*, is the language L=nNLn.L^* = \bigcup_{n \in \mathbb N} L^n. Equivalently, L={u1u2unΣ:nN,uiL for all i{1,2,,n}}.L^* = \{u_1u_2\cdots u_n \in \Sigma^* : n \in \mathbb N, u_i \in L \text{ for all } i \in \{1,2,\ldots,n\}\}. Note that we always have ϵL\epsilon\in L^*. Sometimes we may prefer to not include ϵ\epsilon by default, so we define L+=nN+Ln.L^+ = \bigcup_{n \in \mathbb N^+} L^n.

Prove or disprove: If L1,L2{a,b}L_1, L_2 \subseteq \{{\texttt{a}},{\texttt{b}}\}^* are languages, then (L1L2)=L1L2(L_1 \cap L_2)^* = L_1^* \cap L_2^*.

Prove or disprove the following:

  1. If A,B,C{a,b}A, B, C \subseteq \{{\texttt{a}},{\texttt{b}}\}^* are languages, then A(BC)=ABACA(B \cup C) = AB \cup AC.

  2. If A,B,C{a,b}A, B, C \subseteq \{{\texttt{a}},{\texttt{b}}\}^* are languages, then A(BC)=ABACA(B \cap C) = AB \cap AC.

Is it true that for any language LL, (L)R=(LR)(L^*)^R = (L^R)^*? Prove your answer.

Prove or disprove: For any language LL, L=(L)L^* = (L^*)^*.

We denote by ALL\mathsf{ALL} the set of all languages (over the default alphabet Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}).

  1. Given an alphabet Σ\Sigma, what is the definition of Σ\Sigma^*?

  2. What is the definition of a language?

  3. Is \varnothing a language? If it is, what is the decision problem corresponding to it?

  4. Is Σ\Sigma^* a language? If it is, what is the decision problem corresponding to it?

  5. Given two languages L1L_1 and L2L_2, what are the definitions of L1L2L_1L_2 and L1L_1^*?

  6. True or false: The language {011,101,110}\{{\texttt{011}}, {\texttt{101}}, {\texttt{110}}\}^* is the set of all binary strings containing twice as many 1{\texttt{1}}’s as 0{\texttt{0}}’s.

  7. True or false: The set {}\{\varnothing\} is a language, where \varnothing denotes the empty set.

  8. True or false: An infinite language must contain infinite-length strings.

  9. True or false: A language can contain infinite-length strings.

  10. Define finite languages L1L_1 and L2L_2 such that L1L2<L1L2|L_1L_2| < |L_1||L_2|.

  11. Define finite languages L1L_1 and L2L_2 such that L1L2=L1L2|L_1L_2| = |L_1||L_2|.

  12. True or false: For any language LL and any nNn \in \mathbb N, we have LnLn+1L^n \subseteq L^{n+1}.

  13. True or false: For any language LL and any nNn \in \mathbb N, we have LnLL^n \subseteq L^*.

  14. Let Σ\Sigma be an alphabet. For which languages LΣL \subseteq \Sigma^* is it true that LLL\cdot L is infinite?

  15. Let Σ\Sigma be an alphabet. For which languages LΣL \subseteq \Sigma^* is it true that LL^* is infinite?

  16. What is the definition of an encodable set? Which sets are encodable? How do we deal with sets that are not encodable?

  17. What is the motivation behind defining an encoding as a mapping to finite-length strings as opposed to both finite and infinite-length strings?

  18. What is the definition of a function problem? Give one example of a function problem.

  19. What is the definition of a decision problem? Give one example of a decision problem.

  20. Briefly explain the correspondence between decision problems and languages.

  21. Write down the primality testing decision problem as a language.

  22. Why do we focus on decision problems in theoretical computer science?

Here are the important things to keep in mind from this chapter.

  1. All the definitions and the notation introduced in this are fundamental, and we will use them liberally without reminders. As we move to more advanced topics, it is important that the definitions in this chapter do not add to the mental burden and make things overwhelming for you. Even when you are totally comfortable with the definitions, we will be covering concepts that are quite challenging. Therefore, you want to be able to spend all your mental energy on digesting the new material without getting confused by the definitions and terminology from previous chapters. If you invest the time to really understanding the definitions now, it will save you time later.

  2. There is only one data type in theoretical computer science: strings. Every other type of data (e.g. numbers, images, graphs, videos) can be encoded with strings. Algorithms/computers work with these encodings.

  3. A decision problem is a special kind of function problem. In a decision problem, for every possible input, the output is either True or False. Our focus for the next few chapters will be on decision problems.

  4. There is a one-to-one correspondence between decision problems and languages. It is important to be able to easily switch between a language and the corresponding decision problem in your mind.

Anything that processes information can be called a computer. However, there can be restrictions on how the information can be processed (either universal restrictions imposed by, say, the laws of physics, or restrictions imposed by the particular setting we are interested in).

A computational model is a set of allowed rules for information processing. Given a computational model, we can talk about the computers (or machines) allowed by the computational model. A computer is a specific instantiation of the computational model, or in other words, it is a specific collection of information processing rules allowed by the computational model.

Note that even though the terms “computer” and “machine” suggest a physical device, in this course we are not interested in physical representations that realize computers, but rather in the mathematical representations. The term algorithm is synonymous to computer (and machine) and makes it more clear that we are not necessarily talking about a physical device.

A deterministic finite automaton (DFA) MM is a 55-tuple M=(Q,Σ,δ,q0,F),M = (Q, \Sigma, \delta, q_0, F), where

  • QQ is a non-empty finite set (which we refer to as the set of states of the DFA);

  • Σ\Sigma is a non-empty finite set (which we refer to as the alphabet of the DFA);

  • δ\delta is a function of the form δ:Q×ΣQ\delta: Q \times \Sigma \to Q (which we refer to as the transition function of the DFA);

  • q0Qq_0 \in Q is an element of QQ (which we refer to as the start state of the DFA);

  • FQF \subseteq Q is a subset of QQ (which we refer to as the set of accepting states of the DFA).

It is very common to represent a DFA with a state diagram. Below is an example of how we draw a state diagram of a DFA:

image

In this example, Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}, Q={q0,q1,q2,q3}Q = \{q_0,q_1,q_2,q_3\}, F={q1,q2}F = \{q_1,q_2\}. The labeled arrows between the states encode the transition function δ\delta, which can also be represented with a table as shown below (row qiQq_i \in Q and column bΣb \in \Sigma contains δ(qi,b)\delta(q_i, b)).

image

The start state is labeled with q0q_0, but if another label is used, we can identify the start state in the state diagram by identifying the state with an arrow that does not originate from another state and goes into that state.

In the definition above, we used the labels Q,Σ,δ,q0,FQ, \Sigma, \delta, q_0, F. One can of course use other labels when defining a DFA as long as it is clear what each label represents.

We’ll consider two DFAs to be equivalent/same if they are the same machine up to renaming the elements of the sets QQ and Σ\Sigma. For instance, the two DFAs below are considered to be the same even though they have different labels on the states and use different alphabets.

image

The flexibility with the choice of labels allows us to be more descriptive when we define a DFA. For instance, we can give labels to the states that communicate the “purpose” or “role” of each state.

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA and let w=w1w2wnw = w_1w_2 \cdots w_n be a string over an alphabet Σ\Sigma (so wiΣw_i \in \Sigma for each i{1,2,,n}i \in \{1,2,\ldots,n\}). The computation path of MM with respect to ww is a specific sequence of states r0,r1,r2,,rnr_0, r_1, r_2, \ldots , r_n (where each riQr_i \in Q). We’ll often write this sequence as follows.

w1w_1 w2w_2 \ldots wnw_n
r0r_0 r1r_1 r2r_2 \ldots rnr_n

 
These states correspond to the states reached when the DFA reads the input ww one character at a time. This means r0=q0r_0 = q_0, and i{1,2,,n},δ(ri1,wi)=ri.\forall i \in \{1,2,\ldots,n\}, \quad \delta(r_{i-1},w_i) = r_i. An accepting computation path is such that rnFr_n \in F, and a rejecting computation path is such that rn∉Fr_n \not\in F.

We say that MM accepts a string wΣw \in \Sigma^* (or “M(w)M(w) accepts”, or “M(w)=1M(w) = 1”) if the computation path of MM with respect to ww is an accepting computation path. Otherwise, we say that MM rejects the string ww (or “M(w)M(w) rejects”, or “M(w)=0M(w) = 0”).

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA. The transition function δ:Q×ΣQ\delta : Q \times \Sigma \to Q extends to δ:Q×ΣQ\delta^* : Q \times \Sigma^* \to Q, where δ(q,w)\delta^*(q, w) is defined as the state we end up in if we start at qq and read the string w=w1wnw = w_1 \ldots w_n. Or in other words, δ(q,w)=δ(δ(δ(δ(q,w1),w2),w3),wn).\delta^*(q, w) = \delta(\ldots\delta(\delta(\delta(q,w_1),w_2),w_3)\ldots, w_n). The star in the notation can be dropped and δ\delta can be overloaded to represent both a function δ:Q×ΣQ\delta : Q \times \Sigma \to Q and a function δ:Q×ΣQ\delta : Q \times \Sigma^* \to Q. Using this notation, a word ww is accepted by the DFA MM if δ(q0,w)F\delta(q_0, w) \in F.

Let f:Σ{0,1}f: \Sigma^* \to \{0,1\} be a decision problem and let MM be a DFA over the same alphabet. We say that MM solves (or decides, or computes) ff if the input/output behavior of MM matches ff exactly, in the following sense: for all wΣw \in \Sigma^*, M(w)=f(w)M(w) = f(w).

If LL is the language corresponding to ff, the above definition is equivalent to saying that MM solves (or decides, or computes) LL if the following holds:

  • if wLw \in L, then MM accepts ww (i.e. M(w)=1M(w) = 1);

  • if w∉Lw \not \in L, then MM rejects ww (i.e. M(w)=0M(w) = 0).

For a DFA MM, there is a unique language LL that MM solves. This language is often denoted by L(M)L(M) and is referred to as the language of MM. Note that L(M)={wΣ:M accepts w}.L(M) = \{w \in \Sigma^* : \text{$M$ accepts $w$}\}.

For each DFA MM below, define L(M)L(M).

  1. image

  2. image

For each language below (over the alphabet Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}), draw a DFA solving it.

  1. {101,110}\{{\texttt{101}}, {\texttt{110}}\}

  2. {0,1}{101,110}\{{\texttt{0}},{\texttt{1}}\}^* \setminus \{{\texttt{101}}, {\texttt{110}}\}

  3. {x{0,1}:x starts and ends with the same bit}\{x \in \{{\texttt{0}},{\texttt{1}}\}^* : \text{$x$ starts and ends with the same bit}\}

  4. {110}={ϵ,110,110110,110110110,}\{{\texttt{110}}\}^* = \{\epsilon, {\texttt{110}}, {\texttt{110110}}, {\texttt{110110110}}, \ldots\}

  5. {x{0,1}:x contains 110 as a substring}\{x \in \{{\texttt{0}},{\texttt{1}}\}^*: \text{$x$ contains ${\texttt{110}}$ as a substring}\}

Let LL be a finite language, i.e., it contains a finite number of words. At a high level, describe why there is a DFA solving LL.

A language LΣL \subseteq \Sigma^* is called a regular language if there is a deterministic finite automaton MM that solves LL.

Let Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}. Is the following language regular? L={w{0,1}:w has an equal number of occurrences of 01 and 10 as substrings}L = \{w \in \{{\texttt{0}},{\texttt{1}}\}^* : w \text{ has an equal number of occurrences of ${\texttt{01}}$ and ${\texttt{10}}$ as substrings}\}

We denote by REG\mathsf{REG} the set of all regular languages (over the default alphabet Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}).

Let Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}. The language L={0n1n:nN}L = \{{\texttt{0}}^n {\texttt{1}}^n: n \in \mathbb N\} is not regular.

In the proof of the above theorem, we defined the set P={0n:n{0,1,,k}}P = \{{\texttt{0}}^n : n \in \{0,1,\ldots,k\}\} and then applied the pigeonhole principle. Explain why picking the following sets would not have worked in that proof.

  1. P={1n:n{1,2,,k+1}}P = \{{\texttt{1}}^n : n \in \{1,2,\ldots,k+1\}\}

  2. P={1,11,000,0000,00000,,0k+1}P = \{{\texttt{1}}, {\texttt{11}}, {\texttt{000}}, {\texttt{0000}}, {\texttt{00000}}, \ldots, {\texttt{0}}^{k+1}\}

Let Σ={a,b,c}\Sigma = \{{\texttt{a}},{\texttt{b}},{\texttt{c}}\}. Prove that L={c251anb2n:nN}L = \{{\texttt{c}}^{251} {\texttt{a}}^n {\texttt{b}}^{2n}: n \in \mathbb N\} is not regular.

In Exercise (Would any set of pigeons work?) we saw that the “pigeon set” that we use to apply the pigeonhole principle must be chosen carefully. We’ll call a pigeon set a fooling pigeon set if it is a pigeon set that “works”. That is, given a DFA with kk states that is assumed to solve a non-regular LL, a fooling pigeon set of size k+1k+1 allows us to carry out the contradiction proof, and conclude that LL is non-regular. Identify the property that a fooling pigeon set should have.

Let Σ={a}\Sigma = \{{\texttt{a}}\}. The language L={a2n:nN}L = \{{\texttt{a}}^{2^n}: n \in \mathbb N\} is not regular.

In the next exercise, we’ll write the proof in a slightly different way to offer a slightly different perspective. In particular, we’ll phrase the proof such that the goal is to show no DFA solving LL can have a finite number of states. This is done by identifying an infinite set of strings that all must end up in a different state (this is the fooling pigeon set that we defined in Exercise (A fooling pigeon set)). Once you have a set of strings SS such that every string in SS must end up in a different state, we can conclude any DFA solving the language must have at least S|S| states.

This slight change in phrasing of the non-regularity proof makes it clear that even if LL is regular, the technique can be used to prove a lower bound on the number of states of any DFA solving LL.

Let Σ={a,b,c}\Sigma = \{{\texttt{a}},{\texttt{b}},{\texttt{c}}\}. Prove that L={anbncn:nN}L = \{{\texttt{a}}^n{\texttt{b}}^n{\texttt{c}}^n: n \in \mathbb N\} is not regular.

Is it true that if LL is regular, then its complement ΣL\Sigma^* \setminus L is also regular? In other words, are regular languages closed under the complementation operation?

Is it true that if LΣL \subseteq \Sigma^* is a regular language, then any LLL' \subseteq L is also a regular language?

Let Σ\Sigma be some finite alphabet. If L1ΣL_1 \subseteq \Sigma^* and L2ΣL_2 \subseteq \Sigma^* are regular languages, then the language L1L2L_1 \cup L_2 is also regular.

Let Σ\Sigma be some finite alphabet. If L1ΣL_1 \subseteq \Sigma^* and L2ΣL_2 \subseteq \Sigma^* are regular languages, then the language L1L2L_1 \cap L_2 is also regular.

Give a direct proof (without using the fact that regular languages are closed under complementation, union and intersection) that if L1L_1 and L2L_2 are regular languages, then L1L2L_1 \setminus L_2 is also regular.

  1. Suppose L1,,LkL_1, \ldots, L_k are all regular languages. Is it true that their union i=0kLi\bigcup_{i=0}^k L_i must be a regular language?

  2. Suppose L0,L1,L2,L_0, L_1, L_2, \ldots is an infinite sequence of regular languages. Is it true that their union i0Li\bigcup_{i\geq 0} L_i must be a regular language?

Suppose L1L_1 and L2L_2 are not regular languages. Is it always true that L1L2L_1 \cup L_2 is not a regular language?

Suppose LΣL \subseteq \Sigma^* is a regular language. Show that the following languages are also regular: SUFFIXES(L)={xΣ:yxL for some yΣ},PREFIXES(L)={yΣ:yxL for some xΣ}.\begin{aligned} \text{SUFFIXES}(L) & = \{x \in \Sigma^* : \text{$yx \in L$ for some $y \in \Sigma^*$}\}, \\ \text{PREFIXES}(L) & = \{y \in \Sigma^* : \text{$yx \in L$ for some $x \in \Sigma^*$}\}.\end{aligned}

If L1,L2ΣL_1, L_2 \subseteq \Sigma^* are regular languages, then the language L1L2L_1L_2 is also regular.

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA. We define the generalized transition function δ:(Q)×Σ(Q)\delta_\wp: \wp(Q) \times \Sigma \to \wp(Q) as follows. For SQS \subseteq Q and σΣ\sigma \in \Sigma, δ(S,σ)={δ(q,σ):qS}.\delta_{\wp}(S, \sigma) = \{\delta(q, \sigma) : q \in S\}.

In the proof of Theorem (Regular languages are closed under concatenation), we defined the set of states for the DFA solving L1L2L_1L_2 as Q=Q×(Q)Q'' = Q \times \wp(Q'). Construct a DFA for L1L2L_1L_2 in which QQ'' is equal to (QQ)\wp(Q \cup Q'), i.e., specify how δ\delta'', q0q_0'' and FF'' should be defined with respect to Q=(QQ)Q'' = \wp(Q \cup Q'). Proof of correctness is not required.

Critique the following argument that claims to establish that regular languages are closed under the star operation, that is, if LL is a regular language, then so is LL^*.

Let LL be a regular language. We know that by definition L=nNLnL^* = \bigcup_{n \in \mathbb N} L^n, where Ln={u1u2un:uiL for all i}.L^n = \{u_1 u_2 \ldots u_n : u_i \in L \text{ for all $i$} \}. We know that for all nn, LnL^n must be regular using Theorem (Regular languages are closed under concatenation). And since LnL^n is regular for all nn, we know LL^* must be regular using Theorem (Regular languages are closed under union).

Show that regular languages are closed under the star operation as follows: First show that if LL is regular, then so is L+L^+, which is defined as the union L+=nN+Ln.L^+ = \bigcup_{n \in \mathbb N^+} L^n. For this part, given a DFA for LL, show how to construct a DFA for L+L^+. A proof of correctness is not required. In order to conclude that LL^* is regular, observe that L=L+{ϵ}L^* = L^+ \cup \{\epsilon\}, and use the fact that regular languages are closed under union.

  1. What are the 55 components of a DFA?

  2. Let MM be a DFA. What does L(M)L(M) denote?

  3. Draw a DFA solving \varnothing. Draw a DFA solving Σ\Sigma^*.

  4. True or false: Given a language LL, there is at most one DFA that solves it.

  5. Fix some alphabet Σ\Sigma. How many DFAs are there with exactly one state?

  6. True or false: For any DFA, all the states are “reachable”. That is, if D=(Q,Σ,δ,q0,F)D = (Q, \Sigma, \delta, q_0, F) is a DFA and qQq \in Q is one of its states, then there is a string wΣw \in \Sigma^* such that δ(q0,w)=q\delta^*(q_0,w) = q.

  7. What is the set of all possible inputs for a DFA D=(Q,Σ,δ,q0,F)D = (Q, \Sigma, \delta, q_0, F)?

  8. Consider the set of all DFAs with kk states over the alphabet Σ={a}\Sigma =\{{\texttt{a}}\} such that all states are reachable from q0q_0. What is the cardinality of this set?

  9. What is the definition of a regular language?

  10. Describe the general strategy (the high level steps) for showing that a language is not regular.

  11. Give 33 examples of non-regular languages.

  12. Let L{a}L \subseteq \{{\texttt{a}}\}^* be a language consisting of all strings of a{\texttt{a}}’s of odd length except length 251251. Is LL regular?

  13. Let LL be the set of all strings in {0,1}\{{\texttt{0}},{\texttt{1}}\}^* that contain at least 251251 0{\texttt{0}}’s and at most 251251 1{\texttt{1}}’s. Is LL regular?

  14. Suppose you are given a regular language LL and you are asked to prove that any DFA solving LL must contain at least kk states (for some kk value given to you). What is a general proof strategy for establishing such a claim?

  15. Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA solving a language LL and let M=(Q,Σ,δ,q0,F)M' = (Q', \Sigma, \delta', q'_0, F') be a DFA solving a language LL'. Describe the 55 components of a DFA solving LLL \cup L'.

  16. True or false: Let L1L2L_1 \oplus L_2 denote the set of all words in either L1L_1 or L2L_2, but not both. If L1L_1 and L2L_2 are regular, then so is L1L2L_1 \oplus L_2.

  17. True or false: For languages LL and LL', if LLL \subseteq L' and LL is non-regular, then LL' is non-regular.

  18. True or false: If LΣL \subseteq \Sigma^* is non-regular, then L=ΣL\overline{L} = \Sigma^* \setminus L is non-regular.

  19. True or false: If L1,L2ΣL_1, L_2 \subseteq \Sigma^* are non-regular languages, then so is L1L2L_1 \cup L_2.

  20. True or false: Let LL be a non-regular language. There exists KLK \subset L, KLK \neq L, such that KK is also non-regular.

  21. By definition a DFA has finitely many states. What is the motivation/reason for this restriction?

  22. Consider the following decision problem: Given as input a DFA, output True if and only if there exists some string sΣs \in \Sigma^* that the DFA accepts. Write the language corresponding to this decision problem using mathematical notation, and in particular, using set builder notation.

Here are the important things to keep in mind from this chapter.

  1. Given a DFA, describe in English the language that it solves.

  2. Given the description of a regular language, come up with a DFA that solves it.

  3. Given a non-regular language, prove that it is indeed non-regular. Make sure you are not just memorizing a template for these types of proofs, but that you understand all the details of the strategy being employed. Apply the Feynman technique and see if you can teach this kind of proof to someone else.

  4. In proofs establishing a closure property of regular languages, you start with one or more DFAs, and you construct a new one from them. In order to build the new DFA successfully, you have to be comfortable with the 5-tuple notation of a DFA. The constructions can get notation-heavy (with power sets and Cartesian products), but getting comfortable with mathematical notation is one of our learning objectives.

  5. With respect to closure properties of regular languages, prioritize having a very solid understanding of closure under the union operation before you move to the more complicated constructions for the concatenation operation and star operation.

A Turing machine (TM) MM is a 7-tuple M=(Q,Σ,Γ,δ,q0,qacc,qrej),M = (Q, \Sigma, \Gamma, \delta, q_0, q_\text{acc}, q_\text{rej}), where

  • QQ is a non-empty finite set
    (which we refer to as the set of states of the TM);

  • Σ\Sigma is a non-empty finite set that does not contain the blank symbol \sqcup
    (which we refer to as the input alphabet of the TM);

  • Γ\Gamma is a finite set such that Γ\sqcup\in \Gamma and ΣΓ\Sigma \subset \Gamma
    (which we refer to as the tape alphabet of the TM);

  • δ\delta is a function of the form δ:Q×ΓQ×Γ×{L,R}\delta: Q \times \Gamma \to Q \times \Gamma \times \{\text{L}, \text{R}\}
    (which we refer to as the transition function of the TM);

  • q0Qq_0 \in Q is an element of QQ
    (which we refer to as the initial state of the TM or the starting state of the TM);

  • qaccQq_\text{acc}\in Q is an element of QQ
    (which we refer to as the accepting state of the TM);

  • qrejQq_\text{rej}\in Q is an element of QQ such that qrejqaccq_\text{rej}\neq q_\text{acc}
    (which we refer to as the rejecting state of the TM).

Below is an example of a state diagram of a TM with 55 states:

image

In this example, Σ={a,b}\Sigma = \{{\texttt{a}},{\texttt{b}}\}, Γ={a,b,}\Gamma = \{{\texttt{a}},{\texttt{b}},\sqcup\}, Q={q0,qa,qb,qacc,qrej}Q = \{q_0,q_a,q_b,q_\text{acc},q_\text{rej}\}. The labeled arrows between the states encode the transition function δ\delta. As an example, the label on the arrow from q0q_0 to qaq_a is aR{\texttt{a}} \overset{\text{R}}{\rightharpoondown} \sqcup, which represents δ(q0,a)=(qa,,R)\delta(q_0, {\texttt{a}}) = (q_a, \sqcup, \text{R})

A Turing machine is always accompanied by a tape that is used as memory. The tape is just a sequence of cells that can hold any symbol from the tape alphabet. The tape can be defined so that it is infinite in two directions (so we could imagine indexing the cells using the integers Z\mathbb Z), or it could be infinite in one direction, to the right (so we could imagine indexing the cells using the natural numbers N\mathbb N). The input to a Turing machine is always a string wΣw \in \Sigma^*. The string w1wnΣw_1\ldots w_n \in \Sigma^* is put on the tape so that symbol wiw_i is placed on the cell with index i1i-1. We imagine that there is a tape head that initially points to index 0. The symbol that the tape head points to at a particular time is the symbol that the Turing machine reads. The tape head moves left or right according to the transition function of the Turing machine. These details are explained in lecture.

In these notes, we assume our tape is infinite in two directions.

Given any DFA and any input string, the DFA always halts and makes a decision to either reject or accept the string. The same is not true for TMs and this is an important distinction between DFAs and TMs. It is possible that a TM does not make a decision when given an input string, and instead, loops forever. So given a TM MM and an input string xx, there are 3 options when we run MM on xx:

  • MM accepts xx (denoted M(x)=1M(x) = 1);

  • MM rejects xx (denoted M(x)=0M(x) = 0);

  • MM loops forever (denoted M(x)=M(x) = \infty).

The formal definitions for these 3 cases is given below.

Let MM be a Turing machine where QQ is the set of states, \sqcup is the blank symbol, and Γ\Gamma is the tape alphabet. To understand how MM’s computation proceeds we generally need to keep track of three things: (i) the state MM is in; (ii) the contents of the tape; (iii) where the tape head is. These three things are collectively known as the “configuration” of the TM. More formally: a configuration for MM is defined to be a string uqv(ΓQ)uqv \in (\Gamma \cup Q)^*, where u,vΓu, v \in \Gamma^* and qQq \in Q. This represents that the tape has contents uv\cdots \sqcup\sqcup\sqcup uv \sqcup\sqcup\sqcup\cdots, the head is pointing at the leftmost symbol of vv, and the state is qq. A configuration is an accepting configuration if qq is MM’s accept state and it is a rejecting configuration if qq is MM’s reject state.

Suppose that MM reaches a certain configuration α\alpha (which is not accepting or rejecting). Knowing just this configuration and MM’s transition function δ\delta, one can determine the configuration β\beta that MM will reach at the next step of the computation. (As an exercise, make this statement precise.) We write αMβ\alpha \vdash_M \beta and say that “α\alpha yields β\beta (in MM)”. If it is obvious what MM we’re talking about, we drop the subscript MM and just write αβ\alpha \vdash \beta.

Given an input xΣx \in \Sigma^* we say that M(x)M(x) halts if there exists a sequence of configurations (called the computation path) α0,α1,,αT\alpha_0, \alpha_1, \dots, \alpha_{T} such that:

  1. α0=q0x\alpha_0 = q_0x, where q0q_0 is MM’s initial state;

  2. αtMαt+1\alpha_t \vdash_M \alpha_{t+1} for all t=0,1,2,,T1t = 0, 1, 2, \dots, T-1;

  3. αT\alpha_T is either an accepting configuration (in which case we say M(x)M(x) accepts) or a rejecting configuration (in which case we say M(x)M(x) rejects).

Otherwise, we say M(x)M(x) loops.

Let MM denote the Turing machine shown below, which has input alphabet Σ={0}\Sigma = \{{\texttt{0}}\} and tape alphabet Γ={0,x,}\Gamma = \{{\texttt{0}},{\texttt{x}}, \sqcup\}.

image

Write out the computation path α0Mα1MMαT\alpha_0 \vdash_M \alpha_1 \vdash_M \cdots \vdash_M \alpha_T for M(0000)M({\texttt{0000}}) and determine whether it accepts or rejects.

Let f:Σ{0,1}f: \Sigma^* \to \{0,1\} be a decision problem and let MM be a TM with input alphabet Σ\Sigma. We say that MM solves (or decides, or computes) ff if the input/output behavior of MM matches ff exactly, in the following sense: for all wΣw \in \Sigma^*, M(w)=f(w)M(w) = f(w).

If LL is the language corresponding to ff, the above definition is equivalent to saying that MM solves (or decides, or computes) LL if the following holds:

  • if wLw \in L, then MM accepts ww (i.e. M(w)=1M(w) = 1);

  • if w∉Lw \not \in L, then MM rejects ww (i.e. M(w)=0M(w) = 0).

If MM solves a decision problem (language), MM must halt on all inputs. Such a TM is called a decider.

The language of a TM MM is L(M)={wΣ:M accepts w}.L(M) = \{w \in \Sigma^* : \text{$M$ accepts $w$}\}. Given a TM MM, we cannot say that MM solves/decides L(M)L(M) because MM may loop forever for inputs ww not in L(M)L(M). Only a decider TM can decide a language. And if MM is indeed a decider, then L(M)L(M) is the unique language that MM solves/decides.

Give a description of the language decided by the TM shown in Note (State diagram of a TM).

For each language below, draw the state diagram of a TM that decides the language. You can use any finite tape alphabet Γ\Gamma containing the elements of Σ\Sigma and the symbol \sqcup.

  1. L={0n1n:nN}L = \{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\}, where Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}.

  2. L={02n:nN}L = \{{\texttt{0}}^{2^n} : n \in \mathbb N\}, where Σ={0}\Sigma = \{{\texttt{0}}\}.

A language LL is called decidable (or computable) if there exists a TM that decides (i.e. solves) LL.

We denote by R\mathsf{R} the set of all decidable languages (over the default alphabet Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}).

Let MM be a TM with input alphabet Σ\Sigma. We say that on input xx, MM outputs the string yy if the following hold:

  • M(x)M(x) halts with the halting configuration being uqvuqv (where q{qacc,qrej}q \in \{q_\text{acc}, q_\text{rej}\}),

  • the string uvuv equals yy.

In this case we write M(x)=yM(x) = y.

We say MM solves (or computes) a function problem f:ΣΣf : \Sigma^* \to \Sigma^* if for all xΣx \in \Sigma^*, M(x)=f(x)M(x) = f(x).

The Church-Turing Thesis (CTT) states that any computation that can be conducted in this universe (constrained by the laws of physics of course), can be carried out by a TM. In other words, the set of problems that are in principle computable in this universe is captured by the complexity class R\mathsf{R}.

There are a couple of important things to highlight. First, CTT says nothing about the efficiency of the simulation. Second, CTT is not a mathematical statement, but a physical claim about the universe we live in (similar to claiming that the speed of light is constant). The implications of CTT is far-reaching. For example, CTT claims that any computation that can be carried out by a human can be carried out by a TM. Other implications are discussed in lecture.

We will consider three different ways of describing TMs.

  1. A low-level description of a TM is given by specifying the 7-tuple in its definition. This information is often presented using a picture of its state diagram.

  2. A medium-level description is an English description of the movement and behavior of the tape head, as well as how the contents of the tape is changing, as the computation is being carried out.

  3. A high-level description is pseudocode or an algorithm written in English. Usually, an algorithm is written in a way so that a human can read it, understand it, and carry out its steps. By CTT, there is a TM that can carry out the same computation.

Unless explicitly stated otherwise, you can present a TM using a high-level description. From now on, we will do the same.

Let LL and KK be decidable languages. Show that L=ΣL\overline{L} = \Sigma^* \setminus L, LKL \cap K and LKL \cup K are also decidable by presenting high-level descriptions of TMs deciding them.

Fix Σ={3}\Sigma = \{{\texttt{3}}\} and let L{3}L \subseteq \{{\texttt{3}}\}^* be defined as follows: xLx \in L if and only if xx appears somewhere in the decimal expansion of π\pi. For example, the strings ϵ\epsilon, 3{\texttt{3}}, and 33{\texttt{33}} are all definitely in LL, because π=3.1415926535897932384626433\pi = 3.1415926535897932384626433\dots Prove that LL is decidable. No knowledge in number theory is required to solve this question.

In Chapter 1 we saw that we can use the notation \left \langle \cdot \right\rangle to denote an encoding of objects belonging to any countable set. For example, if DD is a DFA, we can write D\left \langle D \right\rangle to denote the encoding of DD as a string. If MM is a TM, we can write M\left \langle M \right\rangle to denote the encoding of MM. There are many ways one can encode DFAs and TMs. We will not be describing a specific encoding scheme as this detail will not be important for us.

Recall that when we want to encode a tuple of objects, we use the comma sign. For example, if M1M_1 and M2M_2 are two Turing machines, we write M1,M2\left \langle M_1, M_2 \right\rangle to denote the encoding of the tuple (M1,M2)(M_1,M_2). As another example, if MM is a TM and xΣx \in \Sigma^*, we can write M,x\left \langle M, x \right\rangle to denote the encoding of the tuple (M,x)(M,x).

The fact that we can encode a Turing machine (or a piece of code) means that an input to a TM can be the encoding of another TM (in fact, a TM can take the encoding of itself as the input). This point of view has several important implications, one of which is the fact that we can come up with a Turing machine, which given as input the description of any Turing machine, can simulate it. This simulator Turing machine is called a universal Turing machine.

Let Σ\Sigma be some finite alphabet. A universal Turing machine UU is a Turing machine that takes M,x\left \langle M, x \right\rangle as input, where MM is a TM and xx is a word in Σ\Sigma^*, and has the following high-level description:

def    U(TM M,  string x):1.    Simulate M on input x (i.e. run M(x))2.    If it accepts, accept.3.    If it rejects, reject.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; U(\left \langle \text{TM } M, \; \text{string } x \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Simulate $M$ on input $x$ (i.e. run $M(x)$)}\\ &{\scriptstyle 2.}~~~~\texttt{If it accepts, accept.}\\ &{\scriptstyle 3.}~~~~\texttt{If it rejects, reject.}\\ \hline\hline\end{aligned}

Note that if M(x)M(x) loops forever, then UU loops forever as well. To make sure MM always halts, we can add a third input, an integer kk, and have the universal machine simulate the input TM for at most kk steps.

Above, when denoting the encoding of a tuple (M,x)(M, x) where MM is a Turing machine and xx is a string, we used TM M,string x\left \langle \text{TM } M, \text{string } x \right\rangle to make clear what the types of MM and xx are. We will make use of this notation from now on and specify the types of the objects being referenced within the encoding notation.

When we give a high-level description of a TM, we often assume that the input given is of the correct form/type. For example, with the Universal TM above, we assumed that the input was the encoding TM M,  string x\left \langle \text{TM } M, \; \text{string } x \right\rangle. But technically, the input to the universal TM (and any other TM) is allowed to be any finite-length string. What do we do if the input string does not correspond to a valid encoding of an expected type of input object?

Even though this is not explicitly written, we will implicitly assume that the first thing our machine does is check whether the input is a valid encoding of objects with the expected types. If it is not, the machine rejects. If it is, then it will carry on with the specified instructions.

The important thing to keep in mind is that in our descriptions of Turing machines, this step of checking whether the input string has the correct form (i.e. that it is a valid encoding) will never be explicitly written, and we don’t expect you to explicitly write it either. That being said, be aware that this check is implicitly there.

Fix some alphabet Σ\Sigma.

  • We call a DFA DD satisfiable if there is some input string that DD accepts. In other words, DD is satisfiable if L(D)L(D) \neq \varnothing.

  • We say that a DFA DD self-accepts if DD accepts the string D\left \langle D \right\rangle. In other words, DD self-accepts if DL(D)\left \langle D \right\rangle \in L(D).

We define the following languages: ACCEPTSDFA={D,x:D is a DFA that accepts the string x},SADFA=SELF-ACCEPTSDFA={D:D is a DFA that self-accepts},SATDFA={D:D is a satisfiable DFA},NEQDFA={D1,D2:D1 and D2 are DFAs with L(D1)L(D2)}.\begin{aligned} \text{ACCEPTS}_{\text{DFA}}& = \{\left \langle D,x \right\rangle : \text{$D$ is a DFA that accepts the string $x$}\}, \\ \text{SA}_{\text{DFA}}= \text{SELF-ACCEPTS}_{\text{DFA}}& = \{\left \langle D \right\rangle : \text{$D$ is a DFA that self-accepts}\}, \\ \text{SAT}_{\text{DFA}}& = \{\left \langle D \right\rangle : \text{$D$ is a satisfiable DFA} \}, \\ \text{NEQ}_{\text{DFA}}& = \{ \left \langle D_1, D_2 \right\rangle : \text{$D_1$ and $D_2$ are DFAs with $L(D_1) \neq L(D_2)$} \}.\end{aligned}

The languages ACCEPTSDFA\text{ACCEPTS}_{\text{DFA}} and SADFA\text{SA}_{\text{DFA}} are decidable.

The language SATDFA\text{SAT}_{\text{DFA}} is decidable.

The language NEQDFA\text{NEQ}_{\text{DFA}} is decidable.

Suppose LL and KK are two languages and KK is decidable. We say that solving LL reduces to solving KK if given a decider MKM_K for KK, we can construct a decider for LL that uses MKM_K as a subroutine (i.e. helper function), thereby establishing LL is also decidable. For example, the proof of Theorem (NEQDFA\text{NEQ}_{\text{DFA}} is decidable) shows that solving NEQDFA\text{NEQ}_{\text{DFA}} reduces to solving SATDFA\text{SAT}_{\text{DFA}}.

A reduction is conceptually straightforward but also a powerful tool to expand the landscape of decidable languages.

  1. Let L={D1,D2:D1 and D2 are DFAs with L(D1)L(D2)}L = \{\left \langle D_1, D_2 \right\rangle: \text{$D_1$ and $D_2$ are DFAs with $L(D_1) \subsetneq L(D_2)$}\}. Show that LL is decidable.

  2. Let K={D:D is a DFA that accepts wR whenever it accepts w}K = \{\left \langle D \right\rangle: \text{$D$ is a DFA that accepts $w^R$ whenever it accepts $w$}\}, where wRw^R denotes the reversal of ww. Show that KK is decidable. For this problem, you can use the fact that given a DFA DD, there is an algorithm to construct a DFA DD' such that L(D)=L(D)R={wR:wL(D)}L(D') = L(D)^R = \{w^R : w \in L(D)\}.

We can relax the notion of a TM MM solving a language LL in the following way. We say that MM semi-decides LL if for all wΣw \in \Sigma^*, wLM(w) accepts.w \in L \quad \Longleftrightarrow \quad M(w) \text{ accepts}. Equivalently,

  • if wLw \in L, then MM accepts ww (i.e. M(w)=1M(w) = 1);

  • if w∉Lw \not \in L, then MM does not accept ww (i.e. M(w){0,}M(w) \in \{0,\infty\}).

(Note that if w∉Lw \not \in L, M(w)M(w) may loop forever.)

We call MM a semi-decider for LL.

A language LL is called semi-decidable if there exists a TM that semi-decides LL.

We denote by RE\mathsf{RE} the set of all semi-decidable languages (over the default alphabet Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}).

A language LL is decidable if and only if both LL and L=ΣL\overline{L} = \Sigma^* \setminus L are semi-decidable.

Observe that any regular language is decidable since for every DFA, we can construct a TM with the same behavior (we ask you to make this precise as an exercise below). Also observe that if a language is decidable, then it is also semi-decidable. Based on these observations, we know REGRREALL\mathsf{REG}\subseteq \mathsf{R}\subseteq \mathsf{RE}\subseteq \mathsf{ALL}.

image

Furthermore, we know {0n1n:nN}\{{\texttt{0}}^n {\texttt{1}}^n : n \in \mathbb N\} is decidable, but not regular. Therefore we know that the first inclusion above is strict. We will next investigate whether the other inclusions are strict or not.

  1. What are the 77 components in the formal definition of a Turing machine?

  2. True or false: It is possible that in the definition of a TM, Σ=Γ\Sigma = \Gamma, where Σ\Sigma is the input alphabet, and Γ\Gamma is the tape alphabet.

  3. True or false: On every valid input, any TM either accepts or rejects.

  4. In the 77-tuple definition of a TM, the tape does not appear. Why is this the case?

  5. What is the set of all possible inputs for a TM M=(Q,Σ,Γ,δ,q0,qacc,qrej)M = (Q, \Sigma, \Gamma, \delta, q_0, q_\text{acc}, q_\text{rej})?

  6. In the definition of a TM, the number of states is restricted to be finite. What is the reason for this restriction?

  7. State 4 differences between TMs and DFAs.

  8. True or false: Consider a TM such that the starting state q0q_0 is also the accepting state qaccq_\text{acc}. It is possible that this TM does not halt on some inputs.

  9. Let D=(Q,Σ,δ,q0,F)D = (Q, \Sigma, \delta, q_0, F) be a DFA. Define a decider TM MM (specifying the components of the 7-tuple) such that L(M)=L(D)L(M) = L(D).

  10. What are the 3 components of a configuration of a Turing machine? How is a configuration typically written?

  11. True or false: We say that a language KK is a decidable language if there exists a Turing machine MM such that K=L(M)K = L(M).

  12. What is a decider TM?

  13. True or false: For each decidable language, there is exactly one TM that decides it.

  14. What do we mean when we say “a high-level description of a TM”?

  15. What is a universal TM?

  16. True or false: A universal TM is a decider.

  17. What is the Church-Turing thesis?

  18. What is the significance of the Church-Turing thesis?

  19. True or false: LΣL \subseteq \Sigma^* is undecidable if and only if Σ\L\Sigma^* \backslash L is undecidable.

  20. Is the following statement true, false, or hard to determine with the knowledge we have so far? \varnothing is decidable.

  21. Is the following statement true, false, or hard to determine with the knowledge we have so far? Σ\Sigma^* is decidable.

  22. Is the following statement true, false, or hard to determine with the knowledge we have so far? The language {M:M is a TM with L(M)}\{\left \langle M \right\rangle : \text{$M$ is a TM with $L(M) \neq \varnothing$}\} is decidable.

  23. True or false: Let L{0,1}L \subseteq \{{\texttt{0}},{\texttt{1}}\}^* be defined as follows: L={{0n1n:nN}if the Goldbach conjecture is true;{12n:nN}otherwise.L = \begin{cases} \{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\} & \text{if the Goldbach conjecture is true;} \\ \{{\texttt{1}}^{2^n} : n \in \mathbb N\} & \text{otherwise.} \end{cases} LL is decidable. (Feel free to Google what Goldbach conjecture is.)

  24. Let LL and KK be two languages. What does “LL reduces to KK” mean? Give an example of LL and KK such that LL reduces to KK.

Here are the important things to keep in mind from this chapter.

  1. Understand the key differences and similarities between DFAs and TMs are.

  2. Given a TM, describe in English the language that it solves.

  3. Given the description of a decidable language, come up with a TM that decides it. Note that there are various ways we can describe a TM (what we refer to as the low level, medium level, and the high level representations). You may need to present a TM in any of these levels.

  4. What is the definition of a configuration and why is it important?

  5. The TM computational model describes a very simple programming language.

  6. Even though the definition of a TM is far simpler than any other programming language that you use, convince yourself that it is powerful enough to capture any computation that can be expressed in your favorite programming language. This is a fact that can be proved formally, but we do not do so since it would be extremely long and tedious.

  7. What is the Church-Turing thesis and what does it imply? Note that we do not need to invoke the Church-Turing thesis for the previous point. The Church-Turing thesis is a much stronger claim. It captures our belief that any kind of algorithm that can be carried out by any kind of a natural process can be simulated on a Turing machine. This is not a mathematical statement that can be proved but rather a statement about our universe and the laws of physics.

  8. The set of Turing machines is encodable (i.e. every TM can be represented using a finite-length string). This implies that an algorithm (or Turing machine) can take as input (the encoding of) another Turing machine. This idea is the basis for the universal Turing machine. It allows us to have one (universal) machine/algorithm that can simulate any other machine/algorithm that is given as input.

  9. This chapter contains several algorithms that take encodings of DFAs as input. So these are algorithms that take as input the text representations of other algorithms. There are several interesting questions that one might want to answer given the code of an algorithm (i.e. does it accept a certain input, do two algorithms with different representations solve exactly the same computational problem, etc.) We explore some of these questions in the context of DFAs. It is important you get used to the idea of treating code (which may represent a DFA or a TM) as data/input.

  10. This chapter also introduces the idea of a reduction (the idea of solving a computational problem by using, as a helper function, an algorithm that solves a different problem). Reductions will come up again in future chapters, and it is important that you have a solid understanding of what it means.

Let XX and YY be two (possibly infinite) sets.

  • A function f:XYf: X \to Y is called injective if for any x,xXx,x' \in X such that xxx \neq x', we have f(x)f(x)f(x) \neq f(x'). We write XYX \hookrightarrow Y if there exists an injective function from XX to YY.

  • A function f:XYf: X \to Y is called surjective if for all yYy \in Y, there exists an xXx \in X such that f(x)=yf(x) = y. We write XYX \twoheadrightarrow Y if there exists a surjective function from XX to YY.

  • A function f:XYf: X \to Y is called bijective (or one-to-one correspondence) if it is both injective and surjective. We write XYX \leftrightarrow Y if there exists a bijective function from XX to YY.

Let X,YX, Y and ZZ be three (possibly infinite) sets. Then,

  • XYX \hookrightarrow Y if and only if YXY \twoheadrightarrow X;

  • if XYX \hookrightarrow Y and YZY \hookrightarrow Z, then XZX \hookrightarrow Z;

  • XYX \leftrightarrow Y if and only if XYX \hookrightarrow Y and YXY \hookrightarrow X.

Let XX and YY be two sets.

  • We write X=Y|X| = |Y| if XYX \leftrightarrow Y.

  • We write XY|X| \leq |Y| (or YX|Y| \geq |X|) if XYX \hookrightarrow Y, or equivalently, if YXY \twoheadrightarrow X.

  • We write X<Y|X| < |Y| (or Y>X|Y| > |X|) if it is not the case that XY|X| \geq |Y|.

Theorem (Relationships between different types of functions) justifies the use of the notation ==, \leq, \geq, << and >>. The properties we would expect to hold for this type of notation indeed do hold. For example,

  • X=Y|X| = |Y| if and only if XY|X| \leq |Y| and YX|Y| \leq |X|,

  • if XYZ|X| \leq |Y| \leq |Z|, then XZ|X| \leq |Z|,

  • if XY<Z|X| \leq |Y| < |Z|, then X<Z|X| < |Z|, and so on.

If XYX \subseteq Y, then XY|X| \leq |Y| since the identity function that maps xXx \in X to xYx \in Y is an injection.

Let S={0,1,4,9,}\mathbb S= \{0, 1, 4, 9,\ldots\} be the set of squares. Then S=N|\mathbb S| = |\mathbb N|.

Let Z\mathbb Z be the set of integers. Then Z=N|\mathbb Z| = |\mathbb N|.

Let Z×Z\mathbb Z\times \mathbb Z be the set of tuples with integer coordinates. Then Z×Z=N|\mathbb Z\times \mathbb Z| = |\mathbb N|.

Let Q\mathbb Q be the set of rational numbers. Then Q=N|\mathbb Q| = |\mathbb N|.

Let Σ\Sigma be a finite non-empty set. Then Σ=N|\Sigma^*| = |\mathbb N|.

If SS is an infinite set and SN|S| \leq |\mathbb N|, then S=N|S| = |\mathbb N|.

Let P={2,3,5,7,}\mathbb{P} = \{2, 3, 5, 7, \ldots\} be the set of prime numbers. Then P=N|\mathbb{P}| = |\mathbb N|.

  • A set SS is called countable if SN|S| \leq |\mathbb N|.

  • A set SS is called countably infinite if it is countable and infinite.

  • A set SS is called uncountable if it is not countable, i.e. S>N|S| > |\mathbb N|.

Theorem (No cardinality strictly between finite and N|\mathbb N|) implies that if SS is countable, there are two options: either SS is finite, or S=N|S| = |\mathbb N|.

Recall that a set SS is encodable if for some alphabet Σ\Sigma, there is an injection from SS to Σ\Sigma^*, or equivalently, SΣ|S| \leq |\Sigma^*| (Definition (Encoding elements of a set)). Since N=Σ|\mathbb N| = |\Sigma^*|, we see that a set is countable if and only if it is encodable. Therefore, one can show that a set is countable by showing that it is encodable. We will call this the “CS method” for showing countability.

The standard definition of countability (SN|S| \leq |\mathbb N|) highlights the following heuristic.

  • If you can list the elements of SS in a way that every element appears in the list eventually, then SS is countable.

The encodability definition highlights another heuristic that is familiar in computer science.

  • If you can “write down” each element of SS using a finite number of symbols, then SS is countable.

The set of all polynomials in one variable with rational coefficients is countable.

Show that the following sets are countable using the CS method.

  1. Z×Z×Z\mathbb Z\times \mathbb Z\times \mathbb Z.

  2. The set of all functions f:SNf: S \to \mathbb N, where SS is some fixed finite set.

The following are some examples of how various mathematical objects have a natural representation as a function.

  • Sets. A set SXS \subseteq X can be viewed as a function fS:X{0,1}f_S: X \to \{0,1\}, where fS(x)=1f_S(x) = 1 if and only if xSx \in S. This is called the characteristic function of the set. Recall that we made this observation in Important Note (Correspondence between decision problems and languages).

  • Finite sequences. A sequence ss of length kk with elements from a set YY can be viewed as a function fs:{0,1,2,,k1}Yf_s : \{0,1,2,\ldots,k-1\} \to Y, where fs(i)f_s(i) is the ii’th element of the sequence (starting counting from 00). In other words, given fsf_s, the corresponding sequence is (fs(0),fs(1),,fs(k1))(f_s(0), f_s(1), \ldots, f_s(k-1)).

  • Infinite sequences. Similarly, an infinite-length sequence ss with elements from YY can be viewed as a function fs:NYf_s : \mathbb N\to Y.

  • Numbers. Numbers can be viewed as functions since the binary representation of a number is a sequence of bits (possibly infinite-length).

    For example, all real numbers between 00 and 11 (inclusive) is denoted by [0,1][0,1]. Every function f:N{0,1}f : \mathbb N\to \{0,1\} represents a real number in [0,1][0,1] in binary, namely 0.f(0)f(1)f(2)0.f(0)f(1)f(2)\ldots.

Let XX be any set, and let F\mathcal{F} be a set of functions f:XYf: X \to Y where Y2|Y| \geq 2. If XF|X| \geq |\mathcal{F}|, we can construct a function fD:XYf_D : X \to Y such that fD∉Ff_D \not \in \mathcal{F}.

When we apply the above lemma to construct an explicit fD∉Ff_D \not\in \mathcal{F}, we call that diagonalizing against the set F\mathcal{F}. And we call fDf_D a diagonal element. Typically there are several choices for the definition of fDf_D:

  • Different injections ϕ:FX\phi : \mathcal{F}\to X can lead to different diagonal elements.

  • If Y>2|Y| > 2, we have more than one choice on what value we assign to fD(xf)f_D(x_f) that makes fD(xf)f(xf)f_D(x_f) \neq f(x_f) (here xfx_f denotes ϕ(f)\phi(f)).

  • If there are elements xXx \in X not in the range of ϕ\phi, then we can define fD(x)f_D(x) any way we like.

Note that diagonalizing against a set F\mathcal{F} produces an explicit function fDf_D not in F\mathcal{F}. Therefore, in situations where you wish to find an explicit object that is not in a given set, you should consider if diagonalization could be applied.

If F\mathcal{F} is the set of all functions f:XYf: X \to Y (and Y2|Y| \geq 2), then X<F|X| < |\mathcal{F}|.

Using the corollary above, give an example of an uncountable set.

For any set XX, X<(X)|X| < |\wp(X)|.

In the famous Russell’s Paradox, we consider the set of all sets that do not contain themselves. That is, we consider D={set S:S∉S}.D = \{\text{set } S : S \not \in S\}. Then we ask whether DD is in DD or not. If DDD \in D, then by the definition of DD, D∉DD \not \in D. And if D∉DD \not \in D, again by the definition of DD, DDD \in D. Either way we get a contradiction. Therefore, the conclusion is that such a set DD should not be mathematically definable.

This paradox is also an application of diagonalization. For any set XX, we cannot diagonalize against the set of all functions f:X{0,1}f : X \to \{0,1\}. If we imagine diagonalizing against this set, where XX is the set of all sets, the natural diagonal element fDf_D that comes out of diagonalization is precisely the characteristic function of the set DD defined above.

For any infinite set SS, (S)\wp(S) is uncountable. In particular, (N)\wp(\mathbb N) is uncountable.

In a common scenario where diagonalization is applied, both F\mathcal{F} and XX are countably infinite sets. So we can list the elements of F\mathcal{F} as f1,  f2,  f3,  f_1,\; f_2,\; f_3,\; \ldots as well as the elements of XX as x1,  x2,  x3,  x_1,\; x_2,\; x_3,\; \ldots Then for all ii, define fD(xi)f_D(x_i) such that fD(xi)fi(xi)f_D(x_i) \neq f_i(x_i). If Y={0,1}Y = \{0,1\}, for example, fD(xi)=not  fi(xi)f_D(x_i) = \textbf{not} \; f_i(x_i). The construction of fDf_D can be nicely visualized with a table, as shown below. Here, an entry corresponding to row fif_i and column xjx_j contains the value fi(xj)f_i(x_j).

image

By construction, the diagonal element fDf_D differs from every fif_i, i{1,2,3,}i \in \{1,2,3,\ldots\}. In particular, it differs from fif_i with respect to the input xix_i.

There exists an irrational real number, that is, there exists a number in RQ\mathbb R\setminus \mathbb Q.

Prove that R\mathbb R is uncountable.

Let Σ\Sigma be some finite alphabet. We denote by Σ\Sigma^\infty the set of all infinite length words over the alphabet Σ\Sigma.

The set {0,1}\{{\texttt{0}},{\texttt{1}}\}^\infty is uncountable.

Prove that if XX is uncountable and XYX \subseteq Y, then YY is also uncountable.

If we want to show that a set XX is uncountable, it suffices to show that XY|X| \geq |Y| for some uncountable set YY. Typically, a good choice for such a YY is {0,1}\{{\texttt{0}},{\texttt{1}}\}^\infty. And one strategy for establishing X{0,1}|X| \geq |\{{\texttt{0}},{\texttt{1}}\}^\infty| is to identify a subset SS of XX such that S{0,1}S \leftrightarrow\{{\texttt{0}},{\texttt{1}}\}^\infty.

Show that the following sets are uncountable.

  1. The set of all bijective functions from N\mathbb N to N\mathbb N.

  2. {x1x2x3{1,2}: for all n1  i=1nxi≢0mod  4}\{x_1x_2 x_3 \ldots \in \{{\texttt{1}},{\texttt{2}}\}^\infty : \text{ for all $n \geq 1$, $\; \sum_{i=1}^n x_i \not\equiv 0 \mod 4$}\}

  1. For sets X,YX, Y, what are the definitions of XY|X| \leq |Y|, XY|X| \geq |Y|, X=Y|X| = |Y|, and X<Y|X| < |Y|?

  2. What is the definition of a countable set?

  3. What is the CS method for showing that a set is countable?

  4. True or false: There exists an infinite set SS such that S<N|S| < |\mathbb N|.

  5. True or false: {0,1}{0,1}=\{{\texttt{0}},{\texttt{1}}\}^* \cap \{{\texttt{0}},{\texttt{1}}\}^\infty = \varnothing.

  6. True or false: {0,1,2}=Q×Q|\{{\texttt{0}},{\texttt{1}},{\texttt{2}}\}^*| = |\mathbb Q\times \mathbb Q|.

  7. State the Diagonalization Lemma and specify how the diagonal function fD:XYf_D : X \to Y is constructed (the construction should make sense for any XX so do not assume XX is finite or countably infinite).

  8. What is the connection between Diagonalization Lemma and uncountability?

  9. What is Cantor’s Theorem?

  10. True or false: ({0,1})=(({0,1}))|\wp(\{{\texttt{0}},{\texttt{1}}\}^\infty)| = |\wp(\wp(\{{\texttt{0}},{\texttt{1}}\}^\infty))|.

  11. True or false: There is a surjection from {0,1}\{{\texttt{0}},{\texttt{1}}\}^\infty to {0,1,2,3}\{{\texttt{0}},{\texttt{1}},{\texttt{2}},{\texttt{3}}\}^\infty.

  12. True or false: Let Σ\Sigma be an alphabet. The set of encodings mapping N\mathbb N to Σ\Sigma^* is countable.

  13. True or false: Let Σ={1}\Sigma = \{{\texttt{1}}\} be a unary alphabet. The set of all languages over Σ\Sigma is countable.

  14. State a technique for proving that a given set is uncountable.

Here are the important things to keep in mind from this chapter.

  1. The definitions of injective, surjective and bijective functions are fundamental.

  2. The concepts of countable and uncountable sets, and their precise mathematical definitions.

  3. When it comes to showing that a set is countable, the CS method is the best choice, almost always.

  4. The Diagonalization Lemma is one of the most important and powerful techniques in all mathematics.

  5. When showing that a set is uncountable, establishing an injection from {0,1}\{0,1\}^\infty (or a surjection to {0,1}\{{\texttt{0}},{\texttt{1}}\}^\infty) is one of the best techniques you can use.

Fix some input alphabet and tape alphabet. The set of all Turing machines T={M:M is a TM}\mathcal{T}= \{M : M \text{ is a TM}\} is countable.

Fix some alphabet Σ\Sigma. There are languages LΣL \subseteq \Sigma^* that are not semi-decidable (and therefore undecidable as well).

In the previous chapter, we presented diagonalization with respect to a set of functions. However, Lemma (Diagonalization) can be applied in a slightly more general setting: it can be applied when F\mathcal{F} is a set of objects that map elements of XX to elements of YY (e.g. F\mathcal{F} can be a set of Turing machines which map elements of Σ\Sigma^* to elements of {0,1,}\{0,1,\infty\}). In other words, elements of F\mathcal{F} do not have to be functions as long as they are “function-like”. Once diagonalization is applied, one gets an explicit function fD:XYf_D : X \to Y such that no object in F\mathcal{F} matches the input/output behavior of fDf_D. We will illustrate this in the proof of the next theorem.

We say that a TM MM self-accepts if M(M)=1M(\left \langle M \right\rangle) = 1. The self-accepts problem is defined as the decision problem corresponding to the language SATM=SELF-ACCEPTSTM={M:M is a TM which self-accepts}.\text{SA}_{\text{TM}}= \text{SELF-ACCEPTS}_{\text{TM}}= \{\left \langle M \right\rangle : \text{$M$ is a TM which self-accepts} \}. The complement problem corresponds to the language SATM=SELF-ACCEPTSTM={M:M is a TM which does not self-accept}.\overline{\text{SA}_{\text{TM}}}= \overline{\text{SELF-ACCEPTS}_{\text{TM}}}= \{\left \langle M \right\rangle : \text{$M$ is a TM which does not self-accept}\}. Note that if a TM MM does not self-accept, then M(M){0,}M(\left \langle M \right\rangle) \in \{0, \infty\}.

The language SATM\overline{\text{SA}_{\text{TM}}} is undecidable.

Note that the proof of the above theorem establishes something stronger: the language SATM\overline{\text{SA}_{\text{TM}}} is not semi-decidable. To see this, recall that if MM is a semi-decider for fDf_D, by definition, this implies that for all xΣx \in \Sigma^*:

  • if fD(x)=1f_D(x) = 1, then M(x)=1M(x) = 1,

  • if fD(x)=0f_D(x) = 0, then M(x){0,}M(x) \in \{0, \infty\}.

However, by construction of fDf_D, we know that for input x=Mx = \left \langle M \right\rangle, if fD(M)=1f_D(\left \langle M \right\rangle) = 1 then M(M){0,}M(\left \langle M \right\rangle) \in \{0, \infty\}, and if fD(M)=0f_D(\left \langle M \right\rangle) = 0, then M(M)=1M(\left \langle M \right\rangle) = 1. So for all TMs MM, MM fails to semi-decide fDf_D on input x=Mx = \left \langle M \right\rangle.

The language SATM\text{SA}_{\text{TM}} is undecidable.

The general principle that the above theorem highlights is that since decidable languages are closed under complementation, if LL is an undecidable language, then L=ΣL\overline{L} = \Sigma^* \setminus L must also be undecidable.

We define the following languages: ACCEPTSTM={M,x:M is a TM which accepts input x},HALTSTM={M,x:M is a TM which halts on input x},\begin{aligned} \text{ACCEPTS}_{\text{TM}}& = \{\left \langle M,x \right\rangle : \text{$M$ is a TM which accepts input $x$}\}, \\ \text{HALTS}_{\text{TM}}& = \{\left \langle M,x \right\rangle : \text{$M$ is a TM which halts on input $x$} \}, \\ \end{aligned}

The language ACCEPTSTM\text{ACCEPTS}_{\text{TM}} is undecidable.

The language HALTSTM\text{HALTS}_{\text{TM}} is undecidable.

The languages SATM\text{SA}_{\text{TM}}, ACCEPTSTM\text{ACCEPTS}_{\text{TM}}, and HALTSTM\text{HALTS}_{\text{TM}} are all semi-decidable, i.e., they are all in RE\mathsf{RE}.

RRE\mathsf{R}\neq \mathsf{RE}.

In the last section, we have used the same technique in all the proofs. It will be convenient to abstract away this technique and give it a name. Fix some alphabet Σ\Sigma. Let LL and KK be two languages. We say that LL reduces to KK, written LKL \leq K, if we are able to do the following: assume KK is decidable (for the sake of argument), and then show that LL is decidable by using the decider for KK as a black-box subroutine (i.e. a helper function). Here the languages LL and KK may or may not be decidable to begin with. But observe that if LKL \leq K and KK is decidable, then LL is also decidable. Or in other words, if LKL \leq K and KRK \in \mathsf{R}, then LRL \in \mathsf{R}. Equivalently, taking the contrapositive, if LKL \leq K and LL is undecidable, then KK is also undecidable. So when LKL \leq K, we think of KK as being at least as hard as LL with respect to decidability (or that LL is no harder than KK), which justifies using the less-than-or-equal-to sign.

image

Even though in the diagram above we have drawn one instance of MKM_K being called within MLM_L, MLM_L is allowed to call MKM_K multiple times with different inputs.

In the literature, the above idea is formalized using the notion of a Turing reduction (with the corresponding symbol T\leq_T). In order to define it formally, we need to define Turing machines that have access to an oracle. See Section (Bonus: Oracle Turing Machines) for details.

Note the difference between the notations LK|L| \leq |K| and LKL \leq K. These have very different meanings. The notation LKL \leq K only applies to languages and denotes a reduction. The notation LK|L| \leq |K| applies to any two sets LL and KK and denotes the existence of an injective function from the set LL to the set KK.

Many reductions LKL \leq K have the following very special structure. The TM MLM_L is such that on input xx, it first transforms xx into a new string y=f(x)y = f(x) by applying some computable transformation ff. Then f(x)f(x) is fed into MKM_K. The output of MLM_L is the same as the output of MKM_K. In other words ML(x)=MK(f(x))M_L(x) = M_K(f(x)).

image

These special kinds of reductions are called mapping reductions (or many-one reductions), with the corresponding notation LmKL \leq_m K. Note that the reduction we have seen from SATM\text{SA}_{\text{TM}} to ACCEPTSTM\text{ACCEPTS}_{\text{TM}} is a mapping reduction. However, the reduction from SATM\overline{\text{SA}_{\text{TM}}} to SATM\text{SA}_{\text{TM}} is not a mapping reduction because the output of MKM_K is flipped, or in other words, we have ML(x)=not MK(f(x))M_L(x) = \textbf{not } M_K(f(x)).

Almost all the reductions in this section will be mapping reductions.

Note that to specify a mapping reduction from LL to KK, all one needs to do is specify a TM MfM_f that computes f:ΣΣf : \Sigma^* \to \Sigma^* with the property that for any xΣx \in \Sigma^*, xLx \in L if and only if f(x)Kf(x) \in K.

image

A TM MM is satisfiable if there is some input string that MM accepts. In other words, MM is satisfiable if L(M)L(M) \neq \varnothing. We now define the following languages: SATTM={M:M is a satisfiable TM},NEQTM={M1,M2:M1 and M2 are TMs with L(M1)L(M2)},HALTS-EMPTYTM={M:M is a TM such that M(ϵ) halts},FINITETM={M:M is a TM such that L(M) is finite}.\begin{aligned} \text{SAT}_{\text{TM}}& = \{\left \langle M \right\rangle : \text{$M$ is a satisfiable TM}\}, \\ \text{NEQ}_{\text{TM}}& = \{\left \langle M_1,M_2 \right\rangle : \text{$M_1$ and $M_2$ are TMs with $L(M_1) \neq L(M_2)$}\}, \\ \text{HALTS-EMPTY}_{\text{TM}}& = \{\left \langle M \right\rangle : \text{$M$ is a TM such that $M(\epsilon)$ halts}\}, \\ \text{FINITE}_{\text{TM}}& = \{\left \langle M \right\rangle : \text{$M$ is a TM such that $L(M)$ is finite}\}.\end{aligned}

The language SATTM\text{SAT}_{\text{TM}} is undecidable.

The language NEQTM\text{NEQ}_{\text{TM}} is undecidable.

Show that the following languages are undecidable.

  1. HALTS-EMPTYTM={M: M is a TM and M(ϵ) halts}\text{HALTS-EMPTY}_{\text{TM}}= \{\left \langle M \right\rangle : \text{ $M$ is a TM and $M(\epsilon)$ halts}\}

  2. FINITETM={M:M is a TM that accepts finitely many strings}\text{FINITE}_{\text{TM}}= \{\left \langle M \right\rangle : \text{$M$ is a TM that accepts finitely many strings}\}

SATTMHALTSTM\text{SAT}_{\text{TM}}\leq \text{HALTS}_{\text{TM}}.

Describe a TM that semi-decides SATTM\text{SAT}_{\text{TM}}. Proof of correctness is not required.

Let L,K{0,1}L, K \subseteq \{{\texttt{0}},{\texttt{1}}\}^* be languages. Prove or disprove the following claims.

  1. If LKL \leq K then KLK \leq L.

  2. If LKL \leq K and KK is regular, then LL is regular.

In all the reductions LKL \leq K presented in this chapter, the TM MLM_L calls the TM MKM_K exactly once. (And in fact, almost all the reductions in this chapter are mapping reductions). Present a reduction from HALTSTM\text{HALTS}_{\text{TM}} to ACCEPTSTM\text{ACCEPTS}_{\text{TM}} in which MHALTSM_\text{HALTS} calls MACCEPTSM_\text{ACCEPTS} twice, and both calls/outputs are used in a meaningful way.

If a language LL is semi-decidable but undecidable, then L\overline{L} is not semi-decidable. In other words, if LRERL \in \mathsf{RE}\setminus \mathsf{R}, then L∉RE\overline{L} \not \in \mathsf{RE}.

The languages SATM\overline{\text{SA}_{\text{TM}}}, ACCEPTSTM\overline{\text{ACCEPTS}_{\text{TM}}}, and HALTSTM\overline{\text{HALTS}_{\text{TM}}} are not semi-decidable, i.e. they are not in RE\mathsf{RE}.

The relationship among the complexity classes REG\mathsf{REG}, R\mathsf{R}, and RE\mathsf{RE} and can be summarized as follows.

  • REGR\mathsf{REG}\subsetneq \mathsf{R} since {0n1n:nN}\{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\} is not regular but is decidable.

  • There are languages like SATM\text{SA}_{\text{TM}}, ACCEPTSTM\text{ACCEPTS}_{\text{TM}}, HALTSTM\text{HALTS}_{\text{TM}}, SATTM\text{SAT}_{\text{TM}} that are undecidable (not in R\mathsf{R}), but are semi-decidable (in RE\mathsf{RE}).

  • The complements of the languages above are not in RE\mathsf{RE}.

image

Given some decision problem g:Σ{0,1}g : \Sigma^* \to \{0,1\}, we define a gg-oracle Turing machine (or simply gg-OTM, or just OTM if gg is understood from the context) as a Turing machine with an additional oracle instruction which can be used in the TM’s high-level description. The additional instruction has the following form. For any string xx and any variable name yy of our choosing, we are allowed to use y=g(x).y = g(x). After this instruction, the variable yy holds the value g(x)g(x) and can be used in the rest of the Turing machine’s description. In this context, the function gg is called an oracle.

We’ll use a superscript of gg (e.g. MgM^g) to indicate that a machine is a gg-OTM.

Similar to TMs, we write Mg(x)=1M^g(x) = 1 if the oracle TM MgM^g, on input xx, accepts. We write Mg(x)=0M^g(x) = 0 if it rejects. And we write Mg(x)=M^g(x) = \infty if it loops forever.

If f:Σ{0,1}f : \Sigma^* \to \{0,1\} is a decision problem such that for all xΣx \in \Sigma^*, Mg(x)=f(x)M^g(x) = f(x), then we say MgM^g describes ff (and the language corresponding to ff).

Let LL and KK be two languages, and let kk be the decision problem corresponding to KK. We say that LL Turing-reduces to KK, written LTKL \leq_T K, if there is an oracle Turing machine MkM^k describing LL.

Observe that if LTKL \leq_T K and KK is decidable, then LL is decidable. This is because if LTKL \leq_T K, then by definition, there is an oracle Turing machine MkM^k that describes LL. If KK is decidable, there is a TM MKM_K deciding KK. Then in MkM^k, if we replace all calls to the oracle kk with MKM_K, we end up with a TM deciding LL.

It follows (taking the contrapositive of the observation) that if LTKL \leq_T K and LL is undecidable, then KK is undecidable.

As we have already seen, the set of all languages, (Σ)\wp(\Sigma^*), is not encodable. Most languages do not have a finite description. In addition to thinking about the TM model as a computational model, it is useful to also think of it as a finite description model: Every TM finitely describes/defines the language that it decides (or semi-decides).

The TM model is not a comprehensive finite description model. There are many finitely describable languages (e.g., SATM\overline{\text{SA}_{\text{TM}}}, ACCEPTSTM\text{ACCEPTS}_{\text{TM}}, HALTSTM\text{HALTS}_{\text{TM}}) that a TM cannot finitely describe. The oracle TM model extends the reach of the TM model as a finite description model.

If we inspect the proof of Turing’s 1st Undecidability Theorem, we can see that we never really made use of the fact that TMs represent a computational model. The crucial part of the argument was that the set of TMs is encodable/countable. Or in other words, we only needed that the TM model is a finite description model for languages/decision problems. This means that for other finite description models, like the oracle TM model, we can carry out the same argument.

An important lesson here is that the main limitation Turing’s 1st Undecidability Theorem highlights about TMs is not that they form a computing model, but that they form a finite description model.

Fix an oracle gg. Then we can define the languages ACCEPTSOTM\text{ACCEPTS}_{\text{OTM}}, HALTSOTM\text{HALTS}_{\text{OTM}}, SAOTM\text{SA}_{\text{OTM}}, and so on, analogous to how they are defined for TMs. For example, HALTSOTM={Mg,x:Mg is a g-OTM which halts on input x}.\text{HALTS}_{\text{OTM}}= \{\left \langle M^g,x \right\rangle : \text{$M^g$ is a $g$-OTM which halts on input $x$} \}.

Fix an oracle gg. Then there is no oracle Turing machine that describes SAOTM\overline{\text{SA}_{\text{OTM}}}.

  1. State 4 languages that are undecidable.

  2. State an undecidable language whose description does not involve Turing machines.

  3. Describe 3 languages that are not semi-decidable.

  4. Describe how one can show that a language is undecidable using the notion of a reduction.

  5. True or false: For languages KK and LL, if KLK \leq L, then LL is undecidable.

  6. True or false: For languages KK and LL, if KLK \leq L and LKL \leq K, then LL and KK are both decidable.

  7. True or false: Fix some alphabet Σ\Sigma. The set of regular languages over Σ\Sigma is countable.

  8. True or false: Fix an alphabet Σ\Sigma. The set of undecidable languages is countable.

  9. True or false: Fix an alphabet Σ\Sigma. The set of decidable languages is infinite.

  10. True or false: If languages KK and LL are both undecidable, then their union is also undecidable.

  11. True or false: If a language LL is undecidable, then LL is infinite.

  12. True or false: Σ\Sigma^* \leq \varnothing.

  13. True or false: HALTSTMΣ\text{HALTS}_{\text{TM}}\leq \Sigma^*.

  14. True or false: Every decidable language reduces to HALTSTM\text{HALTS}_{\text{TM}}.

  15. True or false: For all unary languages LL (i.e. languages over the alphabet Σ={1}\Sigma = \{{\texttt{1}}\}), LL is decidable.

  16. In a previous chapter, we saw that the language SATDFA\text{SAT}_{\text{DFA}} is decidable. In particular, we saw that given the encoding of a DFA DD, we can determine if DD accepts a string by checking if in the state diagram of DD, there is a directed path from the initial state to one of the accepting states.

    In this chapter, we saw that the language SATTM\text{SAT}_{\text{TM}} is undecidable. Show that the strategy above that we used for DFAs does not apply to TMs by drawing the state diagram of a TM MM with L(M)=L(M) = \varnothing, but there is a path from the starting state to the accepting state.

Here are the important things to keep in mind from this chapter.

  1. The existence of undecidable languages follows by a counting argument: The set of all languages is uncountable whereas the set of decidable languages is countable. This shows that almost all languages are undecidable, however, it does not give us an explicit undecidable language.

  2. We can use diagonalization to come up with an explicit language that is undecidable.

  3. Reductions are very important. We have seen in an earlier chapter that reductions can be used to expand the landscape of decidable languages. In this chapter, you see that it can be used to expand the landscape of undecidable languages.

  4. This chapter has many examples of undecidability proofs. It is easy to fall in the trap of trying to memorize the template of such proofs rather than really understanding the logic behind how and why these proofs work. If you struggle to understand these proofs, it is usually a sign that there are gaps in your knowledge from previous chapters. Come talk to us so we can help you identify those gaps.

Note that both Prover\text{Prover} and Verifier\text{Verifier} above represent computational processes. So a key part in formalizing GORM\text{GORM} is the formalization of computation. Luckily, we already have a great model for computation: Turing machines.

Even though our main interest is formalizing all mathematical reasoning, it is useful to build a general framework for formalizing any subset of mathematical reasoning. For instance, we may want to come up with a formalization that captures mathematical reasoning for plane geometry. Or we may want to formalize basic arithmetic. Or we may want to formalize probability theory. For any area AA of mathematical reasoning, we will let GORMA\text{GORM}_A denote Good Old Regular Mathematics restricted to AA. Without the subscript, GORM\text{GORM} represents all mathematical reasoning.

Let AA be some area of mathematics. A proof system PSA\text{PS}_A for AA is a mathematical formalization of GORMA\text{GORM}_A with the following properties.

  • For every statement SS in GORMA\text{GORM}_A with a truth value, there is a precise representation of SS in PSA\text{PS}_A.

  • For every argument PP in GORMA\text{GORM}_A, there is a precise representation of PP in PSA\text{PS}_A.

  • PSA\text{PS}_A specifies a decider TM VV (called a verifier) such that V(S,P)V(\left \langle S,P \right\rangle) accepts if and only if PP is a proof of SS.

Since the verification process represents a computation, there is an implicit requirement that the set of statements in PSA\text{PS}_A as well as the set of proofs in PSA\text{PS}_A are encodable.

Let PS\text{PS} be a proof system and let SS be a statement in PS\text{PS}.

  • Statement SS is provable means there is a proof of SS in PS\text{PS}.

    • If SS is provable, we say PS\text{PS} proves SS.

    • We use Provable()\text{Provable}(\cdot) to denote the function such that Provable(S)\text{Provable}(\left \langle S \right\rangle) returns True if and only if SS is provable. We can view Provable()\text{Provable}(\cdot) as a decision problem.

  • Statement SS is refutable means PS\text{PS} proves the negation of SS, (i.e. ¬S{\neg S}).

  • Statement SS is resolvable means that SS is either provable or refutable.

  • If statement SS is not resolvable, we say SS is independent of PS\text{PS}.

Let PS\text{PS} be a proof system.

  • PS\text{PS} is consistent means for all SS, if SS is provable, then ¬S{\neg S} is not provable. Or equivalently, for all SS, at most one of SS and ¬S{\neg S} is provable.

  • PS\text{PS} is sound means for all SS, if SS is provable, then SS is true (i.e. Provable(S)S\text{Provable}(\left \langle S \right\rangle) \to S). This is equivalent to saying PS\text{PS} does not prove false statements.

  • PS\text{PS} is complete means every statement in PS\text{PS} is resolvable. Or equivalently, for all SS, at least one of SS and ¬S{\neg S} is provable.

  • PS\text{PS} is decidable means that the Provable\text{Provable} decision problem is decidable.

  • If PS\text{PS} is not consistent, then every statement is provable. To see this, suppose SS is such that both SS and ¬S\neg S are provable. And suppose you want to prove some statement TT. Then assume, for the sake of contradiction, ¬T\neg T. Then derive SS as well as ¬S\neg S (both are provable). This is the desired contradiction.

  • If PS\text{PS} is sound, then it must be consistent. So we think of soundness as a stronger requirement.

  • If PS\text{PS} is both consistent and complete, then for all statements SS, exactly one of SS and ¬S{\neg S} is provable.

  • If PS\text{PS} is both sound and complete, then in addition to the observation in the previous point, provable statements would be true. So for all statements SS, we would have SProvable(S)S \leftrightarrow\text{Provable}(\left \langle S \right\rangle), and truth would correspond exactly to provability.

  • It is not hard to see that for any proof system PS\text{PS}, Provable\text{Provable} is semi-decidable. Recalling Definition (TM semi-deciding a language or decision problem), to show that Provable\text{Provable} is semi-decidable, we need to come up with a TM MM such that if SS is provable, M(S)M(\left \langle S \right\rangle) accepts, and if SS is not provable, then M(S)M(\left \langle S \right\rangle) does not accept (i.e. rejects or loops forever). The brute-force search algorithm described below accomplishes this. (Recall VV denotes the verifier for PS\text{PS}.) def    MRESOLVE(S):1.    For k=1,2,3,2.        For every string w of length k:3.            If V(S,w) accepts: return True.4.            If V(¬S,w) accepts: return False.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; M_\text{RESOLVE}(\left \langle S \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{For $k = 1, 2, 3, \ldots$}\\ &{\scriptstyle 2.}~~~~~~~~\texttt{For every string $w$ of length $k$:}\\ &{\scriptstyle 3.}~~~~~~~~~~~~\texttt{If $V(\left \langle S, w \right\rangle)$ accepts: return True.}\\ &{\scriptstyle 4.}~~~~~~~~~~~~\texttt{If $V(\left \langle {\neg S}, w \right\rangle)$ accepts: return False.}\\ \hline\hline \end{aligned} Note that this is not a decider for Provable\text{Provable} because it is possible that SS is not resolvable (i.e. independent), and in that case, MRESOLVE(S)M_\text{RESOLVE}(\left \langle S \right\rangle) would loop forever.

Come up with a proof system PS\text{PS} that formalizes GORM\text{GORM}, and furthermore:

  • Prove that PS\text{PS} is consistent.

  • Prove that PS\text{PS} is complete.

  • Prove that PS\text{PS} is decidable.

Here is a fun question to think about. In Hilbert’s program, why doesn’t Hilbert ask for a proof of soundness and instead asks for a proof of consistency?

It is worth noting that the set of deduction rules that the Hilbert system specifies is known to be complete (this is Gödel’s Completeness Theorem) in the sense that if you are not able to deduce some truth in the Hilbert System, it is not because you did not have enough deduction rules. It must be because you did not have enough axioms (i.e. some truths were not a logical consequence of the axioms).

Every precise mathematical statement and proof that can be expressed in GORM\text{GORM} can be expressed using the ZFC axiomatic proof system. In other words, every GORM\text{GORM}-statement and GORM\text{GORM}-proof has a corresponding ZFC-statement and ZFC-proof.

Note that the Church-Turing Thesis is really a GORA\text{GORA}-to-TM thesis, where GORA\text{GORA} means Good Old Regular Algorithms. It says that any algorithm can be expressed using a TM. Or in other words, every algorithm “compiles down” to a TM.

You should think of GORM\text{GORM}-to-ZFC Thesis in the same light. Every mathematics proof compiles down to a proof in ZFC.

When we are proving properties of algorithms, we don’t really directly work with TMs, but rather appeal to the Church-Turing Thesis: we work with high-level algorithms with the understanding that they can be translated to TMs. Similarly, when we prove properties of mathematical reasoning, we won’t directly work with ZFC, but appeal to the GORM\text{GORM}-to-ZFC Thesis: we will use high-level mathematical statements and proofs with the understanding that they can be translated into ZFC statements and proofs.

ZFC cannot be both sound and complete. In other words, if ZFC is sound, then it must be incomplete.

If ZFC is sound, then the statement SMDS_{M_D} is independent of ZFC.

If MM is a TM such that M(x)M(x) accepts, then the statement M(x) accepts”\text{``}M(x) \text{ accepts}\text{''} is provable. Similarly, if M(x)M(x) rejects, then the statement M(x) rejects”\text{``}M(x) \text{ rejects}\text{''} is provable.

Let MM be a TM such that M(x)M(x) halts. And let S=M(x) accepts”S = \text{``}M(x) \text{ accepts}\text{''}. Then if ZFC is consistent, SMRESOLVE(S)S \leftrightarrow M_\text{RESOLVE}(\left \langle S \right\rangle) (i.e. ZFC is sound with respect to SS).

If ZFC is consistent, then the statement SMDS_{M_D} is independent of ZFC.

If ZFC is consistent, then SMDS_{M_D} is false.

We say proving SS reduces to proving TT (or simply SS reduces to TT) if TST \to S is provable.

If ZFC is consistent, then the statement “ZFC is consistent” is not provable in ZFC.

If ZFC is consistent, the Continuum Hypothesis is independent of ZFC.

If ZFC is sound, then Provable\text{Provable} is undecidable.

For f:R+R+f: \mathbb R^+ \to \mathbb R^+ and g:R+R+g: \mathbb R^+ \to \mathbb R^+, we write f(n)=O(g(n))f(n) = O(g(n)) if there exist constants C>0C > 0 and n0>0n_0 > 0 such that for all nn0n \geq n_0, f(n)Cg(n).f(n) \leq Cg(n). In this case, we say that f(n)f(n) is big-O of g(n)g(n).

Show that 3n2+10n+303n^2 + 10n + 30 is O(n2)O(n^2).

Using the fact that for all m>0m > 0, logm<m\log m < m, show that for all ϵ>0\epsilon > 0 and k>0k > 0, logkn\log^k n is O(nϵ)O(n^\epsilon).

For f:R+R+f: \mathbb R^+ \to \mathbb R^+ and g:R+R+g: \mathbb R^+ \to \mathbb R^+, we write f(n)=Ω(g(n))f(n) = \Omega(g(n)) if there exist constants c>0c > 0 and n0>0n_0 > 0 such that for all nn0n \geq n_0, f(n)cg(n).f(n) \geq cg(n). In this case, we say that f(n)f(n) is big-Omega of g(n)g(n).

Show that n!2n!^2 is Ω(nn)\Omega(n^n).

For f:R+R+f: \mathbb R^+ \to \mathbb R^+ and g:R+R+g: \mathbb R^+ \to \mathbb R^+, we write f(n)=Θ(g(n))f(n) = \Theta(g(n)) if f(n)=O(g(n))andf(n)=Ω(g(n)).f(n) = O(g(n)) \quad \quad \text{and} \quad \quad f(n) = \Omega(g(n)). This is equivalent to saying that there exists constants c,C,n0>0c,C,n_0 > 0 such that for all nn0n \geq n_0, cg(n)f(n)Cg(n).cg(n) \leq f(n) \leq Cg(n). In this case, we say that f(n)f(n) is Theta of g(n)g(n).

For any constant b>1b > 1, logbn=Θ(logn).\log_b n = \Theta(\log n).

Since the base of a logarithm only changes the value of the log function by a constant factor, it is usually not relevant in big-O, big-Omega or Theta notation. So most of the time, when you see a log\log function present inside O()O(\cdot), Ω()\Omega(\cdot), or Θ()\Theta(\cdot), the base will be ignored. E.g. instead of writing lnn=Θ(log2n)\ln n = \Theta(\log_2 n), we actually write lnn=Θ(logn)\ln n = \Theta(\log n). That being said, if the log\log appears in the exponent, the base matters. For example, nlog25n^{\log_2 5} is asymptotically different from nlog35n^{\log_3 5}.

Show that log2(n!)=Θ(nlogn)\log_2 (n!) = \Theta(n \log n).

Suppose we are using some computational model in which what constitutes a step in an algorithm is understood. Suppose also that for any input xx, we have an explicit definition of its length. The worst-case running time of an algorithm AA is a function TA:NNT_A : \mathbb N\to \mathbb N defined by TA(n)=maxinstances/inputs xof length n number of steps A takes on input x.T_A(n) = \max_{\substack{\text{instances/inputs $x$} \\ \text{of length $n$}}} \text{ number of steps $A$ takes on input $x$}. We drop the subscript AA and just write T(n)T(n) when AA is clear from the context.

We use nn to denote the input length. Unless specified otherwise, nn is defined to be the number of bits in a reasonable binary encoding of the input. It is also common to define nn in other ways. For example, if the input is an array or a list, nn can denote the number of elements.

In the Turing machine model, a step in the computation corresponds to one application of the transition function of the machine. However, when measuring running time, often we will not be considering the Turing machine model.

If we don’t specify a particular computational model, by default, our model will be closely related to the Random Access Machine (RAM) model. Compared to TMs, this model aligns better with the architecture of the computers we use today. We will not define this model formally, but instead point out two properties of importance.

First, given a string or an array, accessing any index counts as 1 step.

Second, arithmetic operations count as 1 step as long as the numbers involved are “small”. We say that a number yy is small if it can be upper bounded by a polynomial in nn, the input length. That is, yy is small if there is some constant kk such that yy is O(nk)O(n^k). As an example, suppose we have an algorithm AA that contains a line like x=y+zx = y + z, where yy and zz are variables that hold integer values. Then we can count this line as a single step if yy and zz are both small. Note that whether a number is small or not is determined by the length of the input to the algorithm AA.

We say that a number is large, if it is not small, i.e., if it cannot be upper bounded by a polynomial in nn. In cases where we are doing arithmetic operations involving large numbers, we have to consider the algorithms used for the arithmetic operations and figure out their running time. For example, in the line x=y+zx = y + z, if yy or zz is a large number, we need to specify what algorithm is being used to do the addition and what its running time is. A large number should be treated as a string of digits/characters. Arithmetic operations on large numbers should be treated as string manipulation operations and their running time should be figured out accordingly.

The expression of the running time of an algorithm using big-O, big-Omega or Theta notation is referred to as asymptotic complexity estimate of the algorithm.

Constant time:T(n)=O(1).Logarithmic time:T(n)=O(logn).Linear time:T(n)=O(n).Quadratic time:T(n)=O(n2).Polynomial time:T(n)=O(nk) for some constant k>0.Exponential time:T(n)=O(2nk) for some constant k>0.\begin{aligned} \text{Constant time:} & \quad T(n) = O(1). \\ \text{Logarithmic time:} & \quad T(n) = O(\log n). \\ \text{Linear time:} & \quad T(n) = O(n). \\ \text{Quadratic time:} & \quad T(n) = O(n^2). \\ \text{Polynomial time:} & \quad T(n) = O(n^k) \text{ for some constant $k > 0$}. \\ \text{Exponential time:} & \quad T(n) = O(2^{n^k}) \text{ for some constant $k > 0$}.\end{aligned}

We denote by P\mathsf{P} the set of all languages that can be decided in polynomial-time, i.e., in time O(nk)O(n^k) for some constant k>0k>0.

Suppose that we have an algorithm AA that runs another algorithm AA' once as a subroutine. We know that the running time of AA' is O(nk)O(n^k), k1k \geq 1, and the work done by AA is O(nt)O(n^t), t1t \geq 1, if we ignore the subroutine AA' (i.e., we don’t count the steps taken by AA'). What kind of upper bound can we give for the total running-time of AA (which includes the work done by AA')?

The intrinsic complexity of a computational problem refers to the asymptotic time complexity of the most efficient algorithm that computes the problem.

The intrinsic complexity of L={0k1k:kN}L = \{0^k1^k : k \in \mathbb N\} is Θ(n)\Theta(n).

In the TM model, a step corresponds to one application of the transition function. Show that L={0k1k:kN}L = \{0^k1^k : k \in \mathbb N\} can be decided by a TM in time O(nlogn)O(n \log n). Is this statement directly implied by Proposition (Intrinsic complexity of {0k1k:kN}\{0^k1^k : k \in \mathbb N\})?

Assume the languages L1L_1 and L2L_2 are decidable in polynomial time. Prove or give a counter-example: L1L2L_1L_2 is decidable in polynomial time.

Given a computational problem with an integer input xx, notice that xx is a large number (if xx is nn bits long, its value can be about 2n2^n, so it cannot be upper bounded by a polynomial in nn). Therefore arithmetic operations involving xx cannot be treated as 1-step operations. Computational problems with integer input(s) are the most common examples in which we have to deal with large numbers, and in these situations, one should be particularly careful about analyzing running time.

In the integer addition problem, we are given two nn-bit numbers xx and yy, and the output is their sum x+yx+y. In the integer multiplication problem, we are given two nn-bit numbers xx and yy, and the output is their product xyxy.

Consider the following algorithm for the integer addition problem (we’ll assume the inputs are natural numbers for simplicity).

def    Addition(natural x,  natural y):1.    For i=1 to x:2.        y=y+1.3.    Return y.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{Addition}(\langle \text{natural } x,\; \text{natural } y \rangle):\\ &{\scriptstyle 1.}~~~~\texttt{For $i = 1$ to $x$:}\\ &{\scriptstyle 2.}~~~~~~~~\texttt{$y = y + 1$.}\\ &{\scriptstyle 3.}~~~~\texttt{Return $y$.}\\ \hline\hline\end{aligned}

This algorithm has a loop that repeats xx many times. Since xx is an nn-bit number, the worst-case complexity of this algorithm is Ω(2n)\Omega(2^n).

In comparison, the following well-known algorithm for integer addition has time complexity O(n)O(n).

def    Addition(natural x,  natural y):1.    carry=0.2.    For i=0 to n1:3.        columnSum=x[i]+y[i]+carry.4.        z[i]=columnSum  %  2.5.        carry=(columnSumz[i])/2.6.    z[n]=carry.7.    Return z.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{Addition}(\langle \text{natural } x,\; \text{natural } y \rangle):\\ &{\scriptstyle 1.}~~~~\text{carry} = 0.\\ &{\scriptstyle 2.}~~~~\texttt{For $i = 0$ to $n-1$:}\\ &{\scriptstyle 3.}~~~~~~~~\text{columnSum} = x[i] + y[i] + \text{carry}.\\ &{\scriptstyle 4.}~~~~~~~~z[i] = \text{columnSum} \; \% \;2.\\ &{\scriptstyle 5.}~~~~~~~~\text{carry} = (\text{columnSum} - z[i])/2.\\ &{\scriptstyle 6.}~~~~z[n] = \text{carry}.\\ &{\scriptstyle 7.}~~~~\texttt{Return $z$.}\\ \hline\hline\end{aligned}

Note that the arithmetic operations inside the loop are all O(1)O(1) time since the numbers involved are all bounded (i.e., their values do not depend on nn). Since the loop repeats nn times, the overall complexity is O(n)O(n).

It is easy to see that the intrinsic complexity of integer addition is Ω(n)\Omega(n) since it takes at least nn steps to write down the output, which is either nn or n+1n+1 bits long. Therefore we can conclude that the intrinsic complexity of integer addition is Θ(n)\Theta(n). The same is true for integer subtraction.

Consider the following problem: Given as input a positive integer NN, output a non-trivial factor of NN if one exists, and output False otherwise. Give a lower bound using the Ω()\Omega(\cdot) notation for the running-time of the following algorithm solving the problem:

def    Non-Trivial-Factor(natural N):1.    For i=2 to N1:2.        If N  %  i==0: Return i.3.    Return False.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{Non-Trivial-Factor}(\langle \text{natural } N \rangle):\\ &{\scriptstyle 1.}~~~~\texttt{For $i = 2$ to $N-1$:}\\ &{\scriptstyle 2.}~~~~~~~~\texttt{If $N \;\%\; i == 0$: Return $i$.}\\ &{\scriptstyle 3.}~~~~\texttt{Return False.}\\ \hline\hline\end{aligned}

The grade-school algorithms for the integer multiplication and division problems have time complexity O(n2)O(n^2).

The integer multiplication problem can be solved in time O(n1.59)O(n^{1.59}).

The programming language Python uses Karatsuba algorithm for multiplying large numbers. There are algorithms for multiplying integers that are asymptotically better than the Karatsuba algorithm.

Consider the following computational problem. Given as input a number ANA \in \mathbb{N}, output A1/251\lfloor A^{1/251}\rfloor. Determine whether this problem can be computed in worst-case polynomial-time, i.e. O(nk)O(n^k) time for some constant kk, where nn denotes the number of bits in the binary representation of the input AA. If you think the problem can be solved in polynomial time, give an algorithm in pseudocode, explain briefly why it gives the correct answer, and argue carefully why the running time is polynomial. If you think the problem cannot be solved in polynomial time, then provide a proof.

  1. What is the formal definition of “f=Ω(g)f = \Omega(g)”?

  2. True or false: nlog25=Θ(nlog35)n^{\log_2 5} = \Theta(n^{\log_3 5}).

  3. True or false: nlog2n=Ω(n15251)n^{\log_2 n} = \Omega(n^{15251}).

  4. True or false: 3n=Θ(2n)3^n = \Theta(2^n).

  5. True or false: Suppose f(n)=Θ(g(n))f(n) = \Theta(g(n)). Then we can find constants c>0c > 0 and n0>0n_0 > 0 such that for all nn0n \geq n_0, f(n)=cg(n)f(n) = cg(n).

  6. True or false: f(n)=O(g(n))f(n) = O(g(n)) if and only if g(n)=Ω(f(n))g(n) = \Omega(f(n)).

  7. What is the definition of “worst-case running time of an algorithm AA”?

  8. True or false: Let Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\} and let L={0k:kN+}L = \{{\texttt{0}}^k : k \in \mathbb{N}^+\}. There is a Turing machine AA deciding LL whose running time TAT_A satisfies “TA(n)T_A(n) is O(n)O(n)”.

  9. True or false: Continuing previous question, every Turing machine BB that decides LL has running time TBT_B satisfying “TB(n)T_B(n) is Ω(n)\Omega(n)”.

  10. True or false: The intrinsic complexity of the integer multiplication problem is Θ(nlog23)\Theta(n^{\log_2 3}).

  11. True or false: For languages L1L_1 and L2L_2, if L1L2L_1 \leq L_2 and L2L_2 is polynomial time decidable, then L1L_1 is polynomial time decidable.

  12. Explain under what circumstances we count arithmetic operations as constant-time operations, and under what circumstances we don’t.

  13. Consider the algorithm below. What is nn, the input length? Is the algorithm polynomial time?

    def isPrime(N):
        if (N < 2): return False
        if (N == 2): return True
        if (N mod 2 == 0): return False
        maxFactor = ceiling(N**0.5) # N**0.5 = square root of N
        for factor in range(3,maxFactor+1,2):
            if (N mod factor == 0): return False
        return True

Here are the important things to keep in mind from this chapter.

  1. Both an intuitive and formal understanding of big-O, big-Omega and Theta notations are important.

  2. The definition of “Worst-case running time of an algorithm” is fundamental.

  3. Understanding how to analyze the running time of algorithms that involve integer inputs is one of the main goals of this chapter. The key here is to always keep in mind that if an algorithm has an integer NN as input, the input length nn is not NN, but rather the logarithm of NN. If you are not careful about this, you can fool yourself into thinking that exponential time algorithms are actually polynomial time algorithms.

  4. Make sure you understand when arithmetic operations count as constant time operations and when they do not.

  5. You should be comfortable solving recurrence relations using the tree method, as illustrated in the proof of Theorem (Karatsuba algorithm for integer multiplication).

An undirected graph GG is a pair (V,E)(V,E), where

  • VV is a finite non-empty set called the set of vertices (or nodes),

  • EE is a set called the set of edges, and every element of EE is of the form {u,v}\{u,v\} for distinct u,vVu, v \in V.

Given a graph G=(V,E)G=(V,E), we usually use nn to denote the number of vertices V|V| and mm to denote the number of edges E|E|.

There are two common ways to represent a graph. Let v1,v2,,vnv_1,v_2,\ldots,v_n be some arbitrary ordering of the vertices. In the adjacency matrix representation, a graph is represented by an n×nn \times n matrix AA such that A[i,j]={1if {vi,vj}E,0otherwise.A[i,j] = \begin{cases} 1 & \quad \text{if $\{v_i,v_j\} \in E$,}\\ 0 & \quad \text{otherwise.}\\ \end{cases} The adjacency matrix representation is not always the best representation of a graph. In particular, it is wasteful if the graph has very few edges. For such graphs, it can be preferable to use the adjacency list representation. In the adjacency list representation, you are given an array of size nn and the ii’th entry of the array contains a pointer to a linked list of vertices that vertex ii is connected to via an edge.

In an nn-vertex graph, what is the maximum possible value for the number of edges in terms of nn?

Let G=(V,E)G=(V,E) be a graph, and e={u,v}Ee = \{u,v\} \in E be an edge in the graph. In this case, we say that uu and vv are neighbors or adjacent. We also say that uu and vv are incident to ee. For vVv \in V, we define the neighborhood of vv, denoted N(v)N(v), as the set of all neighbors of vv, i.e. N(v)={u:{v,u}E}N(v) = \{u : \{v,u\} \in E\}. The size of the neighborhood, N(v)|N(v)|, is called the degree of vv, and is denoted by deg(v)\deg(v).

A graph G=(V,E)G = (V,E) is called dd-regular if every vertex vVv \in V satisfies deg(v)=d\deg(v) = d.

Let G=(V,E)G=(V,E) be a graph. Then vVdeg(v)=2m.\sum_{v \in V} \deg(v) = 2m.

Is it possible to have a party with 251 people in which everyone knows exactly 5 other people in the party?

Let G=(V,E)G=(V,E) be a graph. A path of length kk in GG is a sequence of distinct vertices v0,v1,,vkv_0,v_1,\ldots,v_k such that {vi1,vi}E\{v_{i-1}, v_i\} \in E for all i{1,2,,k}i \in \{1,2,\ldots,k\}. In this case, we say that the path is from vertex v0v_0 to vertex vkv_k (or that the path is between v0v_0 and vkv_k).

A cycle of length kk (also known as a kk-cycle) in GG is a sequence of vertices v0,v1,,vk1,v0v_0, v_1, \ldots, v_{k-1}, v_0 such that v0,v1,,vk1v_0, v_1, \ldots, v_{k-1} is a path, and {v0,vk1}E\{v_0,v_{k-1}\} \in E. In other words, a cycle is just a “closed” path. The starting vertex in the cycle is not important. So for example, v1,v2,,vk1,v0,v1v_1, v_2, \ldots, v_{k-1},v_0,v_1 would be considered the same cycle. Also, if we list the vertices in reverse order, we consider it to be the same cycle. For example, v0,vk1,vk2,v1,v0v_0, v_{k-1}, v_{k-2} \ldots, v_1, v_0 represents the same cycle as before.

A graph that contains no cycles is called acyclic.

Let G=(V,E)G = (V,E) be a graph. We say that two vertices in GG are connected if there is a path between those two vertices. A connected graph GG is such that every pair of vertices in GG is connected.

A subset SVS \subseteq V is called a connected component of GG if GG restricted to SS, i.e. the graph G=(S,E={{u,v}E:u,vS})G' = (S, E' = \{\{u,v\} \in E : u,v \in S\}), is a connected graph, and SS is disconnected from the rest of the graph (i.e. {u,v}∉E\{u,v\} \not \in E when uSu \in S and v∉Sv \not \in S). Note that a connected graph is a graph with only one connected component.

Let G=(V,E)G=(V,E) be a graph. The distance between vertices u,vVu, v \in V, denoted dist(u,v)\text{dist}(u,v), is defined to be the length of the shortest path between uu and vv. If u=vu = v, then dist(u,v)=0\text{dist}(u,v) = 0, and if uu and vv are not connected, then dist(u,v)\text{dist}(u,v) is defined to be \infty.

Let G=(V,E)G = (V,E) be a connected graph with nn vertices and mm edges. Then mn1m \geq n-1. Furthermore, m=n1m = n-1 if and only if GG is acyclic.

A graph satisfying two of the following three properties is called a tree:

  1. connected,

  2. m=n1m = n-1,

  3. acyclic.

A vertex of degree 1 in a tree is called a leaf. And a vertex of degree more than 1 is called an internal node.

Show that if a graph has two of the properties listed in Definition (Tree, leaf, internal node), then it automatically has the third as well.

Let TT be a tree with at least 22 vertices. Show that TT must have at least 22 leaves.

Let TT be a tree with LL leaves. Let Δ\Delta be the largest degree of any vertex in TT. Prove that ΔL\Delta \leq L.

Given a tree, we can pick an arbitrary node to be the root of the tree. In a rooted tree, we use “family tree” terminology: parent, child, sibling, ancestor, descendant, lowest common ancestor, etc. (We assume you are already familiar with these terms.) Level ii of a rooted tree denotes the set of all vertices in the tree at distance ii from the root.

A directed graph GG is a pair (V,A)(V,A), where

  • VV is a non-empty finite set called the set of vertices (or nodes),

  • AA is a finite set called the set of directed edges (or arcs), and every element of AA is a tuple (u,v)(u,v) for u,vVu, v \in V. If (u,v)A(u,v) \in A, we say that there is a directed edge from uu to vv. Note that (u,v)(v,u)(u,v) \neq (v,u) unless u=vu = v.

Below is an example of how we draw a directed graph:

image

Let G=(V,A)G = (V,A) be a directed graph. For uVu \in V, we define the neighborhood of uu, N(u)N(u), as the set {vV:(u,v)A}\{v \in V : (u,v) \in A\}. The vertices in N(u)N(u) are called the neighbors of uu. The out-degree of uu, denoted degout(u)\deg_\text{out}(u), is N(u)|N(u)|. The in-degree of uu, denoted degin(u)\deg_\text{in}(u), is the size of the set {vV:(v,u)A}\{v \in V : (v,u) \in A\}. A vertex with out-degree 0 is called a sink. A vertex with in-degree 0 is called a source.

The notions of paths and cycles naturally extend to directed graphs. For example, we say that there is a path from uu to vv if there is a sequence of distinct vertices u=v0,v1,,vk=vu = v_0,v_1,\ldots,v_k = v such that (vi1,vi)A(v_{i-1}, v_i) \in A for all i{1,2,,k}i \in \{1,2,\ldots,k\}.

The arbitrary-first search algorithm, denoted AFS, is the following generic algorithm for searching a given graph. Below, “bag” refers to an arbitrary data structure that allows us to add and retrieve objects. The algorithm, given a graph GG and some vertex ss in GG, traverses all the vertices in the connected componenet of GG containing ss.

def    AFS(graph G=(V,E),  vertex sV):1.    Put s into bag.2.    While bag is non-empty:3.        Pick an arbitrary vertex v from bag.4.        If v is unmarked:5.            Mark v.6.            For each neighbor w of v:7.                Put w into bag.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{AFS}(\langle \text{graph } G = (V,E), \; \text{vertex } s \in V \rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Put $s$ into bag.}\\ &{\scriptstyle 2.}~~~~\texttt{While bag is non-empty:}\\ &{\scriptstyle 3.}~~~~~~~~\texttt{Pick an arbitrary vertex $v$ from bag.}\\ &{\scriptstyle 4.}~~~~~~~~\texttt{If $v$ is unmarked:}\\ &{\scriptstyle 5.}~~~~~~~~~~~~\texttt{Mark $v$.}\\ &{\scriptstyle 6.}~~~~~~~~~~~~\texttt{For each neighbor $w$ of $v$:}\\ &{\scriptstyle 7.}~~~~~~~~~~~~~~~~\texttt{Put $w$ into bag.}\\ \hline\hline\end{aligned}

Note that when a vertex ww is added to the bag, it gets there because it is the neighbor of a vertex vv that has been just marked by the algorithm. In this case, we’ll say that vv is the parent of ww (and ww is the child of vv). Explicitly keeping track of this parent-child relationship is convenient, so we modify the above algorithm to keep track of this information. Below, a tuple of vertices (v,w)(v,w) has the meaning that vertex vv is the parent of ww. The initial vertex ss has no parent, so we denote this situation by (,s)(\perp, s).

def    AFS(graph G=(V,E),  vertex sV):1.    Put (,s) into bag.2.    While bag is non-empty:3.        Pick an arbitrary tuple (p,v) from bag.4.        If v is unmarked:5.            Mark v.6.            parent(v)=p.7.            For each neighbor w of v:8.                Put (v,w) into bag.\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{AFS}(\langle \text{graph } G = (V,E), \; \text{vertex } s \in V \rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Put $(\perp, s)$ into bag.}\\ &{\scriptstyle 2.}~~~~\texttt{While bag is non-empty:}\\ &{\scriptstyle 3.}~~~~~~~~\texttt{Pick an arbitrary tuple $(p, v)$ from bag.}\\ &{\scriptstyle 4.}~~~~~~~~\texttt{If $v$ is unmarked:}\\ &{\scriptstyle 5.}~~~~~~~~~~~~\texttt{Mark $v$.}\\ &{\scriptstyle 6.}~~~~~~~~~~~~\text{parent}(v) = p.\\ &{\scriptstyle 7.}~~~~~~~~~~~~\texttt{For each neighbor $w$ of $v$:}\\ &{\scriptstyle 8.}~~~~~~~~~~~~~~~~\texttt{Put $(v,w)$ into bag.}\\ \hline\hline\end{aligned}

The parent pointers in the second algorithm above defines a tree, rooted at ss, spanning all the vertices in the connected component of the initial vertex ss. We’ll call this the tree induced by the search algorithm. Having a visualization of this tree and how the algorithm traverses the vertices in the tree can be a useful way to understand how the algorithm works. Below, we will actualize AFS by picking specific data structures for the bag. We’ll then comment on the properties of the induced tree.

Note that AFS(GG, ss) visits all the vertices in the connected component that ss is a part of. If we want to traverse all the vertices in the graph, and the graph has multiple connected components, then we can do: def    AFS2(graph G=(V,E)):1.    For v not marked as visited:2.        Run AFS(G,v)\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{AFS2}(\langle \text{graph } G = (V,E) \rangle):\\ &{\scriptstyle 1.}~~~~\texttt{For $v$ not marked as visited:}\\ &{\scriptstyle 2.}~~~~~~~~\texttt{Run } \text{AFS}(\langle G, v \rangle)\\ \hline\hline\end{aligned}

The breadth-first search algorithm, denoted BFS, is AFS where the bag is chosen to be a queue data structure.

The running time of BFS(G,sG,s) is O(m)O(m), where mm is the number of edges of the input graph. If we do a BFS for each connected component, the total running time is O(m+n)O(m+n), where nn is the number of vertices. (We are assuming the graph is given as an adjacency list.)

The induced tree of BFS is called the BFS-tree. In this tree, vertices vv at level ii of the tree are exactly those vertices with dist(s,v)=i\text{dist}(s,v) = i. BFS first traverses vertices at level 1, then level 2, then level 3, and so on. This is why the name of the algorithm is breadth-first search.

Assuming GG is connected, we can think of GG as the BFS-tree, plus, the “extra” edges in GG that are not part of the BFS-tree. Observe that an extra edge is either between two vertices at the same level, or is between two vertices at consecutive levels in the BFS-tree.

The depth-first search algorithm, denoted DFS, is AFS where the bag is chosen to be a stack data structure.

There is a natural recursive representation of the DFS algorithm, as follows. def    DFS(graph G=(V,E),  vertex sV):1.    Mark s.2.    For each neighbor v of s:3.        If v is unmarked:4.            Run DFS(G,v).\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \text{DFS}(\langle \text{graph } G = (V,E), \; \text{vertex } s \in V \rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Mark $s$.}\\ &{\scriptstyle 2.}~~~~\texttt{For each neighbor $v$ of $s$:}\\ &{\scriptstyle 3.}~~~~~~~~\texttt{If $v$ is unmarked:}\\ &{\scriptstyle 4.}~~~~~~~~~~~~\texttt{Run } \text{DFS}(\langle G, v \rangle).\\ \hline\hline\end{aligned}

The running time of DFS(G,sG,s) is O(m)O(m), where mm is the number of edges of the input graph. If we do a DFS for each connected component, the total running time is O(m+n)O(m+n), where nn is the number of vertices. (We are assuming the graph is given as an adjacency list.)

The induced tree of DFS is called the DFS-tree. At a high level, DFS goes as deep as it can in the DFS-tree until it cannot go deeper, in which case it backtracks until it can go deeper again. This is why the name of the algorithm is depth-first search.

Assuming GG is connected, we can think of GG as the DFS-tree, plus, the “extra” edges in GG that are not part of the DFS-tree. Observe that an extra edge must be between two vertices such that one is the ancestor of the other in the DFS-tree.

The search algorithms presented above can be applied to directed graphs as well.

In the minimum spanning tree problem, the input is a connected undirected graph G=(V,E)G = (V,E) together with a cost function c:ER+c : E \to \mathbb R^+. The output is a subset of the edges of minimum total cost such that, in the graph restricted to these edges, all the vertices of GG are connected. For convenience, we’ll assume that the edges have unique edge costs, i.e. ee    c(e)c(e)e \neq e' \implies c(e) \neq c(e').

With unique edge costs, the minimum spanning tree is unique.

Suppose we are given an instance of the MST problem. For any non-trivial subset SVS \subseteq V, let e={u,v}e^* = \{u^*, v^*\} be the cheapest edge with the property that uSu^* \in S and vV\Sv^* \in V \backslash S. Then ee^* must be in the minimum spanning tree.

There is an algorithm that solves the MST problem in polynomial time.

Suppose an instance of the Minimum Spanning Tree problem is allowed to have negative costs for the edges. Explain whether we can use the Jarník-Prim algorithm to compute the minimum spanning tree in this case.

Consider the problem of computing the maximum spanning tree, i.e., a spanning tree that maximizes the sum of the edge costs. Explain whether the Jarník-Prim algorithm solves this problem if we modify it so that at each iteration, the algorithm chooses the edge between VV' and V\VV\backslash V' with the maximum cost.

  1. What is the maximum possible value for the number of edges in an undirected graph (no self-loops, no multi-edges) in terms of nn, the number of vertices?

  2. How do you prove the Handshake Theorem?

  3. What is the definition of a tree?

  4. True or false: If G=(V,E)G = (V, E) is a tree, then for any u,vVu, v \in V, there exists a unique path from uu to vv.

  5. True or false: For a graph G=(V,E)G = (V, E), if for any u,vVu, v \in V there exists a unique path from uu to vv, then GG is a tree.

  6. True or false: If a graph on nn vertices has n1n-1 edges, then it must be acyclic.

  7. True or false: If a graph on nn vertices has n1n-1 edges, then it must be connected.

  8. True or false: If a graph on nn vertices has n1n-1 edges, then it must be a tree.

  9. True or false: A tree with n2n \geq 2 vertices can have a single leaf.

  10. What is the minimum number of edges possible in a connected graph with nn vertices? And what are the high-level steps of the proof?

  11. True or false: An acyclic graph with kk connected components has nkn-k edges.

  12. True or false: For every tree on nn vertices, vVdeg(v)\sum_{v \in V} \deg(v) has exactly the same value.

  13. True or false: Let GG be a 55-regular graph (i.e. a graph in which every vertex has degree exactly 55). It is possible that GG has 1525115251 edges.

  14. True or false: There exists n0Nn_0 \in \mathbb{N} such that for all n>n0n > n_0, there is a graph GG with nn vertices that is 33-regular.

  15. What is the difference between BFS and DFS?

  16. True or false: DFS algorithm runs in O(n)O(n) time for a connected graph, where nn is the number of vertices of the input graph.

  17. Explain why the running time of BFS and DFS is O(m+n)O(m+n); in particular where does the mm come from and where does the nn come from?

  18. What is the MST cut property?

  19. Consider the following “proof” of the MST cut property:

    Assume for the sake of contradiction that the MST TT does not contain the edge ee^*. Since TT is a spanning tree, it must contain an edge ee' with one endpoint in SS and one endpoint not in SS. Since ee^* is the cheapest edge with this property, it follows that T=(T\{e}){e}T^* = (T \backslash \{e'\}) \cup \{e^*\} is a spanning tree that is cheaper than TT, a contradiction.

    Expose the flaw in this argument by giving a specific example where it fails.

  20. Explain at a high-level how the Jarník-Prim algorithm works.

  21. True or false: Suppose a graph has 22 edges with the same cost. Then there are at least 22 minimum spanning trees of the graph.

Here are the important things to keep in mind from this chapter.

  1. There are many definitions in this chapter. This is unfortunately inevitable in order to effectively communicate with each other. On the positive side, the definitions are often very intuitive and easy to guess. It is important that you are comfortable with all the definitions.

  2. In the first section of the chapter, we see a few proof techniques on graphs: degree counting arguments, induction arguments, the trick of removing all the edges of graph and adding them back in one by one. Make sure you understand these techniques well. They can be very helpful when you are coming up with proofs yourself.

  3. The second section of the chapter contains some well-known graph algorithms. These algorithms are quite fundamental, and you may have seen them (or will see them) in other courses. Make sure you understand at a high level how the algorithms work. The implementation details are not important in this course.

A matching in a graph G=(V,E)G=(V,E) is a subset of the edges that do not share an endpoint. A maximum matching in GG is a matching with the maximum number of edges among all possible matchings. A maximal matching is a matching with the property that if we add any other edge to the matching, it is no longer a matching. A perfect matching is a matching that covers all the vertices of the graph.

The size of a matching MM refers to the number of edges in the matching, and is denoted by M|M|. Note that this coincides with the size of the set that MM represents.

In the maximum matching problem the input is an undirected graph G=(V,E)G=(V,E) and the output is a maximum matching in GG.

Let G=(V,E)G = (V,E) be a graph and let MEM \subseteq E be a matching in GG. An augmenting path in GG with respect to MM is a path such that

  1. the path is an alternating path, which means that the edges in the path alternate between being in MM and not in MM,

  2. the first and last vertices in the path are not a part of the matching MM.

An augmenting path does not need to contain all the edges in MM. It is also possible that it does not contain any of the edges of MM. A single edge {u,v}\{u, v\}, where uu is not matched and vv is not matched, is an augmenting path.

Let G=(V,E)G=(V,E) be a graph. A matching MEM \subseteq E is maximum if and only if there is no augmenting path in GG with respect to MM.

Let G=(V,E)G = (V,E) be a graph such that all vertices have degree at most 22. Then prove that every connected component of GG is either a path or a cycle (where we count an isolated vertex as a path of length 0).

Show that a tree can have at most one perfect matching.

A graph G=(V,E)G = (V,E) is called bipartite if there is a partition of VV into sets XX and YY such that all the edges in EE have one endpoint in XX and the other in YY. Sometimes the bipartition is given explicitly and the graph is denoted by G=(X,Y,E)G = (X, Y, E).

Let G=(V,E)G = (V,E) be a graph. Let kN+k \in \mathbb{N}^+. A kk-coloring of VV is just a map χ:VC\chi : V \to C where CC is a set of cardinality kk. (Usually the elements of CC are called colors. If k=3k = 3 then C={red,green,blue}C = \{\text{red},\text{green},\text{blue}\} is a popular choice. If kk is large, we often just call the “colors” 1,2,,k1,2, \dots, k.) A kk-coloring is said to be legal for GG if every edge in EE is bichromatic, meaning that its two endpoints have different colors. (I.e., for all {u,v}E\{u,v\} \in E it is required that χ(u)χ(v)\chi(u)\neq\chi(v).) Finally, we say that GG is kk-colorable if it has a legal kk-coloring.

A graph G=(V,E)G=(V,E) is bipartite if and only if it is 22-colorable. The 22-coloring corresponds to partitioning the vertex set VV into XX and YY such that all the edges have one endpoint in XX and the other in YY.

A graph is bipartite if and only if it contains no odd-length cycles.

There is a polynomial time algorithm to solve the maximum matching problem in bipartite graphs.

The high-level algorithm above presented in the proof of Theorem (Finding a maximum matching in bipartite graphs) is in fact applicable to general (not necessarily bipartite) graphs. However, the step of finding an augmenting path with respect to a matching turns out to be much more involved, and therefore we do not cover it in this chapter. See https://en.wikipedia.org/wiki/Blossom_algorithm if you would like to learn more.

Let G=(X,Y,E)G = (X,Y,E) be a bipartite graph. For a subset SS of the vertices, let N(S)=vSN(v)N(S) = \bigcup_{v \in S} N(v). Then GG has a matching covering all the vertices in XX if and only if for all SXS \subseteq X, we have SN(S)|S| \leq |N(S)|.

Let G=(X,Y,E)G = (X,Y,E) be a bipartite graph. Then GG has a perfect matching if and only if X=Y|X| = |Y| and for any SXS \subseteq X, we have SN(S)|S| \leq |N(S)|.

Sometimes people call the above corollary Hall’s Theorem.

  1. Let GG be a bipartite graph on 2n2n vertices such that every vertex has degree at least nn. Prove that GG must contain a perfect matching.

  2. Let G=(X,Y,E)G = (X,Y,E) be a bipartite graph with X=Y|X| = |Y|. Show that if GG is connected and every vertex has degree at most 22, then GG must contain a perfect matching.

  1. True or false: A maximum matching is a maximal matching.

  2. True or false: A perfect matching is a maximum matching.

  3. True or false: There exist graphs with more than one perfect matching.

  4. How many perfect matchings are there in a complete bipartite graph (all possible edges are present) where both sides of the bipartition contain exactly nn vertices?

  5. Suppose a graph with nn vertices has a perfect matching. What is the size of the perfect matching?

  6. What is the definition of an augmenting path?

  7. True or false: A single edge {u,v}\{u, v\}, where uu is not matched and vv is not matched, is an augmenting path.

  8. True or false: Given a matching MM, there can be at most one augmenting path with respect to MM.

  9. True or false: A matching MM in a non-bipartite graph GG is maximum if and only if there is no augmenting path with respect to MM.

  10. Describe the high-level steps of a polynomial-time algorithm for finding a maximum matching in a graph.

  11. Describe a polynomial-time algorithm that given a bipartite graph and a matching in the graph, determines if an augmenting path exists with respect to the matching.

  12. True or false: The graph below is bipartite.

    image

  13. True or false: If a graph is kk-colorable, then it is (k+1)(k+1)-colorable.

  14. True or false: A graph which is not bipartite must contain an odd-length cycle.

  15. Explain how one can 22-color a graph that does not contain an odd-length cycle.

  16. Describe a polynomial-time algorithm to determine if an input undirected graph is bipartite or not.

  17. How do you prove that a tree has at most one perfect matching?

  18. True or false: Any graph with more than one perfect matching must contain a cycle.

  19. In this chapter we saw the definition of a bipartite graph. We can think of a bipartite graph as a special case of a kk-partite graph where k=2k = 2. How would you define the general notion of a kk-partite graph, and how does it relate to the colorability of the graph?

Here are the important things to keep in mind from this chapter.

  1. As always, all the definitions in this chapter are important (matchings, augmenting path, bipartite graph, kk-coloring of a graph).

  2. Theorem (Characterization of bipartite graphs) gives an important characterization for bipartite graphs. Make sure to understand its proof.

  3. Theorem (Characterization for maximum matchings) gives an important characterization for maximum matchings. The proof is one of the harder proofs we have done in the course so far. A good understanding of the proof is expected and important since it will help you raise your level of mathematical maturity.

  4. Make sure to understand how we can use Theorem (Characterization for maximum matchings) to come up with an algorithm for finding a maximum matching in a graph.

An instance of the stable matching problem is a tuple of sets (X,Y)(X, Y) with X=Y|X| = |Y|, and a preference list for each element of XX and YY. A preference list for an element in XX is an ordering of the elements in YY, and a preference list for an element in YY is an ordering of the elements of XX. Below is an example of an instance of the stable matching problem:

image

The output of the stable matching problem is a stable matching, which is a subset SS of {(x,y):xX,yY}\{(x, y) : x \in X, y \in Y\} with the following properties:

  • The matching is a perfect matching, which means every xXx \in X and every yYy \in Y appear exactly once in SS. If (x,y)S(x,y) \in S, we say xx and yy are matched.

  • There are no unstable pairs. A pair (x,y)(x,y) where xXx \in X and yYy \in Y is called unstable if (x,y)∉S(x,y) \not \in S, but they both prefer each other to the elements they are matched to.

There is a polynomial time algorithm which, given an instance of the stable matching problem, always returns a stable matching.

Consider an instance of the stable matching problem. We say that xXx \in X is a valid partner of yYy \in Y (or yy is a valid partner of xx) if there is some stable matching in which xx and yy are matched. For zXYz \in X \cup Y, we define the best valid partner of zz, denoted best(z)\text{best}(z), to be the highest ranked valid partner of zz. Similarly, we define the worst valid partner of zz, denoted worst(z)\text{worst}(z), to be the lowest ranked valid partner of zz.

The Gale-Shapley algorithm always matches xXx \in X with its best valid partner, i.e., it returns {(x,best(x)):xX}\{(x, \text{best}(x)): x \in X \}.

Note that it is not a priori clear at all that {(x,best(x)):xX}\{(x, \text{best}(x)): x \in X \} would be a matching, not to mention a stable matching.

Show that the Gale-Shapley algorithm always matches yYy \in Y with its worst valid partner, i.e., it returns {(worst(y),y):yY}\{(\text{worst}(y), y): y \in Y \}.

Give a polynomial time algorithm that determines if a given instance of the stable matching problem has a unique solution or not.

Suppose we are given an instance of the stable matching problem in which all xXx \in X have identical preference lists, and all yYy \in Y have identical preference lists. Prove or disprove: there is only one stable matching for such an instance.

Consider the following variant of the stable matching problem. The input is a set of nn people, where nn is even. Each person has a preference list that includes every other person. The goal is to find a stable matching. Give an example to show that a stable matching does not always exist.

Call xXx \in X and yYy \in Y “soulmates” if they are paired with each other in every stable matching.

  1. Given xXx \in X and yYy \in Y, design a polynomial-time algorithm to determine if they are soulmates.

  2. Give a polynomial time algorithm to determine if an instance of the stable matching problem has a unique stable matching.

  1. What is the definition of a stable matching?

  2. Give an example of a stable matching problem instance that has at least two stable matchings.

  3. What are the definitions of valid partner, best valid partner, and worst valid partner?

  4. Describe the Gale-Shapley algorithm.

  5. True or false: In any instance of the stable matching problem, a stable matching always exists.

  6. True or false: In any instance of the stable matching problem, there is at most one stable matching.

  7. What does it mean for a matching to be XX-optimal?

  8. What does it mean for a matching to be XX-pessimal?

  9. True or false: The Gale-Shapley algorithm can output different matchings based on the order of the companies proposing.

  10. True or false: For every stable matching instance, the XX-optimal and YY-optimal stable matchings differ in at least one pairing.

  11. True or false: Given an instance of the stable matching problem, it is not possible for two companies to have the same best valid partner.

  12. True or false: If the Gale-Shapley algorithm ends up pairing xXx \in X with its worst possible partner yYy \in Y, then xx and yy must be paired/matched in all stable matchings.

  1. Your main goal should be to really understand the stable matching problem, the Gale-Shapley algorithm that solves the stable matching problem, and the proof of correctness of the algorithm.

  2. Best and worst valid partners are important notions. Make sure you understand what they mean and how they relate to the Gale-Shapley algorithm’s output.

In the kk-coloring problem, the input is an undirected graph G=(V,E)G=(V,E), and the output is True if and only if the graph is kk-colorable (see Definition (kk-colorable graphs)). We denote this problem by kCOLk\text{COL}. The corresponding language is {G:G is a k-colorable graph}.\{\langle G \rangle: \text{$G$ is a $k$-colorable graph}\}.

Let G=(V,E)G = (V,E) be an undirected graph. A subset SS of the vertices is called a clique if for all u,vSu, v \in S with uvu \neq v, {u,v}E\{u,v\} \in E. We say that GG contains a kk-clique if there is a subset of the vertices of size kk that forms a clique.

In the clique problem, the input is an undirected graph G=(V,E)G=(V,E) and a number kN+k \in \mathbb N^+, and the output is True if and only if the graph contains a kk-clique. We denote this problem by CLIQUE\text{CLIQUE}. The corresponding language is {G,k:G is a graph, kN+G contains a k-clique}.\{\langle G, k \rangle: \text{$G$ is a graph, $k \in \mathbb N^+$, $G$ contains a $k$-clique}\}.

Let G=(V,E)G = (V,E) be an undirected graph. A subset of the vertices is called an independent set if there is no edge between any two vertices in the subset. We say that GG contains an independent set of size kk if there is a subset of the vertices of size kk that forms an independent set.

In the independent set problem, the input is an undirected graph G=(V,E)G=(V,E) and a number kN+k \in \mathbb N^+, and the output is True if and only if the graph contains an independent set of size kk. We denote this problem by IS\text{IS}. The corresponding language is {G,k:G is a graph, kN+G contains an independent set of size k}.\{\langle G, k \rangle: \text{$G$ is a graph, $k \in \mathbb N^+$, $G$ contains an independent set of size $k$}\}.

We denote by ¬\neg the unary NOT operation, by \wedge the binary AND operation, and by \vee the binary OR operation. In particular, we can write the truth tables of these operations as follows:

image

Let x1,,xnx_1,\ldots,x_n be Boolean variables, i.e., variables that can be assigned True or False. A literal refers to a Boolean variable or its negation. A clause is an “OR” of literals. For example, x1¬x3x4x_1 \vee \neg x_3 \vee x_4 is a clause. A Boolean formula in conjunctive normal form (CNF) is an “AND” of clauses. For example, (x1¬x3)(x2x2x4)(x1¬x1¬x5)x4(x_1 \vee \neg x_3) \wedge (x_2 \vee x_2 \vee x_4) \wedge (x_1 \vee \neg x_1 \vee \neg x_5) \wedge x_4 is a CNF formula. We say that a Boolean formula is a satisfiable formula if there is a truth assignment (which can also be viewed as a 0/10/1 assignment) to the Boolean variables that makes the formula evaluate to true (or 11).

In the CNF satisfiability problem, the input is a CNF formula, and the output is True if and only if the formula is satisfiable. We denote this problem by SAT\text{SAT}. The corresponding language is {φ:φ is a satisfiable CNF formula}.\{\langle \varphi \rangle: \text{$\varphi$ is a satisfiable CNF formula}\}. In a variation of SAT\text{SAT}, we restrict the input formula such that every clause has exactly 3 literals (we call such a formula a 3CNF formula). For instance, the following is a 3CNF formula: For example, (x1¬x3x4)(x2x2x4)(x1¬x1¬x5)(¬x2¬x3¬x3)(x_1 \vee \neg x_3 \vee x_4) \wedge (x_2 \vee x_2 \vee x_4) \wedge (x_1 \vee \neg x_1 \vee \neg x_5) \wedge (\neg x_2 \vee \neg x_3 \vee \neg x_3) This variation of the problem is denoted by 3SAT\text{3SAT}.

A Boolean circuit with nn-input variables (n0n \geq 0) is a directed acyclic graph with the following properties. Each node of the graph is called a gate and each directed edge is called a wire. There are 55 types of gates that we can choose to include in our circuit: AND gates, OR gates, NOT gates, input gates, and constant gates. There are 2 constant gates, one labeled 00 and one labeled 11. These gates have in-degree/fan-in 00. There are nn input gates, one corresponding to each input variable. These gates also have in-degree/fan-in 00. An AND gate corresponds to the binary AND operation \wedge and an OR gate corresponds to the binary OR operation \vee. These gates have in-degree/fan-in 22. A NOT gate corresponds to the unary NOT operation ¬\neg, and has in-degree/fan-in 11. One of the gates in the circuit is labeled as the output gate. Gates can have out-degree more than 11, except for the output gate, which has out-degree 00.

For each 0/1 assignment to the input variables, the Boolean circuit produces a one-bit output. The output of the circuit is the output of the gate that is labeled as the output gate. The output is calculated naturally using the truth tables of the operations corresponding to the gates. The input-output behavior of the circuit defines a function f:{0,1}n{0,1}f:\{0,1\}^n \to \{0,1\} and in this case, we say that ff is computed by the circuit.

A satisfiable circuit is such that there is 0/1 assignment to the input gates that makes the circuit output 1. In the circuit satisfiability problem, the input is a Boolean circuit, and the output is True if and only if the circuit is satisfiable. We denote this problem by CIRCUIT-SAT\text{CIRCUIT-SAT}. The corresponding language is {C:C is a Boolean circuit that is satisfiable}.\{\langle C \rangle: \text{$C$ is a Boolean circuit that is satisfiable}\}.

The name of a decision problem above refers both to the decision problem and the corresponding language.

Recall that in all the decision problems above, the input is an arbitrary word in Σ\Sigma^*. If the input does not correspond to a valid encoding of an object expected as input (e.g. a graph in the case of kkCOL), then those inputs are rejected (i.e., they are not in the corresponding language).

All the problems defined above are decidable and have exponential-time algorithms solving them.

Fix some alphabet Σ\Sigma. Let AA and BB be two languages. We say that AA polynomial-time reduces to BB, written APBA \leq^PB, if there is a polynomial-time decider for AA that uses a decider for BB as a black-box subroutine. Polynomial-time reductions are also known as Cook reductions, named after Stephen Cook.

Suppose APBA \leq^PB. Observe that if BPB \in \mathsf{P}, then APA \in \mathsf{P}. Equivalently, taking the contrapositive, if A∉PA \not \in \mathsf{P}, then B∉PB \not \in \mathsf{P}. So when APBA \leq^PB, we think of BB as being at least as hard as AA with respect to polynomial-time decidability.

Note that if APBA \leq^PB and BPCB \leq^PC, then APCA \leq^PC.

Let AA and BB be two languages. Suppose that there is a polynomial-time computable function (also called a polynomial-time transformation) f:ΣΣf: \Sigma^* \to \Sigma^* such that xAx \in A if and only if f(x)Bf(x) \in B. Then we say that there is a polynomial-time mapping reduction (or a Karp reduction, named after Richard Karp) from AA to BB, and denote it by AmPBA \leq_m^PB.

To show that there is a Karp reduction from AA to BB, you need to do the following things.

  1. Present a computable function f:ΣΣf: \Sigma^* \to \Sigma^*.

  2. Show that xA    f(x)Bx \in A \implies f(x) \in B.

  3. Show that x∉A    f(x)∉Bx \not \in A \implies f(x) \not \in B (it is usually easier to argue the contrapositive).

  4. Show that ff can be computed in polynomial time.

In picture, the transformation ff looks as follows.

image

Note that ff need not be injective and it need not be surjective.

Recall that a mapping reduction is really a special kind of a Turing reduction. In particular, if there is a mapping reduction from language AA to language BB, then one can construct a regular (Turing) reduction from AA to BB. We explain this below as a reminder.

To establish a Turing reduction from AA to BB, we need to show how we can come up with a decider MAM_A for AA given that we have a decider MBM_B for BB. Now suppose we have a mapping reduction from AA to BB. This means we have a computable function ff as in the definition of a mapping reduction. This ff then allows us to build MAM_A as follows. Given any input xx, first feed xx into ff, and then feed the output y=f(x)y = f(x) into MBM_B. The output of MAM_A is the output of MBM_B. We illustrate this construction with the following picture.

image

Take a moment to verify that this reduction from AA to BB is indeed correct given the definition of ff.

Even though a mapping reduction can be viewed as a regular (Turing) reduction, not all reductions are mapping reductions.

CLIQUEmPIS\text{CLIQUE} \leq_m^P\text{IS}.

How can you modify the above reduction to show that ISmPCLIQUE\text{IS} \leq_m^P\text{CLIQUE}?

Let G=(V,E)G=(V,E) be a graph. A Hamiltonian path in GG is a path that visits every vertex of the graph exactly once. The HAMPATH problem is the following: given a graph G=(V,E)G=(V,E), output True if it contains a Hamiltonian path, and output False otherwise.

  1. Let L={G,k:G is a graph, kNG has a path of length k}L = \{\langle G, k \rangle : \text{$G$ is a graph, $k \in \mathbb N$, $G$ has a path of length $k$}\}. Show that HAMPATHmPL\text{HAMPATH}\leq_m^PL.

  2. Let K={G,k:G is a graph, kNG has a spanning tree with k leaves}K = \{\langle G, k \rangle : \text{$G$ is a graph, $k \in \mathbb N$, $G$ has a spanning tree with $\leq k$ leaves}\}. Show that HAMPATHmPK\text{HAMPATH}\leq_m^PK.

CIRCUIT-SATmP3COL\text{CIRCUIT-SAT}\leq_m^P3\text{COL}.

Show that if AmPBA \leq_m^PB and BmPCB \leq_m^PC, then AmPCA \leq_m^PC.

Let C\mathcal{C} be a set of languages containing P\mathsf{P}.

  • We say that LL is C\mathcal{C}-hard (with respect to Cook reductions) if for all languages KCK \in \mathcal{C}, KPLK \leq^PL.
    (With respect to polynomial time decidability, a C\mathcal{C}-hard language is at least as “hard” as any language in C\mathcal{C}.)

  • We say that LL is C\mathcal{C}-complete if LL is C\mathcal{C}-hard and LCL \in \mathcal{C}.
    (A C\mathcal{C}-complete language represents the “hardest” language in C\mathcal{C} with respect to polynomial time decidability.)

Suppose LL is C\mathcal{C}-complete. Then observe that LPC=PL \in \mathsf{P}\Longleftrightarrow \mathcal{C}= \mathsf{P}.

Above we have defined C\mathcal{C}-hardness using Cook reductions. In literature, however, they are often defined using Karp reductions, which actually leads to a different notion of C\mathcal{C}-hardness. There are good reasons to use this restricted form of reductions. More advanced courses may explore some of these reasons.

  1. What is a Cook reduction? What is a Karp reduction? What is the difference between the two?

  2. Explain why every Karp reduction can be viewed as a Cook reduction.

  3. Explain why every Cook reduction cannot be viewed as a Karp reduction.

  4. True or false: ΣmP\Sigma^* \leq_m^P\varnothing.

  5. True or false: For languages AA and BB, AmPBA \leq_m^PB if and only if BmPAB \leq_m^PA.

  6. Define the complexity class P\mathsf{P}.

  7. True or false: The language 251CLIQUE={G: G is a graph containing a clique of size 251}\text{251CLIQUE} = \{\langle G \rangle : \text{ $G$ is a graph containing a clique of size 251}\} is in P\mathsf{P}.

  8. True or false: Let L,KΣL, K \subseteq \Sigma^* be two languages. Suppose there is a polynomial-time computable function f:ΣΣf : \Sigma^*\to\Sigma^* such that xLx\in L iff f(x)Kf(x)\notin K. Then LL Cook-reduces to KK.

  9. True or false: There is a Cook reduction from 3SAT to HALTS.

  10. For a complexity class C\mathcal{C} containing P\mathsf{P}, define what it means to be C\mathcal{C}-hard.

  11. For a complexity class C\mathcal{C} containing P\mathsf{P}, define what it means to be C\mathcal{C}-complete.

  12. Suppose LL is C\mathcal{C}-complete. Then argue why LPC=PL \in \mathsf{P}\Longleftrightarrow \mathcal{C}= \mathsf{P}.

  1. There are two very important notions of reductions introduced in this chapter: Cook reductions and Karp reductions. Make sure you understand the similarities and differences between them. And make sure you are comfortable presenting and proving the correctness of reductions. This is the main goal of the chapter.

  2. The chapter concludes with a couple of important definitions: C\mathcal{C}-hardness, C\mathcal{C}-completeness. We present these definitions in this chapter as they are closely related to reductions. However, we will make use of these definitions in the next chapter after we introduce the complexity class NP\mathsf{NP}. For now, make sure you understand the definitions and also Note (C\mathcal{C}-completeness and P\mathsf{P}).

Fix some alphabet Σ\Sigma. We say that a language LL can be decided in non-deterministic polynomial time if there exists

  1. a polynomial-time decider TM VV that takes two strings as input, and

  2. a constant k>0k > 0,

such that for all xΣx \in \Sigma^*:

  • if xLx \in L, then there exists uΣu \in \Sigma^* with uxk|u| \leq |x|^k such that V(x,u)V(x,u) accepts,

  • if x∉Lx \not \in L, then for all uΣu \in \Sigma^*, V(x,u)V(x,u) rejects.

If xLx \in L, a string uu that makes V(x,u)V(x,u) accept is called a proof (or certificate) of xx being in LL. The TM VV is called a verifier for LL.

We denote by NP\mathsf{NP} the set of all languages which can be decided in non-deterministic polynomial time.

3COLNP\text{3COL} \in \mathsf{NP}.

Showing that a language LL is in NP\mathsf{NP} involves the following steps:

  1. Present a TM VV (that takes two inputs xx and uu).

  2. Argue that VV has polynomial running time.

  3. Argue that VV works correctly, which involves arguing the following for some constant k>0k>0:

    • for all xLx \in L, there exists uΣu \in \Sigma^* with uxk|u| \leq |x|^k such that V(x,u)V(x,u) accepts;

    • for all x∉Lx \not \in L and for all uΣu \in \Sigma^*, V(x,u)V(x,u) rejects.

CIRCUIT-SATNP\text{CIRCUIT-SAT}\in \mathsf{NP}.

Show that CLIQUENP\text{CLIQUE}\in \mathsf{NP}.

Show that ISNP\text{IS}\in \mathsf{NP}.

Show that 3SATNP\text{3SAT} \in \mathsf{NP}.

PNP\mathsf{P}\subseteq \mathsf{NP}.

We denote by EXP\mathsf{EXP} the set of all languages that can be decided in at most exponential-time, i.e., in time O(2nC)O(2^{n^C}) for some constant C>0C > 0.

Show that NPEXP\mathsf{NP}\subseteq \mathsf{EXP}.

Recall Definition (C\mathcal{C}-hard, C\mathcal{C}-complete). We use this definition in this section with C\mathcal{C} being NP\mathsf{NP}. However, our notions of NP\mathsf{NP}-hardness and NP\mathsf{NP}-completeness will be restricted to Karp reductions. As mentioned in Note (C\mathcal{C}-hardness with respect to Cook and Karp reductions), hardness and completeness is often defined using Karp reductions. We may explore some of the reasons for this in a homework problem.

To show that a language is NP\mathsf{NP}-hard, you should be using Karp reductions.

CIRCUIT-SAT\text{CIRCUIT-SAT} is NP\mathsf{NP}-complete.

Cook and Levin actually proved that SAT\text{SAT} is NP\mathsf{NP}-complete, however, the theorem is often stated using CIRCUIT-SAT\text{CIRCUIT-SAT} rather than SAT\text{SAT} because the version above is easier to prove.

To show that a language LL is NP\mathsf{NP}-hard, by the transitivity of polynomial-time reductions, it suffices to show that KmPLK \leq_m^PL for some language KK which is known to be NP\mathsf{NP}-hard. In particular, using Cook-Levin Theorem, it suffices to show that CIRCUIT-SATmPL\text{CIRCUIT-SAT}\leq_m^PL.

3COL\text{3COL} is NP\mathsf{NP}-complete.

3SAT\text{3SAT} is NP\mathsf{NP}-complete.

CLIQUE\text{CLIQUE} is NP\mathsf{NP}-complete.

When we want to show that a language BB is NP\mathsf{NP}-hard, we take a known NP\mathsf{NP}-hard language AA and show a Karp reduction from AA to BB. The diagram of a typical transformation is as shown below.

image

So typically, the transformation does not cover all the elements of BB, but only a subset BBB' \subset B. This means that the Karp reduction not only establishes that BB is NP\mathsf{NP}-hard, but also that BB' is NP\mathsf{NP}-hard.

IS\text{IS} is NP\mathsf{NP}-complete.

The collection of reductions that we have shown can be represented as follows:

image

The MSAT problem is defined very similarly to SAT (for the definition of SAT, see Definition (Boolean satisfiability problem)). In SAT, we output True if and only if the input CNF formula has at least one satisfying truth assignment to the variables. In MSAT, we want to output True if and only if the input CNF formula has at least two distinct satisfying assignments.

Show that MSAT is NP\mathsf{NP}-complete.

In the PARTITION problem, we are given nn non-negative integers a1,a2,,ana_1,a_2,\ldots, a_n, and we want to output True if and only if there is a way to partition the integers into two parts so that the sum of the two parts are equal. In other words, we want to output True if and only if there is a set S{1,2,,n}S \subseteq \{1,2,\ldots, n\} such that iSai=j{1,,n}\Saj\sum_{i \in S} a_i = \sum_{j \in \{1,\ldots,n\} \backslash S} a_j.

In the BIN problem we are given a set of n items with non-negative sizes s1,s2,,snNs_1, s_2, \ldots , s_n \in \mathbb N (not necessarily distinct), a capacity C0C \geq 0 (not necessarily an integer), and an integer kNk \in \mathbb N. We want to output True if and only if the items can be placed into at most kk bins such that the total size of items in each bin does not exceed the capacity CC.

Show that BIN is NP\mathsf{NP}-complete assuming PARTITION is NP\mathsf{NP}-hard.

  1. Informally, at a high-level, what does it mean for a language to be in NP\mathsf{NP}?

  2. Give an example of a language in NP\mathsf{NP} and at a high level explain why it is in NP\mathsf{NP}.

  3. What are the things you need to show in order to formally argue that a language LL is in NP\mathsf{NP}?

  4. Why is the complexity class NP\mathsf{NP} named “Non-deterministic polynomial time”?

  5. True or false: Every language in NP\mathsf{NP} is decidable.

  6. True or false: Any finite language is in NP\mathsf{NP}.

  7. True or false: {0,1}NP\{0,1\}^* \in \mathsf{NP}.

  8. True or false: {0k1k:kN}NP\{0^k1^k : k \in \mathbb N\} \in \mathsf{NP}.

  9. Explain why P\mathsf{P} is contained in NP\mathsf{NP}.

  10. True or false: If there is a polynomial-time algorithm for 3COL, then P=NP\mathsf{P}= \mathsf{NP}.

  11. True or false: HALTS is NP\mathsf{NP}-complete.

  12. Explain why NP\mathsf{NP} is contained in EXP\mathsf{EXP}.

  13. State the Cook-Levin Theorem and explain its significance.

  14. Let LΣL \subseteq \Sigma^*, and define its complement L={wΣ:w∉L}\overline{L} = \{w \in \Sigma^* : w \not\in L\}. Define coNP={LΣ:LNP}\textsf{coNP} = \{L \subseteq \Sigma^* : \overline{L} \in \mathsf{NP}\}. True or false: If coNPNP\textsf{coNP} \neq \mathsf{NP}, then PNP\mathsf{P}\neq \mathsf{NP}.

  15. True or false: If every language in NP\mathsf{NP} is NP\mathsf{NP}-complete (under Cook reductions) then P=NP\mathsf{P}= \mathsf{NP}.

  16. True or false: If P=NP\mathsf{P}= \mathsf{NP} then every language in NP\mathsf{NP} is NP\mathsf{NP}-complete (under Cook reductions).

  17. True or false: For any two NP\mathsf{NP}-complete languages AA and BB, there is a Cook reduction from AA to BB.

  18. True or false: Let L={0k1k:kN}L = \{0^k1^k : k \in \mathbb N\}. There is a Karp reduction from LL to 3COL.

  19. Assume PNP\mathsf{P}\neq \mathsf{NP}. Does 2COL Karp reduce to 3COL?

  20. Assume PNP\mathsf{P}\neq \mathsf{NP}. Does 3COL Karp reduce to 2COL?

  1. Non-deterministic polynomial time is one of the most important definitions in the course. The formal definition is not easy to understand at first since it has several layers. Take your time to really digest it, and be careful with the order of quantifiers.

  2. When we ask you to show that a language is in NP\mathsf{NP}, we expect you to use the formal definition of NP\mathsf{NP}. Your proofs should resemble the proofs presented in this chapter.

  3. It is guaranteed that you will see an NP\mathsf{NP}-completeness proof in the homework and exam. In this course, your reductions establishing NP\mathsf{NP}-hardness must be Karp reductions. This chapter contains various such arguments, and we expect your own arguments to follow a similar structure. As usual, do not fall in the trap of memorizing templates. Focus on understanding why the proofs are structured the way they are.

A finite probability space is a tuple (Ω,Pr)(\Omega, \Pr), where

  • Ω\Omega is a non-empty finite set called the sample space;

  • Pr:Ω[0,1]\Pr : \Omega \to [0,1] is a function, called the probability distribution, with the property that ΩPr[]=1\sum_{\ell \in \Omega} \Pr[\ell] = 1.

The elements of Ω\Omega are called outcomes or samples. If Pr[]=p\Pr[\ell] = p, then we say that the probability of outcome \ell is pp.

The abstract definition above of a finite probability space helps us to mathematically model and reason about situations involving randomness and uncertainties (these situations are often called “random experiments” or just “experiments”). For example, consider the experiment of flipping a single coin. We model this as follows. We let Ω={Heads,Tails}\Omega = \{\text{Heads}, \text{Tails}\} and we define function Pr\Pr such that Pr[Heads]=1/2\Pr[\text{Heads}] = 1/2 and Pr[Tails]=1/2\Pr[\text{Tails}] = 1/2. This corresponds to our intuitive understanding that the probability of seeing the outcome Heads is 1/21/2 and the probability of seeing the outcome Tails is also 1/21/2.

image

If we flip two coins, the sample space would look as follows (H represents “Heads” and T represents “Tails”).

image

One can visualize the probability space as a circle or pie with area 1. Each outcome gets a slice of the pie proportional to its probability.

image

In this course, we’ll usually restrict ourselves to finite sample spaces. In cases where we need a countably infinite Ω\Omega, the above definition will generalize naturally.

How would you model a roll of a single 66-sided die using Definition (Finite probability space, sample space, probability distribution)? How about a roll of two dice? How about a roll of a die and a coin toss together?

Sometimes, the modeling of a real-world random experiment as a probability space can be non-trivial or tricky. It helps a lot to have a step in between where you first go from the real-world experiment to computer code/algorithm (that makes calls to random number generators), and then you define your probability space based on the computer code. In this course, we allow our programs to have access to the functions Bernoulli(p)\text{Bernoulli}(p) and RandInt(n)\text{RandInt}(n). The function Bernoulli(p)\text{Bernoulli}(p) takes a number 0p10 \leq p \leq 1 as input and returns 11 with probability pp and 00 with probability 1p1-p. The function RandInt(n)\text{RandInt}(n) takes a positive integer nn as input and returns a random integer from 11 to nn (i.e., every number from 11 to nn has probability 1/n1/n). Here is a very simple example of going from a real-world experiment to computer code. The experiment is as follows. You flip a fair coin. If it’s heads, you roll a 33-sided die. If it is tails, you roll a 44-sided die. This experiment can be represented as:

    flip = Bernoulli(1/2)
    if flip == 0:
        die = RandInt(3)
    else:
        die = RandInt(4)

If we were to ask “What is the probability that you roll a 3 or higher?”, this would correspond to asking what is the probability that after the above code is executed, the variable named die stores a value that is 3 or higher.

One advantage of modeling with randomized code is that if there is any ambiguity in the description of the random (real-world) experiment, then it would/should be resolved in this step of creating the randomized code.

The second advantage is that it allows you to easily imagine a probability tree associated with the randomized code. The probability tree gives you clear picture on what the sample space is and how to compute the probabilities of the outcomes. The probability tree corresponding to the above code is as follows.

image

This simple example may not illustrate the usefulness of having a computer code representation of the random experiment, but one can appreciate its value with more sophisticated examples and we do encourage you to think of random experiments as computer code: real-world experimentcomputer code / probability treeprobability space (Ω,Pr).\text{real-world experiment} \longrightarrow \text{computer code / probability tree} \longrightarrow \text{probability space } (\Omega, \Pr).

If a probability distribution Pr:Ω[0,1]\Pr : \Omega \to [0,1] is such that Pr[]=1/Ω\Pr[\ell] = 1/|\Omega| for all Ω\ell \in \Omega, then we call it a uniform distribution.

Let (Ω,Pr)(\Omega, \Pr) be a probability space. Any subset of outcomes EΩE \subseteq \Omega is called an event. We extend the Pr[]\Pr[\cdot] notation and write Pr[E]\Pr[E] to denote EPr[]\sum_{\ell \in E} \Pr[\ell]. Using this notation, Pr[]=0\Pr[\varnothing] = 0 and Pr[Ω]=1\Pr[\Omega] = 1. We use the notation E\overline{E} to denote the event Ω\E\Omega \backslash E.

  1. Suppose we roll two 66-sided dice. How many events are there? Write down the event corresponding to “we roll a double” and determine its probability.

  2. Suppose we roll a 33-sided die and see the number dd. We then roll a dd-sided die. How many different events are there? Write down the event corresponding to “the second roll is a 22” and determine its probability.

Let AA and BB be two events. Prove the following.

  • If ABA \subseteq B, then Pr[A]Pr[B]\Pr[A] \leq \Pr[B].

  • Pr[A]=1Pr[A]\Pr[\overline{A}] = 1 - \Pr[A].

  • Pr[AB]=Pr[A]+Pr[B]Pr[AB]\Pr[A \cup B] = \Pr[A] + \Pr[B] - \Pr[A \cap B].

We say that two events AA and BB are disjoint events if AB=A \cap B = \varnothing.

Let A1,A2,,AnA_1,A_2,\ldots,A_n be events. Then Pr[A1A2An]Pr[A1]+Pr[A2]++Pr[An].\Pr[A_1 \cup A_2 \cup \cdots \cup A_n] \leq \Pr[A_1] + \Pr[A_2] + \cdots + \Pr[A_n]. We get equality if and only if the AiA_i’s are pairwise disjoint.

Let EE be an event with Pr[E]0\Pr[E] \neq 0. The conditional probability of outcome Ω\ell \in \Omega given EE, denoted Pr[    E]\Pr[\ell \; | \;E], is defined as Pr[    E]={0if ∉EPr[]Pr[E]if E\Pr[\ell \; | \;E] = \begin{cases} 0 &\text{if $\ell \not \in E$}\\ \frac{\Pr[\ell]}{\Pr[E]} &\text{if $\ell \in E$} \end{cases} For an event AA, the conditional probability of AA given EE, denoted Pr[A    E]\Pr[A \; | \;E], is defined as Pr[A    E]=Pr[AE]Pr[E].\begin{aligned} \Pr[A \; | \;E] = \frac{\Pr[A \cap E]}{\Pr[E]}.\end{aligned}

Although it may not be immediately obvious, the above definition of conditional probability does correspond to our intuitive understanding of what conditional probability should represent. If we are told that event EE has already happened, then we know that the probability of any outcome outside of EE should be 00. Therefore, we can view the conditioning on event EE as a transformation of our probability space where we revise the probabilities (i.e., we revise the probability function Pr[]\Pr[\cdot]). In particular, the original probability space (Ω,Pr)(\Omega, \Pr) gets transformed to (Ω,PrE)(\Omega, \Pr_E), where PrE\Pr_E is such that for any ∉E\ell \not \in E, we have PrE[]=0\Pr_E[\ell] = 0, and for any E\ell \in E, we have PrE[]=Pr[]/Pr[E]\Pr_E[\ell] = \Pr[\ell] / \Pr[E]. The 1/Pr[E]1/\Pr[E] factor here is a necessary normalization factor that ensures the probabilities of all the outcomes sum to 11 (which is required by Definition (Finite probability space, sample space, probability distribution)). Indeed, ΩPrE[]=∉EPrE[]+EPrE[]=0+EPr[]/Pr[E]=1Pr[E]EPr[]=1.\begin{aligned} \textstyle \sum_{\ell \in \Omega} \Pr_E[\ell] & = \textstyle \sum_{\ell \not \in E} \Pr_E[\ell] + \sum_{\ell \in E} \Pr_E[\ell] \\ & = \textstyle 0 + \sum_{\ell \in E} \Pr[\ell]/\Pr[E] \\ & = \textstyle \frac{1}{\Pr[E]} \sum_{\ell \in E} \Pr[\ell] \\ & = 1.\end{aligned} If we are interested in the event “AA given EE” (denoted by A    EA \; | \;E) in the probability space (Ω,Pr)(\Omega, \Pr), then we are interested in the event AA in the probability space (Ω,PrE)(\Omega, \Pr_E). That is, Pr[A    E]=PrE[A]\Pr[A \; | \;E] = \Pr_E[A]. Therefore, Pr[A    E]=PrE[A]=PrE[AE]=Pr[AE]Pr[E],\textstyle \Pr[A \; | \;E] = \Pr_E[A] = \Pr_E[A \cap E] = \displaystyle \frac{\Pr[A \cap E]}{\Pr[E]}, where the last equality holds by the definition of PrE[]\Pr_E[ \cdot ]. We have thus recovered the equality in Definition (Conditional probability).

Conditioning on event EE can also be viewed as redefining the sample space Ω\Omega to be EE, and then renormalizing the probabilities so that Pr[Ω]=Pr[E]=1\Pr[\Omega] = \Pr[E] = 1.

image

Suppose we roll a 33-sided die and see the number dd. We then roll a dd-sided die. We are interested in the probability that the first roll was a 11 given that the second roll was a 11. First express this probability using the notation of conditional probability and then determine what the probability is.

Let n2n \geq 2 and let A1,A2,,AnA_1,A_2,\ldots,A_n be events. Then Pr[A1An]=Pr[A1]Pr[A2    A1]Pr[A3    A1A2]Pr[An    A1A2An1].\begin{aligned} & \Pr[A_1 \cap \cdots \cap A_n] = \\ & \qquad \Pr[A_1] \cdot \Pr[A_2 \; | \;A_1] \cdot \Pr[A_3 \; | \;A_1 \cap A_2] \cdots \Pr[A_n \; | \;A_1 \cap A_2 \cap \cdots \cap A_{n-1}].\end{aligned}

Suppose there are 100 students in 15-251 and 5 of the students are Andrew Yang supporters. We pick 3 students from class uniformly at random. Calculate the probability that none of them are Andrew Yang supporters using Proposition (Chain rule).

  • Let AA and BB be two events. We say that AA and BB are independent events if Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A]\cdot \Pr[B]. Note that if Pr[B]0\Pr[B] \neq 0, then this is equivalent to Pr[A    B]=Pr[A]\Pr[A \; | \;B] = \Pr[A]. If Pr[A]0\Pr[A] \neq 0, it is also equivalent to Pr[B    A]=Pr[B]\Pr[B \; | \;A] = \Pr[B].

  • Let A1,A2,,AnA_1, A_2,\ldots, A_n be events. We say that A1,,AnA_1,\ldots,A_n are independent if for any subset S{1,2,,n}S \subseteq \{1,2,\ldots,n\}, Pr[iSAi]=iSPr[Ai].\Pr\left[\bigcap_{i \in S} A_i\right] = \prod_{i\in S} \Pr[A_i].

Above we have given the definition of independent events as presented in 100% of the textbooks on probability theory. Yet, there is something deeply unsatisfying about this definition. In many situations people want to compute a probability of the form Pr[AB]\Pr[A \cap B], and if possible (if they are independent), would like to use the equality Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A] \Pr[B] to simplify the calculation. In order to do this, they will informally argue that events AA and BB are independent in the intuitive sense of the word. For example, they argue that if BB happens, then this doesn’t affect the probability of AA happening (this argument is not done by calculation, but by informal argument). Then using this, they justify using the equality Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A] \Pr[B] in their calculations. So really, secretly, people are not using Definition (Independent events) but some other non-formal intuitive definition of independence, and then concluding what the formal definition says, which is Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A] \Pr[B].

To be a bit more explicit, recall that the approach to answering probability related questions is to go from a real-world experiment we want to analyze to a formal probability space model: real-world experimentprobability space (Ω,Pr).\text{real-world experiment} \longrightarrow \text{probability space } (\Omega, \Pr). People often argue the independence of events AA and BB on the left-hand-side in order to use Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A] \Pr[B] on the right-hand-side. The left-hand-side, however, is not really a formal setting and may have ambiguities. And why does our intuitive notion of independence allow us to conclude Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A] \Pr[B]? In these situations, it helps to add the “computer code” step in between: real-world experimentcomputer codeprobability space (Ω,Pr).\text{real-world experiment} \longrightarrow \text{computer code} \longrightarrow \text{probability space } (\Omega, \Pr). Computer code has no ambiguities and we can give a formal definition of independence using it. Suppose you have a randomized code modeling the real-world experiment, and suppose that you can divide the code into two separate parts. Suppose AA is an event that depends only on the first part of the code, and BB is an event that depends only on the second part of the code. If you can prove that the two parts of the code cannot affect each other, then we say that AA and BB are independent. When AA and BB are independent in this sense, then one can verify that indeed the equality Pr[AB]=Pr[A]Pr[B]\Pr[A \cap B] = \Pr[A] \Pr[B] holds.

Let A1,A2,,An,BA_1,A_2,\ldots,A_n,B be events such that the AiA_i’s form a partition of the sample space Ω\Omega. Then Pr[B]=Pr[BA1]+Pr[BA2]++Pr[BAn].\Pr[B] = \Pr[B \cap A_1] + \Pr[B \cap A_2] + \cdots + \Pr[B \cap A_n]. Equivalently, Pr[B]=Pr[A1]Pr[B    A1]+Pr[A2]Pr[B    A2]++Pr[An]Pr[B    An].\Pr[B] = \Pr[A_1]\cdot \Pr[B \; | \;A_1] + \Pr[A_2]\cdot \Pr[B \; | \;A_2] + \cdots + \Pr[A_n]\cdot \Pr[B \; | \;A_n].

There are 2 bins. Bin 1 contains 6 red balls and 4 blue balls. Bin 2 contains 3 red balls and 7 blue balls. We choose a bin uniformly at random, and then choose one of the balls in that bin uniformly at random. Calculate the probability that the chosen ball is red using Proposition (Law of total probability).

A random variable is a function X:ΩR\mathbf{X} : \Omega \to \mathbb R, where Ω\Omega is a sample space.

Note that a random variable is just a labeling of the elements in Ω\Omega with some real numbers. One can think of this as a transformation of the original sample space into one that contains only numbers. For example, suppose the original sample space corresponds to flipping two coins. Then we can define a random variable X\mathbf{X} which maps an outcome in the sample space to the number of tails in the outcome. Since we are only flipping two coins, the possible outputs of X\mathbf{X} are 00, 11, and 22.

image

This kind of transformation is often desirable. For example, the transformation allows us to take a weighted average of the elements in the new sample space, where the weights correspond to the probabilities of the elements (if the distribution is uniform, the weighted average is just the regular average). This is called the expectation of the random variable and is formally defined below in Definition (Expected value of a random variable). Without this transformation into real numbers, the concept of an “expected value” (i.e. averaging) would not be possible to define.

Almost always, the random variables we consider in this course will have a range that is a subset of N\mathbb N.

Let X\mathbf{X} be a random variable and xRx \in \mathbb R be some real value. We use X=x to denote the event {Ω:X()=x},Xx to denote the event {Ω:X()x},Xx to denote the event {Ω:X()x},X<x to denote the event {Ω:X()<x},X>x to denote the event {Ω:X()>x}.\begin{aligned} \mathbf{X}=x & \quad \text{ to denote the event $\{\ell \in \Omega : \mathbf{X}(\ell) = x \}$},\\ \mathbf{X} \leq x & \quad \text{ to denote the event $\{\ell \in \Omega : \mathbf{X}(\ell) \leq x \}$},\\ \mathbf{X} \geq x & \quad \text{ to denote the event $\{\ell \in \Omega : \mathbf{X}(\ell) \geq x \}$},\\ \mathbf{X} < x & \quad \text{ to denote the event $\{\ell \in \Omega : \mathbf{X}(\ell) < x \}$},\\ \mathbf{X} > x & \quad \text{ to denote the event $\{\ell \in \Omega : \mathbf{X}(\ell) > x \}$}.\end{aligned} For example, Pr[X=x]\Pr[\mathbf{X}=x] denotes Pr[{Ω:X()=x}]\Pr[\{\ell \in \Omega : \mathbf{X}(\ell) = x\}]. More generally, for SRS \subseteq \mathbb R, we use XS to denote the event {Ω:X()S}.\begin{aligned} \mathbf{X} \in S & \quad \text{ to denote the event $\{\ell \in \Omega : \mathbf{X}(\ell) \in S \}$}.\end{aligned}

Suppose we roll two 66-sided dice. Let X\mathbf{X} be the random variable that denotes the sum of the numbers we see. Explicitly write down the input-output pairs for the function X\mathbf{X}. Calculate Pr[X7]\Pr[\mathbf{X} \geq 7].

Using the randomized code and probability tree point of view, we can simply define a random variable as a numerical variable in some randomized code (more accurately, the variable’s value at the end of the execution of the code). For example, consider the following randomized code.

    S = RandInt(6) + RandInt(6)
    if S == 12:
        I = 1
    else:
        I = 0

Here we have two random variables corresponding to the variables in the code, S\mathbf{S} and I\mathbf{I}. From the probability tree picture, we can see how these two random variables can be viewed as functions from the sample space (the set of leaves) to R\mathbb R since they map each leaf/outcome to some numerical value.

image

Given some probability space (Ω,Pr)(\Omega, \Pr) and a random variable X:ΩR\mathbf{X} : \Omega \to \mathbb R, we often forget about the original sample space and consider the sample space to be the range of X\mathbf{X}, range(X)={X():Ω}\text{range}(\mathbf{X}) = \{\mathbf{X}(\ell): \ell \in \Omega\}.

Let X:ΩR\mathbf{X} : \Omega \to \mathbb R be a random variable. The probability mass function (PMF) of X\mathbf{X} is a function pX:R[0,1]p_{\mathbf{X}} : \mathbb R\to [0,1] such that for any xRx \in \mathbb R, pX(x)=Pr[X=x]p_{\mathbf{X}}(x) = \Pr[\mathbf{X} = x].

Related to the previous remark, we sometimes “define” a random variable by just specifying its probability mass function. In particular, we make no mention of the underlying sample space.

Let X\mathbf{X} be a random variable. The expected value of X\mathbf{X}, denoted E[X]\mathbb E[\mathbf{X}], is defined as follows: E[X]=ΩPr[]X().\mathbb E[\mathbf{X}] = \sum_{\ell \in \Omega} \Pr[\ell]\cdot \mathbf{X}(\ell). Equivalently, E[X]=xrange(X)Pr[X=x]x,\mathbb E[\mathbf{X}] = \sum_{x \in \text{range}(\mathbf{X})} \Pr[\mathbf{X}=x]\cdot x, where range(X)={X():Ω}\text{range}(\mathbf{X}) = \{\mathbf{X}(\ell): \ell \in \Omega\}.

Prove that the above two expressions for E[X]\mathbb E[\mathbf{X}] are equivalent.

Suppose we roll two 66-sided dice. Let X\mathbf{X} be the random variable that denotes the sum of the numbers we see. Calculate E[X]\mathbb E[\mathbf{X}].

Fix some sample space Ω\Omega and consider random variables on this sample space. Since random variables are functions from Ω\Omega to R\mathbb R, arithmetic operations on random variables are arithmetic operations on functions, and these arithmetic operations produce a new random variable. For example, if X\mathbf{X} and Y\mathbf{Y} are random variables, then an expression like X+2XY\mathbf{X} + 2\mathbf{X}\mathbf{Y} denotes a new random variable, because it is also a function from Ω\Omega to R\mathbb R.

Let X\mathbf{X} and Y\mathbf{Y} be two random variables, and let c1,c2Rc_1,c_2 \in \mathbb R be some constants. Then E[c1X+c2Y]=c1E[X]+c2E[Y].\mathbb E[c_1\mathbf{X}+c_2\mathbf{Y}] = c_1\mathbb E[\mathbf{X}] + c_2\mathbb E[\mathbf{Y}].

Let X1,X2,,Xn\mathbf{X}_1,\mathbf{X}_2,\ldots,\mathbf{X}_n be random variables, and c1,c2,,cnRc_1,c_2,\ldots,c_n \in \mathbb R be some constants. Then E[c1X1+c2X2++cnXn]=c1E[X1]+c2E[X2]++cnE[Xn].\mathbb E[c_1\mathbf{X}_1 + c_2\mathbf{X}_2 + \cdots + c_n\mathbf{X}_n] = c_1\mathbb E[\mathbf{X}_1] + c_2\mathbb E[\mathbf{X}_2] + \cdots + c_n\mathbb E[\mathbf{X}_n]. In particular, when all the cic_i’s are 1, we get E[X1+X2++Xn]=E[X1]+E[X2]++E[Xn].\mathbb E[\mathbf{X}_1 + \mathbf{X}_2 + \cdots + \mathbf{X}_n] = \mathbb E[\mathbf{X}_1] + \mathbb E[\mathbf{X}_2] + \cdots + \mathbb E[\mathbf{X}_n].

Suppose we roll three 1010-sided dice. Let X\mathbf{X} be the sum of the three values we see. Calculate E[X]\mathbb E[\mathbf{X}].

Define two random variables X\mathbf{X} and Y\mathbf{Y} such that E[XY]E[X]E[Y]\mathbb E[\mathbf{X}\mathbf{Y}] \neq \mathbb E[\mathbf{X}] \mathbb E[\mathbf{Y}].

Let EΩE \subseteq \Omega be some event. The indicator random variable with respect to EE is denoted by IE\mathbf{I}_E and is defined as IE()={1if E,0otherwise.\mathbf{I}_E(\ell) = \begin{cases} 1 & \quad \text{if $\ell \in E$,}\\ 0 & \quad \text{otherwise.}\\ \end{cases}

Let EE be an event. Then E[IE]=Pr[E]\mathbb E[\mathbf{I}_E] = \Pr[E].

Suppose that you are interested in computing E[X]\mathbb E[\mathbf{X}] for some random variable X\mathbf{X}. If you can write X\mathbf{X} as a sum of indicator random variables, i.e., if X=jIEj\mathbf{X} = \sum_j \mathbf{I}_{E_j} where IEj\mathbf{I}_{E_j} are indicator random variables, then by linearity of expectation, E[X]=E[jIEj]=jE[IEj].\mathbb E[\mathbf{X}] = \mathbb E\left[\sum_j \mathbf{I}_{E_j}\right] = \sum_j \mathbb E[\mathbf{I}_{E_j}]. Furthermore, by Proposition (Expectation of an indicator random variable), we know E[IEj]=Pr[Ej]\mathbb E[\mathbf{I}_{E_j}] = \Pr[E_j]. Therefore E[X]=jPr[Ej]\mathbb E[\mathbf{X}] = \sum_j \Pr[E_j]. This often provides an extremely convenient way of computing E[X]\mathbb E[\mathbf{X}]. This combination of indicator random variables together with linearity expectation is one of the most useful tricks in probability theory!

  1. There are nn balls and nn bins. For each ball, you pick one of the bins uniformly at random and drop the ball in that bin. What is the expected number of balls in bin 1? What is the expected number of empty bins?

  2. Suppose you randomly color the vertices of the complete graph on nn vertices one of kk colors. What is the expected number of paths of length cc (where we assume c3c \geq 3) such that no two adjacent vertices on the path have the same color?

Let X\mathbf{X} be a non-negative random variable. Then for any c>0c > 0, Pr[XcE[X]]1c.\Pr[\mathbf{X} \geq c \cdot \mathbb E[\mathbf{X}]] \leq \frac{1}{c}. Or equivalently, Pr[Xc]E[X]c.\Pr[\mathbf{X} \geq c] \leq \frac{\mathbb E[\mathbf{X}]}{c}.

During the Spring 2022 semester, the 15-251 TAs decide to strike because they are not happy with the lack of free food in grading sessions. Without the TA support, the performance of the students in the class drop dramatically. The class average on the first midterm exam is 15%. Using Markov’s Inequality, give an upper bound on the fraction of the class that got an A (i.e., at least a 90%) in the exam.

Let 0<p<10 < p < 1 be some parameter. If X\mathbf{X} is a random variable with probability mass function pX(1)=pp_{\mathbf{X}}(1) = p and pX(0)=1pp_{\mathbf{X}}(0) = 1-p, then we say that X\mathbf{X} has a Bernoulli distribution with parameter pp (we also say that X\mathbf{X} is a Bernoulli random variable). We write XBernoulli(p)\mathbf{X} \sim \text{Bernoulli}(p) to denote this. The parameter pp is often called the success probability.

A Bernoulli random variable Bernoulli(p)\text{Bernoulli}(p) captures a random experiment where we toss a pp-biased coin where the probability of heads is pp (and we assign this the numerical outcome of 11) and the probability of tails is 1p1-p (and we assign this the numerical outcome of 00).

Note that E[X]=0pX(0)+1pX(1)=pX(1)=p\mathbb E[\mathbf{X}] = 0 \cdot p_{\mathbf{X}}(0) + 1 \cdot p_{\mathbf{X}}(1) = p_{\mathbf{X}}(1) = p.

Let X=X1+X2++Xn\mathbf{X} = \mathbf{X}_1 + \mathbf{X}_2 + \cdots + \mathbf{X}_n, where the Xi\mathbf{X}_i’s are independent and for all ii, XiBernoulli(p)\mathbf{X}_i \sim \text{Bernoulli}(p). Then we say that X\mathbf{X} has a binomial distribution with parameters nn and pp (we also say that X\mathbf{X} is a binomial random variable). We write XBin(n,p)\mathbf{X} \sim \text{Bin}(n,p) to denote this.

A Binomial random variable XBin(n,p)\mathbf{X} \sim \text{Bin}(n,p) captures a random experiment where we toss a pp-biased coin nn times. We are interested in the probability of seeing kk heads among those nn coin tosses, where kk ranges over {0,1,2,,n}=range(X)\{0,1,2,\ldots, n\} = \text{range}(\mathbf{X}).

Note that we can view a Bernoulli random variable as a special kind of a binomial random variable where n=1n = 1.

Let X\mathbf{X} be a random variable with XBin(n,p)\mathbf{X} \sim \text{Bin}(n,p). Determine E[X]\mathbb E[\mathbf{X}] (use linearity of expectation). Also determine X\mathbf{X}’s probability mass function.

We toss a coin 55 times. What is the probability that we see at least 4 heads? What is the expected number of heads?

Let X\mathbf{X} be a random variable with probability mass function pXp_{\mathbf{X}} such that for n{1,2,}n \in \{1,2,\ldots\}, pX(n)=(1p)n1pp_{\mathbf{X}}(n) = (1-p)^{n-1}p. Then we say that X\mathbf{X} has a geometric distribution with parameter pp (we also say that X\mathbf{X} is a geometric random variable). We write XGeometric(p)\mathbf{X} \sim \text{Geometric}(p) to denote this.

A Geometric random variable Geometric(p)\text{Geometric}(p) captures a random experiment where we successively toss a pp-biased coin until we see heads for the first time, and we stop. We are interested in the probability of making nn coin tosses in total before we stop, where nn ranges over {1,2,}\{1,2,\ldots\}.

image

Let X\mathbf{X} be a geometric random variable. Verify that n=1pX(n)=1\sum_{n = 1}^\infty p_{\mathbf{X}}(n) = 1.

Suppose we repeatedly flip a coin until we see a heads for the first time. What is the probability that we will flip the coin more than 55 times?

Let X\mathbf{X} be a random variable with XGeometric(p)\mathbf{X} \sim \text{Geometric}(p). Determine E[X]\mathbb E[\mathbf{X}].

Here are some general tips on probability calculations (this is not meant to be an exhaustive list).

  • If you are trying to upper bound Pr[A]\Pr[A], you can try to find BB with ABA \subseteq B, and then bound Pr[B]\Pr[B]. Note that if an event AA implies an event BB, then this means ABA \subseteq B. Similarly, if you are trying to lower bound Pr[A]\Pr[A], you can try to find BB with BAB \subseteq A, and then bound Pr[B]\Pr[B].

  • If you are trying to upper bound Pr[A]\Pr[A], you can try to lower bound Pr[A]\Pr[\overline{A}] since Pr[A]=1Pr[A]\Pr[A] = 1-\Pr[\overline{A}]. Similarly, if you are trying to lower bound Pr[A]\Pr[A], you can try to upper bound Pr[A]\Pr[\overline{A}].

  • In some situations, law of total probability can be very useful in calculating (or bounding) Pr[A]\Pr[A].

  • If you need to calculate Pr[A1An]\Pr[A_1 \cap \cdots \cap A_n], try the chain rule. If the events are independent, then this probability is equal to the product Pr[A1]Pr[An]\Pr[A_1] \cdots \Pr[A_n]. Note that the event “for all i{1,,n}i \in \{1,\ldots,n\}, AiA_i” is the same as A1AnA_1 \cap \cdots \cap A_n.

  • If you need to upper bound Pr[A1An]\Pr[A_1 \cup \cdots \cup A_n], you can try to use the union bound. Note that the event “there exists an i{1,,n}i \in \{1,\ldots,n\} such that AiA_i” is the same as A1AnA_1 \cup \cdots \cup A_n.

  • When trying to calculate E[X]\mathbb E[\mathbf{X}], try:

    1. directly using the definition of expectation;

    2. writing X\mathbf{X} as a sum of indicator random variables, and then using linearity of expectation.

  1. Describe what a probability tree is.

  2. True or false: If two events AA and BB are independent, then their complements A\overline{A} and B\overline{B} are also independent. (The complement of an event AA is A=Ω\A\overline{A} = \Omega \backslash A.)

  3. True or false: If events AA and BB are disjoint, then they are necessarily independent.

  4. True or false: For all events A,BA,B, Pr[A    B]Pr[A]\Pr[A \; | \;B] \le \Pr[A].

  5. True or false: For all events A,BA,B, Pr[A    B]=1Pr[A    B]\Pr[\overline{A} \; | \;B] = 1- \Pr[A \; | \;B].

  6. True or false: For all events A,BA,B, Pr[A    B]=1Pr[A    B]\Pr[A \; | \;\overline{B}] = 1 - \Pr[A \; | \;B].

  7. True or false: Assume that every time a baby is born, there is 1/21/2 chance that the baby is a boy. A couple has two children. At least one of the children is a boy. The probability that both children are boys is 1/21/2.

  8. What is the union bound?

  9. What is the chain rule?

  10. What is a random variable?

  11. What is an indicator random variable?

  12. What is the expectation of a random variable?

  13. What is linearity of expectation?

  14. When calculating the expectation of a random variable X\mathbf{X}, the strategy of writing X\mathbf{X} as a sum of indicator random variables and then using linearity of expectation is quite powerful. Explain how this strategy is carried out.

  15. True or false: Let X\mathbf{X} be a random variable. If E[X]=μ\mathbb E[\mathbf{X}] = \mu, then Pr[X=μ]>0\Pr[\mathbf{X} = \mu] > 0.

  16. True or false: For any random variable X\mathbf{X}, E[1/X]=1/E[X]\mathbb E[1/\mathbf{X}] = 1/\mathbb E[\mathbf{X}].

  17. True or false: For any random variable X\mathbf{X}, Pr[XE[X]]>0\Pr[\mathbf{X} \geq \mathbb E[\mathbf{X}]] > 0.

  18. True or false: For any non-negative random variable X\mathbf{X}, E[X2]E[X]2\mathbb E[\mathbf{X}^2] \le \mathbb E[\mathbf{X}]^2.

  19. True or false: For any random variable X\mathbf{X}, E[X3]=E[X3]\mathbb E[-\mathbf{X}^3] = - \mathbb E[\mathbf{X}^3].

  20. What is Markov’s inequality?

  21. What is a Bernoulli random variable? Give one example.

  22. What is the expectation of a Bernoulli random variable, and how do you derive it?

  23. What is a Binomial random variable? Give one example.

  24. What is the expectation of a Binomial random variable, and how do you derive it?

  25. What is a Geometric random variable? Give one example.

  26. What is the expectation of a Geometric random variable (justification not required)?

  27. Let X\mathbf{X} and Y\mathbf{Y} be random variables. Does the expression E[X    Y]=0\mathbb E[\mathbf{X} \; | \;\mathbf{Y}] = 0 type-check?

  28. Let AA be an event. Does the expression E[A]=0\mathbb E[A] = 0 type-check?

Let f:ΣΣf : \Sigma^* \to \Sigma^* be a computational problem. Let 0ϵ<10 \leq \epsilon < 1 be some parameter and T:NNT :\mathbb N\to \mathbb N be some function. Suppose AA is a randomized algorithm such that

  • for all xΣx \in \Sigma^*, Pr[A(x)f(x)]ϵ\Pr[A(x) \neq f(x)] \leq \epsilon;

  • for all xΣx \in \Sigma^*, Pr[number of steps A(x) takes is at most T(x)]=1\Pr[\text{number of steps $A(x)$ takes is at most $T(|x|)$}] = 1.

(Note that the probabilities are over the random choices made by AA.) Then we say that AA is a T(n)T(n)-time Monte Carlo algorithm that computes ff with ϵ\epsilon probability of error.

Let f:ΣΣf : \Sigma^* \to \Sigma^* be a computational problem. Let T:NNT :\mathbb N\to \mathbb N be some function. Suppose AA is a randomized algorithm such that

  • for all xΣx \in \Sigma^*, Pr[A(x)=f(x)]=1\Pr[A(x) = f(x)] = 1, where the probability is over the random choices made by AA;

  • for all xΣx \in \Sigma^*, E[number of steps A(x) takes]T(x)\mathbb{E}[\text{number of steps $A(x)$ takes}] \leq T(|x|).

Then we say that AA is a T(n)T(n)-time Las Vegas algorithm that computes ff.

One can adapt the definitions above to define the notions of Monte Carlo algorithms and Las Vegas algorithms that compute decision problems (i.e. languages).

Suppose you are given a Las Vegas algorithm AA that solves f:ΣΣf:\Sigma^* \to \Sigma^* in expected time T(n)T(n). Show that for any constant ϵ>0\epsilon > 0, there is a Monte Carlo algorithm that solves ff in time O(T(n))O(T(n)) and error probability ϵ\epsilon.

Suppose you are given a Monte Carlo algorithm AA that runs in worst-case T1(n)T_1(n) time and solves f:ΣΣf:\Sigma^* \to \Sigma^* with success probability at least pp (i.e., for every input, the algorithm gives the correct answer with probability at least pp and takes at most T1(n)T_1(n) steps). Suppose it is possible to check in T2(n)T_2(n) time whether the output produced by AA is correct or not. Show how to convert AA into a Las Vegas algorithm that runs in expected time O((T1(n)+T2(n))/p)O((T_1(n)+T_2(n))/p).

In the minimum cut problem, the input is a connected undirected graph GG, and the output is a 22-coloring of the vertices, where each color is used at least once, such that the number of edges whose endpoints are colored differently is minimized (such edges are called cut edges). Equivalently, we want to output a non-empty subset SVS \subsetneq V such that the number of edges between SS and V\SV\backslash S is minimized. Such a set SS is called a cut and the size of the cut is the number of edges between SS and V\SV\backslash S (note that the size of the cut is not the number of vertices). We denote this problem by MIN-CUT\text{MIN-CUT}.

A multi-graph G=(V,E)G=(V,E) is an undirected graph in which EE is allowed to be a multi-set. In other words, a multi-graph can have multiple edges between two vertices.

Let G=(V,E)G=(V,E) be a multi-graph and let u,vVu,v \in V be two vertices in the graph. Contraction of uu and vv produces a new multi-graph G=(V,E)G'=(V',E'). Informally, in GG', we collapse/contract the vertices uu and vv into one vertex and preserve the edges between these two vertices and the other vertices in the graph. Formally, we remove the vertices uu and vv, and create a new vertex called uvuv, i.e. V=V\{u,v}{uv}V' = V \backslash \{u,v\} \cup \{uv\}. The multi-set of edges EE' is defined as follows:

  • for each {u,w}E\{u,w\} \in E with wvw \neq v, we add {uv,w}\{uv,w\} to EE';

  • for each {v,w}E\{v,w\} \in E with wuw \neq u, we add {uv,w}\{uv,w\} to EE';

  • for each {w,w}E\{w,w'\} \in E with w,w∉{u,v}w,w' \not \in \{u,v\}, we add {w,w}\{w,w'\} to EE'.

Below is an example:

image

There is a polynomial-time Monte-Carlo algorithm that solves the MIN-CUT problem with error probability at most 1/en1/e^n, where nn is the number of vertices in the input graph.

This question asks you to boost the success probability of a Monte Carlo algorithm computing a decision problem with one-sided error.

Let f:Σ{0,1}f:\Sigma^* \to \{0,1\} be a decision problem, and let AA be a Monte Carlo algorithm for ff such that if xx is a YES instance, then AA always gives the correct answer, and if xx is a NO instance, then AA gives the correct answer with probability at least 1/2. Suppose AA runs in worst-case O(T(n))O(T(n)) time. Design a new Monte Carlo algorithm AA' for ff that runs in O(nT(n))O(nT(n)) time and has error probability at most 1/2n1/2^n.

This question asks you to boost the success probability of a Monte Carlo algorithm computing a decision problem with two-sided error.

Let f:Σ{0,1}f:\Sigma^* \to \{0,1\} be a decision problem, and let AA be a Monte Carlo algorithm for ff with error probability 1/41/4, i.e., for all xΣx \in \Sigma^*, Pr[A(x)f(x)]1/4\Pr[A(x) \neq f(x)] \leq 1/4. We want to boost the success probability to 11/2n1-1/2^n, and our strategy will be as follows. Given xx, run A(x)A(x) 6n6n times (where n=xn = |x|), and output the more common output bit among the 6n6n output bits (breaking ties arbitrarily). Show that the probability of outputting the wrong answer is at most 1/2n1/2^n.

Using the analysis of the randomized minimum cut algorithm, show that a graph can have at most n(n1)/2n(n-1)/2 distinct minimum cuts.

Suppose we modify the min-cut algorithm seen in lecture so that rather than picking an edge uniformly at random, we pick 2 vertices uniformly at random and contract them into a single vertex. True or False: The success probability of the algorithm (excluding the part that boosts the success probability) is 1/nk1/n^k for some constant kk, where nn is the number of vertices. Justify your answer.

  1. True or false: When analyzing a randomized algorithm, we assume that the input is chosen uniformly at random.

  2. What is the difference between a Monte Carlo algorithm and a Las Vegas algorithm?

  3. Describe the probability tree induced by a Monte Carlo algorithm on a given input.

  4. Describe the probability tree induced by a Las Vegas algorithm on a given input.

  5. Describe at a high level how to convert a Las Vegas algorithm into a Monte Carlo algorithm.

  6. Describe at a high level how to covert a Monte Carlo algorithm into a Las Vegas algorithm.

  7. In this chapter, we defined the MIN-CUT problem. In the MAX-CUT problem, given a graph G=(V,E)G=(V,E), we want to output a subset SVS \subseteq V such that the number of edges between SS and V\SV\backslash S is maximized. Suppose for every vertex vVv \in V, we flip a fair coin and put the vertex in SS if the coin comes up heads. Show that the expected number of cut edges is E/2|E|/2.

  8. Argue, using the result from previous question, why every graph with E|E| edges contains a cut of size at least E/2|E|/2.

  9. Outline a general strategy for boosting the success probability of a Monte Carlo algorithm that computes a decision problem.

  10. True or false: For any decision problem, there is a polynomial-time Monte Carlo algorithm that computes it with error probability equal to 1/21/2.

  11. Suppose we have a Monte-Carlo algorithm for a decision problem with error probability equal to 1/21/2. Can we boost the success probability of this algorithm by repeated trials?

  12. True or false: The cut output by the contraction algorithm is uniformly random among all possible cuts.

  13. True or false: The size of a minimum cut in a graph is equal to the minimum degree of a vertex in the graph.

  14. Give an example of a problem for which we have a polynomial-time randomized algorithm, but we do not know of a polynomial-time deterministic algorithm.

  1. As always, understanding the definitions is important. Make sure you are comfortable with the definitions of Monte Carlo and Las Vegas algorithms.

  2. We require worst-case guarantees for randomized algorithms. In particular, in this course, we never consider a randomly chosen input.

  3. One of the best ways to understand randomized algorithms is to have a clear understanding on how they induce a probability tree and what properties those probability trees have. We have emphasized this point of view in lecture even though it does not appear in this chapter.

  4. It is important to know how to convert a Monte Carlo algorithm into a Las Vegas algorithm and vice versa.

  5. The main example in this chapter is the analysis of the contraction algorithm for MIN-CUT. There are several interesting ideas in the analysis, and you should make note of those ideas/tricks. Especially the idea of boosting the success probability of a randomized algorithm via repeated trials is important.

  6. One extremely useful trick that was highlighted in the previous chapter is the calculation of an expectation of a random variable by writing it as a sum of indicator random variables and using linearity of expectation (see Important Note (Combining linearity of expectation and indicators) and the exercise after it). This trick comes up a lot in the context of randomized algorithms. In particular, you will very likely be asked a question of the form “Give a randomized algorithm for problem X with the property that the expected number of Y is equal to Z.” As an example, see the analysis of the randomized algorithm for MAX-CUT problem covered in lecture.

Let A,BZA, B \in \mathbb Z. We say that AA divides BB (or AA is a divisor of BB), denoted ABA | B, if there is a number CZC \in \mathbb Z such that B=ACB = AC.

Let PNP \in \mathbb N. We say that PP is a prime number if P2P \geq 2 and the only divisors of PP are 11 and PP.

Let A,BA, B and NN be positive integers. We denote by AmodNA \bmod N the remainder you get when you divide AA by NN. Note that AmodN{0,1,2,,N1}A \bmod N \in \{0,1,2,\ldots,N-1\}. We say that AA and BB are congruent modulo NN, denoted ANBA \equiv_N B (or ABmodNA \equiv B \bmod N), if AmodN=BmodNA \bmod N = B \bmod N.

Show that ANBA \equiv_N B if and only if N(AB)N | (A - B).

The above characterization of ANBA \equiv_N B can be taken as the definition of ANBA \equiv_N B. Indeed, it is used a lot in proofs.

We write gcd(A,B)\gcd(A, B) to denote the greatest common divisor of AA and BB. Note that for any AA, gcd(A,1)=1\gcd(A,1) = 1 and gcd(A,0)=A\gcd(A, 0) = A.

We say that AA and BB are relatively prime if gcd(A,B)=1\gcd(A,B) = 1.

We let ZN\mathbb Z_N denote the set {0,1,2,,N1}\{0,1,2,\ldots,N-1\}.

For A,BZNA, B \in \mathbb Z_N, we define the addition of AA and BB, denoted A+NBA +_N B, as (A+B)modN(A+B) \bmod N. When NN is clear from the context, we can drop the subscript NN from +N+_N and write ++. For N=5N=5, we can represent the addition operation in Z5\mathbb Z_5 using the following table.

image

In ZN\mathbb Z_N, the element 0 is called the additive identity. It has the property that for any AZNA \in \mathbb Z_N, A+N0=0+NA=AA +_N 0 = 0 +_N A = A.

  • Show that if ANBA \equiv_N B and ANBA' \equiv_N B', then A+ANB+BA+ A' \equiv_N B+B'.

  • Show that for any A,BZA,B \in \mathbb Z, (A+B)modN=(AmodN)+N(BmodN).(A + B) \bmod N = (A \bmod N) +_N (B \bmod N).

Let AZNA \in \mathbb Z_N. The additive inverse of AA, denoted A-A, is defined to be an element in ZN\mathbb Z_N such that A+NA=0A +_N -A = 0.

Show that every element of ZN\mathbb Z_N has a unique additive inverse.

Let A,BZNA, B \in \mathbb Z_N. We define “AA minus BB”, denoted ANBA -_N B, as A+NBA +_N -B.

Show that in the addition table of ZN\mathbb Z_N, every row and column is a permutation of the elements ZN\mathbb Z_N.

For A,BZNA, B \in \mathbb Z_N, we define the multiplication of AA and BB, denoted ANBA \cdot_N B, as ABmodNAB \bmod N. If NN is clear from the context, we can drop the subscript NN from N\cdot_N and write \cdot. Furthermore, we can even drop \cdot and represent ANBA \cdot_N B as simply ABAB.

  • Show that if ANBA \equiv_N B and ANBA' \equiv_N B', then AANBBAA' \equiv_N BB'.

  • Show that for any A,BZA,B \in \mathbb Z, ABmodN=(AmodN)N(BmodN).AB \bmod N = (A \bmod N) \cdot_N (B \bmod N).

Let AZNA \in \mathbb Z_N. The multiplicative inverse of AA, denoted A1A^{-1}, is defined to be an element in ZN\mathbb Z_N such that ANA1=1A \cdot_N A^{-1} = 1.

Let A,NNA,N \in \mathbb N. The multiplicative inverse of AA in ZN\mathbb Z_N exists if and only if gcd(A,N)=1\gcd(A,N) = 1.

Let A,BZNA, B \in \mathbb Z_N, where BB has a multiplicative inverse B1B^{-1}. Then we define “AA divided by BB”, denoted A/NBA /_N B, as ANB1A \cdot_N B^{-1}.

We let ZN\mathbb Z_N^* denote the set {AZN:gcd(A,N)=1}\{A \in \mathbb Z_N : \gcd(A, N) = 1\}. In other words, ZN\mathbb Z_N^* is the set of all elements of ZN\mathbb Z_N that have a multiplicative inverse.

Show that ZN\mathbb Z_N^* is closed under multiplication, i.e., A,BZN    ANBZNA,B \in \mathbb Z_N^* \implies A \cdot_N B \in \mathbb Z_N^*.

Similar to an addition table for ZN\mathbb Z_N, one can consider a multiplication table for ZN\mathbb Z_N^*. For example, Z8={1,3,5,7}\mathbb Z_8^* = \{1, 3, 5, 7\}, and the multiplication table is as below:

image

Show that in the multiplication table of ZN\mathbb Z_N^*, every row and column is a permutation of the elements ZN\mathbb Z_N^*.

The Euler totient function φ:NN\varphi : \mathbb N\to \mathbb N is defined as φ(N)=ZN\varphi(N) = |\mathbb Z_N^*|. By convention, φ(0)=0\varphi(0) = 0.

Show that for PP a prime number, φ(P)=P1\varphi(P) = P-1. Also show that for PP and QQ distinct prime numbers, φ(PQ)=(P1)(Q1)\varphi(PQ) = (P-1)(Q-1).

Let AZNA \in \mathbb Z_N and EZE \in \mathbb Z. We write AEA^E to denote ANANNAE times.\underbrace{A \cdot_N A \cdot_N \cdots \cdot_N A}_{E \text{ times}}.

For any AZNA \in \mathbb Z_N^*, Aφ(N)=1A^{\varphi(N)} = 1. Equivalently, for any A,NZA,N \in \mathbb Z with gcd(A,N)=1\gcd(A,N) = 1, Aφ(N)N1A^{\varphi(N)} \equiv_N 1.

When NN is a prime number, then Euler’s Theorem is known as Fermat’s Little Theorem.

Let AZNA \in \mathbb Z_N^* and EZE \in \mathbb Z. Show that AENAEmodφ(N)A^E \equiv_N A^{E \bmod \varphi(N)}.

Compute by hand 10298mod7102^{98} \bmod 7.

What the previous two exercises demonstrate is that if we are exponentiating an element AZNA \in \mathbb Z_N^*, then we can effectively think of the exponent as living in the set Zφ(N)\mathbb Z_{\varphi(N)}. This will be important to keep in mind when we cover Cryptography later.

Let AZNA \in \mathbb Z_N^*. We say that AA is a generator if {AE:EZφ(N)}=ZN.\{A^E : E \in \mathbb Z_{\varphi(N)}\} = \mathbb Z_N^*.

If PP is a prime number, then ZP\mathbb Z_P^* contains a generator.

gcd(A,B)\gcd(A, B):

  • If B=0B = 0, then return AA.

  • Else return gcd(B,AmodB)\gcd(B, A \bmod B).

Show that if ABA \geq B, gcd(A,B)=gcd(AB,B)\gcd(A,B) = \gcd(A-B, B). Use this to show that Euclid’s Algorithm correctly computes the greatest common divisor of two numbers.

Suppose AA and BB can be represented with at most nn bits each. Give an upper bound on the number of recursive calls Euclid’s Algorithm makes in terms of nn.

Let A,B,CNA,B,C \in \mathbb N. We say that CC is a miix of AA and BB if C=kA+BC = k A + \ell B for some k,Zk, \ell \in \mathbb Z.

Let A,B,CNA,B,C \in \mathbb N. Show that if CC is a miix of AA and BB then CC is a multiple of gcd(A,B)\gcd(A,B).

Show how to extend Euclid’s algorithm so that it outputs kk and \ell such that gcd(A,B)=kA+B\gcd(A,B) = k A + \ell B. Then conclude that if CC is any multiple of gcd(A,B)\gcd(A,B), then CC is a miix of AA and BB.

Prove Proposition (Multiplicative inverse characterization) using the previous two exercises.

FME(A,E,N)\text{FME}(A, E, N):

  • Repeatedly square AA to obtain
    A2modNA^2 \bmod N, A4modNA^4 \bmod N, A8modNA^8 \bmod N, \ldots, A2nmodNA^{2^n} \bmod N.

  • Multiply together (modulo NN) the powers of AA so that the product is AEA^E.
    To figure out which powers to multiply, look at the binary representation of EE.

Suppose A,EA, E and NN are integers that can be represented using at most nn bits. Give an upper bound on the running time of the above algorithm in terms of nn.

  1. Wilson’s theorem states that NN is prime if and only if (N1)!NN1(N-1)! \equiv_N N-1. Can we use this fact in the obvious way to design a polynomial time algorithm for isPrime?

  2. True or false: Suppose that AKN1A^K \equiv_N 1, where A1A \neq 1. Then φ(N)K\varphi(N) | K.

  3. Determine, with proof, 75010002(mod251)750^{10002} \pmod{251}. Note that 251251 is prime.

  4. Determine, with proof, 251(15251)(mod7)251^{(15^{251})} \pmod{7}.

  5. Let GG be a generator in ZN\mathbb Z_N^*, and let RZφ(N)R \in \mathbb Z_{\varphi(N)} be chosen uniformly at random. For every AZNA \in \mathbb Z_N^*, determine Pr[GR=A]\Pr[G^R = A].

  6. Let RZNR \in \mathbb Z_N^* be chosen uniformly at random. For every A,BZNA, B \in \mathbb Z_N^*, determine Pr[ANR=B]\Pr[A \cdot_N R = B].

  7. What is Euler’s Theorem?

  8. What is a generator in ZN\mathbb Z_N^*?

  9. Consider the addition, subtraction, multiplication, division, exponentiation, taking logarithm, and taking root. Which of these operations are known to be polynomial-time solvable when the universe is the integers? Which of them are known to be polynomial-time solvable in the modular universe?

  10. What is an efficient algorithm for finding the multiplicative inverse of an element in ZN\mathbb Z_N^*?

  1. The theory behind modular arithmetic should mostly be review. It is important to know how the basic operations are defined and what kind of properties they have.

  2. For which modular arithmetic operations do we have an efficient algorithm for (and what is that algorithm)? For which operations do we not expect to have an efficient algorithm? And for these operations, why don’t the “obvious” algorithms work? You should have a solid understanding of all these. Please internalize Important Note (Our model when measuring running time) and Important Note (Integer inputs are large numbers).

Note that in the one-time pad cryptographic scheme described above, the underlying universe that we are dealing with is Z2n\mathbb Z_2^n with the operation being bit-wise xor. Describe a similar scheme that uses ZN\mathbb Z_N^* as the universe.

  1. Consider the one-time pad cryptographic system. Show that for any plaintext M{0,1}nM \in \{0,1\}^n, if the key K{0,1}nK \in \{0,1\}^n is chosen uniformly at random, then the ciphertext CC is a uniformly random element of {0,1}n\{0,1\}^n.

  2. Describe the Diffie-Hellman secret key exchange protocol.

  3. Quantum computers can compute the discrete log problem efficiently. Explain how quantum computers can be used to break the Diffie-Hellman secret key exchange protocol.

  4. Describe the RSA public-key protocol.

  5. Quantum computers can compute the discrete root problem efficiently. Explain how quantum computers can be used to break the RSA public-key protocol.

  6. Quantum computers can compute the factoring problem (given an integer NN, output its prime factors) efficiently. Explain how quantum computers can be used to break the RSA public-key protocol.

  1. Understand the general outline of how private-key protocols and public-key protocols work, independent of specific implementations of these ideas.

  2. Make sure you understand how the one-time pad private-key protocol works and why it is a perfectly secure protocol.

  3. You should be able to describe the Diffie-Hellman secret key exchange protocol and how it relates to the assumed hardness of the discrete log problem.

  4. You should be able to describe the RSA public-key protocol and how it relates to the assumed hardness of the discrete root problem. You should also understand the relevance of the factoring problem in this setting.