MTH 6109: Combinatorics - QMUL Mathsdellis/MTH6109/Combinatorics... · 2012-12-14 · To state the...

MTH 6109: Combinatorics

Dr David Ellis

Autumn Semester 2012

Contents

1 Counting 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Counting sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Counting subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 The inclusion-exclusion principle . . . . . . . . . . . . . . . . . . . 161.5 Counting surjections . . . . . . . . . . . . . . . . . . . . . . . . . 201.6 Permutations and derangements . . . . . . . . . . . . . . . . . . . 22

2 Recurrence relations & generating series 352.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2 Solving recurrence relations . . . . . . . . . . . . . . . . . . . . . 362.3 Generating series . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3 Graphs 753.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.2 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.3 Bipartite graphs and matchings . . . . . . . . . . . . . . . . . . . 85

4 Latin squares 954.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.2 Orthogonal latin squares . . . . . . . . . . . . . . . . . . . . . . . 984.3 Upper bounds on the number of latin squares . . . . . . . . . . . 1054.4 Transverals in Latin Squares . . . . . . . . . . . . . . . . . . . . . 107

3

4 CONTENTS

Chapter 1

Counting sequences, subsets,integer partitions, andpermutations

1.1 Introduction

Combinatorics is a very broad, rich part of mathematics. It is mostly aboutthe size and properties of finite structures. Often in combinatorics, we want toknow whether it is possible to arrange a set of objects into a pattern satisfyingcertain rules. If it is possible, we want to know how many such patterns thereare. And can we come up with an explicit recipe, or algorithm, for producingsuch a pattern?

Here is an example. The great mathematician Leonhard Euler asked thefollowing question in 1782.

‘There are 6 different regiments. Each regiment has 6 soldiers, one of eachof 6 different ranks. Can these 36 soldiers be arranged in a square formation sothat each row and each column contains one soldier of each rank and one fromeach regiment?’

Euler conjectured that the answer is ‘no’, but it was not until 1900 that thiswas proved correct. He also conjectured that the answer is ‘no’ if six is replacedby 10, 14, or any number congruent to 2 mod 4. He was completely wrong aboutthis, but this was not discovered until the 1960’s.

Euler’s formations are known as mutually orthogonal latin squares; we willstudy them later in the course.

Note that if we replace ‘6’ with ‘3’, then such an arrangment is possible: ifthe regiments are labelled 1,2 and 3, and the ranks are labelled a,b and c, thenthe following works:

5

6 CHAPTER 1. COUNTING

a1 b2 c3b3 c1 a2c2 a3 b1

Challenge: work out how many such arrangements there are! (You may wantto wait until after the first 6 lectures, before tackling this.)

While it includes many interesting and entertaining puzzles, combinatorics isalso of great importance in the modern digital world. Much of computer sciencecan be seen as combinatorics, and indeed, computer scientists and combinatori-alists are interested in many of the same problems. Even Euler’s ‘puzzle’ turnsout to be relevant to the construction of error-correcting codes.

1.2 Counting sequences

In this chapter, we’ll be concerned with working out how many there are of somevery common mathematical patterns, or structures.

Let’s start with some simple, but important, examples.

Example 1. How many sequences of length 3 can we make using the lettersa,b,c,d,e? (Order is important, repetition is allowed.)

Answer: There are 5 choices for the first letter. For each choice of the firstletter, there are 5 choices for the second letter. And for each choice of the first twoletters, there are 5 choices for the third letter. So the answer is 5× 5× 5 = 125.

Example 2. How many sequences in Example 1 have no repetitions?

Answer: There are 5 choices for the first letter. For each choice of the firstletter, there are 4 choices for the second letter. For each choice of the first twoletters, there are 3 choices for the third letter. So the answer is 5× 4× 3 = 60.

Example 3. How many sequences in Example 2 contain the letter a?

Answer: there are 3 choices of where to put the letter a. There are then 4choices of which letter to put in the first remaining space, and then 3 choices ofwhich letter to put in the second remaining space. So the answer is 3×4×3 = 36.

We can generalize Example 1 as follows.

Example 4. Suppose X is a set of n elements. How many sequences of length kcan we make, using elements of X?

Answer: There are n choices for the first letter in the sequence. For eachchoice of the first letter, there are n choices for the second. And so on, until, foreach choice of the first k− 1 letters, there are n choices for the kth letter. So theanswer is

n× n× n× . . .× n︸︷︷︸k times

= nk.

1.2. COUNTING SEQUENCES 7

Aside: making proofs formal

It is intuitively obvious that this is the right answer, but the argument above isnot a totally formal proof, because of the ‘and so on’ in the middle. To make itformal, we need to use two principles: the principle of induction, and the bijectionprinciple.

You are all familiar with the principle of mathematical induction.

Principle 1 (The principle of mathematical induction). For each natural numbern, let P (n) be a statement, which can either be true or false. For example, P (n)might be ‘1 + 2 + . . .+ n = 1

2n(n+ 1).’ Suppose that:

• P (1) is true;

• For each n, P (n) implies P (n+ 1).

Then P (n) is true for all natural numbers n.

To state the bijection principle, we need some definitions. Let X and Y besets, and let f : X → Y be a function.

Definition. We say that f is an injection if f(x) = f(x′) implies that x = x′.In other words, any element of Y has at most one pre-image.

Definition. We say that f is a surjection if for every y ∈ Y , there exists anx ∈ X such that f(x) = y. In other words, every element of Y has at least onepre-image.

Definition. We say that f is a bijection if it is both an injection and a surjection.In other words, every element of Y has exactly one pre-image.

We can now state the bijection principle.

Principle 2 (The bijection principle). If X and Y are finite sets, and there existsa bijection from X to Y , then |X| = |Y |.

A bijection from X to Y is simply a way of pairing up the elements of X withthe elements of Y , so that each element of Y is paired with exactly one element ofX. It is also known as a ‘one-to-one correspondence’ between X and Y . If thereis a one-to-one correspondence between X and Y , we sometimes denote this factusing a double arrow, X ↔ Y .

Remark 1. Recall that f : X → Y is a bijection if and only if there exists afunction g : Y → X such that

• g ◦ f = IdX , the identity function on X, and

• f ◦ g = IdY , the identity function on Y .


The function g is called the inverse of f .

Armed with these two principles, we can now give a formal proof of the answerto Example 4.

Theorem 1. Let n and k be positive integers. Let X be a set of size n. Then thenumber of sequences of length k which can be made using elements of X, is nk.

Proof. Let us write Xk for the set of sequences of length k which can be madeusing elements of X. Our aim is to show that

|Xk| = nk (1.1)

for all k ∈ N.First, observe that |Xk| = n|Xk−1|. This is because, for every sequence of

length k − 1, we can construct n sequences of length k by choosing any elementof X and joining it to the end of the sequence. Every sequence of length k isobtained exactly once in this way. (Formally, the function

f : Xk−1 ×X → Xk;

(S, x) 7→ (S followed by x)

is a bijection.) This shows that |Xk| = n|Xk−1|.We can now prove (1.1) using induction on k. There are n sequences of length

1, so |X1| = n, so (1.1) holds for k = 1. Now suppose (1.1) holds for k − 1, i.e.|Xk−1| = nk−1. Then by the above fact, we have |Xk| = n|Xk−1| = n×nk−1 = nk,so (1.1) holds for k also. Therefore, by induction, (1.1) holds for all k ∈ N.

In general, the words ‘and so on’ (or . . .), in a proof, indicate an inductionargument, which is sufficiently obvious that it need not be spelt out. Whenanswering coursework or exam questions, you do not need to write down theformal argument in the ‘aside’, but it is good to know what lies underneath ‘. . .’in a proof!

Remark 2. Observe that the number of sequences of length k, using elementsof X, is just the same as the number of functions from {1, 2, . . . , k} to X. In-deed, there is an one-to-one correspondence, or bijection, between Xk and the setof functions from {1, 2, . . . , k}: just pair up the sequence (x1, . . . , xk) with thefunction

i 7→ xi (i ∈ [k]).

Sequences without repetition

Example 5. Suppose X is a set of n elements. How many sequences of length kcan we make, using elements of X, without repetition?

1.3. COUNTING SUBSETS 9

Answer: there are n choices for the first letter in the sequence. For each choiceof the first letter, there are n − 1 choices for the second. For each choice of thefirst two letters, there are n − 2 choices for the third. And so on, until for eachchoice of the first k − 1 letters, there are n− (k − 1) = n− k + 1 choices for thekth letter. So the answer is

n(n− 1) . . . (n− k + 1).

Remark 3. This can be made into a formal proof, just like Theorem 1, usinginduction. Exercise: write down this proof !

Remark 4. Note that the number of sequences of length k which we can make us-ing elements of X without repetition is just the same as the number of injectionsfrom {1, 2, . . . , k} to X. This should be ‘obvious’ by now. But it is worth bear-ing in mind that ‘a mathematical statement is obvious if the proof writes itself.’Exercise: write down this proof ! (One sentence.)

Example 6. How many sequences of length n can we make using the elementsof {1, 2, . . . , n} without repetition?

This is just a special case of Example 5, where n = k, so the answer isn(n− 1) . . . (2)(1).

Of course, a sequence of length n made out of the numbers in {1, 2, . . . , n}without repetition, must contain each number exactly once. So it is just a re-ordering of the numbers 1, 2, . . . , n. There is an obvious one-to-one correspon-dence between reorderings of 1, 2, . . . , n, and bijections from {1, 2, . . . , n} to itself.As you know, a bijection from a set X to itself is known as a permutation of X.So we see that the number of permutations of {1, 2, . . . , n} is n(n− 1) . . . (2)(1).

This number is so important that we give it a name; it is written as n!,pronounced ‘n factorial’:

n! := n(n− 1) . . . (2)(1).

In terms of factorials, we can rewrite the answer to Example 5 as

n(n− 1) . . . (n− k + 1) =n!

(n− k)!.

This number is known as the kth falling factorial moment of n. It is sometimeswritten as nPk.

1.3 Counting subsets

Example 7. If X is an n-element set, how many subsets of size k does it have?


A k-element subset can be viewed as an unordered sequence of k distinct ele-ments of X. We know already that the number of (ordered) k-element sequencesof distinct elements of X is n(n− 1) . . . (n− k+ 1). Let’s try to relate this to thenumber of k-element subsets of X.

We can generate (ordered) k-element sequences of distinct elements of X asfollows. First choose a k-element subset of X, and then choose a way of orderingits elements to produce a length-k sequence of distinct elements of X. Eachlength-k sequence of distinct elements of X is produced exactly once by thisprocess. There are k! orderings of each k-element set, so we obtain:

n(n− 1) . . . (n− k + 1) = (number of k-element subsets of X)× k!.

Therefore, the number of k-element subsets of X is

n(n− 1) . . . (n− k + 1)

k!=

n!

k!(n− k)!.

This number is written(nk

), pronounced ‘n choose k’:(

n

k

):=

n!

k!(n− k)!(1.2)

(It is also sometimes written as nCk, but we will not use this notation.)This argument is an example of the proof-technique of ‘double-counting’,

which occurs very often in combinatorics. We want to count a certain quan-tity, so we do it by counting another related quantity in two different ways, andthen rearranging to get an expression for the first quantity.

Exercise 1. Show that(nk

)=(

nn−k

), where n and k are non-negative integers

with 0 ≤ k ≤ n.

(i) Using the formula (1.2);

(ii) By means of a suitable bijection between k-element subsets of {1, 2, . . . , n}and (n− k)-element subsets of {1, 2, . . . , n}.

Example 8. If X is an n-element set, how many subsets of X are there (norestriction on size)?

In fact, this turns out to be easier than counting k-element subsets.

Theorem 2. If X is an n-element set, then the number of subsets of X is 2n.

Proof 1. Let X = {x1, . . . , xn}. Observe that we can choose a subset of X usingthe following n-stage process.

Stage 1: either x1 ∈ S or x1 6∈ S (2 choices);


Stage 2: either x2 ∈ S or x2 6∈ S (2 choices);

. . .

Stage n: either xn ∈ S or xn 6∈ S (2 choices).

Hence there are 2n subsets altogether.

Proof 2. We observe that there is a one-to-one correspondence (a bijection) be-tween subsets of X and length-n sequences of 0’s and 1’s. As above, let X ={x1, . . . , xn}. For each S ⊂ X, pair up S with the sequence which has a 1in the ith position if xi ∈ S, and a 0 in the ith position if xi /∈ S, for eachi ∈ {1, 2, . . . , n}.

For example, when n = 5, the set {x1, x3, x4} is paired up with the sequence(1, 0, 1, 1, 0).

It is easy to see that this is a one-to-one correspondence. We already knowthat the number of length-n sequences we can make using elements of {0, 1} is2n. So the number of subsets of X is also 2n.

As in section 1.2, there is a one-to-one correspondence between length-n se-quences of 0’s and 1’s, and functions fromX to {0, 1}: simply pair up the sequence(ε1, . . . , εn) ∈ {0, 1}n with the function

xi 7→ εi.

This suggests a way of rewriting proof 2, using an explicit one-to-one correspon-dence between subsets of X and functions from X to {0, 1}.

Proof 3. We observe that there is a one-to-one correspondence between subsets ofX and functions from X to the set {0, 1}. Indeed, if A ⊂ X, let χA : X → {0, 1}denote the function with χA(x) = 1 if x ∈ A, and χA(x) = 0 if x /∈ A. It is easyto see that A ↔ χA is a one-to-one correspondence. We already know that thenumber of functions from X to {0, 1} is 2n, so the number of subsets of X is also2n.

Remark 5. The function χA defined in proof 3 is called the characteristic func-tion of the subset A. The characteristic function is a very useful object indeed,as we will see later.

We now come to a very useful tool: the binomial theorem.

Theorem 3 (The Binomial Theorem). If n is any positive integer, then

(x+ y)n =n∑k=0

(n

k

)xkyn−k.


Proof. Consider (x+ y)n as a product of n factors B1.B2. · · · .Bn, where

B1 = B2 = · · · = Bn = (x+ y).

To get a term xkyn−k in this product, we need to choose an x from exactly k of thefactors B1, B2, . . . , Bn, and a y from the remaining n− k factors. The number ofways of doing this is just the number of k-element subsets of {1, 2, . . . , n}, whichis (

n

k

).

Hence in the expansion of the product there are exactly(nk

)terms xkyn−k. In

other words, the coefficient of xkyn−k is(nk

).

Corollary 4. If m is any positive integer, thenn∑k=0

(n

k

)= 2n.

Proof 1. Put x = y = 1 in the Binomial Theorem.

Proof 2. Let X be an n-element set. The number of subsets of X is 2n, and thenumber of k-element subsets of X is

(nk

), for each k ∈ {0, 1, 2, . . . , n}. So

n∑k=0

(n

k

)= 2n.

as required.

Exercise 2. (i) Use the binomial theorem to show that if X is an n-elementset, then the number of even-sized subsets of X is equal to the number ofodd-sized subsets of X.

(ii) When n is odd, this can also be proved using a bijection: pair up the subsetA with Ac — check that this works.

(iii) When n is even, the bijection above does not work. Find another bijectionwhich works for both n even and n odd. (This is Exercise 6 in Assignment1.)

Exercise 3. Write down the first 7 rows of Pascal’s triangle. (Recall that weconstruct Pascal’s triangle by starting with

1

1 1

1 2 1

1 1

1 1

. .. . . .


and then writing in each space the sum of the two numbers above that space,working down the triangle row by row.) What is the number in row n and spacek (from the left)? Write down the identity (in terms of binomial coefficients) youused to construct Pascal’s triangle. Now prove it:

(i) Directly, from the formula (1.2);

(ii) By counting subsets in two different ways.

Example 9. Let n and k be positive integers with 1 ≤ k ≤ n. How many length-k sequences of non-negative integers are there which sum to n? In other words,how many sequences (x1, x2, . . . , xk) of non-negative integers have

∑ki=1 xi = n?

There will be small prizes for correct answers (with proofs) of this at thebeginning of Lecture 3, provided I am reasonably convinced that it is your ownwork!

Answer: there is a one-to-one correspondence (a bijection) between solutionsto this equation, and diagrams containing n stars and k− 1 bars in a row. Givena sequence (x1, . . . , xk) of non-negative integers with x1 + . . . + xk = n, we pairit up with a diagram as follows. We first place k vertical bars in a row, and thenwe place xi stars between the (i−1)th bar and the ith bar, for i = 1, 2, . . . , k−1.(So we place x1 stars before the first bar, and xk stars after the (k − 1)th bar.)For example, when n = 5 and k = 4, the solution (1, 2, 2, 0) corresponds to thediagram

∗ | ∗ ∗ | ∗ ∗ |

(Check that this defines a bijection.) How many such diagrams are there? Theanswer is,

(n+k−1k−1

). Why? The diagram is a row of n+k−1 symbols, and we must

choose k−1 of the symbols to be bars. (The rest are stars.) The number of waysof choosing which symbols are to be bars is equal to the number of (k−1)-elementsubsets of an (n+ k − 1)-element set, which is

(n+k−1k−1

). Hence

number of sequences = number of diagrams =

(n+ k − 1

k − 1

).

More complicated counting

Examples 1, 5, 6 and 8 can be seen as applications of the following simple prin-ciple.

Principle 3 (The Multiplication Principle). Suppose that a finite set X is de-scibed to us, and we want to find |X|, the number of elements in X. Suppose wecan generate the elements of X using a process consisting of k steps such that:

(i) The number of possible choices at the ith step is ti, and this number isindependent of which choices we made in the previous stages;


(ii) Each element of X is produced by exactly one sequence of choices. (So ifwe change the choice at any stage of the process, we get a different elementof X.)

Then |X| = t1t2 · · · tk.

In Example 1, we have ti = n for all i with 1 ≤ i ≤ k, and in Example 5, wehave ti = n− i+ 1 for all i with 1 ≤ i ≤ k.

Here is an example where we cannot use the multiplication principle straightaway.

Example 10. If n ≥ 5, how many k-element subsets of {1, 2, . . . , n} contain atleast 3 elements of the set {1, 2, 3, 4, 5}?

Answer: Let F be the family of k-element subsets of {1, 2, . . . , n} containingat least 3 elements of the set {1, 2, 3, 4, 5}. We want to find |F| by generating thesets S ∈ F using a sequence of choices, generating each set in F exactly once.

The natural thing to do is to generate S in the following sequence of steps:

Step 1: Choose the number of elements of {1, 2, 3, 4, 5} that S contains.

Step 2: Choose exactly which elements of {1, 2, 3, 4, 5} S contains.

Step 3: Choose exactly which elements of {6, 7, . . . , n} S contains.

Obviously, in Step 1, there are 3 choices for the number of elements of{1, 2, 3, 4, 5} which S contains: 3,4 or 5. However:

• If we choose ‘3’ in Step 1, then at Step 2, there are(

53

)= 10 possible choices

for which 3 elements of {1, 2, 3, 4, 5} S contains (we just have to choose a3-element subset of {1, 2, 3, 4, 5}). For each such choice, there are

(n−5k−3

)possible choices at Step 3 for which elements of {6, 7, . . . , n} S contains.(We just have to choose a (k − 3)-element subset of {6, 7, . . . , n}.) So thetotal number of possibilities in this case is(

5

3

)(n− 5

k − 3

)= 10

(n− 5

k − 3

).

• If we choose ‘4’ in Step 1, then at Step 2, there are(

54

)= 5 possible choices

for which 4 elements of {1, 2, 3, 4, 5} S contains (we just have to choose a4-element subset of {1, 2, 3, 4, 5}). For each such choice, there are

(n−5k−4

)possible choices at Step 3 for which elements of {6, 7, . . . , n} S contains.(We just have to choose a (k − 4)-element subset of {6, 7, . . . , n}.) So thetotal number of possibilities in this case is(

5

4

)(n− 5

k − 4

)= 5

(n− 5

k − 4

).


• If we choose ‘5’ in Step 1, then at Step 2, there is just(

55

)= 1 choice

for which elements of {1, 2, 3, 4, 5} S contains (S must contain all 5 ofthem). There are then

(n−5k−5

)possible choices at Step 3 for which elements

of {6, 7, . . . , n} S contains. (We just have to choose a (k−5)-element subsetof {6, 7, . . . , n}.) So the total number of possibilities in this case is(

5

5

)(n− 5

k − 5

)=

(n− 5

k − 5

).

Here, the number of possible choices at Step 2 and Step 3 depend upon the choiceat Step 1, so we cannot use the multiplication principle straight away.

However, observe that once we have made the choice at Step 1, the numberof possible choices at Step 3 does not depend upon the choice at Step 2 — onlyupon the choice at Step 1. So, to calculate the total number of possible choicesafter Step 1, we can use the multiplication principle. So the right thing to do isto sum over all the choices in Step 1: the total number of possible choices is(

5

3

)(n− 5

k − 3

)+

(5

4

)(n− 5

k − 4

)+

(5

5

)(n− 5

k − 5

)= 10

(n− 5

k − 3

)+5

(n− 5

k − 4

)+

(n− 5

k − 5

).

We have generated each set in F exactly once, so

|F| = 10

(n− 5

k − 3

)+ 5

(n− 5

k − 4

)+

(n− 5

k − 5

).

Another way of what we are explaining what we are doing is partitioning Faccording to the number of elements of {1, 2, 3, 4, 5} a set contains: if we let Fibe the family of k-element subsets of {1, 2, . . . , n} containing exactly i elementsof {1, 2, 3, 4, 5}, then we are saying that

|F| =5∑i=3

|Fi| = |F3|+ |F4|+ |F5| = 10

(n− 5

k − 3

)+ 5

(n− 5

k − 4

)+

(n− 5

k − 5

).

Counting sets by partitioning

Recall that ifX is a set, andX1, . . . , Xk are subsets ofX, we say that {X1, . . . , Xk}is a partition of X if:

• the Xi’s are pairwise disjoint (meaning that Xi ∩Xj = ∅ for all i 6= j), and

• X = ∪ki=1Xi.

Partioning is often useful for counting. Often, when we want to work outthe size of a set X, we cannot apply the multiplication principle to X straightaway. But even then, we can sometimes still find a partition of X into disjoint


sets X1, X2, . . . , Xk such that we can apply the multiplication principle to eachXi, separately. We then have

|X| =k∑i=1

|Xi|,

enabling us to count X. This is what happens in Example 10.

1.4 The inclusion-exclusion principle

If A1, . . . , An are finite sets which are all disjoint from one another, it is easy tocalculate the size of their union: we simply have

|A1 ∪ A2 ∪ . . . ∪ An| = |A1|+ |A2|+ . . .+ |An|.

Often, we want to calculate the size of the union of n sets which are not all disjointfrom one another. The inclusion-exclusion formula gives us a way of doing thisin terms of intersections.

It is easy to see that if A1, A2 are subsets of a finite set X, then

|A1 ∪ A2| = |A1|+ |A2| − |A1 ∩ A2|.

Equivalently, taking complements,

|X \ (A1 ∪ A2)| = |X| − |A1| − |A2|+ |A1 ∩ A2|

The inclusion-exclusion formula generalises this to k arbitrary subsets.

Theorem 5 (The inclusion-exclusion formula). Let X be a finite set.

Let A1, A2, . . . , An be subsets of X. Then∣∣∣∣∣n⋃i=1

Ai

∣∣∣∣∣ = |A1 ∪ A2 ∪ · · · ∪ An|

= |A1|+ |A2|+ · · ·+ |An|− (|A1 ∩ A2|+ |A1 ∩ A3|+ · · ·+ |An−1 ∩ An|)+ (|A1 ∩ A2 ∩ A3|+ · · ·+ |An−2 ∩ An−1 ∩ An|)− . . .+ (−1)n−1|A1 ∩ A2 ∩ · · · ∩ An|

=∑

I⊆{1,2,...,n}:I 6=∅

(−1)|I|−1

∣∣∣∣∣⋂i∈I

Ai

∣∣∣∣∣

1.4. THE INCLUSION-EXCLUSION PRINCIPLE 17

Equivalently, taking complements, we have∣∣∣∣∣X \(

n⋃i=1

Ai

)∣∣∣∣∣ = |X| − (|A1|+ |A2|+ · · ·+ |An|)

+ (|A1 ∩ A2|+ |A1 ∩ A3|+ · · ·+ |An−1 ∩ An|)− (|A1 ∩ A2 ∩ A3|+ · · ·+ |An−2 ∩ An−1 ∩ An|). . .+ (−1)n|A1 ∩ A2 ∩ · · · ∩ An|

=∑

I⊆{1,2,...,n}

(−1)|I|

∣∣∣∣∣⋂i∈I

Ai

∣∣∣∣∣ . (1.3)

(Note that the term in the above sum where I = ∅ is |X|; by convention, theintersection of no subsets of X is simply X.)

For example, if A1, A2, A3 are arbitrary subsets of a finite set X, then theabove becomes

|A1 ∪A2 ∪A3| = |A1|+ |A2| − |A1 ∩A2| − |A1 ∩A3| − |A2 ∩A3|+ |A1 ∩A2 ∩A3|

and

|X \ (A1 ∪ A2 ∪ A3)| = |X| − |A1| − |A2| − |A3|+ |A1 ∩ A2|+ |A1 ∩ A3|+ |A2 ∩ A3| − |A1 ∩ A2 ∩ A3|.

Proof. (Non-examinable.) Notice the similarity between the above formula for|X \ (∪ni=1Ai)|, and the equation of polynomials

(1−X1)(1−X2) . . . (1−Xn) =∑

I⊂{1,...,n}

(−1)|I|∏i∈I

Xi. (1.4)

For any set S ⊂ X, write χS for its characteristic function, defined by

χS : X → {0, 1};

χS(x) =

{1 if x ∈ S;

0 if x /∈ S.

Observe that for any set S ⊂ X, we have

|S| =∑x∈X

χS(x).

This equation enables us express the size of a set as a sum of values of a function,which can then be analysed using (1.4).


Let B = X \ (∪ni=1Ai). First, observe that

χB =n∏i=1

(1− χAi).

Second, observe that

n∏i=1

(1− χAi(x)) =∑

I⊂{1,...,n}

(−1)|I|∏i∈I

χAi(x) ∀x ∈ X,

by substituting Xi = χAi(x) (which is just some real number) into the equation(1.4). In other words,

n∏i=1

(1− χAi) =∑

I⊂{1,...,n}

(−1)|I|∏i∈I

χAi ,

as real-valued functions on X. Therefore,

χB =n∏i=1

(1− χAi) =∑

I⊂{1,...,n}

(−1)|I|∏i∈I

χAi .

(Here, all equalities are between real-valued functions on X.) But note that forany subset I ⊂ {1, 2, . . . , n}, we have∏

i∈I

χAi = χ(∩i∈IAi).

(This is another useful property of characteristic functions: the characteristicfunction of an intersection of sets is equal to the product of all their characteristicfunctions. Check this!) Therefore,

χB =∑

I⊂{1,...,n}

(−1)|I|∏i∈I

χAi =∑

I⊂{1,...,n}

(−1)|I|χ(∩i∈IAi). (1.5)

Hence, summing over all x ∈ X gives

|B| =∑

I⊆{1,2,...,n}

(−1)|I|

∣∣∣∣∣⋂i∈I

Ai

∣∣∣∣∣ .This proves the second version of the inclusion-exclusion formula. The first ver-sion follows from taking complements: we have

|n⋃i=1

Ai| = |X| − |B| =∑

∅6=I⊆{1,2,...,n}

(−1)|I|−1

∣∣∣∣∣⋂i∈I

Ai

∣∣∣∣∣ .

1.4. THE INCLUSION-EXCLUSION PRINCIPLE 19

Example 11. How many primes are there between 1 and 100?

Answer: let X = {1, 2, . . . , 100}. Suppose x ∈ X is not prime. Then we maywrite x = yz, where y is prime and y < z. Hence, y <

√100 = 10, so y = 2, 3, 5

or 7.Now let Ai = {x | 1 ≤ x ≤ 100 and i divides x}, for i = 2, 3, 5, 7. Then the

set of all primes in X isX \ {1} \ ⋃i∈{2,3,5,7}

Ai

∪ {2, 3, 5, 7}.Now we compute the sizes of all the intersections of the Ai.

|A2| = b100

2c = 50

|A3| = b100

3c = 33

|A5| = b100

5c = 20

|A7| = b100

7c = 14

|A2 ∩ A3| = b100

6c = 16

|A2 ∩ A5| = b100

10c = 10

|A2 ∩ A7| = b100

14c = 7

|A3 ∩ A5| = b100

15c = 6

|A3 ∩ A7| = b100

21c = 4

|A5 ∩ A7| = b100

35c = 2

|A2 ∩ A3 ∩ A5| = b100

30c = 3

|A2 ∩ A3 ∩ A7| = b100

42c = 2

|A2 ∩ A5 ∩ A7| = b100

70c = 1

|A3 ∩ A5 ∩ A7| = b100

105c = 0

|A2 ∩ A3 ∩ A5 ∩ A7| = b100

210c = 0

Here, we are using the fact that if p1, . . . , pl are distinct primes, then

Ap1 ∩ Ap2 ∩ . . . ∩ Apl = Ap1p2···pl ,


which follows from the Fundamental Theorem of Arithmetic.So, by the inclusion exclusion formula, we have

|A2 ∪ A3 ∪ A5 ∪ A7| = (50 + 33 + 20 + 14)− (16 + 10 + 7 + 6 + 4 + 2) + (3 + 2 + 1)= 117− 45 + 6 = 78.

Hence the number of primes in X is 100− 78− 1 + 4 = 25.The following corollary to the inclusion-exclusion formula is useful when all

the Ai’s ‘look the same’.

Corollary 6. Let X be a finite set. Suppose that A1, A2, . . . , An are subsets ofX, and assume that for every j with 1 ≤ j ≤ n and for every I ⊆ {1, 2, . . . , n}with |I| = j we have ∣∣∣∣∣⋂

i∈I

Ai

∣∣∣∣∣ = aj.

(Note that this means a0 = |X|, as the intersection of no sets is understood to beX.)

Then ∣∣∣∣∣n⋃i=1

Ai

∣∣∣∣∣ =n∑j=1

(−1)j−1

(n

j

)aj,

or equivalently, ∣∣∣∣∣X \(

n⋃i=1

Ai

)∣∣∣∣∣ =n∑j=0

(−1)j(n

j

)aj.

Proof. We will prove the second version. If |I| = j, then the contribution fromI to the sum in the inclusion-exclusion formula (2.4) is (−1)jaj. Adding this upover all the sets of size j gives

(nj

)(−1)jaj. Finally adding up over all j gives the

result. Again, the first version follows by taking complements.

1.5 Counting surjections

Suppose S and T are sets with |S| = k and |T | = n. Recall that the number offunctions from S to T is nk. (This follows from Example 1.) We also saw thatthe number of injections from S to T is n(n − 1) · · · (n − k + 1) (which is 0 ifk > n) — see Remark 4. But we did not count the number of surjections. Let’sdo this now.

We can count the number of surjections using the inclusion-exclusion formula.Let S be a k-element set and let T be an n-element set. Without loss of

generality, we may assume that T = {1, 2, . . . , n}. Let F denote the set of allfunctions from S to {1, 2, . . . , n}. Let

Ai = {f ∈ F : i /∈ f(S)}

1.5. COUNTING SURJECTIONS 21

denote the set of all functions in F whose image does not contain i. A surjectionis precisely a function which does not lie in any of the sets Ai, so the set ofsurjections is

F \

(n⋃i=1

Ai

).

We can give calculate the size of this set using the inclusion-exclusion formula.Let I ⊂ {1, 2, . . . , n} with |I| = j. The intersection C =

⋂i∈I Ai is simply the

set of all functions in X whose image does not contain any i ∈ I. There is anobvious one-to-one correspondence between C and the set of all functions from Sto {1, 2, . . . , n}\I. Therefore, the number of functions in C is (n−|I|)k = (n−j)k,the same as the number of functions from S to {1, 2, . . . , n} \ I, and therefore∣∣∣∣∣⋂

i∈I

Ai

∣∣∣∣∣ = (n− j)k.

This depends only on |I| = j, so we can use the version of the inclusion-exclusionformula in Corollary 6, with aj = (n− j)k, giving:

number of surjections from S to T =

∣∣∣∣∣F \(

n⋃i=1

Ai

)∣∣∣∣∣ =n∑j=0

(−1)j(n

j

)(n− j)k.

Note that the term with j = n is zero (there are no functions from S to{1, 2, . . . , n} whose image contains none of {1, 2, . . . , n}), so we can rewrite thisas

number of surjections from S to T =

∣∣∣∣∣F \(

n⋃i=1

Ai

)∣∣∣∣∣ =n−1∑j=0

(−1)j(n

j

)(n− j)k.

Example 12. How many ways are there to choose 3 teams (an A-team, a B-teamand C-team) from a class of 7 children? (Each team must have at least one childin it, and the names of the teams are important, so swapping the children in theA-team with the children in the B-team produces a different choice.)

Answer: this is simply the number of surjections from the set of children toT = {A,B,C}, the set of team-names. Hence, we simply apply the above formulawith k = 7 and n = 3, giving

number of ways =2∑j=0

(−1)j(

3

j

)(3− j)7

=

(3

0

)· 37 −

(3

1

)· 27 +

(3

2

)· 17

= 37 − 3 · 27 + 3

= 1806.


1.6 Permutations and derangements

Recall that a permutation of a set X is defined to be a bijection from X to itself.We saw before that if X is an n-element set, then the number of permutations ofX is

n! := n(n− 1) . . . (2)(1).

When studying permutations, the names of the elements of the set X do notmatter, so from now on, we will work with permutations of the set {1, 2, . . . , n}.We write Sn for the set of all permutations of {1, 2, . . . , n}. If f ∈ Sn, we canwrite f as a 2× n matrix, as follows:(

1 2 . . . nf(1) f(2) . . . f(n)

).

This is known as the two-line notation for permutations. For example, whenn = 3, (

1 2 33 1 2

)represents the permutation which sends 1 to 3, 2 to 1, and 3 to 2.

Remark 6. The set Sn is a group under the binary operation of composition ofpermutations. (Check that it satisfies the group axioms: closure, associativity,identity and inverses.) We call it the symmetric group on {1, 2, . . . , n}.

We can also write a permutation in disjoint cycle notation. We do this byexample. Consider the permutation

f =

(1 2 3 4 5 64 1 6 2 5 3

)∈ S6.

1.6. PERMUTATIONS AND DERANGEMENTS 23

We can represent f diagramatically as follows. Place 6 points in the plane,label them with the numbers 1 to 6, and draw an arrow from i to f(i) for eachi ∈ {1, 2, 3, 4, 5, 6}:

1

6

5

42

3

This produces a set of disjoint cycles, in which each of the numbers occursexactly once. We now list these cycles in any order. Choose of the cycles (saythe top one), and write it out as a sequence, starting at any point (1 say):

(1 4 2).

Now choose any of the other cycles (say the second one), and write it out asa sequence, starting at any point (3 say):

(1 4 2)(3 6).

Do the same with the last cycle:

(1 4 2)(3 6)(5).

This is a disjoint cycle representation for this permutation. Each cycle is alist of iterates of the permutation: f sends 1 to 2, 2 to 4, and 4 to 1, it sends 3to 6 and 6 to 3, and it sends 5 to itself.

Fact. It is easy to see that for any permutation, this process always produces alist of disjoint cycles, in which each number in {1, 2, . . . , n} occurs exactly once.

Exercise 4. Check this fact!

The disjoint cycle representation of a permutation is not unique, as we canchoose what order we list the cycles in, and we can choose where to start eachcycle. (These are the only choices we have, however.)

As an example, we could have represented the permutation above as

(6 3)(5)(4 2 1),


if we had first chosen to start with 6, and then with 5, and then with 4.The cycles of length 1 in a disjoint cycle representation are the fixed points

of the permutation. When n is given beforehand, some authors abbreviate thedisjoint cycle notation by leaving out the cycles of length 1 from the disjoint cyclerepresentation. So the disjoint cycle representation above would be abbreviatedto:

(1 4 2)(3 6).

However, in this course, you should write out the cycles of length 1 as well, forclarity.

To compute a disjoint cycle representation without drawing the picture above,we start by writing down any number, say 1, and then we write down the iteratesof the permutation until we get back to 1 again. We get the cycle:

(1 4 2).

If we have written down all the numbers, we stop. Otherwise (as in this case),we pick another number, say 3, and repeat the process. We now have two cycles:

(1 4 2)(3 6).

We repeat this process until we have written down all the numbers. We end upwith a list of cycles, in this case

(1 4 2)(3 6)(5).

How do we find the number of permutations with cycles of given lengths?Let’s start with a simple example.

Example 13. How many permutations of {1, 2, 3, 4, 5, 6} are cycles of length 6(‘6-cycles’)?

Answer: we can produce a 6-cycle (in disjoint cycle notation) by first choosingan ordering of the set {1, 2, 3, 4, 5, 6} (this produces an ordered sequence contain-ing each element of {1, 2, 3, 4, 5, 6} exactly once), e.g.

(3, 2, 1, 6, 4, 5)

and then turning it into a cycle:

(3 2 1 6 4 5).

There are 6! choices for the sequence (the same number as the number of permu-tations of {1, 2, 3, 4, 5, 6}!!), but notice that the permutations

(3 2 1 6 4 5),

(2 1 6 4 5 3),

(1 6 4 5 3 2),

(6 4 5 3 2 1),

(4 5 3 2 1 6),

(5 3 2 1 6 4)


are all the same: 6 different sequences produce the same permutation. (The cycleis the same, whichever number you choose to write at the start.) In general,there are exactly 6 different ways of writing a cycle of length 6 (you just have tochoose which number to write at the start), so the process above produces eachpermutation exactly 6 times. Therefore,

6× number of 6-cycles in S6 = 6!

sonumber of 6-cycles in S6 = 6!/6 = 5! = 120.

Exercise 5. In exactly the same way, show that the number of permutations inSn which are n-cycles, is (n− 1)!.

Now let’s do a slightly harder example.

Example 14. How many permutations of {1, 2, 3, 4, 5, 6} are there with two cy-cles of length 3?

We can produce these permutations by first choosing an ordering of the set{1, 2, 3, 4, 5, 6}, e.g

(3, 2, 5, 4, 1, 6)

and then turning it into a permutation by bracketing the first three numberstogether, and then bracketing the last three numbers together, so the aboveexample produces

(3 2 5)(4 1 6).

As always, there are 6! choices for the ordering, but how many times do weproduce each permutation? The answer is, each permutation is produced exactly

3× 3× 2 = 18

times. Why? for any of the above permutations, we can represent it in

3× 3× 2!

ways: there are 3 choices for where to start the first 2-cycle, 3 choices for whereto start the second 2-cycle, and 2! choices for the order of the two 2-cycles. So

number of the above permutations× 3× 3× 2 = 6!,

and therefore

number of the above permutations =6!

3× 3× 2= 40.

Permutations with no fixed points are called derangements, and are quiteuseful in various parts of mathematics and computer science. It is easy to read offthe number of fixed points of a permutation f from a disjoint cycle representationof f : it is just the number of cycles of length 1.

Consider the following puzzle.


Example 15. There are 100 guests at a party. When they arrive, they all turnout to have identical coats. They put them in a big pile altogether, and the coatsget thoroughly mixed up. When the guests leave, one by one, each is equally likelyto pick up any of the remaining coats. What is the probability that no-one leaveswith their own coat?

If we label the guests with the numbers from 1 to 100, and we (invisibly) labeleach person’s coat with the same number, then the matching between guests andcoats after the party defines a permutation f of {1, 2, . . . , n}: if person i takeshome coat j, we define f(i) = j. The condition that each guest is equally likelyto pick up any of the remaining coats, means that f is equally likely to be anyof the n! permutations of {1, 2, . . . , n}. (Think about this for a moment, to seewhy.)

Notice that person i leaves with their own coat if and only if the permutationf fixes i. So the probability that no-one leaves with their own coat is simply

number of derangements of {1, 2, . . . , 100}total number of permutations of {1, 2, . . . , 100}

.

Writing down all the permutations in S100 in disjoint cycle notation, andcounting how many have no fixed points, would take far too much paper: thereare more than 9 × 10157 permutations in S100, and there are only around 1087

particles in the universe!

In order to answer this question, we want to find a way of counting (or esti-mating) the number of permutations of {1, 2, . . . , n} which are derangements, forgeneral n.

In fact, we can use the inclusion-exclusion formula. If we let X be Sn, the setof all permutations of {1, 2, . . . , n}, and we let Ai be the set of all permutationsof {1, 2, . . . , n} which have i as a fixed point, the set of all derangements of{1, 2, . . . , n} is

X \

(n⋃i=1

Ai

).

We want a formula for the size of this set. To apply inclusion-exclusion, we mustcalculate ∣∣∣∣∣⋂

i∈I

Ai

∣∣∣∣∣ ,for each subset I ⊂ {1, 2, . . . , n}. The set ∩i∈IAi is simply the set of permutationsof {1, 2, . . . , n} which fix every number in the set I. Therefore, it’s in one-to-onecorrespondence with the set of all permutations of {1, 2, . . . , n} \ I, so it has size(n − |I|)!. Since this just depends on |I|, we can again apply Corollary 6, this


time with aj = (n− j)!, to get:

number of derangements of {1, 2, . . . , n} =

∣∣∣∣∣X \(

n⋃i=1

Ai

)∣∣∣∣∣=

n∑j=0

(−1)j(n

j

)(n− j)!

=n∑j=0

(−1)jn!

(n− j)!j!(n− j)!

=n∑j=0

(−1)jn!

j!

= n!n∑j=0

(−1)j

j!

Therefore,

number of derangements of {1, 2, . . . , n}total number of permutations of {1, 2, . . . , n}

=n!∑n

j=0(−1)j

j!

n!=

n∑j=0

(−1)j

j!.

Now does this series remind you of anything? What would we get if we tookthe sum to infinity? We would get

∞∑j=0

(−1)j

j!,

which is one of the expansions of 1/e. This suggests that the number of permu-tations of {1, 2, . . . , n} which are derangements is approximately n!/e. But howgood is this approximation?

The error in this approximation is∣∣∣∣∣n!

e− n!

n∑j=0

(−1)j

j!

∣∣∣∣∣ =

∣∣∣∣∣n!∞∑j=0

(−1)j

j!− n!

n∑j=0

(−1)j

j!

∣∣∣∣∣=

∣∣∣∣∣∞∑

j=n+1

(−1)jn!

j!

∣∣∣∣∣=

∣∣∣∣∣∞∑

j=n+1

(−1)j

(j) . . . (n+ 1)

∣∣∣∣∣Since

∞∑j=n+1

(−1)j

(j) . . . (n+ 1)


is a sum of terms of decreasing absolute value and alternating signs, it mustconverge, and its absolute value is at most the absolute value of the first term.(This rule is known as the alternating series test.) Hence,∣∣∣∣∣

∞∑j=n+1

(−1)j

(j) . . . (n+ 1)

∣∣∣∣∣ ≤ 1

(n+ 1)< 1

2for n ≥ 2.

It follows that the number of permutations of {1, . . . , n} which are derangementsis n!/e to the nearest integer — the best possible approximation we could possiblyhope for! (The calculation above only works for n ≥ 2, but it’s easy to see thatthe statement holds for n = 1 as well.) So we see that, with an astonishinglyhigh degree of accuracy, the proportion of permutations of {1, 2, . . . , n} which arederangements is 1/e, and

Probability[None of the 100 guests leaves with their own coat] =[100!/e]

100!.

(Here, if x is a real number, [x] denotes the integer nearest to x, rounded downif x is of the form m+ 1/2 for some integer m.)

Some group-theoretic aspects of permutations

For studying more group-theoretic aspects of permutations, disjoint cycle nota-tion is very useful. In this section, when n is understood, we will sometimesabbreviate the disjoint cycle notation for permutations in Sn, by missing out thefixed points (the cycles of length 1). For example, if

f =

(1 2 3 4 53 2 5 4 1

),

then we abbreviate the disjoint cycle notation

f = (1 3 5)(2)(4) ∈ S5

to

f = (1 3 5).

The identity permutation (which consists only of 1-cycles) is often abbreviatedto Id.

Recall that the set Sn of all permutations of {1, 2, . . . , n} is a group underthe multiplication operation of composition of permutations. If f, g ∈ Sn, thecomposition gf is defined as follows.

(gf)(i) = g(f(i)) (i ∈ {1, 2, . . . , n}).


(We compose starting from the right — first do f , then do g.) It is easy to checkthat gf is a permutation.

Observe that if f ∈ Sn has disjoint cycle notation

f = ClCl−1 . . . C2C1,

where C1, . . . , Cl are cycles, then f is simply the product (composition) of cyclicpermutations

f = clcl−1 . . . c2c1,

where ci is the cyclic permutation which has Ci as a cycle, and which fixes all thenumbers not in Ci. So we can view the disjoint cycle notation as an expressionof a permutation as a composition of cyclic permutations. For example, if

f = (1 2 3)(4 5)(6) ∈ S6

in disjoint cycle notation, then

f = c3c2c1,

where

c1 = (6) = Id = (1)(2)(3)(4)(5)(6),

c2 = (4 5) = (4 5)(1)(2)(3)(6),

c3 = (1 2 3) = (1 2 3)(4)(5)(6).

It is easy to multiply permutations which are written in disjoint cycle notation.We will do this by example. Suppose we wish to find gf , where

f = (1 3 5)(2)(4), g = (1 3 2 5)(4).

Thengf = [(1 3 2 5)(4)][(1 3 5)(2 4)].

Since multiplication is associative, and since we can think of f and g as composi-tions of the cycles in their disjoint cycle notation, this can be viewed as a productof 4 cyclic permutations:

gf = (1 3 2 5)(4)(1 3 5)(2 4);

the four permutations are

(2 4),

(1 3 5),

(4) = Id,

(1 3 2 5),


written here in abbreviated disjoint cycle notation (i.e., missing out the fixedpoints). Since multiplying by the identity permutation has no effect, we canshorten the product above by missing out the identity permutation:

gf = (1 3 2 5)(1 3 5)(2 4).

To obtain a disjoint cycle representation of gf , we begin by choosing any number(say 1), to start the first cycle of gf :

gf = (1 . . .

What is (gf)(1)? Remember that we compose permutations from the right. Thecycle (2 4) takes 1 to 1 (it leaves 1 fixed), then the cycle (1 3 5) takes 1 to 3, andfinally the cycle (1 3 2 5) takes 3 to 2. So (gf)(1) = 2, and so we write 2 next:

gf = (1 2 . . .

What is (gf)(2)? The cycle (2 4) takes 2 to 4, the cycle (1 3 5) takes 4 to 4,and finally the cycle (1 3 2 5) takes 4 to 4. So (gf)(2) = 4. Continuing in thisway, we find that (gf)(4) = 5, (gf)(5) = 3 and (gf)(3) = 1, so we obtain:

gf = (1 2 4 5 3).

Transpositions

Transpositions are the simplest type of permutation: a transposition just swapstwo elements, and leaves all the rest fixed. So, in disjoint cycle notation, atransposition in Sn consists of one 2-cycle, and (n − 2) 1-cycles. In abbreviateddisjoint cycle notation, the transpositions in Sn are

{(i j) : 1 ≤ i < j ≤ n};there are

(n2

)of them.

We have the following

Fact. The transpositions generate Sn as a group, meaning that any permutationin Sn is a product (composition) of a finite number of transpositions.

Since any permutation is a composition of the cycles in its disjoint cycle nota-tion, to prove the above fact, it is enough to prove that any cycle is a composition(product) of a finite number of transpositions. In fact, a k-cycle is a product ofk − 1 transpositions: observe that

(1 2 3 . . . k) = (1 2)(2 3)(3 4) . . . (k − 1 k),

using abbreviated disjoint cycle notation.Using this fact, it is easy to see that any permutation f ∈ Sn is a product of

at most n− 1 transpositions.

Exercise 6. Show that if f ∈ Sn is an n-cycle, then at least n− 1 transpositionsare needed to express f as a product of transpositions. Show that if f ∈ Sn is notan n-cycle, f can be expressed as a product of at most n− 2 transpositions.


The sign of a permutation

We can express the same permutation as products of different numbers of trans-positions — for example, if f = (1 2 3) ∈ S3, then we have

f = (1 2)(2 3) = (2 3)(1 2)(1 3)(2 3).

However, we have the following theorem.

Theorem 7. Let f ∈ Sn. Whenever we write f as a product of transpositions,

f = t1t2 . . . tl,

the number of transpositions l always has the same parity: it is either always evenor always odd.

Proof 1. If f ∈ Sn, we define c(f) to be the number of disjoint cycles in a fulldisjoint cycle representation of f . (Full means that we include cycles of length1.) For example, if f = (12)(3)(4) ∈ S5, then c(f) = 3. We now define

ε(f) = (−1)n−c(f).

Our aim is to show that ε(f) = 1 if f is a product of an even number of transpo-sitions, and ε(f) = −1 if f is a product of an odd number of transpositions.

Observe that if f is a transposition, then ε(f) = −1, since f has n− 1 cyclesin its full disjoint cycle representation. I now make the following

Claim: For any permutation f ∈ Sn and any transposition (p q),

ε(f(p q)) = −ε(f).

Proof of claim: Let f ∈ Sn, and let

f = C1C2 . . . Cl

be a disjoint cycle representation of f . We have two cases to deal with:

Case (i): p and q are in different cycles of f .

Case (ii): p and q are in the same cycle of f .

First, suppose we are in case (i): p and q are in different cycles of f . Notice thatwe can reorder disjoint cycles in a disjoint cycle representation, without changingthe permutation. Therefore, we may assume that p is in Cl−1 and q is in Cl.Suppose Cl−1 = (p x1 x2 . . . xM) and Cl = (q y1 y2 . . . yN), so that

f = C1C2 . . . Cl−2(p x1 x2 . . . xM)(q y1 y2 . . . yN).


We can now write down a disjoint cycle representation for f(p q):

f(p q) = C1C2 . . . Cl−2(p x1 x2 . . . xM)(q y1 y2 . . . yN)(p q)

= C1C2 . . . Cl−2(p y1 y2 . . . yN q x1 x2 . . . xM).

Therefore, f(p q) has one less cycle than f in its disjoint cycle representation, soε(f(p q)) = −ε(f).

Now suppose we are in case (ii): p and q are in the same cycle of f . Sincewe can reorder disjoint cycles in a disjoint cycle representation, without changingthe permutation, we may assume that p and q are both in the cycle Cl. SupposeCl = (p x1 x2 . . . xM q y1 y2 . . . yN), so that

f = C1C2 . . . Cl−1(p x1 x2 . . . xM q y1 y2 . . . yN).

We can now write down a disjoint cycle representation for f(p q):

f(p q) = C1C2 . . . Cl−1(p x1 x2 . . . xM q y1 y2 . . . yN)(p q)

= C1C2 . . . Cl−1(p y1 y2 . . . yN)(q x1 x2 . . . xM).

Therefore, f(p q) has one more cycle than f in its (full) disjoint cycle represen-tation, so again, ε(f(p q)) = −ε(f). This proves the claim.

It now follows (by induction on r) that if f ∈ Sn, and t1, . . . , tr are transpo-sitions, then

ε(ft1t2 . . . tr) = (−1)rε(f).

Therefore, taking f = Id,

ε(t1t2 . . . tr) = (−1)rε(Id) = (−1)r.

Hence, if g is a product of an even number of transpositions, then ε(g) = 1; if gis a product of an odd number of transpositions, then ε(g) = −1. It follows thatno permutation can be written as a product of an even number of transpositionsand also as a product of an odd number of transpositions. This proves thetheorem.

Proof 2 (non-examinable). We now give a more algebraic proof. If P (X1, . . . , Xn)is any polynomial in X1, . . . , Xn with real coefficients, and f ∈ Sn, we define

f(P ) = P (Xf(1), Xf(2), . . . , Xf(n)).

In other words, to produce f(P ), we just take the polynomial P and replaceXi with Xf(i), for each i. This defines an action of Sn on the set of all realpolynomials in X1, . . . , Xn. In particular, for any two permutations f, g ∈ Sn, wehave

(gf)(P ) = g(f(P )). (1.6)


(Obviously, replacing i with f(i) and then replacing f(i) with g(f(i)), for each i,is the same as replacing i with (gf)(i), for each i.)

Now let ∆ be the polynomial

∆ =∏

1≤i<j≤n

(Xi −Xj).

So, if f ∈ Sn, the polynomial f(∆) is defined by

f(∆) =∏

1≤i<j≤n

(Xf(i) −Xf(j)).

Observe that f(∆) = ±∆. This is because, if we write out the pairs

{(f(i), f(j)) : 1 ≤ i < j ≤ n},

then for each (a, b) with 1 ≤ a < b ≤ n, exactly one of (a, b) and (b, a) appears.When (a, b) appears, we get a factor of (Xa −Xb) in f(∆), just as we do in ∆.When (b, a) appears, we get a factor of (Xb−Xa) = −(Xa−Xb) in f(∆), insteadof a factor of Xa −Xb (a sign-change). So

f(∆) = (−1)number of sign changes∆ = ±∆.

If f(∆) = ∆, we say that sign(f) = 1; if f(∆) = −∆, we say that sign(f) = −1.Now observe that if f is a transposition, f(∆) = −∆. To see this, suppose

that f = (p q), with p < q. If neither i nor j is equal to p or q, then Xf(i)−Xf(j) =Xi−Xj (no sign change). If i = p and j 6= q, then Xf(p)−Xf(j) = Xq −Xj (signchange if and only if p < j < q). If i 6= p and j = q, then Xf(i)−Xf(q) = Xi−Xp

(sign change if and only if p < i < q). Finally, if i = p and j = q, thenXf(p) − Xf(q) = Xq − Xp (a sign change). The total number of sign changes is2(q − p− 1) + 1, which is odd, so f(∆) = −∆, as claimed.

The same argument shows that, if (p q) is any transposition, and g ∈ Sn, then

(p q)(g(∆)) = −g(∆).

Therefore, using (1.6),

((p q)g)(∆) = (p q)(g(∆)) = −g(∆).

Induction on l now shows that for any f ∈ Sn, if f is a product of l transpositions,then

f(∆) = (−1)l∆,

sosign(f) = (−1)l.

So if sign(f) = 1, then l is always even; if sign(f) = −1, then l is always odd.This proves the theorem.


Definition. If f ∈ Sn has sign(f) = 1 (meaning that f is a product of an evennumber of transpositions), we call it an even permutation; if it has sign(f) = −1(meaning that f is a product of an odd number of transpositions), we call it anodd permutation.

Remark 7. The two proofs of Theorem 7 shows that ε(f) = sign(f).

It follows immediately from Theorem 7 that sign is a group homomorphismfrom Sn to the cyclic group ({±1},×), meaning that

sign(fg) = sign(f) sign(g) ∀f, g ∈ Sn.

Therefore, Kernel(sign) (which is the set of all even permutations in Sn) is anormal subgroup of Sn. It is called the alternating group of order n, written An.By the Isomorphism Theorem, we have

Sn/Kernel(sign) ∼= Image(sign) = {±1},

and therefore|Sn|

|Kernel(sign)|= |{±1}| = 2.

So |An| = n!/2. In Exercise Sheet 3, question 9, you are asked to give a bijectionbetween the set of even permutations and the set of odd permutations in Sn; thisgives another proof that |An| = n!/2.

Chapter 2

Recurrence relations &generating series

2.1 Introduction

In combinatorics, we often want to find the solution to a sequence of countingproblems. For example, we calculated the number of permutations of an n-element set, for each positive integer n. This leads us to study sequences,

s1, s2, . . .

where sn, the nth element of the sequence, is the number of objects of a certainkind, which are at ‘level’ n. We call this a combinatorial sequence.

Often, we can express the nth element of a combinatorial sequence in termsof earlier elements, for each n. For example, if sn is the number of orderings of{1, 2, . . . , n} (i.e., the number of permutations of {1, 2, . . . , n}), then sn = nsn−1.This is called a recurrence relation. Recurrence relations are very useful, both forcalculating early values in a sequence, and for proving general formulae.

Let’s see one of the most famous early examples of a recurrence relation.

Example 16. Leonardo Fibonacci was an Italian mathematician of the 13th cen-tury. His most important work was the introduction of the Arabic numerals 0, 1,2, 3, 4, 5, 6, 7, 8, 9 to Europe. In order to show how much easier it is to calculatewith these than with the Roman numerals previously used, he posed the followingproblem as an exercise in his book Liber Abaci (The Book of Calculation):

‘A pair of rabbits do not breed in their first month of life, but at the end ofthe second and every subsequent month they produce one pair of offspring (onemale, and one female). If I acquire a new-born pair of rabbits at the beginning ofthe year, how many pairs of rabbits will I have at the end of the year?’

Answer. Under these conditions, the number of pairs of rabbits after n monthsis called the nth Fibonacci number, Fn. How do we calculate these numbers?

35

36 CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES

We have F0 = 1, since there is one pair of rabbits after 0 months, and F1 = 1,since no breeding takes place in the first month. For each n ≥ 2, we haveFn = Fn−1 + Fn−2, since there are Fn−1 pairs of rabbits at the beginning of thenth month, and only the Fn−2 pairs born before the (n − 1)th month are oldenough to breed at the end of the nth month; each such pair produces exactlyone new pair.

Hence, it takes only 11 additions to calculate the number of rabbits after 12months:

F0 = 1

F1 = 1

F2 = F0 + F1 = 2

F3 = F1 + F2 = 3

F4 = 5

F5 = 8

F6 = 13

F7 = 21

F8 = 34

F9 = 55

F10 = 89

F11 = 144

F12 = 233.

So there are 2× 233 = 466 rabbits after 12 months. This was easy using thenew Arabic numerals, but not so easy using Roman numerals (try it!).

As it well known, the Fibonacci numbers occur as the number of spirals onthe seed heads of many different plants.

The recurrence relation Fn = Fn−1+Fn−2 is an example of a 2-term recurrencerelation: Fn is given in terms of Fn−1 and Fn−2. Before investigating it in moredetail, let’s look at 1-term recurrence relations, the simplest type of recurrencerelation.

2.2 Solving recurrence relations

Example 17. A bacterium reproduces by dividing into 2 identical bacteria afterit has lived for 1 minute. Suppose we start with one of these bacterium. Let xnbe the number of bacteria after n minutes. Find a recurrence relation for xn, andfind xn as a function of n.

Answer: Clearly, we have x0 = 1 and xn = 2xn−1 for all n ≥ 1. It follows(formally, by induction) that xn = 2n for all n ≥ 0. So after an hour, there willbe 260 > 1018 bateria. Yikes!

2.2. SOLVING RECURRENCE RELATIONS 37

The recurrence relation xn = 2xn−1 is an example of a 1-term recurrencerelation.

Example 18. A new kind of bacterium is discovered which reproduces by dividinginto k identical bacteria after it has lived for 1 minute. Suppose we start withs of these bacteria. Let yn be the number of bacteria after n minutes. Find arecurrence relation for yn, and find yn as a function of n.

Answer: clearly, we have y0 = s and yn = kyn−1 for all n ≥ 1. It follows thatyn = skn for all n ≥ 0.

Now let’s look at 2-term recurrence relations.

Example 19. If fn = fn−1 + 2fn−2, f0 = 1, and f1 = 1, find a formula for fn asa function of n.

Inspired by our success for 1-term recurrence relations, let’s try a solution ofthe form fn = tn. Substituting this into the recurrence relation gives:

tn = tn−1 + 2tn−2.

Rearranging,tn − tn−1 − 2tn−2 = 0

Factorizing,tn−2(t− 2)(t+ 1) = 0.

This has solutions t = 0, t = −1, t = 2, so we know that the function

Atn

satisfies the recurrence relation for t = 0,−1 or 2. (You can check this directly.)Since the recurrence relation is linear, any linear combination of these functionsalso satisfies the recurrence relation, so

A(−1)n +B2n

satisfies the recurrence relation, for any real numbers A and B. This gives afamily of solutions; it is called the ‘general solution’. We must now find thecorrect values of A and B to satisfy the initial conditions f0 = 0,f1 = 1. We dothis by substituting these initial conditions into the ‘general solution’

fn = A(−1)n +B2n.

Substituting n = 0, we get0 = A+B;

substituting n = 1 we get1 = −A+ 2B.


To solve this pair of simultaneous equations, we do the usual thing: eliminate Aby adding the two equations together, getting 3B = 1, so B = 1/3. Substitutingthis back into the first equation gives A = −1/3. So

fn = 13(2n − (−1)n)

satisfies the recurrence relation and the two initial conditions. Is it the onlysolution to the problem? Yes, because the problem has exactly one solution: thevalue of f0, the value of f1, and the recurrence relation, tell us exactly what theother fn’s must be. So the solution to the problem is

fn = 13(2n − (−1)n).

Exercise 7. Give an example of a real-world population (like Fibonacci’s rabbits)which satisfies the conditions of Example 19.

Let’s now apply this method to find a formula for the Fibonacci numbers,as a function of n. We substitute Fn = tn into the recurrence relation Fn =Fn−1 + Fn−2. This gives

tn = tn−1 + tn−2.

Rearranging,

tn − tn−1 − tn−2 = 0.

Taking out a factor of tn−2,

tn−2(t2 − t− 1) = 0.

Solving the equation t2 − t − 1 = 0 for t, we get t = 12(1 ±

√5). So we try a

‘general solution’ of the form

Fn = A

(1 +√

5

2

)n

+B

(1−√

5

2

)n

.

Substituting in n = 0 gives

1 = A+B.

Substituting in n = 1 gives

1 = A

(1 +√

5

2

)+B

(1−√

5

2

).

This is just a pair of simultaneous equations we have to solve. To simplify thecalculation, let’s write α = (1 +

√5)/2 and β = (1−

√5)/2. The equations now


become

1 = A+B

1 = Aα +Bβ

⇒ α = Aα +Bα

⇒ α− 1 = B(α− β)

⇒ (√

5− 1)/2 = B√

5

⇒ B = −β/√

5

⇒ A = α/√

5

Hence,

Fn = Aαn +Bβn =1√5

(αn+1− βn+1) =1√5

(1 +√

5

2

)n+1

−

(1−√

5

2

)n+1

satisfies both the recurrence relation and the two initial conditions. As before,there can only be one solution to the problem, and we have found it. So we havesolved Fibonacci’s problem!

Observe that α > 1 and |β| < 1, so the ratio between two consecutive Fi-bonacci numbers satisfies

FnFn−1

=αn+1 − βn+1

αn − βn→ α =

1 +√

5

2as n→∞.

As you probably know, this limit is known as the Golden Ratio. The 16thcentury Franciscan friar Lucia Pacioli thought that the golden ratio had a specialspiritual significance, and called it De Divine Proportione. Some have claimedthat it is the architectural ratio most pleasing to the human eye, but (sadly)there is no statistical evidence for this. The claim that the Parthenon was builtusing the Golden Ratio is also, sadly, wrong! (Look at some photographs!) Butit does occur in some beautiful natural spirals.

Now let’s look at a trickier example.

Example 20. Solve the recurrence relation fn = 4fn−1 − 4fn−2 with the initialconditions f0 = 1, f1 = 4. (I.e., find fn as a function of n.)

In this case putting fn = tn yields the equation tn−2(t−2)2 = 0, with repeatedroot t = 2, so certainly fn = A2n is a solution to the recurrence relation. But nochoice of A satisfies both initial conditions, so it looks like we’re stuck. However,the key thing to notice is that there is actually another solution: fn = n.2n isalso a solution, for in this case

fn − 4fn−1 + 4fn−2 = n.2n − 4(n− 1)2n−1 + 4(n− 2)2n−2

= 2n(n− 2(n− 1) + (n− 2)) = 0


So we try the general solution

fn = (A+Bn)2n.

Now substitute in the initial conditions, to get

f0 = 1 = A

f1 = 4 = 2A+ 2B

⇒ B = 1,

A = 1,

so the solution is fn = (n+ 1)2n.In this course, we will mostly be concerned with linear recurrence relations,

which are of the form

fn =n∑j=1

Aj,nfn−j,

where the coefficients Aj,n are real numbers. A k-term recurrence relation is onethat expresses fn in terms of fn−1, fn−2 . . . , and fn−k alone. A k-term linearrecurrence relation with constant coefficients is of the form

fn =k∑j=1

cjfn−j = c1fn−1 + c2fn−2 + . . .+ cn−kfn−k.

(The coefficients are said to be constant because cj, the coefficient of fn−j, isonly allowed to depend on j.) These are the most important ones for this course,and are the easiest to solve; indeed, we will now see a general method for solvingthem.

A general method for solving k-term linear recurrence re-lations with constant coefficients

We now describe a general method for solving k-term linear recurrence relationswith constant coefficients.

Suppose we want to solve the recurrence relation

fn = c1fn−1 + c2fn−2 + · · ·+ ckfn−k

(for n ≥ k), where c1, . . . , ck are constants, subject to initial values f0 = a0,f1 = a1, . . . , fk−1 = ak−1. (Notice that in order for there to be a unique solution,there will be k initial conditions.)

Step 1: Write down the characteristic equation: this is given by substitutingfn = tn into the recurrence relation and cancelling the factor of tn−k. So in ourcase, it is:

tk − c1tk−1 − c2t

k−2 − · · · − ck−1t− ck = 0.


Now find the roots of the characteristic equation, with their multiplicities. Sup-pose the roots are α1 with multiplicity m1, and α2 with multiplicity m2, and soon, up to αr with multiplicity mr. Then m1 +m2 + · · ·+mr = k.

Step 2: the solutions corresponding to each αi are

(Ai +Bin+ Cin2 + · · ·+ Zin

mi−1)αni .

The number of arbitrary constants in this expression is mi. Putting fn equal tothe sum of all of these, for 1 ≤ i ≤ r, gives an expression with m1+m2+· · ·+mr =k arbitrary constants.

Step 3: substitute in the values f0 = a0, . . . , fk−1 = ak−1 to get k simulta-neous linear equations in k unkowns A1, . . . , Zr. Solve these to get the uniquesolution for fn.

Example 21. Find a formula for fn defined by the recurrence relation fn =3fn−2 + 2fn−3 (for n ≥ 3) and the initial conditions f0 = 2, f1 = 0, f2 = 7.

Answer: if fn = tn, then tn− 3tn−2− 2tn−3 = 0. Cancelling the factor of tn−3,we get the characteristic equation:

0 = t3 − 3t− 2 = 0.

Factorizing this gives

(t+ 1)2(t− 2) = 0

This has roots 2 (with multiplicity 1) and -1 (with multiplicity 2), so the generalsolution is

fn = A2n +B(−1)n + Cn(−1)n.

Substiting n = 0, 1, 2 gives the three simultaneous equations

2 = A+B0 = 2A−B − C7 = 4A+B + 2C

which you solve in the usual way to get A = 1, B = 1, C = 1. Hence, the solutionis

fn = 2n + (n+ 1)(−1)n.

More complicated recurrence relations

We will now see a more complicated example of a recurrence relation.The nth Bell number Bn is defined as the number of partitions of a set with n

elements. The names of the elements do not matter, so we might as well supposeour set is {1, 2, . . . , n}.


If n = 0, then X = ∅, and there is a unique partition of X, namely ∅. Hence,B0 = 1.

If n = 1, then X = {1}, and there is a unique partition of X, namely {{1}},so B1 = 1.

If n = 2, then X = {1, 2}, and there are exactly two partitions of X, onepartition into two pieces and one partition into just one part, that is {{1}, {2}}and {{1, 2}}. So B2 = 2.

When n = 3, we have one partition into a single part, {{1, 2, 3}}, and onepartition into three parts, {{1}, {2}, {3}}, and three partitions into two parts,{{1}, {2, 3}}, {{2}, {1, 3}}, and {{3}, {1, 2}}. Therefore B3 = 5.

Theorem 8. The Bell numbers satisfy the following recurrence relations:

Bn =n∑k=1

(n− 1

k − 1

)Bn−k.

Proof. Let Bn be the set of all partitions of {1, 2, . . . , n}, so that Bn = |Bn|.Now we divide up Bn according to the size of the part of the partition con-

taining n. Let Tk be the set of those partitions π of {1, 2, . . . , n} such that thepart of π which contains n has size k. In symbols,

Tk = {π ∈ Bn : |S| = k, where S is the part of π which contains n}.

Now every partition π of {1, 2, . . . , n} has a unique part S ∈ π such thatn ∈ S, and this part must have some size, between 1 and n inclusive. So

Bn = |Bn| =n∑k=1

|Tk|.

Next we need to work out the size of Tk, for each k. We can pick the partitionsin Tk by a two-stage process: first pick the part of the partition which containsn; then pick the rest of the partition.

Stage 1: We need to pick the set S, of size k, such that n ∈ S. In other words, weneed to pick k−1 more elements of S, from the set {1, 2, . . . , n−1}. Thereare

(n−1k−1

)ways of doing this.

Stage 2: We have already put k elements into one part of the partition, so now wehave to partition the remaining n− k elements. This can be done in Bn−kways.

Therefore, we have

|Tk| =(n− 1

k − 1

)Bn−k.


Hence,

Bn =n∑k=1

|Tk| =n∑k=1

(n− 1

k − 1

)Bn−k,

proving the theorem.

This recurrence relation does not have constant coefficients, and is not k-termfor any fixed k, so we cannot use our earlier methods to solve it. However, it canbe used to compute small Bell numbers relatively quickly.

Example 22. Use the recurrence relation to compute B4 and B5.

Answer:

B4 =

(3

0

)B3 +

(3

1

)B2 +

(3

2

)B1 +

(3

3

)B0

= 5 + 3× 2 + 3 + 1

= 15,

B5 = B4 + 4B3 + 6B2 + 4B1 +B0

= 15 + 20 + 12 + 4 + 1

= 52.

In fact, we can find an explicit formula for Bn, using a completely differentargument, from Probability!

Theorem 9. The Bell numbers are given by the formula

Bn =1

e

∞∑r=0

rn

n!.

Proof. (Non-examinable.) Let Bn denote the set of all partitions of the set{1, 2, . . . , n}. First, I make the following.

Claim. For any real number t, we have

tn =∑π∈Bn

(t)|π|. (2.1)

Here, |π| denotes the number of parts of the partition π, and

(t)r = t(t− 1) . . . (t− r + 1)

denotes the rth falling factorial moment of t; this is defined for all real numberst.


First, I prove (2.1) whenever t is a positive integer. In this case, the left-hand side is the total number of functions from {1, 2, . . . , n} to X, where X isa t-element set. But I can also choose a function f from {1, 2, . . . , n} to X asfollows. First, I choose a partition π = {S1, S2, . . . , Sk} of {1, 2, . . . , n}. Then Ichoose a sequence of k distinct elements of X, (t1, . . . , tk) say, and I put f(i) = tjfor all i ∈ Sj, for each j. In other words, I am choosing k distinct values for thefunction f to take, and then I force it to take the jth value on every numberin the jth part, for each j. The number of ways of choosing the sequence of kdistinct elements of X is simply t(t− 1)(t− 2) . . . (t− k + 1) = (t)k. Hence, thenumber of functions from {1, 2, . . . , n} to X is also equal to∑

π∈Bn

(t)π.

This proves (2.1) whenever t ∈ N. Notice that

tn −∑π∈Bn

(t)|π|

is a polynomial of degree at most n in the variable t. We have shown thatevery positive integer is a root of this polynomial. A polynomial which is notidentically zero, has only finitely many roots, so the above polynomial must bethe zero polynomial. It follows that

tn =∑π∈Bn

(t)|π|

for all real numbers t, proving the claim.It follows that if T is a Poisson random variable with mean 1, then

T n =∑π∈Bn

(T )|π|.

Taking the expectation of both sides, it follows that

E[T n] =∑π∈Bn

E[(T )|π|].

By definition, we have

E[T n] =∞∑r=0

e−1 rn

r!.

Moreover, for any integer k, we have

E[(T )k] =∞∑r=0

e−1r(r − 1) . . . (r − k + 1)1

r!=

1

e

∞∑r=k

1

(r − k)!=

1

e

∞∑l=0

1

l!=e

e= 1.


Hence, we have

∞∑r=0

e−1 rn

r!= E[T n] =

∑π∈Bn

E[(T )|π|] =∑π∈Bn

1 = |Bn|,

proving the theorem.

Example 23. For each positive integer n, let Cn denote the number of possibletriangulations of a convex (n + 2)-sided polygon, by non-intersecting diagonals.(A diagonal is a straight line between two non-adjacent vertices.) For example,we have C1 = 1 and C2 = 2, since there are 2 possible triangulations of a convexquadrilateral. For convenience, we define C0 = 1. The Cn’s are called the Catalannumbers. Find a recurrence relation for Cn in terms of Cn−1, Cn−2, . . . , C0.

Answer: Suppose n ≥ 2. Let the vertices of our (n+ 2)-sided polygon be

v1, v2, . . . , vn+2.

We can generate all the possible triangulations (generating each triangulationexactly once) as follows. Choose a triangle for the side v1vn+2 to be in; say it isin the triangle v1vivn+2, where i ∈ {2, 3, . . . , n+ 1}. After including this triangle,we must finish off the triangulation. How many ways of doing this are there? Ifi = 2, then we just have to triangulate the (n+ 1)-sided polygon v2v3 . . . vn+2v2;there are Cn−1 ways of doing this. Similarly, if i = n + 1, then we just have totriangulate the (n+1)-sided polygon v1v2 . . . vn+1v1; there are Cn−1 ways of doingthis. If 3 ≤ i ≤ n, then we must triangulate the i-sided polygon v1v2 . . . viv1, andthen we must triangulate the (n+ 3− i)-sided polygon vivi+1 . . . vn+2vi; in total,there are CiCn+1−i ways of doing this. So altogether, there are

Cn−1 + C1Cn−2 + C2Cn−3 + . . .+ Cn−3C2 + Cn−2C1 + Cn−1

ways of triangulating the (n+ 2)-sided polygon, so

Cn = Cn−1 + C1Cn−2 + C2Cn−3 + . . .+ Cn−3C2 + Cn−2C1 + Cn−1

= C0Cn−1 + C1Cn−2 + C2Cn−3 + . . .+ Cn−3C2 + Cn−2C1 + Cn−1C0

=n−1∑k=0

CkCn−1−k,

for all n ≥ 2.

Note that this is not a linear recurrence relation; there is no easy way to solveit! To find a formula for the Cn’s, we consider another problem which producesthe same recurrence relation.


Example 24. Let Ln denote the number of (2n)-step paths in the xy-plane whichgo from (0, 0) to (n, n) by moving either right by 1 (from (x, y) to (x+1, y)) or upby 1 (from (x, y) to (x, y + 1)) at each step, and never rise above the line y = x.For example, L1 = 1, and L2 = 2. We define L0 = 1 for convenience. Find arecurrence relation for Ln, and then, using a different argument, find a generalformula for Ln in terms of n.

Answer: Suppose n ≥ 2. Such a path must start off by moving right, from(0, 0) to (1, 0). Suppose it moves above the line y = x− 1 for the first time whenit goes from (i+ 1, i)→ (i+ 1, i+ 1), where i ∈ {1, 2, . . . , n}. The total numberof paths from (1, 0) to (i + 1, i) which never move above the line y = x − 1 isLi, and the total number of paths from (i+ 1, i+ 1) to (n, n) which never moveabove the line y = x is Ln−1−i, so the total number of paths which move abovethe line y = x− 1 for the first time when going from (i+ 1, i)→ (i+ 1, i+ 1), isLiLn−1−i. It follows that

Ln = L0Ln−1+L1Ln−2+L2Ln−3+. . .+Ln−3L2+Ln−2L1+Ln−1L0 =n−1∑k=0

LkLn−1−k,

for all n ≥ 2. So the Ln’s satisfy the same recurrence relation as the Cn’s! SinceL0 = C0 = 1 and L1 = C1 = 1, it follows that Ln = Cn for all n.

Let’s now find a general formula for the Ln’s, without using the recurrencerelation above. To do this, we let Pn denote the set of (2n)-step paths in thexy-plane which go from (0, 0) to (n, n) by moving either right by 1 (from (x, y)to (x + 1, y)) or up by 1 (from (x, y) to (x, y + 1)) at each step, and we let Qndenote the subset of these paths which do rise above the line y = x. Clearly, wehave

Ln = |Pn| − |Qn| ∀n.Notice that

|Pn| =(

2n

n

).

This is because any path in Pn has 2n steps in total, and to choose a path inPn, we just have to choose which n of those 2n steps are steps to the right.The number of ways of doing this is simply the number of n-element subsets of{1, 2, . . . , 2n}, which is

(2nn

).

Now let’s find a formula for |Qn|. We do this by finding a bijection from Qnto a ‘simpler’ set Rn which we know how to count.

For any path q in Qn, let (i, i)→ (i, i+ 1) be the first step at which it movesabove the line y = x. Reflect the portion of the path after (i + 1, i) in the liney = x+ 1. The resulting path r goes from (0, 0) to (n− 1, n+ 1) by taking n− 1steps to the right and n+ 1 steps upwards (in some order), and any path of thisform is obtained from exactly one path in Qn. Hence, the map q 7→ r defines abijection from Qn to the set Rn, where Rn is the set of (2n)-step paths that go


from (0, 0) to (n − 1, n + 1) by taking n − 1 steps to the right and n + 1 stepsupwards (in some order). Notice that

|Rn| =(

2n

n− 1

),

since to choose a path in Rn, we must simply choose n − 1 steps out of 2n inwhich to move right. Therefore,

|Qn| = |Rn| =(

2n

n− 1

).

Hence,

Ln = |Pn| − |Qn| =(

2n

n

)−(

2n

n− 1

)=

(2n

n

)− n

n+ 1

(2n

n

)=

1

n+ 1

(2n

n

).

So

Cn =1

n+ 1

(2n

n

).

We have found a formula for the Catalan numbers!

Bijective proofs

We proved that Ln = Cn by showing that the sequences satisfied the same recur-rence relation and initial conditions. If we know (or suspect) that two sets havethe same size, an elegant way of proving this (without doing any calculations) isto construct a bijection between the two sets. Such proofs are called bijectiveproofs, and are greatly prized by combinatorialists!

In this section, we will show that the Catalan numbers count two other typesof objects, by constructing bijections.

Example 25. In a group, multiplication is always associative:

(a · b) · c = a · (b · c)

for all a, b and c. However, some common multiplication operations are non-associative. For example, the cross-product of vectors in R3 is non-associative:we have

(i× i)× j = 0× j = 0, i× (i× j) = i× k = −j.

Suppose ∗ is a non-associative multiplication operation. In order to make theproduct

a ∗ b ∗ c


well-defined, we must place brackets to indicate the order in which we multiply.There are two ways of bracketing a product of 3 elements:

(a ∗ b) ∗ c , a ∗ (b ∗ c).

Let Mn denote the number of ways of bracketing a product of n + 1 elements.Show that Mn = Cn, the nth Catalan number, by constructing an appropriatebijection.

Answer: let Tn denote the set of all triangulations of a fixed, convex (n+ 2)-gon. Let Mn denote the set of all different ways of bracketing the product

a1 ∗ a2 ∗ a3 ∗ . . . ∗ an+1.

Our aim is to define a function f : Tn →Mn, and to show that it is a bijection.To do this, take a convex (n+ 2)-gon, P say, and label its sides with the symbolsa1, a2, . . . , an+1 in an anticlockwise order starting from any side. Leave the lastside blank. Let T be a triangulation of P . Look at the triangles of T . If anytriangle has two of its sides already labelled and its third side unlabelled, labelthe third side with the product

(label of the first side) ∗ (label of the second side),

where the order of (first side, second side, third side) is anticlockwise.Repeat this process, until you have labelled every line which is a side of a

triangle in the triangulation. The last line to be labelled will be the ‘blank’ sideof P ; the label on this side is a bracketing of the product a1 ∗ a2 ∗ . . . ∗ an+1.We define f(T ) to be this bracketing. For example, replacing a1, . . . , a5 witha, b, c, d, e, we would obtain

There are two things we must check, to be completely rigorous. Firstly, wemust check that this process always actually produces a bracketing (i.e, that thefunction f is well-defined). And secondly, we must check that the function f isactually a bijection.

First of all, we have to be able to start the process, so we must make surethat for any triangulation T , there is one triangle of T which has two sides thatare both labelled sides of P . There are n+ 2 sides and n triangles, so there mustbe at least two triangles (∆1 and ∆2, say), each of which shares two sides withP . There is only one unlabelled side of P , so one of these two triangles (∆1 say)must share two labelled sides of P , ai and ai+1 say. So we can label the third sideof ∆1 as ai ∗ ai+1. We can then replace the two sides ai and ai+1 with the newside ai ∗ai+1, producing a triangulated (n+ 1)-gon, and repeat the above processon the triangulation of the (n+ 1)-gon.

It is easy to see that if, at any stage, we have a choice of two or more triangleswhose remaining side we can label, it does not matter which we choose (we will


a

b

c

d

e

a*b

(a*b)*c

d*e

((a*b)*c)*(d*e)

always get the same bracketing at the end). Moreover, the unlabelled side of Pis the last side to be labelled by this process.

To show that f is a bijection, we will show that it has an inverse, g. Again, takea convex (n+2)-gon, P say, and label its sides with the symbols a1, a2, . . . , an+1 inan anticlockwise order starting from any side. Leave the last side blank. Given abracketing of a1 ∗a2 ∗ . . .∗an, we produce a triangulation of P as follows. Chooseany pair (ai ∗ ai+1) which is bracketed together, and include the triangle withsides ai and ai+1 in the triangulation. Let b be the other side of this triangle. Itremains for us to triangulate the convex (n+1)-gon P ′ produced by replacing thetwo sides ai and ai+1 with the side b. Replace (ai ∗ai+1) with b in the bracketing,and use this new bracketing to triangulate P ′, by repeating the above process.

Example 26. Let Vn denote the set of all sequences of X’s and Y ’s such thatthere are n X’s, n Y ’s, and for any k, the number of Y ’s in the first k terms ofthe sequence never exceeds the number of X’s. Let Vn = |Vn| denote the numberof these sequences. For example, V2 = 2: we have

XXY Y , XY XY.

Show that Vn = Mn, the number of bracketings of a product of length n + 1, byconstructing a bijection between the sets Vn and Mn.

Answer: our bijection f is defined as follows. Given a bracketing of a1 ∗ a2 ∗. . . ∗ an+1, add one left bracket on the far left and one right bracket on the farright. Directly below this new bracketing, write an X underneath every ∗ and aY underneath every right bracket. The sequence of X’s and Y ’s that you get is inVn, since there are n ∗’s, n right brackets, and there cannot be more right bracketsthan ∗’s up to any point in the bracketing. (Each right bracket corresponds to a


unique ∗, the last operation which it encloses, and this ∗ must lie before it.) Forexample, the bracketing

(a1 ∗ a2) ∗ ((a3 ∗ a4) ∗ a5)

produces new bracketing

((a1 ∗ a2) ∗ ((a3 ∗ a4) ∗ a5)),

from which we get the sequence

XYXXYXY Y.

To show that f is a bijection, we will show that it has an inverse, g. Define gas follows. Given a sequence in Vn, write a ∗ below each X and a right bracketbelow each Y . Now place an a1 before the first ∗, and an ai directly after theith ∗, for each i ∈ {1, . . . , n}. How do we choose where the left brackets go? Atsome point in our sequence of ai’s, ∗’s and right brackets, we must have a stringof the form

ai ∗ ai+1)

(There is one more ai than right bracket, and the sequence ends with a rightbracket, so at some point there must be two ai’s separated only by a ∗.) Place aleft bracket just before ai, to produce

. . . (ai ∗ ai+1) . . .

Now draw a box around (ai ∗ ai+1) , and regard it as a single ‘letter’, b say. Wenow have a sequence of n letters,

a1, . . . , ai−1, b, ai+2, ai+3, . . . , an,

and n−1 right brackets. Repeat the above process on the new sequence, to choosewhere the next left bracket goes. Continuing, we eventually obtain a bracketingof a1 ∗ a2 ∗ . . . ∗ an+1, with an extra left bracket on the far left and an extraright bracket on the far right. Deleting the two extra brackets produces our finalbracketing. For example, from the sequence

XXY Y XYXY

we get∗∗))∗)∗),

then

a1 ∗ a2 ∗ a3)) ∗ a4) ∗ a5),

a1 ∗ (a2 ∗ a3)) ∗ a4) ∗ a5),

(a1 ∗ (a2 ∗ a3)) ∗ a4) ∗ a5),

((a1 ∗ (a2 ∗ a3)) ∗ a4) ∗ a5),

(((a1 ∗ (a2 ∗ a3)) ∗ a4) ∗ a5),

((((a1 ∗ (a2 ∗ a3)) ∗ a4) ∗ a5),

2.3. GENERATING SERIES 51

so the final bracketing is

(((a1 ∗ (a2 ∗ a3)) ∗ a4) ∗ a5.

Example 27. Let Ln denote the set of all paths in Example 24, so that Ln = |Ln|.Show that Ln = Vn, by finding a bijection between Ln and Vn.

Answer: there is an obvious bijection, f say: given a sequence of X’s and Y ’s,X means ‘go right by 1’ and Y means ‘go up by 1’.

In the three examples above, we constructed bijections:

Tnf−→Mn

f−→ Vnf−→ Ln.

The composition f ◦ f ◦ f of these bijections, is a bijection from Tn to Ln.This gives an alternative proof that Cn = Ln = 1

n+1

(2nn

), without using recurrence

relations!This is a very beautiful argument, but it requires a certain amount of ingenuity

(or luck!) to get the formula

Ln =1

n+ 1

(2n

n

).

In the next section, we will see how to use generating series to get a formula for thenth term of a sequence like the Catalan numbers, relying only upon a recurrencerelation for the sequence, and a collection of simple tools not requiring any specialingenuity.

2.3 Generating series

We will now see a new way of investigating a combinatorial sequence. As we saidbefore, a lot of combinatorics is about sequences of numbers,

(a0, a1, a2, . . .).

We’ve seen such sequences as

1, 1, 2, 3, 5, 8, 13, 21, 34, . . .

(the Fibonacci numbers), and

1, 1, 2, 6, 24, 120, 720, . . .

(the factorials). A very useful device for investigating a combinatorial sequence,is to take its terms to be the coefficients in a power series,

∞∑n=0

anxn = a0 + a1x+ a2x

2 + a3x3 + . . . .


You’ve encountered things called ‘power series’ in calculus, and maybe inanalysis also. In combinatorics, they have a slightly different meaning. We arenot doing calculus, so we don’t necessarily have to worry about whether a powerseries ‘converges’ or not. For us, a power series is just a way of combining infinitelymany numbers into a single mathematical object — in the words of Herbert S.Wilf, ‘it is a clothesline, on which we hang up the numbers for display’.

For example, if our sequence is the factorials above, then the power series is

∞∑n=0

n!xn = 1 + x+ 2x2 + 6x3 + 24x4 + 120x5 + 720x6 + . . . .

If you remember the ratio test from calculus, you should be able to show that, ifwe view x as a real number, this series only converges when x = 0. The ratio ofsuccessive terms is

(n+ 1)!xn+1

n!xn= (n+ 1)x,

which tends to infinity as n→∞. But this power series is still useful!In formal mathematical language, a power series

∞∑n=0

anxn

is just an element of a ring, where we define addition by

∞∑n=0

anxn +

∞∑n=0

bnxn =

∞∑n=0

(an + bn)xn,

and multiplication by(∞∑n=0

anxn

)(∞∑n=0

bnxn

)=∞∑n=0

(n∑k=0

akbn−k

)xn,

in other words, we add and multiply power series as if they were polynomials.This ring is denoted by R[[x]], if the power series have coefficients in R. Twopower series are defined to be equal if and only if they have the same coefficientof xn for every n. We are not regarding x as a variable which can take differentvalues (yet!), so what happens when you substitute particular values for x istotally irrelevant, for now!

If a0, a1, a2, . . . is a combinatorial sequence, meaning that an is the number ofobjects of a certain kind, for each n, then the power series

∞∑n=0

anxn


is known as the generating series for (an). (It is often called the generatingfunction for (an), but this is misleading, as it is defined to be a power series,rather than a function of x, so we will not use the term ‘generating function’ verymuch at first.)

Multiplying the generating series for two combinatorial sequences has a usefulcombinatorial interpretation. Suppose A and B are families of sets of differentsizes, where every set in A is disjoint from every set in B. For each n, let An bethe family of all sets in A with size n, and let Bn be the family of all sets in Bwith size n. Define the combinatorial sequences

an = |An| = number of n-element sets in A,bn = |Bn| = number of n-element sets in B.

Now let’s build a new family, C, consisting of all sets that are a union of aset in A and a set in B. Let Cn be the family of all n-element sets in C, and letcn = |Cn|, for each n. What is cn in terms of the ai’s and the bi’s? The answer is

cn =n∑k=0

akbn−k,

since to choose an n-element set in C, we must first choose an integer k between0 and n, then a k-element set from A (ak choices), and then an (n− k)-elementset from B (bn−k choices). So altogether,

cn = |Cn| =n∑k=0

akbn−k.

This means that∞∑n=0

cnxn =

(∞∑n=0

anxn

)(∞∑n=0

bnxn

)— the generating series for the sequence (cn) is just the product of the generatingseries for (an) and the generating series for (bn).

Now we define some other useful operations on power series.

Definition (Reciprocal). Let∑∞

n=0 anxn be a power series with a0 6= 0. Its

reciprocal is the power series∑∞

n=0 bnxn satisfying(

∞∑n=0

anxn

)(∞∑n=0

bnxn

)= 1.

Equating the coefficients on both sides, this is equivalent to

a0b0 = 1 (equating coefficients of x0),


n∑i=1

aibn−i + a0bn = 0 ∀n ≥ 1 (equating coefficients of xn).

Rearranging, this is equivalent to

b0 = 1/a0, bn = − 1

a0

n∑i=1

aibn−i ∀n ≥ 1,

which gives a recursive definition for the sequence (bn).

Definition (Substitution). If A(x) =∑∞

n=0 anxn, B(x) =

∑∞n=0 bnx

n is a powerseries with a0 = 0, then we define

B(A(x)) =∞∑n=0

bn(A(x))n,

where (A(x))n is calculated using the multiplication rule.

Note that if a0 6= 0, then the above formula would not give a finite expressionfor the coefficient of x0 in B(A(x)); instead, we would get

b0 + b1a0 + b2a20 + . . . .

If we have a0 = 0, however, then (A(x))n only contributes to the coefficient ofxi in B(A(x)) if n ≤ i, so the above formula gives a finite expression for all thecoefficients in B(A(x)).

Definition (Derivative). If A(x) =∑∞

n=0 anxn is a power series, its (formal)

derivative is defined by

A′(x) =∞∑n=1

nanxn−1.

Definition (Integral). If A(x) =∑∞

n=0 anxn is a power series, its (formal) inte-

gral is defined by ∫A(x) =

∞∑n=0

1

n+ 1anx

n+1.

We now come to what is perhaps the most important tool for manipulatingpower series.

Theorem 10 (General binomial theorem). For any rational number a,

(1 + x)a =∞∑n=0

(a

n

)xn.


Here, if a is a rational number, we define the binomial coefficient(a

n

)=a(a− 1) . . . (a− n+ 1)

n!;

this agrees with our definition when a is positive integer.The general binomial theorem can be viewed in two ways. Firstly, it can be

interpreted as a statement about power series: if we view 1 +x as a power series,and we define (1 + x)a to be the power series above, then all the usual rules ofexponents hold, namely

(1 + x)a(1 + x)b = (1 + x)a+b, ((1 + x)a)b = (1 + x)ab.

It can also be interpreted as a statement about functions of x. Namely, for anyreal number x with −1 < x < 1, the right-hand side converges (by the ratio test),and is equal to the left-hand side, provided we take the left-hand side to be theath power of (1 + x) which is real and positive.

An important special case of Theorem 10 is when a = −1; then we have(−1

n

)=

(−1)(−2)(−3) . . . (−n)

n!= (−1)n,

so

(1 + x)−1 =∞∑n=0

(−1)nxn = 1− x+ x2 − x3 + x4 − . . . .

Substituting −x for x in the above, we get

(1− x)−1 =∞∑n=0

(−1)n(−x)n =∞∑n=0

(−1)2nxn =∞∑n=0

xn.

Interpreting the two sides as functions of x, this is the familiar formula for thesum of a geometric progression.

Generating series and recurrence relations

Ifa0, a1, a2 . . .

is a combinatorial sequence, and we know a recurrence relation for an with initialconditions, we can often use generating series to come up with a formula for anas a function of n. Let’s see an example of how we do this with the Fibonaccinumbers.

Example 28. Recall that the Fibonacci numbers are defined by F0 = F1 = 1,Fn = Fn−1 + Fn−2 for all n ≥ 2. Use the generating series for the sequence (Fn)to derive a formula for Fn as a function of n.


Answer: Let

F (x) =∞∑n=0

Fnxn

be the generating series of (Fn). Using the recurrence relation Fn = Fn−1 +Fn−2,we obtain:

F (x) =∞∑n=0

Fnxn

= F0 + F1x+∞∑n=2

Fnxn

= 1 + x+∞∑n=2

(Fn−1 + Fn−2)xn

= 1 + x+∞∑n=2

Fn−1xn +

∞∑n=2

Fn−2xn

= 1 + x+ x∞∑n=1

Fnxn + x2

∞∑n=0

Fnxn

= 1 + x+ x(∞∑n=0

Fnxn − 1) + x2

∞∑n=0

Fnxn

= 1 + x∞∑n=0

Fnxn + x2

∞∑n=0

Fnxn

= 1 + xF (x) + x2F (x).

Rearranging, we obtain:(1− x− x2)F (x) = 1.

Taking reciprocals,

F (x) =1

1− x− x2.

Notice that we have found a very simple formula for F (x), without any sums; thisis known as a ‘closed form’ expression for F (x). The formula on the right-handside is really a power series (although it can also be viewed as a function of x).If we can find the coefficients of this power series, we will be done! To make thistask easier, let’s now express the right-hand side using partial fractions; this willenable us to use the binomial theorem to find the coefficients of the power series.First, let us factorise

1− x− x2 = (1− αx)(1− βx);

then 1α

and 1β

are the two roots of the equation 1− x− x2 = 0, so

α =1 +√

5

2, β =

1−√

5

2.


Write1

1− x− x2=

1

(1− αx)(1− βx)=

A

1− αx+

B

1− βx;

then we must have

A(1− βx) +B(1− αx) = 1.

Equating coefficients of 1 and x on both sides, we get

A+B = 1, −βA− αB = 0.

Solving this pair of simultaneous equations gives

A =α

α− β=

α√5, B = − β

α− β= − β√

5.

Hence,

F (x) =1

1− x− x2=

1

(1− αx)(1− βx)=

α/√

5

1− αx− β/

√5

1− βx.

We can now expand the right-hand side as a power series, using the binomialtheorem: we get

F (x) =α√5

∞∑n=0

(αx)n − β√5

∞∑n=0

(βx)n

=∞∑n=0

αn+1 − βn+1

√5

xn.

Equating coefficients in F (x), we obtain

Fn =αn+1 − βn+1

√5

∀n ≥ 0,

the same formula as we obtained before.

Exponential Generating Series

There is another type of generating series which can be more useful for some com-binatorial problems. This is called the exponential generating series of a sequence.Let a0, a1, a2, . . . be a sequence of real numbers. Its exponential generating seriesis the power series

Ae(x) =∞∑n=0

ann!xn.


It is called the exponential generating series because of the close relation withthe exponential function

exp(t) =∞∑t=0

tn

n!.

Note that the generating series

A(x) =∞∑n=0

anxn

we worked with before is often called the ordinary generating series for (an), todistinguish it from the exponential generating series. Like the ordinary generatingseries, the exponential generating series is just defined to be a formal power series,rather than a function of x (although in many applications later on, it will beuseful to regard it as a function of x).

Taking the derivative of an exponential generating series has a particularlysimple interpretation. Let a0, a1, a2, . . . be a sequence of real numbers, and let

Ae(x) =∞∑n=0

ann!xn

be its exponential generating series. Then

A′e(x) =∞∑n=1

nann!xn−1

=∞∑n=1

an(n− 1)!

xn−1

=∞∑m=0

am+1

m!xm,

which is just the exponential generating series for the sequence (bn) defined bybn = an+1 for all n ≥ 0, i.e., the sequence you get by shifting the original sequenceleft by 1. Similarly, the rth derivative if Ae(x) is the exponential generating seriesfor the sequence you get by shifting (an) to the left by r places.

Multiplying two exponential generating series also has a nice combinatorialinterpretation. Let a0, a1, a2, . . . and b0, b1, b2, . . . be sequences of real numbers,and let

Ae(x) =∞∑n=0

ann!xn, Be(x) =

∞∑n=0

bnn!xn

be their exponential generating series. Then their product is the exponentialgenerating series for the sequence c0, c1, c2, . . ., where

cn =n∑k=0

(n

k

)akbn−k.


Indeed, we have

Ae(x)Be(x) =

(∞∑m=0

amm!xm

)(∞∑m=0

bmm!xm

)

=∞∑n=0

n∑k=0

akbn−kk!(n− k)!

xn

=∞∑n=0

1

n!

(m∑k=0

n!

k!(n− k)!akbn−k

)xn

=∞∑n=0

1

n!

(n∑k=0

(n

k

)akbn−k

)xn

=∞∑n=0

cnn!xn

= Ce(x).

If (an) and (bn) are combinatorial sequences, what is the interpretation of thesequence (cn)? Suppose A and B are families of structures. For each n ≥ 0, letan be the number of structures in A which contain n points, and let bn be thenumber of structures in B which contain n points. Suppose we now create a newfamily of structures, C, as follows. For each pair of structures (A,B) with A ∈ Aand B ∈ B, we place A and B side-by-side (without overlap) to create a newstructure, and then we relabel the points. Let C be the set of all structures wecan produce in this way, and for each n ≥ 0, let cn be the number of structuresin C which contain n points. Often, cn will be given by

cn =n∑k=0

(n

k

)akbn−k.

Let’s see an example where this happens. For each n ≥ 3, let an be the numberof permutations in Sn which have a single cycle of length n, and define a0 = a1 =a2 = 0. For each n ≥ 1, define bn to be the number of permutations in Sn whichonly have cycles of lengths 1 and 2. Then

n∑k=0

(n

k

)akbn−k

is the number of permutations in Sn which have exactly one cycle of lengthgreater than 2. (Choose a length k for the cycle of length greater than 2. Thenchoose which k numbers go in this cycle —

(nk

)choices. Then choose how these

k numbers are arranged in the cycle — ak choices. Finally, choose the cyclesformed by the other n− k numbers — bn−k choices.)


Like ordinary generating series, exponential generating series are useful forsolving recurrence relations. Whether they are better or worse than ordinarygenerating series, depends upon the form of the recurrence relation.

Recall that the Fibonacci numbers satisfy F0 = F1 = 1, Fn = Fn−1 + Fn−2

for all n ≥ 2. Since exponential generating series behave nicely when we shift asequence to the left, they give us another method of obtaining the general formulafor the Fibonacci numbers, which is easier in some ways (and harder in others)than the method using ordinary generating series.

Example 29. Use exponential generating series to obtain again the general for-mula for the Fibonacci numbers.

Answer: recall that the Fibonacci numbers satisfy F0 = F1 = 1, Fn = Fn−1 +Fn−2 for all n ≥ 2. Let

f(x) =∞∑n=0

Fnn!xn

denote the exponential generating series for (Fn). We have

Fn+2 = Fn+1 + Fn ∀n ≥ 0,

and taking the exponential generating series of both sides of this equation gives:

∞∑n=0

Fn+2

n!xn =

∞∑n=0

Fn+1

n!xn +

∞∑n=0

Fnn!xn.

Therefore,f ′′(x) = f ′(x) + f(x). (2.2)

Let us now consider f(x) as a function of x, so f is a function from R to R.Equation (2.2) is a second-order linear differential equation for f , and we knowhow to solve these. Let us try to solve it, to obtain an explicit expression for fas a function of x. As always with second-order, linear differential equations, wetry a solution of the form

f(x) = etx.

Substituting this into (2.2) yields:

t2etx = tetx + etx.

Rearranging,(t2 − t− 1)etx = 0.

Cancelling the factor of etx gives

t2 − t− 1 = 0.


This has roots t = α, β where α = 1+√

52

and β = 1−√

52

. Therefore, the generalsolution of (2.2) is

f(x) = Aeαx +Beβx. (2.3)

We now want to use our initial conditions to find A and B. We can do this easilyby finding the values of f(0) and f ′(0). Note that if we evaluate the functionf(x) at zero, we get f(0) = F0 = 1. Note also that

f ′(x) =∞∑n=0

Fn+1

n!xn,

so f ′(0) = F1 = 1. Substituting these values into (2.3) yields the pair of simulta-neous equations

A+B = 1

αA+ βB = 1.

Multiplying the first equation by β gives

βA+ βB = β

αA+ βB = 1.

Subtracting gives

(α− β)A = 1− β = α⇒ A = α/√

5.

Similarly, B = −β/√

5.Therefore, we have the following closed-form expression for the exponential

generating series:

f(x) =α√5eαx − β√

5eβx.

Now let us expand the right-hand side as a power series; we obtain

f(x) =α√5

∞∑n=0

(αx)n

n!− β√

5

∞∑n=0

(βx)n

n!=∞∑n=0

(αn+1 − βn+1

√5

1

n!

)xn,

so∞∑n=0

Fnn!xn = f(x) =

∞∑n=0

(αn+1 − βn+1

√5

1

n!

)xn.

Equating coefficients of xn on both sides gives

Fn =αn+1 − βn+1

√5

∀n ≥ 0,

giving yet another derivation of our general formula for the Fibonacci numbers.


Remark 8. Note that in this derivation, we actually considered the exponentialgenerating series f(x) as a function of x. When we consider (ordinary or expo-nential) generating series as functions from R to R, in order for our proofs towork, we need the series to converge in some neighbourhood of zero. With mostof the generating series we work with, this will be true, although you may recallthat the ordinary generating series for n!,

∞∑n=0

n!xn,

converges only at x = 0, so it cannot be considered as a function of x. When youare answering exam questions using generating series, you will not be required tocheck convergence. (In the above example, the power series all converge for allx ∈ R!)

Remark 9. Notice that this derivation using the exponential generating functionof the Fibonacci sequence is in some ways easier, and in some ways harder than theone using the ordinary generating function. Harder, because we have to solve adifferential equation (although only one of the easiest kinds). Easier, because oncewe had the closed-form expression for f(x), expanding it as a power-series wasvery easy, using the power-series expansion of et. (With the ordinary generatingseries, we had to use partial fractions and the general Binomial Theorem.)

Now let’s see an example where the ordinary generating series is much betterthan the exponential one.

Example 30. Recall that the Catalan numbers (Cn) may be defined by C0 =C1 = 1, Cn =

∑n−1k=0 CkCn−1−k for all n ≥ 2. Use the generating series for the

sequence (Cn) to derive a formula for Cn as a function of n.

Observe that the recurrence relation for Cn looks very like a term in the squareof the (ordinary) generating series of Cn. Let

C(x) =∞∑n=0

Cnxn

be the generating series for (Cn).


Inspired by this observation, notice that

C(x)2 =

(∞∑n=0

Cnxn

)2

=∞∑n=0

(n∑k=0

CkCn−k

)xn

=∞∑n=0

Cn+1xn

=1

x

(∞∑n=0

Cnxn − 1

)=

1

x(C(x)− 1).

Therefore,

xC(x)2 − C(x) + 1 = 0.

Let us now think of C(x) as a function of x, so for any real number x, C(x) isjust a real number. The quadratic equation above has two solutions,

C(x) =1±√

1− 4x

2x.

Notice that the solution 1+√

1−4x2x

→ ∞ as x → 0, whereas we have C(0) = 1.Therefore, we take the solution

C(x) =1−√

1− 4x

2x.

This is our ‘closed form’ expression for C(x). Now let us use the binomial theorem


to expand this as a power series. We have

(1− 4x)1/2 =∞∑n=0

(1/2

n

)(−4x)n

= 1 +∞∑n=1

(12)(−1

2) · · · (3

2− n)

n!(−4x)n

= 1 +∞∑n=1

(2n− 3)(2n− 5) . . . (3)(1)

n!2n(−1)n−14n(−1)nxn

= 1−∞∑n=1

(2n− 2)!

(n− 1)!2n−1n!2n4nxn

= 1−∞∑n=1

2

n

(2n− 2)!

((n− 1)!)2xn

= 1− 2∞∑n=1

1

n

(2n− 2

n− 1

)xn.

It follows that

C(x) =1

2x

(1−

(1− 2

∞∑n=1

1

n

(2n− 2

n− 1

)xn

))

=∞∑n=1

1

n

(2n− 2

n− 1

)xn−1

=∞∑n=0

1

n+ 1

(2n

n

)xn.

Equating coefficients of xn in C(x), we obtain

Cn =1

n+ 1

(2n

n

)∀n ≥ 0.

We have obtained our previous formula for the Catalan numbers. Note thatthe generating series approach requires none of the ingenuity of our previousmethod. In fact, it is useful for tackling a wide range of problems.

Remark 10. Note that in the above derivation, we are considering C(x) as afunction of x, so in order to be completely rigorous, we would need to check thatC(x) converges in some open neighbourhood of 0 (in fact, it converges for all|x| < 1/4.) When you are answering a question and are using a generating seriesto derive a formula for a given recurrence relation, you do not need to checkconvergence. If we want to be completely rigorous, we can always use generating


series to find the general formula, and then prove that it satisfies the recurrencerelation. This is a common strategy in mathematics: we often try to ‘guess’the answer to a question, by non-rigorous means; only once we have guessed theanswer, do we try to prove it rigorously.

Now, let’s see an example where it is best to use the exponential generatingseries.

Studying derangements via generating series

For each n ∈ N, let dn denote the number of derangements of {1, 2, . . . , n}. (Recallthat a derangement of {1, 2, . . . , n} is a permutation of {1, 2, . . . , n} which hasno fixed point.) We proved that

dn = n!n∑j=0

(−1)j

j!,

using the inclusion-exclusion formula. We’ll now see two more proofs of this,both using the exponential generating function for the sequence (dn). For thefirst proof, we need the following recurrence relation for dn.

Theorem 11. The derangement numbers (dn) satisfy

dn = (n− 1)(dn−2 + dn−1) ∀n ≥ 3.

Proof. If f ∈ Sn is a derangement, then f(n) ∈ {1, 2, . . . , n − 1}. Our aim is tocount the number of derangements with f(n) = i, for each i ∈ {1, 2, . . . , n− 1},in terms of dn−1 and dn−2. If f ∈ Sn is a derangement with f(n) = i, then either

(a) f(i) = n, i.e. f swaps n and i, or else

(b) f(i) 6= n.

Let A denote the set of all derangements in Sn with f(n) = i and f(i) = n, andlet B denote the set of all derangements in Sn with f(n) = i but f(i) 6= n. Ourtask is to find |A| and |B|.

Observe that if f ∈ A, then f has a disjoint cycle representation of the form

f = (i n)g,

where g consists of a collection of disjoint cycles (of length > 1) involving all thenumbers except for i and n. In other words, g is a derangement of {1, 2, . . . , i−1, i+ 1, i+ 2, . . . , n− 1}. The number of choices for g is simply dn−2, the numberof derangements of a set of size n− 2, and therefore |A| = dn−2.


We now turn our attention to B. Let C be the set of all permutations in Snwhich fix n, but do not fix any other number; I will now construct a bijectionfrom B to C. Consider the map

Φ : Sn → Sn; f 7→ (i n)f.

Note that Φ is a bijection from Sn to itself. It is its own inverse, since

(i n)(i n)f = f ∀f ∈ Sn.

I claim that Φ(B) = C. To see this, first note that if f ∈ B, and g = Φ(f) =(i n)f , then

g(n) = (i n)f(n) = (i n)(i) = n.

Second, note that if f ∈ B, then Φ(f) = (i n)f cannot fix any j ∈ {1, 2, . . . , n−1}.Indeed, if f ∈ B and g = (i n)f fixes some j ∈ {1, 2, . . . , n− 1}, then

f(j) = (i n)g(j) = j,

so f also fixes j, a contradiction. Therefore, we have

Φ(B) ⊂ C. (2.4)

Finally, note that if g ∈ B, then Φ(g) ∈ C, so

Φ(C) ⊂ B.

Applying Φ to both sides, we have

Φ(Φ(C)) ⊂ Φ(B).

Since Φ is its own inverse, we have Φ(Φ(C)) = C. It follows that

C ⊂ Φ(B). (2.5)

Combining (2.4) and (2.5) proves that Φ(B) = C, as claimed.Let φ : B → C be the restriction of Φ to B; then φ is a bijection from B to C.

It follows that |B| = |C|. Note that |C| = dn−1, the number of derangements ofan (n− 1)-element set. We conclude that |B| = dn−1.

We are finally ready to prove the recurrence relation. The number of derange-ments satisfies

dn = (n− 1)(dn−2 + dn−1),

since there are n − 1 choices for the image i of n, and for each choice, we have|A| = dn−2 and |B| = dn−1.


Our aim is now to use the exponential generating series to turn the recurrencerelation above into a formula for dn in terms of n. First, it is helpful to defined0 = 1; then the recurrence relation dn = (n−1)(dn−2 +dn−1) holds for all n ≥ 2.

Let

D(x) =∞∑n=0

dnn!xn

denote the exponential generating function of the sequence (dn).We first rewrite our recurrence relation in a form which is easier to use with

the exponential generating series:

dn+1 = n(dn−1 + dn) ∀n ≥ 1.

We now multiply each side by xn/n! and sum over all n ≥ 1:

∞∑n=1

dn+1

n!xn =

∞∑n=1

ndn−1

n!xn +

∞∑n=1

ndnn!

xn. (2.6)

Since d1 = 0, we may rewrite the left-hand side of (2.6) as

∞∑n=0

dn+1

n!xn = D′(x).

We rewrite the right-hand side of (2.6) as

x∞∑n=1

dn−1

(n− 1)!xn−1 + x

∞∑n=1

dn(n− 1)!

xn−1 = x∞∑n=0

dnn!xn + x

∞∑n=0

dn+1

n!xn

= xD(x) + xD′(x).

Therefore, we haveD′(x) = xD′(x) + xD(x). (2.7)

Let us now regard D(x) as a function of x. Equation (2.7) is now a differentialequation, which is separable, so we can solve it! Rearranging, we have

D(x)

D′(x)=

x

1− x=

1

1− x− 1.

Integrating both sides, we get

ln(D(x)) = − ln(1− x)− x+ C,

where C is the constant of integration. Raising both sides to the power of e, weget

D(x) =Ae−x

1− x,


where A = eC . To find A, note that we need D(0) = d0 = 1, giving A = 1.Therefore,

D(x) =e−x

1− x— we have found a closed form expression for D. Expanding the left-hand sideas a power series in powers of x gives:

D(x) =

(∞∑m=0

(−1)m

m!xm

)(∞∑m=0

xm

)=∞∑n=0

(n∑k=0

(−1)k

k!

)xn.

Therefore, we have

∞∑n=0

dnn!xn = D(x) =

∞∑n=0

(n∑k=0

(−1)k

k!

)xn.

Equating coefficients of powers of xn on both sides, we have

dn = n!n∑k=0

(−1)k

k!∀n ≥ 0.

We have re-proved our formula for the derangement numbers!We now give another proof, which is much shorter. Observe that the derange-

ment numbers also satisfy the equation

n! =n∑k=0

(n

k

)dn−k.

This is because both sides count the number of permutations in Sn: to choose apermutation in Sn, we can first choose exactly how many fixed points it has —say k ∈ {0, 1, . . . , n}. We can then choose which k points are fixed —

(nk

)choices.

The other cycles of the permutation must form a derangement of the other n− knumbers — there are dn−k choices for this derangement. The total number ofpossibilities is therefore

n∑k=0

(n

k

)dn−k.

Recall the formula for multiplying two exponential generating series:(∞∑n=0

ann!xn

)(∞∑n=0

ann!xn

)=∞∑n=0

cnn!xn,

where

cn =n∑k=0

(n

k

)akbn−k.


If we choose an = 1 for all n and bn = dn for all n, then we get cn = n! for all n.Therefore, (

∞∑n=0

1

n!xn

)(∞∑n=0

dnn!xn

)=∞∑n=0

n!

n!xn.

Thinking of both sides of the above equation as functions, we obtain

exD(x) =1

1− x,

so we recover the same closed form expression for D(x) as we had above.

Composing exponential generating functions

It turns out that composing two exponential generating series has a useful com-binatorial interpretation. (Composing two ordinary generating series is not souseful, in general.) This is given by the following theorem.

Theorem 12. Let

A(x) =∞∑n=0

ann!xn,

where a0 = 0, and let

B(x) =∞∑n=0

bnn!xn.

Then

B(A(x)) =∞∑n=0

cnn!xn,

where

cn =∑

π={S1,...,Sk}∈Bn

bka|S1|a|S2| · . . . · a|Sk|.

(Recall that Bn denotes the set of all partitions of {1, 2, . . . , n}.)

This has many uses. One is counting permutations with a given number ofcycles of given lengths. Here is an example.

Example 31. Use exponential generating series to find an explicit formula (asa function of n) for the number of permutations in Sn whose cycles all have oddlengths.

Answer: Define

an =

{(n− 1)! if n is odd,0 if n is even,


and define bn = 1 for all n ≥ 0. If cn is given by the formula in Theorem 12, thenwe have

cn =∑

π={S1,...,Sk}∈Bn

bka|S1|a|S2| · . . . · a|Sk|

=∑

π={S1,...,Sk}∈Bn: |Si| is odd ∀i

bk(|S1| − 1)!(|S2| − 1)! · . . . · (|Sk| − 1)!,

which is precisely the number of permutations whose cycles all have odd lengths!(The partition π of {1, 2, . . . , n} tells us which numbers go in the same cyclesas one another; there are (|Si| − 1)! cycles we can form from the numbers inthe part Si.) Therefore, by Theorem 12, if A(x), B(x), C(x) are the exponentialgenerating series for (an), (bn) and (cn) respectively, then we have

C(x) = B(A(x)).

Note that B(x) = exp(x), when viewed as a function of x. We have

A(x) =∑

n≥1, n odd

(n− 1)!

n!xn =

∑n≥1, n odd

xn

n.

Can we write A as a familiar function of x? Yes, we can! Notice the similarityto the power-series expansion of log(1 + x):

ln(1 + x) = x− x2

2+x3

3− x4

4+ . . . .

Similarly,

ln(1− x) = −x− x2

2− x3

3− x4

4− . . . .

Therefore,

A(x) =∑

n≥1, n odd

xn

n= 1

2(ln(1 + x)− ln(1− x)) = ln

√1 + x

1− x.

Hence,

C(x) = exp

(ln

√1 + x

1− x

)=

√1 + x

1− x.

To find the number of permutations with all cycles of odd lengths, we now expandC(x) as a power series in powers of x. To do this, it is easier to rewrite

C(x) =

√1− x2

1− x,

and then to expand the numerator as a power series.


By the general binomial theorem, we have

√1− x2 =

∞∑n=0

(1/2

n

)(−x2)n

= 1 +∞∑n=1

(12)(−1

2) · · · (3

2− n)

n!(−1)nx2n

= 1 +∞∑n=1

(2n− 3)(2n− 5) . . . (3)(1)

n!2n(−1)2n−1x2n

= 1−∞∑n=1

(2n− 2)!

(n− 1)!2n−1n!2nx2n.

= 1−∞∑n=1

(2n− 2)!

(n− 1)!n!22n−1n!2nx2n.

= 1−∞∑n=1

1

22n−1n

(2n− 2

n− 1

)x2n.

Now observe that it is easy to multiply a power series by 1/(1 − x): for anysequence (fn) of real numbers, we have(

∞∑n=0

fnxn

)(1

1− x

)=

(∞∑n=0

fnxn

)(∞∑n=0

xn

)=∞∑n=0

gnxn,

where

gn =∞∑i=0

fi.

In other words, if we multiply a power series by 1/(1− x), the coefficients of thenew power series are just the partial sums of the coefficients of the old powerseries.

So in our case, we have

C(x) =

√1− x2

1− x=

(1−

∞∑n=1

1

22n−1n

(2n− 2

n− 1

)x2n

)(1

1− x

)=∞∑n=0

cnn!xn,

wherecnn!

= 1−∑

1≤i≤n/2

1

22i−1i

(2i− 2

i− 1

).

Hence, cn, the number of permutations in Sn with all cycles of odd lengths,satisfies

cn = n!

1−∑

1≤i≤n/2

1

22i−1i

(2i− 2

i− 1

) = n!

1−∑

1≤i≤n/2

Ci−1

22i−1

,


where Cm = 1m+1

(2mm

)denotes the mth Catalan number.

We’ll now use exponential generating series to prove a rather surprising the-orem.

Theorem 13. If n ≥ 2 is an even integer, then the number of permutations inSn with all cycles of even lengths is equal to the number of permutations in Snwith all cycles of odd lengths.

Proof. Define

gn =

{n! if n is even,0 if n is odd,

and let

G(x) =∞∑n=0

gnn!xn =

∑n even, ≥0

xn = 1 + x2 + x4 + . . .

be the exponential generating series for (gn).Let en be the number of permutations in Sn with all cycles even, for each

n ≥ 0, and let

on =

{number of permutations in Sn with all cycles odd if n is even,0 if n is odd.

(For convenience, we define e0 = o0 = 1.) Let

E(x) =∞∑n=0

enn!xn, O(x) =

∞∑n=0

onn!xn

be their exponential generating series. Our aim is to show that in fact, E(x) =O(x). To do this, we’ll first show that

E(x)O(x) = G(x).

Indeed, we have (∞∑n=0

enn!xn

)(∞∑n=0

onn!

)=∞∑n=0

fnn!,

where

fn =∞∑k=0

(n

k

)ekon−k.

But if n is even, then fn is just the number of permutations in Sn: to choosea permutation in Sn, we can first choose how many numbers are in even cycles(say k), then choose exactly which numbers are in even cycles (

(nk

)choices), then

choose the even cycles (ek choices), and then choose the odd cycles (on−k choices).


If n is odd, then fn = 0, since ek = 0 if k is odd, and on−k = 0 if n − k is odd,and either k or n− k must be odd. So fn = gn, for all n ≥ 0. This proves that

E(x)O(x) = G(x).

Viewing G(x) as a function of x, we have

G(x) = 1 + x2 + x4 + x6 + . . . =1

1− x2.

Now we’ll express E(x) as a function of x. To do this, we do a very similarthing to in Example 31. Define the sequence

an =

{(n− 1)! if n is even, ≥ 20 otherwise

,

so an just counts even cycles; define bn = 1 for all n ≥ 0. Let

A(x) =∞∑n=0

ann!xn, B(x) =

∞∑n=0

bnn!xn

be their exponential generating series. Using Theorem 12, we have

B(A(x)) =∞∑n=0

cnn!xn,

where

cn =∑

π={S1,...,Sk}∈Bn

bka|S1|a|S2| · . . . · a|Sk|

=∑

π={S1,...,Sk}∈Bn:

|Si| is even ∀i

(|S1| − 1)!(|S2| − 1)!) · . . . · (|Sk| − 1)!

= en,

by the same argument as in Example 31. So

E(x) = B(A(x)).

As before, B(x) = exp(x), when viewed as a function of x. Now observe that

A(x) =∑

n even, ≥2

xn

n

= 12

(∞∑n=1

xn

n+∞∑n=1

(−1)nxn

n

)= 1

2(− ln(1 + x)− ln(1− x))

= ln

(1√

1− x2

).


Therefore,

E(x) = exp

(ln

(1√

1− x2

))=

1√1− x2

.

Since E(x)O(x) = 11−x2 , we must have

O(x) =1√

1− x2= E(x).

If two power series are the same as functions of x, then they must have the samecoefficients — if

C(x) =∞∑n=0

cnn!xn,

then we can calculate the nth coefficient by differentiating the function n timesand setting x = 0:

cn =dn

dxnC(x)

∣∣∣∣x=0

.

It follows that en = on for all n, proving the theorem.

Remark 11. To be completely rigorous, we must note that all of the above powerseries converge for −1 < x < 1, so the above proof is indeed a valid one!

Remark 12. There is, in fact, a bijective proof of Theorem 13, but it is not aparticularly nice one!

Chapter 3

Graph Theory

3.1 Introduction

A graph is one of the most basic and important objects in Combinatorics. In-formally, a graph is a set of points (called vertices), together with a set of lines(called edges) where each edge joins a pair of vertices together, and each pair ofvertices has at most one edge between them. Hence, the London Undergroundcan be thought of as a graph: the vertices are the stations, and two stationshave an edge between them if they are adjacent stops on one of the UndergroundLines. Here is part of it:

Picadilly CircusGreen Park

Oxford Circus

Regent’s Park

Baker Street

BondStreet

The formal mathematical definition of a graph is as follows.

Definition. A graph G is a pair of sets (V,E) where E is a set of pairs ofelements of V . The set V is called the set of vertices of G (or the vertex-set ofG, for short), and the set E is called the set of edges of G (or the edge-set of G,for short).

For example, the picture above corresponds to the graph with vertex-set

{Baker Street,Bond Street,Green Park,Oxford Circus,Regent’s Park,Picadilly Circus},

75

76 CHAPTER 3. GRAPHS

and edge-set

{{Baker Street,Bond Street}, {Bond Street,Green Park},{Green Park,Picadilly Circus}, {Picadilly Circus,Oxford Circus},{Oxford Circus,Regent’s Park}, {Regent’s Park,Baker Street},{Green Park,Oxford Circus}, {Oxford Circus,Bond Street}}.

Often, we will work with graphs whose vertex-sets are {1, 2, . . . , n} for somenatural number n. Here is a picture of the graph with vertex-set {1, 2, 3, 4} andedge-set {{1, 2}, {2, 3}, {1, 3}, {2, 4}}:

1

2

3

4

If G is a graph, we will often denote its vertex-set by V (G) and its edge-setby E(G).

In this course, we will only be concerned with finite graphs: graphs where thevertex-set (and therefore the edge-set) is finite. (Infinite graphs are interstingmathematical objects too, but we will not study them in this course.) From nowon, ‘graph’ will always mean ‘finite graph.’

If G is a finite graph, we will often write v(G) for the number of vertices ofG, and e(G) for the number of edges of G.

Here are some of the simplest examples of graphs.

The path Pn:

1

2

3

4

n5

n−1

3.1. INTRODUCTION 77

The cycle Cn:

23

4

n−1

n

5

The empty graph, En, consisting of n vertices and no edges:

1 2 3

...

n

The complete graph, Kn, consisting of n vertices which are all joined to oneanother by edges.

1

2

3

4

5

(Note that e(Kn) =(n2

).)

Notice that in the above definition of a graph, the vertices (points) are labelled,and the labels matter: the graph

1

2

3G1

is different to the graph

1

3

2G2


Some authors use the term ‘labelled graph’ for what we defined as a ‘graph’.An unlabelled graph is different: informally, it is a ‘graph’ where we do not labelthe vertices. Both of the graphs G1 and G2 above define the same unlabelledgraph:

Informally, an unlabelled graph is produced by ‘forgetting about the labelson the vertices of a labelled graph’. Formally, we can define unlabelled graphs asfollows.

If G and H are labelled graphs, we say that they are isomorphic if thereexists a bijection f : V (G) → V (H) such that {u, v} ∈ E(G) if and only if{f(u), f(v)} ∈ E(H). The bijection f is then called an isomorphism from G toH. For example, the bijection

1 7→ 1, 2 7→ 3, 3 7→ 2

is an isomorphism from G1 to G2. (Informally, an isomorphism from G to H canbe thought of as a way of relabelling the vertices of G with the names of verticesof H, in such a way as to turn G into H.)

We define a relation ∼ on the set of labelled graphs as follows. We say thatG ∼ H if G is isomorphic to H. Note that ∼ is an equivalence relation — itsatisfies the three axioms for an equivalence relation:

Reflexive G ∼ G for all graphs G. (The identity map on V (G) is an isomor-phism from G to itself.)

Symmetric If G ∼ H, then H ∼ G. (If f is an isomorphism from G to H, thenf−1 is an isomorphism from H to G.)

Transistive If G ∼ H and H ∼ K, then G ∼ K. (If f is an isomorphism from Gto H, and f ′ is an isomorphism from H to K, then f ′ ◦f is an isomorphismfrom G to K.)

Recall that if X is a set, and ∼ is an equivalence relation on X, then we canpartition X into ∼-equivalence classes. (The equivalence classes are defined byx ∼ x′ if and only if x and x′ are in the same equivalence class.) An unlabelledgraph is defined to be a ∼-equivalence class of labelled graphs.

This constructs unlabelled graphs (as formal mathematical objects), startingwith labelled graphs, which in turn were defined in terms of sets. It is nice toknow that we can do this, but when you are solving problems, you should justthink of an unlabelled graph as a graph whose vertices do not have labels.

It is useful to have a few more definitions. If G and H are labelled graphs, wesay that H is a subgraph of G if V (H) ⊂ V (G) and E(H) ⊂ E(G). For example,if


G=

1

2

3

4

H=

12

4

then H is a subgraph of G. We will sometimes write H ⊂ G to mean that His a subgraph of G.

If H is a subgraph of G, we say that H is an induced subgraph of G if wheneveru, v ∈ V (H) with {u, v} ∈ E(G), we have {u, v} ∈ E(H) as well. The graph H(above) is not an induced subgraph of G (above) because it does not contain theedge {1, 4}, but the graph

234

is an induced subgraph of G.If H is a subgraph of G, we say that H is a spanning subgraph of G if V (H) =

V (G).Now we come to an important property which graphs can have. We say that

a graph G is connected if for any two distinct vertices u, v ∈ V (G), there is a walkalong edges of the graph which begins at u and ends at v. In other words, if thereis a sequence of vertices w1, w2, . . . , wl where w1 = u, wl = v, and {wi, wi+1} isan edge of G for all i ∈ {1, 2, . . . , l − 1}.

It is easy to see that if G is a graph, and u, v ∈ V (G), then there is a walk(in G) from u to v if and only if there is a path (in G) from u to v. (A walk canrepeat vertices, but a path cannot.) Exercise: write down a formal proof of this!

So equivalently, a graph is connected if and only if for any two distinct verticesu, v ∈ V (G), there is a path in G from u to v.

If a graph is not connected, it is said to be disconnected.If G is a graph, then we can partition it into maximal connected subgraphs.

The maximal connected subgraphs of a graph are called the components of agraph.


For example, the graph G below has components G1, G2 and G3:

234

1

G=

G1

G2

G3

5

6

78 9

10

Here, saying that H is a ‘maximal’ connected subgraph of G means that ifyou try to extend H to a larger subgraph of G, you get a disconnected subgraph.In general, the term ‘maximal’ is different to the term ‘maximum’. A ‘maximum’object means ‘one of the largest objects of its kind’. A ‘maximal’ object means‘an object which cannot be made any larger without destroying the key property.’In the above example, the maximum connected subgraphs of G are G1 and G2,but the maximal connected subgraphs of G are G1, G2 and G3.

Notice that a component of a graph G is always an induced subgraph of G.By definition, two distinct components of a graph cannot share any vertices.

3.2. TREES 81

3.2 Trees

A tree is a simple but important type of graph.

Definition. A tree is a connected graph which contains no cycles.

If a graph contains no cycles, we say that it is acyclic. So a tree is preciselya connected, acyclic graph. Notice that if G is an acyclic graph, then each of itscomponents is a connected, acyclic graph (i.e., a tree). So an acyclic graph isoften called a forest.

The following theorem tells us two useful conditions which are equivalent tothe condition of being a tree.

Lemma 14. Let G be a graph. The following three conditions are equivalent.

(a) G is a tree.

(b) G is a minimal connected graph.

(c) G is a maximal acyclic graph.

Remark 13. Here, saying that a graph G is ‘minimal connected’ means thatit is connected, but removing any edge from G produces a disconnected graph.Similarly, saying that a graph G is ‘maximal acyclic’ means that it is acyclic, butadding any edge to G (between two vertices of G with no edge of G between them)produces a graph with a cycle.

Proof of Lemma 14. (a) ⇒ (b): Suppose G is a tree. Then it is connected; wemust show that it is a minimal connected graph. Let e = {u, v} be an edge ofG. Remove e to produce a new graph G′. I claim that G′ is disconnected. To seethis, suppose for a contradiction that G′ is connected; then there is a path in G′

from u to v. Together with the edge e, this forms a cycle in G, contradicting thefact that G is acyclic. Hence, G is a minimal connected graph.

(a) ⇒ (c): Suppose G is a tree. Then it is acyclic; we must show that it isa maximal acyclic graph. Let u, v ∈ V (G) such that {u, v} /∈ E(G). Producea new graph G′ by adding in the edge {u, v}. I claim that G′ contains a cycle.To see this, observe that since G is connected, there is a path from u to v inG. Together with the new edge {u, v}, this forms a cycle in G′. Hence, G is amaximal acyclic graph.

(b)⇒ (a): Suppose G is a minimal connected graph. We must show that it isacyclic. Suppose for a contradiction that G has a cycle, C say. Let e be any edgeof the cycle C. Produce a new graph G′ by removing e. I claim that the graph G′

is connected (this will contradict the fact that G is a minimal connected graph).To see this, let u, v ∈ V (G′). Since G is connected, there must be a path in Gfrom u to v. If this path does not use the edge e, then it is also a path from u tov in G′. If the path does use the edge e, then we can produce a walk from u to


v in G′ by replacing the edge e by the other edges of the cycle. Therefore, G′ isconnected, proving the claim.

(c) ⇒ (a): Suppose G is a maximal acyclic graph. We must show that G isconnected. Let u, v ∈ V (G); we will show that there exists a path in G from uto v. If {u, v} ∈ E(G), then we are done, so we may assume that {u, v} /∈ E(G).Let G′ be the graph produced from G by adding in the edge {u, v}. Then G′

contains a cycle. This cycle must contain {u, v}, otherwise G itself would havehad a cycle. The other edges of the cycle form a path from u to v. Therefore, Gis connected.

We can use Lemma 14 to prove the following useful fact.

Theorem 15. If G is connected, then G has a spanning tree. (A spanning treeof G is a subgraph of G which contains all the vertices of G, and is a tree.)

Proof. Let G′ be a minimal connected spanning subgraph of G. (This meansthat G′ is a spanning subgraph of G, which is connected, but removing any edgefrom G′ produces a disconnected graph. Observe that G is a connected spanningsubgraph of itself, so there does exist at least one connected spanning subgraphof G, and so there must be a minimal one.) By Lemma 14, G′ is a tree, so it is aspanning tree, proving the theorem.

Choosing a minimal (or maximal) object with a certain property is a veryimportant method of proof in combinatorics. Our next theorem is also provedusing this kind of technique. First, we need some more definitions.

Definition. Let G be a graph, and let v be a vertex of G. The degree of v is thenumber of edges of G which meet v; it is denoted by d(v). The neighbours of vare the vertices which are joined to v by edges of G; the neighbourhood of v isthe set of neighbours of v.

Definition. Let G be a graph. If e ∈ E(G), then G − e is the graph producedfrom G by removing the edge e. If v ∈ V (G), then G − v is the graph producedfrom G by removing the vertex v, and all the edges which meet v.

Theorem 16. Let T be a tree with at least two vertices. Then T contains at leasttwo leaves.

Proof. Choose any vertex v ∈ T . Let P be a maximal path in T which goesthrough v. (Here, ‘maximal’ means that that there is no edges= of T which wecould add to P to produce a longer path. Note that, since T is a connected graphwith at least two vertices, P has at least one edge.)

We shall prove that an endpoint of P must be a leaf of T . Let a be anendpoint of P ; I claim that the only neighbour of a in T is the vertex adjacentto a on P . Suppose for a contradition that there is another vertex c which isalso a neighbour of a in T . If c does not lie on P , then we could extend P to

3.2. TREES 83

c, contradicting the maximality of P . If c does lie on P , then together with thepart of P between a and c, the edge {a, c} forms a cycle in T , contradicting thefact that T is acyclic. This proves that a has just one neighbour in T , so is aleaf. Since P has two endpoints, T has at least two leaves.

We can use this to prove the following theorem.

Theorem 17. Let T be a tree with n vertices. Then e(T ) = n− 1.

Proof. By induction on n. When n = 1, this is clear. Let n ≥ 2, and supposethat the statement of the theorem holds for all trees with n − 1 vertices. Let Tbe a tree with n vertices. By the previous theorem, T has a leaf, v say. Let u bethe neighbour of v. Remove v and the edge {u, v} to produce a new graph, T ′.Note that T ′ is a tree. By the induction hypothesis, we have e(T ′) = n− 2, andtherefore e(T ) = e(T ′) + 1 = n − 1. This completes the proof of the inductivestep, proving the theorem.

Induction on the number of vertices (or edges) of a graph is another veryfrequently used method of proof in graph theory.

We can now give two other conditions which are equivalent to a graph beinga tree.

Theorem 18. Let G be a graph with n vertices. The following three conditionsare equivalent.

(a) G is a tree.

(b) G is a connected graph with e(G) = n− 1.

(c) G is an acyclic graph with e(G) = n− 1.

Proof. (a) ⇒ (b) and (c): this is Theorem 17.(b) ⇒ (a): Let G be a connected graph with n vertices and n− 1 edges. By

Theorem 15, G has a spanning tree, T say. By Theorem 17, T also has n − 1edges, so G = T , so G is a tree.

(c) ⇒ (a) Let G be an acyclic graph with n vertices and n − 1 edges. Nowwe build a new graph G′ (with the same set of vertices as G) as follows. Startingwith G, let us produce G′ by trying to add as many edges as we can to Gwithout producing any cycles. The graph G′ will be a maximal acyclic graph,and therefore a tree, by Lemma 14. Hence, by Theorem 17, it has n − 1 edges.But G also has n− 1 edges, so in fact, G = T , so G is a tree.

So far, we have mainly been interested in the properties of trees. As combi-natorialists, we are also interested in how many trees there are!

Question: How many (labelled) trees are there with vertex-set {1, 2, . . . , n}?

The answer is surprisingly simple.


Theorem 19. If n ≥ 2, then there are nn−2 labelled trees with vertex-set {1, 2, . . . , n}.

Proof. Let Tn denote the set of labelled trees with vertex-set {1, 2, . . . , n}. LetSn denote the set of all sequences of n − 2 numbers, where each number is aninteger between 1 and n. We know from Chapter 1 that |Sn| = nn−2. We shallgive a bijection f : Tn → Sn. (This bijection was discovered by Prufer.)

Given a tree T ∈ Tn, we define the sequence f(T ) as follows. Let i1 be the leafof T with the lowest label, and let j1 be its neighbour in T . Write down j1 as thefirst element of the sequence, and then produce a new tree (T2 say) by removingthe leaf i1 and the edge {i1, j1} from T . Now repeat this process on the new treeT2: let i2 be the leaf of T2 with the lowest label, let j2 be its neighbour in T , andwrite down j2 as the second element of the sequence. Produce a new tree T3 bydeleting the leaf i2 and the edge {i2, j2}. Repeat this process until you are leftwith a tree which has just one edge, and then stop. Since T had n− 1 edges, andwe removed an edge at each stage, the sequence we produce has length n− 2.

Here is an example:

1

2

6

4

3

5

7

8

1

2

6

3

5

7

8

1

23

5

7

8

1

3

5

7

8

3

5

7

8

3

58

3

8

T=

The tree T above has f(T ) = (1, 2, 1, 3, 5, 3).Challenge: prove that f is a bijection! (There will be small prizes for correct

answers to this.)

3.3. BIPARTITE GRAPHS AND MATCHINGS 85

3.3 Bipartite graphs and matchings

We now come to another important type of graph.

Definition. Let r ∈ N with r ≥ 2. We say that a graph G is r-partite if we canpartition V (G) into r sets (or ‘classes’) such that none of these classes containsan edge of G.

For example, the following graph is 4-partite; a suitable partition into 4 classesis indicated by circles.

1

23

4

5

67

8

9

(It is also 3-partite; can you see why?)A very important special case of this definition is when r = 2. A 2-partite

graph is usually called a bipartite graph. A graph is bipartite if its vertex-set canbe partitioned into two sets (X and Y say) such that every edge of the graph hasone end in X and the other in Y .

A tree is an example of a bipartite graph. (Can you prove this before readingon?) The complete bipartite graph Km,n is another example: this is a graph withbipartition X ∪Y , where |X| = m and |Y | = n, where every vertex of X is joinedto every vertex of Y :

Remarkably, there is a very simple characterization of bipartite graphs.

Theorem 20. Let G be a graph. Then G is bipartite if and only if it containsno odd cycles.

To prove this, we need to define the distance between two vertices in a graph.

Definition. Let G be a graph, and let u, v ∈ V (G). The distance from u to v isthe length of the shortest path (in G) from u to v; it is written d(u, v).


XY

a

b

c

s

r

q

p

K3,4

Proof of Theorem 20. First suppose that G is bipartite; we must prove that Gcontains no odd cycle. Suppose for a contradiction that G does have an oddcycle; let v1v2 . . . v2l+1v1 be an odd cycle in G. Since G is bipartite, there is apartition V (G) = X ∪ Y such that all the edges of G go between X and Y . Byrelabelling if necessary, we may assume that v1 ∈ X. Then v2 ∈ Y , so v3 ∈ X,and so on — in general, if i is odd, then vi ∈ X, and if i is even, then vi ∈ Y .This implies that v2l+1 ∈ X — but then v1 ∈ X, a contradiction. Therefore, abipartite graph cannot contain an odd cycle.

For the other direction, suppose that G has no odd cycles; we must prove thatG is bipartite. Let G1, . . . , GN be the components of G; we shall deal with eachcomponent separately. Let us prove that G1 is bipartite.

Choose any vertex v0 ∈ V (G1). For each i, let

Vi = {w ∈ V (G1) : d(v0, w) = i}

be the set of all vertices of G1 which are at a distance of exactly i from v0. If lis the maximum possible distance a vertex of G can be from v0, then

V0 ∪ V1 ∪ . . . ∪ Vl

is a partition of V (G1).Firstly, observe that there are no edges of G1 between Vi and Vj if j > i+ 1.

Indeed, if there was an edge of G1 between Vi and Vj, say {a, b} ∈ E(G1) witha ∈ Vi and b ∈ Vj, then we have

d(v0, b) ≤ d(v0, a) + 1 = i+ 1 < j,

contradicting b ∈ Vj. (This is true whether or not the graph G1 has odd cycles.)


V0

V1 V2 Vi Vj Vl

Secondly, observe that there are no edges of G1 within any of the Vi. Indeed,if there was an edge of G1 within Vi, then there would be a path of length i backto v0 from each of its two endpoints. At the point where these two paths meet,you would get an odd cycle, a contradiction.

V0

V1 V2 Vi Vj Vl

We may conclude that each edge of G1 goes between Vi and Vi+1, for some i.Therefore, the bipartition

X =⋃i even

Vi, Y =⋃i odd

Vi

shows that G1 is a bipartite graph. Similarly, each of the other componentsG2, . . . , GN are bipartite, so G is bipartite.

Theorem 20 makes it easy for us to decide whether a given graph is bipartite:we just need to check whether it has any odd cycles. Unfortunately, there is nosuch theorem for 3-partite graphs (or for r-partite graphs, for any r > 2).

Matchings and Hall’s Marriage Theorem

Suppose there are a certain number of women, each of whom has some malefriends. Is it possible for each woman to marry a man who she knows? (Eachman is only allowed to marry one woman!) If there are four women (A, B, C andD), and four men (P, Q, R and S), and

• A knows P and Q only,


• B knows P and Q only,

• C knows P, Q, R and S,

• D knows P and Q only,

then this is not possible: between them, A, B and D know only two men (P andQ). Hall’s Marriage Theorem gives us a condition which tells us exactly when itis possible. (In fact, the kind of situation above is the only obstacle.)

The question can be phrased in terms of a bipartite graph. Let X be the setof women, and let Y be the set of men. Define a bipartite graph with bipartitionX ∪ Y by joining each woman to all the men she knows; what we want to find isa set of edges (M say) in this graph, such that each vertex in X meets an edgeof M , and no two edges of M share any vertex. This is called a matching.

Definition. Let G be a bipartite graph with bipartition X ∪Y . A set of edges Min G is called a matching (from X to Y ) if each vertex in X meets one edge, andno two edges of M share any vertex.

For example, the bold edges in the graph below form a matching from X toY :

X Y

a

b

c

d

e

p

q

r

s

t

u

Before stating Hall’s Marriage Theorem, we need some more notation. If Gis a graph, and S is a set of vertices of G, we define

Γ(S) = {v ∈ V (G) : {v, s} ∈ E(G) for some s ∈ S}

to be the set of vertices joined to a vertex in S; this is sometimes called theneighbourhood of S in G.

If A is a set of vertices of G, we define G[A] to be the induced subgraph ofG with vertex-set A — in other words, the subgraph of G whose vertex-set is A,and where two vertices are joined if and only if they are joined in G.

Here, then, is Hall’s Marriage Theorem.


Theorem 21 (Hall’s Marriage Theorem). Let G be a bipartite graph with bipar-tition X ∪ Y . Then G has a matching from X to Y if and only if

|Γ(A)| ≥ |A| ∀A ⊂ X (‘Hall’s condition’)

Proof. The forward direction is easy. Suppose G has a matching, M say. Thenfor any subset A ⊂ X, the edges of M join each vertex in A to a different vertexof Y . These |A| distinct vertices of Y are all in Γ(A), so |Γ(A)| ≥ |A|. Therefore,Hall’s condition holds.

Now let’s prove the other direction. We claim that any bipartite graph satis-fying Hall’s condition, must have a matching. We prove this claim by inductionon |X|. If |X| = 0 then G trivially has a matching (the empty matching), so theclaim holds when |X| = 0.1 Now for the induction step. Let k ≥ 1, and supposethat the claim holds whenever |X| ≤ k − 1. Let G be a bipartite graph withbipartition X ∪ Y , where |X| = k, and suppose that G satisfies Hall’s condition.We must show that G has a matching from X to Y . We consider two cases:

case 1: For any subset S ⊂ X with S 6= ∅, S 6= X, we have |Γ(S)| > |S|;

case 2: There exists a subset T ⊂ X with T 6= ∅, T 6= X, with |Γ(T )| = |T |.(Such a subset T is called a critical subset.)

Suppose first that we are in case 1. Since |Γ(X)| ≥ |X| ≥ 1, G must have at leastone edge. Choose u ∈ X and v ∈ Y such that {u, v} ∈ E(G). Now let G′ be thegraph produced from G by removing both u and v (and all edges meeting them):in symbols, G′ = (G − u) − v. Let X ′ = X \ {u} and let Y = Y \ {v}; then G′

is bipartite with bipartition X ′ ∪ Y ′. Observe that Hall’s condition holds for G′.Indeed, if B ⊂ X ′ with B 6= ∅, let Γ′(B) denote the neighbourhood of B in G′,

Γ′(B) = {v ∈ Y ′ : {v, b} ∈ E(G′) for some b ∈ B}.

Then we have

|Γ′(B)| ≥ |Γ(B)| − 1 ≥ |B| ∀B ⊂ X ′.

Therefore, Hall’s condition holds for G′. So by the induction hypothesis, G′

contains a matching, M ′ say. We can now combine M ′ with the deleted edge{u, v} to produce a matching M in G. So G has a matching. This deals withcase 1.

Suppose now that we are in case 2. Then there exists a subset T ⊂ X withT 6= ∅, T 6= X, with |Γ(T )| = |T |. Let G1 = G[T ∪Γ(T )] be the induced subgraphof G on vertex-set T ∪Γ(T ), and let G2 = G[(X \T )∪(Y \Γ(T ))] be the subgraphof G induced on the rest of the vertices of G. Our aim is to show that there isa matching in G1 and a matching in G2. Observe that Hall’s condition must

1If you don’t like this, you can start the induction with |X| = 1, as in the lectures.


hold in G1, since all the edges of G which start in T must end in Γ(T ), by thedefinition of Γ(T ). So by the induction hypothesis, G1 has a matching.

Now we must check that Hall’s condition holds in G2. If B ⊂ X \T , let Γ2(B)denote the neighbourhood of B in G2. We must show that |Γ2(B)| ≥ |B|. To dothis, we use a trick: we consider B ∪ T . Notice that

|Γ(B ∪ T )| = |Γ2(B)|+ |Γ(T )| = |Γ2(T )|+ |T |,

since T is a critical subset — so we can express |Γ2(T )| in terms of the size of aneighbourhood in the original graph G, which we know all about. We have

|Γ2(B)| = |Γ(B ∪ T )| − |T | ≥ |B ∪ T | − |T | = |B|+ |T | − |T | = |B|,

using the fact that Hall’s condition holds in G. Therefore, Hall’s condition holdsin G2. So by the induction hypothesis, G2 has a matching. Put the matchingin G1 and the matching in G2 together to produce a matching in G. So G hasa matching. This deals with case 2, completing the proof of the induction step,and proving the theorem.

We shall now use Hall’s theorem to show that a special kind of bipartite graphalways contains a matching.

We need one more piece of notation. If G is a graph, and S, T ⊂ V (G) withS ∩ T = ∅ (S and T are disjoint sets of vertices of G), then E(S, T ) denotes theset of edges of G which go between S and T , and e(S, T ) = |E(S, T )| denotes thenumber of edges of G which go between S and T .

Theorem 22. Let k be a positive integer. Suppose G is a bipartite graph withbipartition X ∪ Y , such that each vertex in X has degree at least k, and eachvertex in Y has degree at most k. Then G has a matching from X to Y .

Proof. We check that Hall’s condition holds. Let A ⊂ X. We have

k|A| ≤ e(A, Y ) ≤ e(X,Γ(A)) ≤ k|Γ(A)|,

since there are at least k|A| edges coming out of A, the set of edges coming out ofA is a subset of the edges going in to Γ(A), and there are at most k|Γ(A)| edgesgoing in to Γ(A). Therefore |Γ(A)| ≥ |A|, so Hall’s condition holds, so G has amatching from X to Y .

Let us introduce some more definitions.

Definition. If G is a graph, the minimum degree of G is the minimum of thedegrees of all the vertices of G; it is sometimes written as δ(G). In symbols,

δ(G) = min{d(v) : v ∈ V (G)}.

The maximum degree of G is the maximum of the degrees of all the verticesof G; it is sometimes written as ∆(G). In symbols,

∆(G) = max{d(v) : v ∈ V (G)}.


Definition. We say that a graph is k-regular if all its vertices have degree exactlyk.

Theorem 22 implies that a k-regular bipartite graph always has a matching,for any positive integer k.

In fact, if G is a bipartite graph with bipartition X ∪ Y , and we know that Ghas a matching from X to Y , and we also know that each vertex in X has ‘high’degree, then we can deduce that in fact, G has ‘many’ matchings from X to Y .This is the content of the following theorem, which will be useful in our study oflatin squares.

Theorem 23. Let G be a bipartite graph with bipartition X ∪ Y . Suppose thatG has a matching from X to Y , and that d(x) ≥ r for all x ∈ X, where r ∈ N.Then in fact, the number of matchings in G from X to Y is at least{

r! if r ≤ |X|;r(r − 1)(r − 2) . . . (r − |X|+ 1) if r > |X|.

Proof. We prove this by induction on |X|. When |X| = 1, the statement of thetheorem holds: let X = {x}; then d(x) ≥ r, so there are at least r matchingsfrom X to Y . Now for the induction step. Let k ≥ 2, and assume that thestatement of the theorem holds whenever |X| ≤ k − 1. Now let G be a bipartitegraph with bipartition X ∪ Y , where |X| = k. Suppose that G has a matchingfrom X to Y , and that d(x) ≥ r for all x ∈ X. We split into two cases, as in theproof of Hall’s theorem:

case 1: For any subset S ⊂ X with S 6= ∅, S 6= X, we have |Γ(S)| > |S|;

case 2: There exists a subset T ⊂ X with T 6= ∅, T 6= X, with |Γ(T )| = |T |.(Such a subset T is called a critical subset.)

First suppose that we are in case 1. Choose any vertex u ∈ X. Now choose anyvertex v ∈ Y such that {u, v} ∈ E(G), and remove u and v from G, producing thenew bipartite graph G′ = (G−u)−v with bipartition X ′∪Y ′, where X ′ = X\{u}and Y ′ = Y \ {v}. (G′ is produced from G by removing u, v and all the edgesmeeting u or v.) If x ∈ X ′, let d′(x) be the degree of x in G′. Note that d′(x) ≥r − 1 for all x ∈ X, since we have only deleted one vertex from Y . Moreovor, asin the proof of Hall’s theorem, G′ satisfies Hall’s condition. Therefore, by Hall’stheorem, G′ must have a matching. So by the induction hypothesis (applied toG′, with r − 1 in place of r), the number of matchings in G′ is at least

(r − 1)! if r − 1 ≤ |X ′| = |X| − 1;(r − 1)(r − 2) . . . (r − 1− |X ′|+ 1)= (r − 1)(r − 2) . . . (r − |X|+ 1) if r − 1 > |X ′| = |X| − 1.

(3.1)

Each one of these matchings in G′ can be extended to a matching in G by addingin the edge {u, v}. Since d(u) ≥ r, there must be at least r choices for v, so the


total number of matchings in G we can produce in this way is at least r times(3.1), which is at least{

r! if r ≤ |X|;r(r − 1)(r − 2) . . . (r − |X|+ 1) if r > |X|.

This completes the proof of the induction step in case 1.Now suppose that we are in case 2. Let T be a critical set. Then we have

r ≤ |Γ(T )| = |T | ≤ |X|, so we must prove that G has at least r! matchings. DefineG1 = G[T ∪ Γ(T )] and G2 = G[(X \ T ) ∪ (Y \ Γ(T ))], as in the proof of Hall’stheorem. Again, as in the proof of Hall’s theorem, G1 satisfies Hall’s condition,so G1 has a matching. For x ∈ V (G1), let d1(x) denote the degree of x in G1.Then we have d1(x) ≥ r for all x ∈ T . Therefore, by the induction hypothesis, G1

has at least r! matchings (as r ≤ |T |). Also, as in the proof of Hall’s theorem, G2

has a matching. We can use this matching in G2 to extend any matching in G1

to a different matching in G, so G must have at least r! matchings as well. Thiscompletes the proof of the induction step in case 2, proving the theorem.

Sometimes, we are interested in whether a bipartite graph has a partial match-ing of a certain size.

Definition. Let G be a graph. A partial matching of size k in G is a set of kedges of G, such that no two of these edges meet.

If G is a bipartite graph with bipartition X ∪ Y , then a partial matching ofsize k in G consists of a set of k edges which match k distinct vertices of X tok distinct vertices of Y . The following bipartite graph has a partial matching ofsize 3 (one is indicated in bold), but no matching:

X Y

We can use Hall’s theorem to deduce the following characterization of whena bipartite graph contains a partial matching of size k.

Theorem 24. Let k be a positive integer. Let G be a bipartite graph with bipar-tition X ∪ Y . Then G contains a partial matching of size k if and only if

|Γ(A)| ≥ |A| − (|X| − k) ∀A ⊂ X (∗).


Proof. As with Hall’s theorem, the forward direction is easy. Suppose that Gcontains a partial matching (M , say) of size k. Let A ⊂ X. Since M has at most|X \ A| = |X| − |A| endpoints in X \ A, it must have at least k − (|X| − |A|)endpoints in A; the other ends of these edges are all distinct neighhours of A inY . So |Γ(A)| ≥ k − (|X| − |A|) = |A| − (|X| − k), so (∗) holds.

Now suppose that (∗) holds. Produce a new bipartite graph G′ by taking Gand adding |X| − k new vertices to Y , each joined to every vertex of X. ThenHall’s condition holds in G′, so by Hall’s theorem, G′ has a matching, M ′ say.Delete all the edges of M ′ which meet one of the |X| − k new vertices; there areat most |X| − k of these edges, so we are left with at least k edges. These forma partial matching of size k in G. So G has a partial matching of size k. Thisproves the theorem.

In some societies, a man can have more than one wife. And in some societies,a woman can have more than one husband! Suppose there is a society where awoman is allowed to have more than one husband, but each man can have atmost one wife. Can each woman find two husbands, both of whom she knows?

In terms of bipartite graphs, what we are asking for is a set of edges from Xto Y , such that each vertex of X meets exactly 2 of these edges, but each vertexof Y meets at most one of these edges. Such a set of edges is known as a 2-foldmatching from X to Y . Similarly, we make the following general definition.

Definition. Let r be a positive integer. If G is a bipartite graph with bipartitionX ∪ Y , an r-fold matching in G from X to Y is a set of edges of G such thateach vertex of X meets exactly r of these edges, but each vertex of Y meets atmost one of these edges.

We can use Hall’s theorem to deal with this situation, too.

Theorem 25. Let G be a bipartite graph with bipartition X ∪Y . Then G has anr-fold matching from X to Y if and only if

|Γ(A)| ≥ r|A| ∀A ⊂ X (∗∗).

Proof. If G has an r-fold matching from X to Y , then we clearly have |Γ(A)| ≥r|A| for any subset A ⊂ X, so (∗∗) holds.

Now suppose that (∗∗) holds. We produce a new bipartite graph G′ by re-placing each vertex u ∈ X with r ‘copies’ of u, joined to all the same vertices as uwas joined to. The graph G′ has bipartition X ′ ∪ Y , where X ′ consists of all thecopies of vertices in X (so |X ′| = r|X|). I claim that G′ satisfies Hall’s condition.Indeed, if B ⊂ X ′, then let A be the set of vertices of X which B contains a copyof. Let Γ′(B) denote the neighbourhood of B in G′; then Γ′(B) = Γ(A).

Each vertex in X was copied exactly r times, so |B| ≤ r|A|. Therefore,

|Γ′(B)| = |Γ(A)| ≥ r|A| ≥ |B|,


so Hall’s condition holds in G′, as claimed. Therefore, by Hall’s theorem, G′ has amatching, M ′ say. This corresponds to an r-fold matching in the original graph,G. This proves the theorem.

Konig’s Theorem is another useful consequence of Hall’s Theorem; it relatesthe maximum possible size of a partial matching to the minimum possible size ofa vertex-cover in a bipartite graph.

Definition. Let G be a bipartite graph with bipartition X ∪ Y . A maximumpartial matching in G is a partial matching from X to Y in G with the maximumpossible size.

Definition. Let G be a bipartite graph with bipartition X ∪ Y . A vertex-coverin G is a set of vertices of G such that each edge of G meets at least one ofthese vertices. A minimum vertex-cover in G is a vertex-cover with the minimumpossible number of vertices.

Here, then, is Konig’s theorem.

Theorem 26. Let G be a bipartite graph with bipartition X∪Y . Then the size ofa maximum partial matching in G is equal to the size of a minimum vertex-coverin G.

Proof. Let M be a partial matching in G with the maximum possible number ofedges. Let S be a minimum vertex-cover in G. Then each edge of M meets adifferent vertex of S, so |S| ≥ e(M).

Now let |S| = k. Our aim is to prove that G has a partial matching of size k.I claim that

|Γ(A)| ≥ |A| − (|X| − k) ∀A ⊂ X.

(Theorem 24 will then imply that G has a partial matching of size k.) To provethe claim, observe that for any A ⊂ X, (X \ A) ∪ Γ(A) is a vertex-cover of G.Since any vertex-cover is at least as large as S (S was chosen to be a minimumvertex-cover), we must have

k ≤ |(X \ A) ∪ Γ(A)| = |X \ A|+ |Γ(A)| = |X| − |A|+ |Γ(A)|.

Rearranging, we get |Γ(A)| ≥ |A|− (|X|−k), as claimed. Therefore, by Theorem24, G has a partial matching of size k. It follows that e(M) ≥ k = |S|. Hence,e(M) = |S|, proving the theorem.

Chapter 4

Latin Squares

4.1 Introduction

In this chapter, we will study Latin Squares. These objects are quite simpleto define, but display interesting and complicated behaviour, and the variousconstructions involved use techniques from both combinatorics and algebra.

Definition. A latin square of order n is an n by n array (i.e., matrix), whereeach entry is a symbol drawn from an alphabet of size n, and each symbol occursexactly once in each row and exactly once in each column.

For example, here is a latin square of order 3, with symbols drawn from thealphabet {1, 2, 3}:

1 2 32 3 13 1 2

The alphabet does not have to be the numbers {1, 2, . . . , n} — here is a latinsquare of order 3 with alphabet {a, b, c}:

c b ab a ca c b

Here is a latin square of order 4:

1 2 3 42 1 4 33 4 1 24 3 2 1

A natural question to ask is the following: does there exist a latin square oforder n, for any positive integer n? The answer is ‘yes’: we can construct one

95

96 CHAPTER 4. LATIN SQUARES

using modular arithmetic. Let n ∈ N. We define an order-n latin square, L, asfollows. The alphabet is

Zn = {0, 1, 2, . . . , n− 1},

‘the integers modulo n’. We index the rows of L with the numbers 0, 1, 2, . . . , n−1,and we index the columns of L with the numbers 0, 1, 2, . . . , n−1. We define Li,j(the (i, j)th entry of L) by

Li,j ≡ i+ j (mod. n).

(In other words, Li,j is produced by adding i and j in the Abelian group Zn.)For example, when n = 5, we get

L =

0 1 2 3 41 2 3 4 02 3 4 0 13 4 0 1 24 0 1 2 3

I make the following claim.

Claim. Each element of Zn occurs at most once in each row of L and at mostonce in each column of L.

Proof of claim: Suppose Li,j = Li,k. Then i + j ≡ i + k (mod. n), so j ≡k (mod. n), so j = k. So each element of Zn occurs at most once in eachrow of L. Similarly, each element of Zn occurs at most once in each column ofL.

The above claim is equivalent to saying that L is a latin square — since thealphabet has exactly n symbols, if each symbol occurs at most once in a row ofL, then each symbol must occur exactly once in that row, and similarly for acolumn.

Therefore, L is a latin square of order n — we have constructed a latin squareof order n, for each n.

In fact, the construction above is a special case of the following.

Exercise 8. Let G = {g1, . . . , gn} be a finite group of order n, with multiplication∗. We define the Cayley multiplication table of G to be the array LG with (i, j)thentry gi ∗ gj. Prove that LG is a latin square.

(Taking the group G to be Zn, with the ‘multiplication’ operation ∗ beingaddition modulo n, we get the latin square L above.)

There is also a more combinatorial way of constructing latin squares. We canconstruct them row by row, without worrying about whether we will run intotrouble in the future! In order to do this, we need to define k×n latin rectangles,for k < n.


Definition. Let n and k be positive integers with k ≤ n. A k× n latin rectangleis a k by n array (i.e., matrix), where each symbol is drawn from an alphabetof size n, and each symbol occurs exactly once in each row, and at most once ineach column.

The following lemma will enable us to construct a latin square ‘row by row’.

Lemma 27. Let n and k be positive integers with k < n. Let R be a k × n latinrectangle. Then L can be extended to a (k + 1)× n latin rectangle by adding onemore row. (In fact, L can be extended to at least (n − k)! different (k + 1) × nlatin rectangles.)

Proof. Add a row of n empty cells to the bottom of the latin rectangle R. Let Xbe the set of these cells, and let Y be the set of symbols in R’s alphabet. Definea bipartite graph G with bipartition X ∪ Y as follows. Let us join the emptycell in column j to all of the symbols which have not yet occurred in column j.Notice that a matching in G is exactly what we need to extend L to a (k+ 1)×nlatin rectangle. I claim that G is an (n − k)-regular bipartite graph, meaningthat every vertex in G has degree n−k. (We can then conclude, from the sectionon matchings, that G has a matching — in fact, that it has at least (n − k)!matchings.) To prove this claim, first observe that each column of R containsexactly k symbols, so there are n− k symbols it does not contain, so each x ∈ Xhas degree n− k. Secondly, observe that each symbol y ∈ Y occurs exactly oncein each of the k rows of R, and each time it occurs in a different column, so itoccurs in exactly k columns of R, so there are exactly n−k columns where it doesnot appear. So each y ∈ Y has degree n − k. This proves the claim. Therefore,G has at least (n−k)! different matchings. So L can be extended to a (k+ 1)×nlatin square in at least (n− k)! different ways.

Corollary 28. We can produce an order-n latin square with alphabet {1, 2, . . . , n}by first choosing any ordering of 1, 2, . . . , n for the first row, and then using thefollowing step-by-step process. At each step, starting with a k×n latin rectangle,we can extend it to a (k + 1) × n latin rectangle, by the lemma above. If wecontinue for n− 1 steps, we have a latin square of order n.

Corollary 29. There are at least n!(n−1)! . . . 2!1! =∏n−1

k=0(n−k)! different latinsquares of order n with alphabet {1, 2, . . . , n}.

Proof. If we construct latin squares using the previous corollary, there are n!choices for the first step (we must just choose any ordering of 1, 2, . . . , n, i.e. anypermutation in Sn), and for the kth step in the above process, there are (n− k)!choices.


4.2 Orthogonal latin squares

We shall now look at orthogonal latin squares.

Definition. Let L = (Li,j) and M = (Mi,j) be two latin squares of order n.Suppose L uses alphabet A, and M uses alphabet B. The two latin squares Land M are said to be orthogonal to one another if every ordered pair of symbols(a, b) ∈ A×B occurs exactly once when we list the ordered pairs (Li,j,Mi,j).

For example, the two latin squares

L =1 2 32 3 13 1 2

, M =a b cc a bb c a

are orthogonal to one another, since when we list the ordered pairs (Li,j,Mi,j),we get the array

(1, a) (2, b) (3, c)(2, c) (3, a) (1, b)(3, b) (1, c) (2, a)

,

in which each one of the 9 ordered pairs of symbols in {1, 2, 3} × {a, b, c} occursexactly once.

Of course, if we have a pair of orthogonal latin squares, and we take oneof them and replace it by a new latin square by relabelling the symbols in itsalphabet, we get another pair of orthogonal latin squares. In the example above,if we relabel a as 1, b as 2, and c as 3, we get another pair of orthogonal latinsquares:

L =1 2 32 3 13 1 2

, M ′ =1 2 33 1 22 3 1

.

We can do this with both squares in a pair of orthogonal latin squares. So,‘relabelling the symbols makes no difference’ to orthogonality.

When n = 2, there is no pair of orthogonal latin squares. The reason is asfollows. Suppose that there was a pair of orthogonal latin squares of order 2.Then, by relabelling the alphabets as above, there would be a pair of orthogonallatin squares with alphabet {1, 2}. But there are only two latin squares withalphabet {1, 2}, namely

S =1 22 1

, T =2 11 2

and these are not orthogonal to one another, as when we list all the pairs(Si,j, Ti,j), we only get

(1, 2) (2, 1)(2, 1) (1, 2)

4.2. ORTHOGONAL LATIN SQUARES 99

— the pairs (1, 1) and (2, 2) do not appear. Hence, there is no pair of orthogonallatin squares of order 2.

You may remember from the first lecture that Euler asked the following ques-tion.

‘There are 6 different regiments. Each regiment has 6 soldiers, one of eachof 6 different ranks. Can these 36 soldiers be arranged in a square formation sothat each row and each column contains one soldier of each rank and one fromeach regiment?’

This is really asking whether there exists a pair of orthogonal latin squaresof order 6. (Why?) Euler conjectured that the answer is ‘no’, but he couldnot prove this. (There are an awful lot of latin squares of order 6 — at least6!5!4!3!2! = 24, 883, 200, as we saw above, and in fact exactly 812, 851, 200, so itwould take far too long to check all the possible pairs by hand.) In 1900, Tarryproved that there is no pair of orthogonal latin squares of order 6, using a cleverargument with lots of different cases, but obviously (and impressively!) withoutthe use of a computer.

Euler also made the daring conjecture that there exists no pair of orthogonallatin squares of order n, for any n which is congruent to 2 (mod. 4). In fact, thiswas completely false — Bose, Shrikhande and Parker proved in 1960 that thereis a pair of orthogonal latin squares of order n for any n, except for n = 2 andn = 6. Often in combinatorics, ‘small-number’ behaviour can be deceptive!

Our aim is to prove an easier result: that whenever n ≡ 0, 1 or 3 (mod. 4),there exists a pair of orthogonal latin squares of order n. The easiest cases arewhen n ≡ 1 or 3 (mod. 4), i.e. when n is odd:

Theorem 30. If n is an odd positive integer, there is a pair of orthogonal latinsquares of order n.

Proof. We use Zn = {0, 1, . . . , n − 1}, the integers modulo n, as our alphabet.Define the latin square L by

Li,j ≡ i+ j (mod.n),

as before, and define the matrix M by

Mi,j ≡ 2i+ j (mod. n).

(As before, the rows and columns are both indexed by the elements of Zn.) Iclaim that M is also a latin square. Indeed, if Mi,j = Mi,k, then 2i + j ≡ 2i + k(mod. n), so j ≡ k (mod. n), so j = k. Also, if Mi,j = Mk,j, then 2i+ j ≡ 2k+ j(mod. n), so 2i ≡ 2k (mod. n), so i ≡ k (mod. n), by multiplying both sidesof the previous equation by the multiplicative inverse of 2 in Zn. (Recall that if


r ∈ Zn, a multiplicative inverse for r is an element s ∈ Zn such that rs = 1; anelement r ∈ Zn has a multiplicative inverse if and only if the highest commonfactor of r and n is 1. So 2 has a multiplicative inverse in Zn, whenever n is odd.)So M is a latin square, as claimed.

I now claim that L and M are orthogonal. Indeed, take any (a, b) ∈ Zn ×Zn; we must show that there exist i, j ∈ Zn with (Li,j,Mi,j) = (a, b). This isequivalent to

i+ j ≡ a (mod.n)

2i+ j ≡ b (mod. n)

which has solution i ≡ b− a (mod. n), j ≡ 2a− b (mod. n). So each pair (a, b)occurs at least once. Since there are n2 possible pairs (a, b) and only n2 possibleentries (i, j), if each occurs at least once, then each must occur exactly once. Soeach pair (a, b) ∈ Zn×Zn occurs exactly once, so L and M are orthogonal to oneanother.

Our aim is now to construct a pair of orthogonal latin squares of order n, forevery n ≡ 0 (mod. 4). To do this, we will use finite fields.

Recall that a field is a set F equipped with two binary operations, denotedby + and ·, such that (F,+) is an Abelian group (with identity element denotedby 0), (F \ {0}, ·) is an Abelian group (with identity element denoted by 1), and· is distributive over +. Writing out the axioms in full, this means that:

• x+ (y + z) = (x+ y) + z for all x, y, z ∈ F ;

• x+ 0 = 0 + x = x for all x ∈ F ;

• For any x ∈ F , there exists (−x) ∈ F such that x+ (−x) = (−x) + x = 0.

• x+ y = y + x for all x, y ∈ F ;

• x · (y · z) = (x · y) · z for all x, y, z ∈ F \ {0};

• x · 1 = 1 · x = x for all x ∈ F \ {0};

• For any x ∈ F \{0}, there exists (x−1) ∈ F such that x·(x−1) = (x−1)·x = 1;

• (x+ y) · z = x · z + y · z and z · (x+ y) = z · x+ z · y for all x, y, z ∈ F ;

• x · y = y · x for all x, y ∈ F .

The order of a finite field F is just the number of elements F has; it is denotedby |F |.

The simplest examples of finite fields are Zp, the integers modulo p, for anyprime p, under the usual operations of + and ×. In fact, there is a field Fpd oforder pd for any prime p and any positive integer d. In particular, there is a field


F4 of order 22 = 4. (The integers modulo 4 do not form a field under + and×; why?) The field F4 = {0, 1, α, β} has addition and multiplication tables asfollows:

+ 0 1 α β0 0 1 α β1 1 0 β αα α β 0 1β β α 1 0

· 0 1 α β0 0 0 0 01 0 1 α βα 0 α β 1β 0 β 1 α

.

If F is a finite field, and f ∈ F \ {0}, then we can define a latin square L(f)by

L(f)i,j = f · i+ j (i, j ∈ F ).

(Here, the alphabet is F , and the rows and columns are each indexed by theelements of F .) Let’s check that L(f) is indeed a latin square. If L(f)i,j = L(f)i,k,then f ·i+j = f ·i+k, so i = k. Similarly, if L(f)i,j = L(f)k,j, then f ·i+j = f ·k+j,so f · i = f · k, so i = k (multiplying both sides of the previous equation by f−1,the multiplicative inverse of f). So each element of F occurs at most once ineach row and at most once in each column, so it must occur exactly once in eachrow and exactly once in each column, so L(f) is indeed a latin square.

Moreover, I claim that for any two distinct elements f, g ∈ F \ {0}, L(f) andL(g) are a pair of orthogonal latin squares. To prove this, we must show that forany (a, b) ∈ F × F , there exist i, j ∈ F satisfying

L(f)i,j = a,

L(g)i,j = b

i.e.

f · i+ j = a,

g · i+ j = b.

This is just a pair of simultaneous equations (with variables i and j). We solveit as follows: subtract the first from the second to give

(g − f) · i = b− a ⇒ i = (g − f)−1 · (b− a),

subsitute the value of i into the first equation to give j = a− (g − f)−1 · (b− a).So the pair (a, b) appears in the entry (i, j) where

i = (g − f)−1 · (b− a), j = a− (g − f)−1 · (b− a).

Therefore, the latin squares L(f) and L(g) are orthogonal to one another, asclaimed.


If F is any finite field with |F | ≥ 3, then it must have at least two non-zeroelements; choosing f and g to be any two distinct non-zero elements, we get apair of orthogonal latin squares L(f), L(g) of order |F |. Since there is a field oforder pd for any prime p and any d ∈ N, we see that there is a pair of orthogonallatin squares of order pd, for any prime p and any d ∈ N with pd ≥ 3:

Theorem 31. If p is prime and d ∈ N with pd ≥ 3, then there exists a pair oforthogonal latin squares of order pd.

In particular, taking p = d = 2, there is a pair of orthogonal latin squares oforder 4. Taking F4 = {0, 1, α, β}, we can take f = 1 and g = α, giving

L(1) =

0 1 α β1 0 β αα β 0 1β α 1 0

, L(α) =

0 1 α βα β 0 1β α 1 01 0 β α

.

(Note that L(1)i,j = 1 · i + j = i + j, so L(1) is just the addition table of F4,which we saw above; L(α)i,j = α · i + j, so we can work out the entries of L(α)using the multiplication of F4, which we saw above.) You can see directly thatL(1) and L(α) are orthogonal to each other.

We need one more tool to allow us to construct a pair of orthogonal latinsquares of order n, for every n ≡ 0 (mod. 4). This is the product construction oflatin squares.

Definition. Let A be a latin square of order m, using alphabet {1, 2, . . . ,m}, andlet B be a latin square of order n. The product latin square A ◦ B is defined asfollows. Produce m copies of B, say B1, . . . , Bm, by relabelling the alphabet ofB using m different alphabets which are pairwise disjoint (meaning that no twoof these m alphabets share any symbols). Now produce A ◦ B by taking A, andreplacing each symbol i in A with the latin square Bi.

For example, if

A =1 22 1

, B =1 2 32 3 13 1 2

,

then let us take

B1 =a b cb c ac a b

, B2 =d e fe f df d e

(note that we must use disjoint alphabets for B1 and B2, so we relabel 1, 2, 3 as


a, b, c to produce B1, and we relabel 1, 2, 3 as d, e, f to produce B2). We get

A ◦B =B1 B2

B2 B1=

a b c d e fb c a e f dc a b f d ed e f a b ce f d b c af d e c a b

.

The squares Bi are referred to as the ‘blocks’ of A ◦ B. Exercise: check thatA ◦B is always a latin square!

We can use this product construction to produce new pairs of orthogonal latinsquares:

Lemma 32. Suppose (A,C) is a pair of orthogonal latin squares of order m, and(B,D) is a pair of orthogonal latin squares of order m. Then (A ◦ B,C ◦D) isa pair of orthogonal latin squares of order mn.

Proof. Since relabelling alphabets does not affect orthogonality, we may assumethat A and C both use alphabet {1, 2, . . . ,m}. Let us produce A ◦ B by takingm copies B1, . . . , Bm of B (with disjoint alphabets), and let us produce C ◦ Dby taking m copies D1, . . . , Dm of D (with disjoint alphabets). Let X denotethe alphabet of A ◦ B (which is the union of all the alphabets of B1, . . . , Bm),and let Y denote the alphabet of C ◦D (which is the union of all the alphabetsof D1, . . . , Dm). We must prove that each ordered pair (x, y) ∈ X × Y appearsexactly once when we list all the ordered pairs ((A ◦B)p,q, (C ◦D)p,q).

Take any x ∈ X and any y ∈ Y . Suppose that x is in the alphabet of Ciand that y is in the alphabet of Dj. Since A and C are orthogonal, the pairof symbols (i, j) appears exactly once when we list the ordered pairs (Ar,s, Cr,s).Suppose (i, j) appears in entry (r0, s0), so that (Ar0,s0 , Cr0,s0) = (i, j). Then thepair (x, y) can only possibly appear in the (r0, s0) ‘block’ when we list the orderedpairs ((A ◦B)p,q, (C ◦D)p,q). Since the (r0, s0) ‘block’ of A ◦B is exactly Bi, andthe (r0, s0) ‘block’ of C ◦D is exactly Dj, and Bi and Dj are orthogonal to oneanother, the pair of symbols (x, y) appears exactly once in the (r0, s0) block.Therefore, each pair of symbols (x, y) ∈ X×Y appears exactly once when we listthe ordered pairs ((A ◦B)p,q, (C ◦D)p,q), so (A ◦B,C ◦D) is a pair of orthogonallatin squares.

It is helpful to illustrate this proof with an example. Let us take m = n = 3,and

A =1 2 32 3 13 1 2

, C =3 1 22 3 11 2 3

,

B =a b cb c ac a b

, D =a b cc a bb c a

.


Then let us take copies

B1 =a b cb c ac a b

, B2 =d e fe f df d e

, B3 =g h ih i gi g h

and

D1 =a b cc a bb c a

, D2 =d e ff d ee f d

, D3 =g h ii g hh i g

.

The ‘blocks’ of A ◦B and C ◦D are as follows:

A ◦B =B1 B2 B3

B2 B3 B1

B3 B1 B2

, C ◦D =D3 D1 D2

D2 D3 D1

D1 D2 D3

.

Suppose x is in the alphabet of B2 and y is in the alphabet of D3. The only ‘block’in which B2 appears in A ◦B and D3 appears in C ◦D is the (3, 3) ‘block’. SinceB2 and D3 are orthogonal to one another, the pair (x, y) must appear exactlyonce in this ‘block’.

We can use this to construct a pair of orthogonal latin squares of order n, forany n ≡ 0 (mod. 4).

Corollary 33. If n ≡ 0 (mod. 4), then there exists a pair of orthogonal latinsquares of order n.

Proof. Write n = 2dq, where q is odd and d ≥ 2. There exists a pair of orthogonallatin squares of order 2d (by Theorem 31), and there exists a pair of orthogonallatin squares of order q (by Theorem 30). Therefore, by Lemma 32, there existsa pair of orthogonal latin squares of order 2dq.

Sometimes, we are interested in finding a family of latin squares of order n,which are all orthogonal to one another. This motivates the following definition.

Definition. A family of mutually orthogonal latin squares is a family of latinsquares such that any two distinct latin squares in the family are orthogonal toone another.

What is the maximum possible number of squares we can have in a family ofmutually orthogonal latin squares? Recall that if n = pd, where p is prime andd ∈ N, and Fpd denotes a finite field of order pd, then the latin squares

L(f) : f ∈ Fpd \ {0}

are all orthogonal to one another, so they form a family of n−1 mutually orthog-onal latin squares of order n. It turns out that we cannot have a larger family ofmutually orthogonal latin squares:

4.3. UPPER BOUNDS ON THE NUMBER OF LATIN SQUARES 105

Theorem 34. Let n ∈ N, and let A be a family of mutually orthogonal latinsquares of order n. Then |A| ≤ n− 1.

Proof. Suppose for a contradiction that there exists a family A of at least nmutually orthogonal latin squares of order n. By removing some of the squaresif necessary, we may produce a family B of n mutually orthogonal latin squaresof order n. Let B = {B(1), . . . , B(n)}.

Since relabelling alphabets does not affect orthogonality, we may assumethat each B(i) has alphabet {1, 2, . . . , n}, and that the first row of each B(i)is 1, 2, . . . , n. Now, where can the symbol 1 go in the second row of the squareB(i)? It can never go in the (2, 1)-space, as the symbol 1 appears in the (1, 1)-space of B(i), and each B(i) is a latin square. So for each B(i), the symbol 1must appear in one of the spaces (2, 2), (2, 3), . . . or (2, n). But there are n ofthese squares B(i), and only n − 1 of these spaces, so there must be two latinsquares (B(r) and B(s), say) where the symbol 1 appears in the same space,(2, j) say. But then the ordered pair of symbols (1, 1) appears twice when we listthe ordered pairs (B(r)p,q, B(s)p,q) — once in the space (1, 1) (when p = q = 1),and once in the space (2, j) (when p = 2 and q = j). This contradicts the factthat B(r) and B(s) are orthogonal to one another, proving the theorem.

4.3 Upper bounds on the number of latin squares

How many latin squares of order n are there with alphabet {1, 2, . . . , n}? LetLn denote the set of all latin squares of order n, with alphabet {1, 2, . . . , n}.Estimating |Ln| accurately is an important unsolved problem in Combinatorics!The best known upper and lower bounds for |Ln| are quite far apart. We sawpreviously (Corollary 29) that

|Ln| ≥ n!(n− 1)!(n− 2)! . . . 2!1!.

In this section, we shall prove some simple upper bounds on |Ln|. First, we havethe following (very crude) upper bound.

Lemma 35. For any n ∈ N, |Ln| ≤ (n!)n.

Proof. Let L be a latin square with alphabet {1, 2, . . . , n}. Each row of L is apermutation of {1, 2, . . . , n} (thinking of a permutation as an ordering). If wechoose a latin square row-by-row, there are n! possibilities for the first row, andthen there are at most n! possibilities for each subsequent row. Therefore, thereare at most (n!)n possibilities altogether.

We can improve on this by noting that in fact, each row of a latin squarebelow the first row must be a derangement of the first row — meaning that therethere is no column where the two rows have the same number.

First of all, notice the following fact.


Lemma 36. The number of latin squares in Ln whose first row is

f(1), f(2), . . . , f(n)

is the same for all permutations f ∈ Sn.

Proof. The idea of the proof is just to relabel the alphabet of the latin squares.We define a bijection Φf from the set of all latin squares in Ln with first row1, 2, . . . , n to the set of all latin squares in Ln with first row f(1), f(2), . . . , f(n),as follows. If L is a latin square with first row 1, 2, . . . , n, define Φf (L) to be thelatin square produced from L by replacing the symbol i with the symbol f(i),wherever i occurs in L, for each i. It is clear that Φf is a bijection. Therefore, thenumber of latin squares with first row f(1), f(2), . . . , f(n) is equal to the numberof latin squares with first row 1, 2, . . . , n, for all f , proving the lemma.

Definition. A latin square of order n is said to be row-standard if its first rowis 1, 2, . . . , n. We let Ln denote the set of all row-standard latin squares of ordern.

It follows from Lemma 36 that

|Ln| = n!|Ln|.

Notice that if L ∈ Ln, then any row of L below row 1 must be a derangement of{1, 2, . . . , n}, so there are at most dn choices for each row below row 1, where dnis the number of derangments of {1, 2, . . . , n}. It follows that

|Ln| ≤ (dn)n−1. (4.1)

We would like to use the estimate dn = [n!/e] to deduce that

|Ln| ≤ (n!/e)n−1. (4.2)

(Recall that if y is a real number, [y] denotes the closest integer to y, roundeddown if y is of the form m+ 1/2 for some integer m.)

If n is odd, then we have dn < n!/e, so (4.2) follows directly from (4.1). Ifn is even, however, we have dn > n!/e, so we cannot just apply (4.1). Instead,note that the number of choices for row 2 is dn, but row 3 cannot equal row 2,so there are at most dn − 1 choices for row 3 (and for all subsequent rows). Thisgives us the slightly better upper bound

|Ln| ≤ dn(dn − 1)n−2.

If n is even, then we have dn − 1/2 < n!/e < dn, so

dn(dn − 1) < d2n − dn + 1/4 = (dn − 1/2)2 < (n!/e)2,

4.4. TRANSVERALS IN LATIN SQUARES 107

and therefore

|Ln| ≤ dn(dn − 1)n−2 = dn(dn − 1)(dn − 1)n−3 ≤ (n!/e)2(n!/e)n−3 = (n!/e)n−1

(provided n > 2), which is what we wanted. It follows that

|Ln| = n!|Ln| ≤ n!(n!/e)n−1 = (n!)n/en−1,

provided n > 2. (Note that this does not hold for n = 2, since (2!)2/e = 4/e < 2,but there are two latin squares of order 2 with alphabet {1, 2}.) We have provedthe following.

Theorem 37. If n > 2, then

|Ln| < (n!)n/en−1.

In fact, it is known that

|Ln| ≤ (n!)n/en2(1−εn),

where εn is a function of n which tends to zero as n tends to infinity, but theproof of this is slightly beyond the scope of this course.

It is a major open problem in combinatorics to find the ‘asymptotic behaviour’of |Ln|’ — i.e., to find a function f(n) such that

|Ln|/f(n)→ 1 as n→∞.

4.4 Transverals in Latin Squares

We conclude the chapter on latin squares by a discussion of the most well-knownunsolved problem in the area.

Definition. Let L be a latin square of order n. A transversal of L is a set of nentries of L such that there is exactly one entry from each row and exactly oneentry from each column, and each symbol occurs in exactly one of the entries.

For example, the bold entries form transversals in the two latin squares below:

1 2 32 3 13 1 2

1 2 3 42 1 4 33 4 1 24 3 2 1


Conjecture (Ryser’s Conjecture, 1967). Every latin square of odd order has atransversal.

This is perhaps the best-known unsolved problem on latin squares. It is agood example of a problem in mathematics which is very simple to state (indeed,it can be explained to someone without a formal mathematical training), butwhich has baffled mathematicians for a very long time!

The hypothesis that n is odd is necessary — indeed, for every even n thereexists a latin square of order n which has no transversal:

Exercise 9. Prove that if n is even, then the addition table of Zn has no transver-sal.

Notice that if L is a latin square which has a latin square orthogonal to it,then L has a transversal. To see this, suppose that L is a latin square of ordern, and that M is a latin square orthogonal to L. Then for any symbol i in thealphabet of M , the set of entries where M has symbol i is a transversal of L. Soin fact, the n2 entries of L can be partitioned into n disjoint transversals!

We proved that whenever n is not congruent to 2 (modulo 4), there exists apair of orthogonal latin squares of order n. It follows that for any odd n, thereexists a latin square of order n that has a transversal. However, this does notprove that every latin square of that order has a transversal!

MTH 6109: Combinatorics - QMUL Mathsdellis/MTH6109/Combinatorics... · 2012-12-14 · To state the...

Documents

Transcript of MTH 6109: Combinatorics - QMUL Mathsdellis/MTH6109/Combinatorics... · 2012-12-14 · To state the...