Richard P. Brent ANUbrent/pd/number_theoretic_algs.pdf · Examples P(x) = 5+2x3 −9x7 is a...

Number-theoretic Algorithms ∗

Richard P. BrentANU

∗Copyright c© 2011, R. P. Brent. comp4600, 2011

1. Polynomials and integers

Reference: CLRS, Chapter 30.We first consider algorithms for integer andpolynomial arithmetic, particularlymultiplication. Let us formally define what wemean by a “polynomial”.

Polynomials over a ring

Let R be a ring. With a symbol x /∈ R we formthe expressions

P (x) =∑

ν

pνxν

in which the sum is taken over a finite numberof different integers ν ≥ 0, and where the“coefficients” pν belong to the ring R. Suchexpressions are called “polynomials” or moreprecisely “polynomials in x over R”; the symbolx is called an indeterminate.

We regard two polynomials as equal if theydiffer only by zero coefficients. For example,7x + 0x2 = 0 + 7x = 7x.

2

The ring R[x]

The set of polynomials in an indeterminate xover a ring R is written as R[x]. Addition andmultiplication in R[x] are defined in the naturalway. With this definition, R[x] forms a ring.

In applications of interest to us, R is often afield, e.g. the field Q of rationals, the field R ofreal numbers, the field C of complex numbers,or a finite (Galois) field GF(p) = Z/pZ, where pis a prime.

We are sometimes interested in polynomialsover rings which are not fields, e.g. the ring Z ofintegers, or the ring Z/mZ of integers modulom, where m is a composite number. In mostcases the rings are commutative (an exception isthe ring of n× n matrices over a field, n > 1).

The degree of P (x) ∈ R[x] is

deg(P ) = maxν | pν 6= 0 ∪ −∞ .

Note: −∞ is included in case P = 0.

3

Examples

P (x) = 5 + 2x3 − 9x7 is a polynomial over thering of integers, and deg(P ) = 7.

P (x) = 0 is a polynomial over whatever ringyou choose, and deg(P ) = −∞.

P (x) = πx99 is a polynomial over R.

P (y) = 1 + 57y2 is a polynomial over Q.

P (z) = (1 +√

2)− (7− 5√

2)z is a polynomialover Q(

√2), a finite extension of Q.

If P, Q ∈ R[x] then

deg(P + Q) ≤ max(deg(P ), deg(Q))

anddeg(PQ) = deg(P ) + deg(Q) .

This relation motivates our definitiondeg(0) = −∞ (as in Knuth but not in CLRS).

4

Polynomials in several variables

If x, y are indeterminates we can considerpolynomials in y whose coefficients arepolynomials in x, e.g.

P (x)(y) = P (x, y) = (5 + 3x) + (7 + 2x2)y3

is a polynomial in y whose coefficients arepolynomials in x. In this case P ∈ R[x][y].

We usually write P (x)(y) as

P (x, y) =∑

λ,ν

pλ,νxλyν .

There are several possible definitions of thedegree of a multivariate polynomial. Forexample, we could define

deg(P (x, y)) = maxλ + ν | pλ,ν 6= 0 ∪ −∞ .

5

Interpretation as functions

We often interpret a polynomial P (x) ∈ R(x) asa function f : R→ R.

For example, the Chebyshev polynomials Tn(x)are polynomials of degree n over R or C,defined by

T0(x) = 1, T1(x) = x,

and

Tn+1(x) = 2xTn(x)− Tn−1(x) for n ≥ 1.

We can also regard Tn(x) as a function whichsatisfies the equation

Tn(cos θ) = cos nθ for θ ∈ C.

6

Formal power series

If we have an infinite sequence (a0, a1, . . .) andan indeterminate x then we can define a formalpower series A(x) by

A(x) =∑

ν≥0

aνxν .

The coefficients aν are assumed to lie in a ringR which may in particular cases be a field suchas R or C.

We can define addition and multiplication ofpower series in the obvious way: ifC(x) = A(x) + B(x) then

cν = aν + bν

and if C(x) = A(x)B(x) then

cν =∑

0≤λ≤ν

aλbν−λ .

With these definitions the formal power seriesover R form a ring.

7

Definition of ord

Analogous to the degree of a polynomial, it isuseful to define

ord(A) = minν | aν 6= 0 ∪ +∞ ,

where the +∞ is included in case A = 0.

Exercise:

ord(A + B) ≥ min(ord(A), ord(B))

andord(AB) = ord(A) + ord(B) .

Exercise: If we consider power series over afield F , then A(x) has a multiplicative inverse(i.e. B(x) such that A(x)B(x) = 1) ifford(A) = 0.

Remark: If in the definition we allow a finitenumber of nonzero coefficients aν with ν < 0,then we get Laurent series. The Laurent seriesover a field F form a field.

8

Convergence

We have defined power series quite formally, sono question of convergence or divergence arises.For example, the power series

A(x) =∑

ν≥0

22ν

xν

is a perfectly well-defined formal power series.We can think of it as a generating functionwhich “generates” the coefficients 22ν

.

However, if we want to regard a power seriesover a field F as a function then questions ofconvergence arise (and this does not alwaysmake sense, e.g. if F is a finite field).

In this course power series will only be used asgenerating functions, so we can ignore questionsof convergence.

9

Truncated power series

If A(x) and B(x) are two power series over thesame ring R, we write

A(x) = B(x) mod xn

ifford(A(x)−B(x)) ≥ n .

In other words, iff aν = bν for 0 ≤ ν < n.

If P (x) is a polynomial then we can regardP (x) as a power series with only a finitenumber of nonzero terms.

If A(x) is a power series and n ≥ 0 then clearlythere is a unique polynomial P (x) such thatdeg(P ) < n and A(x) = P (x) mod xn.

Proof: define

pν =

aν if 0 ≤ ν < n0 otherwise

10

Representation of polynomials

A polynomial P (x) of degree n can berepresented as an array A[0 .. n] provided thebase type of the array can represent thecoefficients of P (x). In other words, we requireA[ν] to represent pν .

Similarly, a multivariate polynomial can berepresented as a multidimensional array.

Sparse polynomials

A polynomial is said to be sparse if “most” ofthe coefficients a0, . . . , adeg(A) are zero (and

similarly for multivariate polynomials). Weshall not attempt to define what “most” meansbut it typically means “at least 90%”.

In order to save storage (and arithmetic) it maybe desirable to store sparse polynomials aslinked lists, so only the nonzero coefficients needto be stored.

11

Multiple-precision integers

A (large) nonnegative integer N < βt can berepresented as

N =t−1∑

ν=0

aνβν ,

where β > 1 is the base or radix, the aν are“base β digits” and (usually) satisfy 0 ≤ aν < β,and t is the number of digits. (For signedintegers we can use a “sign and magnitude”representation, i.e. N = s|N |, where s = ±1.)

Clearly there is a close correspondence betweenthe integer N represented as above and thepolynomial

P (x) =t−1∑

ν=0

aνxν .

Note that N = P (β).

Because of this correspondence, manyalgorithms for operating on large integers areclosely related to algorithms for operating onpolynomials.

12

Other operations on polynomials andpower series

There are some operations on polynomialswhich have no analogue for integers,e.g. differentiation, composition, reversion.

The formal derivative P ′(x) of a polynomial orpower series P (x) =

∑pνxν is defined by

P ′(x) =∑

ν>0

νpνxν−1 .

For a polynomial P (x) over a field ofcharacteristic zero (e.g. Q, R or C), we candefine a formal integral by

∫ x

0P (t) dt =

∑

ν≥0

pνxν+1

ν + 1.

13

Composition and reversion

The composition of two power series P (x) andQ(x), where ord(Q(x)) > 0, is defined to be thepower series C(x), where

C(x) = P (Q(x)) =∑

ν≥0

pνQ(x)ν .

Note that ord(Q(x)n) ≥ n, so each coefficient cn

of C(x) is defined by a finite sum involvingp0, . . . , pn and q1, . . . , qn; thus no questions ofconvergence arise.

If P (x) and Q(x) are power series,ord(P (x)) = 1, ord(Q(x)) = 1, and

P (Q(x)) = Q(P (x)) = x ,

then we say that Q(x) is the reversion of P (x),and we write Q(x) = P (x)(−1).

For example, ifP (x) = x/(1− x) = x + x2 + x3 + · · ·, andQ(x) = x/(1 + x) = x− x2 + x3 − · · ·, then it iseasy to verify that Q(x) = P (x)(−1).

14

Arithmetic on polynomials

Suppose we are given two polynomials A(x) andB(x) over R, of degree (at most) n, and want tocompute the product C(x) = A(x)B(x). Fromthe definition,

ck =∑

i+j=k

aibj ,

for 0 ≤ k ≤ 2n, where we assume 0 ≤ i ≤ n,0 ≤ j ≤ n (sometimes it is convenient to defineai = 0 if i > n, etc).

The number of terms in the sum for ck is k + 1if 0 ≤ k ≤ n, and 2n + 1− k if n < k ≤ 2n(check this !), so the total number ofmultiplications involved is n2 + O(n). There is asimilar number of additions. Thus, assumingthe time to perform a multiplication or additionin R is O(1), the time to multiply A(x) andB(x) is O(n2).

Exercise: the time to add A(x) and B(x) isO(n).

15

Karatsuba’s algorithm

It turns out that the obvious O(n2) result is notthe best possible. Karatsuba discovered thefollowing (not quite, but his idea was similar).

Suppose we can multiply polynomials of degreen− 1 in time M(n), and we are given twopolynomials A(x) and B(x) of degree 2n− 1.We write

A(x) = A0(x) + A1(x)xn ,

B(x) = B0(x) + B1(x)xn ,

where deg(Aj(x)) < n, deg(Bj(x)) < n forj = 0, 1.

Now

A(x)B(x) = A0(x)B0(x)

+ (A0(x)B1(x) + A1(x)B0(x))xn

+ A1(x)B1(x)x2n .

16

Karatsuba continued

Suppose we compute

P1(x) = A0(x)B0(x) ,

P2(x) = A1(x)B1(x) ,

and

P3(x) = (A0(x) + A1(x))(B0(x) + B1(x)) .

Then we easily see that

A0(x)B1(x)+A1(x)B0(x) = P3(x)−P1(x)−P2(x)

so

A(x)B(x) = P1(x)

+ (P3(x)− P1(x)− P2(x))xn

+ P2(x)x2n .

17

Complexity analysis

Using Karatsuba’s “divide and conquer” idea,we have reduced the problem to threemultiplications of polynomials of degree n− 1,plus some additions/subtractions (which taketime O(n)) and multiplications by powers of x(which do not require any arithmetic, justshifting array elements). Thus

M(2n) ≤ 3M(n) + O(n) .

We can easily deduce that

M(2k) = O(3k) .

When multiplying polynomials we can always“round up” the degree to the next integer of theform 2k − 1. Thus

M(n) = O(nα) ,

where α = log2(3) < 1.6 .

18

Generalisation

Karatsuba’s idea can be generalised to give

M((r + 1)n) ≤ (2r + 1)M(n) + O(n)

for any fixed integer r ≥ 1 (see Knuth §4.3.3).Thus

M(n) = O(nα) ,

where

α = logr+1(2r + 1) =log(2r + 1)

log(r + 1).

By choosing r sufficiently large, we can make αarbitrarily close to 1. Thus

M(n) = O(n1+ε)

for any ε > 0. (The constant hidden in the “O”notation depends on ε.) We omit the detailsbecause methods based on the fast Fouriertransform (FFT) are better. If the FFT isapplicable we can obtain a sharper bound

M(n) = O(n log n) .

19

Fast integer multiplication

Karatsuba’s idea also applies to the problem ofmultiplying integers represented in base β withn digits (in fact it was originally presented forthis problem). After applying the divide andconquer step we may have to normalise thedigits (i.e. reduce them to the range [0, β − 1)which involves “carry propagation”.

Algebraically, we use

aν+1βν+1 +aνβ

ν = (aν+1 +1)βν+1 +(aν−β)βν .

As for the multiplication of power series, we canuse a generalisation of Karatsuba’s algorithm toshow that n-digit numbers can be multiplied intime O(n1+ε) for any ε > 0.

20

2. Polynomial and integerdivision

We now consider algorithms for findingreciprocals, performing division, and relatedoperations for polynomials, power series andlarge integers. It turns out that Newton’smethod is helpful for reducing these operationsto multiplication.

We shall show that in a certain sense theoperations of multiplication, squaring, andforming reciprocals all have the samecomplexity (similar to what we already saw formatrix operations).

21

Newton’s method

Newton’s method is a well-known method forapproximating zeros of a function by successivelinear approximation – the iteration is

xk+1 = xk −f(xk)

f ′(xk)

and under suitable conditions this converges toa zero ζ of f .

Newton’s method can be used to approximatethe square root or reciprocal of a real number.For example, to approximate

√a for a > 0 we

take f(x) = x2 − a, so f ′(x) = 2x and theNewton iteration is

xk+1 = xk −x2

k − a

2xk

which can be written as

xk+1 =1

2

(

xk +a

xk

)

.

22

Newton’s method for reciprocal

Similarly, to approximate the reciprocal of areal number a 6= 0 we take

f(x) = a− 1

x,

so

f ′(x) =1

x2

and the Newton iteration is

xk+1 = xk −(

a− 1

xk

)

x2k

orxk+1 = xk(2− axk) .

Note that this iteration only requiresmultiplication and subtraction. Thus, it can beused to approximate a reciprocal without doingany divisions !

23

Rate of convergence

Let us consider the error εk defined by

axk = 1− εk .

Then1− εk+1 = (1− εk)(1 + εk) ,

soεk+1 = ε2

k .

We see that the convergence is quadraticprovided

|ε0| < 1 .

In general, Newton’s method convergesquadratically to a simple zero of a C(2) functionf provided the initial approximation issufficiently good – this is the Newton-Kantorovich theorem.

24

Application to power series

The reason for considering Newton’s methodhere is that it also applies to power series. Infact, using k steps of Newton’s method, we cancompute (exactly !) the first 2k coefficients inthe reciprocal of a power series.

Let A(x) = a0 + a1x + . . . be a power series overa field F , with a0 6= 0 (so ord(A) = 0). We takeb0 = 1/a0 and take B0(x) = b0 to start theNewton iteration. Then, by analogy with theiteration for reciprocal considered previously, wedefine

Bk+1(x) = Bk(x)(2−A(x)Bk(x)) mod x2k+1

.

Why the mod x2k+1

? Because terms past thispoint are “garbage” anyway (as we shall seewhen we consider the error in Bk(x)) so there isno point in wasting time computing them.

25

The error term

If Ek(x) = 1−A(x)Bk(x) then we have

E0(x) = 0 mod x

andEk+1(x) = Ek(x)2 mod x2k+1

,

so by induction on k,

Ek(x) = 0 mod x2k

.

Expressed another way,

ord(Ek(x)) ≥ 2k .

Thus, after k iterations, we have 1/A(x) with

“error” O(x2k

).

The time bound

The time required to compute n = 2k terms inthe reciprocal of A(x) is

O(M(2k) + M(2k−1) + · · ·) + O(n)

and under plausible assumptions about thefunction M(n) this is O(M(n)) overall.

26

Division of power series

If A(x) and B(x) are two power series over afield, and ord(B(x)) = 0, then we can computeA(x)/B(x) mod xn by first computing

C(x) = 1/B(x) mod xn

and thenA(x)C(x) mod xn .

27

Division of polynomials

If A(x) and B(x) are two polynomials of degree(at most) n over a field, and B(x) 6= 0, then wecan find polynomials Q(x) and R(x) such that

A(x) = Q(x)B(x) + R(x) ,

satisfying the condition

deg(R(x)) < deg(B(x)) .

It follows that

deg(Q(x)) = deg(A(x))− deg(B(x)) .

There is a straightforward algorithm (Knuth,§4.6.1, Algorithm D) which takes time O(n2).In fact, this is very similar to the “longdivision” process which you (may have) learnedat school for dividing one integer by another.

28

Reduction to triangular system

We are given A(x), B(x) and want to computeQ(x) and R(x) such that deg(R) < deg(B) and

A(x) = Q(x)B(x) + R(x) ,

If deg(B) > deg(A) we can take Q = 0 andR = A, so suppose that

deg(B) = m ≤ deg(A) = n .

Let k = n−m = deg(Q). Equating coefficientsof xn, . . . , xm we get a triangular system ofequations for the coefficients of Q(x):

bm

bm−1 bm

.... . .

. . .

bm−k . . . bm−1 bm

qk

qk−1

...q0

=

an

an−1

...am

(Here we interpret bj = 0 if j < 0.)

Solving the triangular system to compute Q(x)takes O(k2) operations, and then we cancompute R(x) from R(x) = A(x)−Q(x)B(x).

29

Using power series

We can do better by reducing division ofpolynomials to division of power series, andusing the fact that this can be done quickly byNewton’s method (for reciprocals) andKaratsuba’s method (or the FFT) formultiplication.

Let y = 1/x so (with k, m, n as before) we have

ynA(1/y) = an + an−1y + · · ·+ a0yn ,

ymB(1/y) = bm + bm−1y + · · ·+ b0ym ,

ykQ(1/y) = qk + qk−1y + · · ·+ q0yk .

Thus, if we regard ynA(1/y) and ymB(1/y) aspower series in y we can compute ykQ(1/y) bypower series division:

ykQ(1/y) =ynA(1/y)

ymB(1/y)mod yk+1

and then find R by subtraction as before.

30

Reducibility and equivalence

We say that problem A is reducible to problemB if an algorithm for the solution of B can beused to solve A. Some conditions have to beimposed on the size of the problems and theoverhead in making the reduction, but I shallnot be specific here. If A is reducible to B andB is reducible to A then we say that A and Bare equivalent.

Suppose problem A can be solved in time A(n)(for inputs of size n), and problem B can besolved in time B(n). We say that problems Aand B are computationally equivalent if

A(n) = Θ(B(n)) .

Recall that this means A(n) = O(B(n)) andB(n) = O(A(n)), so we can say that, apart fromconstant factors, the time required to solveproblems A and B is the same.

31

Notation

As a shorthand notation which makes clear thatwe are talking about an equivalence relation, wewrite

A(n) ≈ B(n)

if A(n) = Θ(B(n)).

Warning: this is not a standard notation.Other symbols which we might use include

∼, ≃, ≍, ∼=, ⇔

Remark: we already considered equivalence ofvarious matrix algorithms.

32

Definition of M, S, R and D

Let F be a field (with characteristic 6= 2). Weconsider power series P (x), Q(x) ∈ F [x]. We areinterested in the time required to performoperations such as multiplication, squaring, anddivision mod xn on such power series.

Let M(n) be the time required to formP (x)Q(x) mod xn.

Let S(n) be the time required to formP (x)2 mod xn (this is the case P = Q of theabove).

Let R(n) be the time required to form1/Q(x) mod xn for ord(Q) = 0.

Let D(n) be the time required to formP (x)/Q(x) mod xn for ord(Q) = 0.

33

A regularity assumption

Property B: We say that f(n) satisfiesProperty B if f(n) is positive, monotonicnon-decreasing, i.e.

m ≥ n ≥ 1⇒ f(m) ≥ f(n) > 0 ,

and there exist constants α, β ∈ (0, 1) such that

f(⌈αn⌉) ≤ βf(n) (1)

for all sufficiently large n.

Condition (1) holds iff(n) ∼ na(log n)b(log log n)c for some constantsa > 0, b and c, so it is not very restrictive.

We shall assume that M(n) satisfies Property B.

34

Plausibility argument

The assumption that M(n) is positive(for n ≥ 1) and non-decreasing is extremelyplausible.

Suppose P0, P1, Q0 and Q1 are polynomials ofdegree n− 1. We can compute

(P0 + x2nP1)(Q0 + x2nQ1)

in time M(6n), but from the result we can “pickout” P0Q0 and P1Q1. Thus, it is plausible that

2M(n) ≤M(6n) .

(Why isn’t this a rigorous proof ?)

Replacing n by n/6, we get the condition ofProperty B with α = 1/6, β = 1/2.

35

A simpler assumption

Property W: We way that f(n) satisfiesProperty W if f(n) is positive, monotonicnon-decreasing, and

f(2n) ≥ 2f(n)

for all sufficiently large n.

We could assume that M(n) satisfiesProperty W. This is simpler (though perhapsslightly less plausible) than assumingProperty B.

Exercise: Property W ⇒ Property B.

36

A useful lemma

Lemma: If f(n) satisfies Property B and0 < c < 1, then

∑

k≥0, ckn≥1

f(⌈ckn⌉) = O(f(n)) .

Proof: Let α and β be as in the definition ofProperty B. Since 0 < c < 1, there is a positiveK such that

cK ≤ α .

Thus∑

k≥0, ckn≥1

f(⌈ckn⌉)

≤ K∑

j

f(⌈αjn⌉) + O(1)

≤ K∑

j

βjf(n) + O(1)

≤(

K

1− β

)

f(n) + O(1)

= O(f(n)) .

⊓⊔

37

Properties of M

By the Lemma and our assumption that M(n)satisfies Property B, we have

⌊lg n⌋∑

k=0

M(⌈n/2k⌉) = O(M(n)) .

By definition of M(n), it is clear that

M(n) = Ω(n) .

Using the identity

(P0 + xnP1 + x2nP2 + x3nP3)

× (Q0 + xnQ1 + x2nQ2 + x3nQ3)

= P0Q0 + xn(P0Q1 + P1Q0) + . . . mod x4n

we have

M(4n) ≤ 16M(2n) + O(n) = O(M(2n)) .

Thus, for any fixed c > 0,

M(cn) ≈M(n) .

38

Equivalence of some operations

Theorem: Under the assumption that M(n)satisfies Property B,

M(n) ≈ S(n) ≈ R(n) ≈ D(n) .

Proof: The proof has three main steps.1. We have

1

1− xnP (x)− 1− xnP (x) = x2nP (x)2 + O(x3n)

soS(n) ≤ R(3n) + O(n) .

2. Using the Newton iteration

Qk(x) = Qk−1(x)(2− P (x)Qk−1(x)) mod x2k

we can compute 1/P (x) mod x2k

in time

2M(2k) + 2M(2k−1) + · · ·+ O(n) ,

and by the Lemma above this is O(M(2k)). Wecan choose k such that 2k−1 ≤ n ≤ 2k, and

M(2k) ≤M(2n) = O(M(n)) .

39

Thus, we have

R(n) = O(M(n)) .

This implies that R(3n) = O(M(3n)), butM(3n) ≈M(n), so

R(3n) = O(M(n)) .

3. Since 4PQ = (P + Q)2 − (P −Q)2, we have

M(n) ≤ 2S(n) + O(n) .

(Note: the assumption that char(F ) 6= 2 isessential here.)

From parts 1–3 we have S(n) = O(R(3n)),R(3n) = O(M(n)), and M(n) = O(S(n)). Thus

S(n) ≈ R(3n) ≈M(n) .

Also, because M(n) ≈M(3n), we haveR(3n) ≈M(3n), so R(n) ≈M(n).

To complete the proof, note that

R(n) ≤ D(n) ≤ R(n) + M(n) ,

so D(n) ≈ R(n). ⊓⊔

40

Other equivalences

Our results have been expressed in terms ofoperations on power series mod xn; they canalso be expressed in terms of operations onpolynomials of degree n.

Multiple-precision operations

Similar results hold for arithmetic operations onn-digit numbers: the operations ofmultiplication, squaring, finding reciprocals,and division are all computationally equivalent(in the sense defined above, i.e. ignoringconstant factors). The proofs are similar tothose for power series.

Exercise: Consider other operations such ascomputing square roots, logarithms,exponentials, and powers (where they arewell-defined), for both power series and n-digitnumbers.

41

3. Finite fields & modulararithmetic

We now consider some basic properties of finitefields, the use of modular arithmetic, theChinese remainder theorem, and someapplications. Reference: CLRS, Chapter 31.

Fields

Recall that a field F is a set with twooperations, conventionally written as “+”(addition) and “×” (multiplication), satisfyingthe following properties:

1. (F , +) is an Abelian (i.e. commutative)group (called the “additive group” of thefield). We write the zero element of thisgroup as 0.

2. (F\0,×) is an Abelian group (called the“multiplicative group” of the field).

3. The distributive law holds, i.e.(a + b)× c = (a× c) + (b× c).

42

Comments

Property 3 means that (F , +,×) is a ring.However, in general a ring does not satisfyproperty 2.

For definitions of Abelian groups etc, see anygood book on modern algebra, e.g. van derWaerden, Modern Algebra (first published in1931 – the most recent edition is called simplyAlgebra).

Familiar examples

You are probably familiar with three fields: thefields Q, R, C of rational, real and complexnumbers (respectively). These are infinitefields – Q is countable, R and C areuncountable.

Other examples are finite extension fields suchas Q[

√2], and the field of algebraic numbers

(roots of polynomials over Q).

43

Notation (nothing surprising)

We may omit the symbol “×” and write ab fora× b. Also, we assume that “×” has higherprecedence than “+”, so we can write ac + bcinstead of (a× c) + (b× c).

The identity element of the additive group iswritten as 0, so a + 0 = 0 + a = a for all a ∈ F .The additive inverse of a is written as −a.

The identity element of the multiplicative groupis written as 1, so a× 1 = 1× a = a for alla ∈ F . The multiplicative inverse of a 6= 0 iswritten as a−1 or 1/a, and b× a−1 is written asb/a, etc.

If n is a positive integer, we write na for

n︷︸︸︷

a + a + · · ·+ a ,

0a for 0, and (−n)a for −(na), etc.

Similarly, we write an for

n︷︸︸︷

a× a× · · · × a ,

and if a 6= 0 we write a−n for(

a−1)n

= (an)−1.

44

Finite fields

We are interested in finite fields, i.e. fields suchthat |F| is finite. One reason is that it is easyto represent elements of such fields in (finite)computer words, without any approximation ortruncation error.

|F| is called the order of F . (Another notationis #F .) There is a well-known Theorem (whichis not too difficult to prove, but it would take ustoo far afield), which characterises the possibleorders of finite fields.

Theorem. Let F be a field with finite order q.Then q is a prime power, i.e. q = pn for someprime p and nonnegative integer n. Moreover,for any prime power q, there exists a finite fieldF with order q. ⊓⊔There is essentially only one field of any givenprime power order q: any two such fields are thesame up to isomorphism. The field is called theGalois field of order q and denoted by GF(q).

45

GF(p)

For simplicity, we are only going to considerGalois fields of prime order, i.e. the case n = 1,q = p. In this case GF(p) is essentially the sameas the set of residues 0, 1, . . . , p− 1 with thenatural operations of addition andmultiplication mod p, usually written as Z/pZ.

Warning. GF(pn) is not isomorphic to Z/pnZif n > 1. In fact, it is easy to see that Z/pnZ isnot a field in this case – it is only a ring.

Examples

The smallest finite field is GF(2), which consistsof 0, 1 with addition and multiplication mod 2(so addition is “exclusive or” and multiplicationis “and” if we regard 0 as “false” and “1” as“true”). Although this field is almost trivial,there are many applications of polynomials overGF(2).

A less trivial example is GF(5).Exercise: Write out the addition andmultiplication tables for GF(5).

46

The characteristic of a field

If there is a positive integer p such that pa = 0for all a ∈ F , then the least such p is called thecharacteristic of F , denoted char(F).Otherwise, we say that the characteristic iszero.

The Galois field GF(pn) has characteristic p.

Infinite fields can have characteristic zero ornonzero, but the familiar examples of Q, R, Call have characteristic zero.

Exercise: If m is not a multiple of char(F),and y ∈ F , then there exists a unique x ∈ Fsuch that

mx = y .

Naturally we write x = y/m in this case.Hint: Consider m× 1 and its inverse.

47

Fermat’s (little) theorem

Fermat has many “theorems”. These include his“big” theorem (proved by Wiles et al) that

an + bn = cn

has no solutions in positive integers a, b, c forn > 2. (Fermat claimed to have a proof, but itwas too large to fit in the margin of the book hewas annotating.)

Of more interest to us is Fermat’s “little”theorem:

Theorem. If p is a prime and 0 < a < p, then

ap−1 = 1 mod p .

Remark. We can conclude that ap = a mod pwithout any restriction on a.

The converse of Fermat’s little theorem is nottrue – the fact that ap−1 = 1 mod p for some a(or even all a which are relatively prime to p)does not imply that p is prime.Example (Carmichael number)Try p = 1729 = 7 · 13 · 19 and any a which is notdivisible by 7, 13 or 19.

48

Structure of GF(p)

The multiplicative group of GF(p) has orderp− 1 and there may be several Abelian groupswith this order. For example, if p = 5, there aretwo non-isomorphic groups of order p− 1 = 4,Z4 and Z2 × Z2. (Here Z4 = Z/4Z etc.)

Gauss proved that the multiplicative group ofGF(p) is cyclic, i.e. it is generated by a singleelement. (The same is true for GF(q).)Considering arithmetic modulo p, this meansthat there is a primitive root a such that

for 1 ≤ m < p− 1, am 6= 1 mod p

but (as we know from Fermat’s little theorem)

ap−1 = 1 mod p .

The existence proof is not constructive – it doesnot give an algorithm to find a primitive root.We can do this by trial and error, or bysystematically checking a = 2, 3, . . ., but it isnot clear how long this will take. (How large isthe smallest primitive root mod p ?)

49

Testing a primitive root

To test if a is a primitive root modulo p, it isnot necessary to check am mod p for allm ∈ [1, p− 2].

Exercise: Show that a is a primitive rootmodulo p iff ap−1 = 1 mod p anda(p−1)/r 6= 1 mod p for all prime factors r ofp− 1.

A simple primality test

Suppose we want to test p for primality, but itis too large to test by trial division. If we canfind an integer a satisfying the aboveconditions, then p must be prime.

This is not always a practical test because itrequires that we factorise p−1, but it does allowus to give “succinct certificates” of primality(i.e. short proofs which are easily checked).

50

Computing powers mod p

A problem which often arises is: given p,a ∈ [0, p− 1], and n > 0, computeb = an mod p. In applications both p and nmay be large (e.g. 1024-bit numbers) so weneed an efficient algorithm.

A good general rule, when computing mod p, isto perform a reduction to the range [0, p− 1)after each multiplication. Otherwise theintermediate results may grow extremely large.

For example, when computing 5p−1 mod pwhere p is a large prime, we would not compute5p−1 and then find the remainder on divisionby p.

When computing b = an mod p for large n andp, we would not perform n− 1 multiplications.It is possible to obtain the result with onlyO(log n) multiplications, by making use of thebinary representation of n.

51

Binary powering algorithm

power(a, n, p)u← a;b← 1;while n > 0 do

beginif odd(n) then b← b× u mod p;u← u× u mod p;n← n div 2;end;

return b.

Exercise: Prove that power(a, n, p) returnan mod p (check exactly what happens if a = 0or n = 0). If the numbers a, n, p are at most 2t,show that the time required is O(t3), or less if afast t-bit multiplication algorithm is used.

52

Computing inverses mod p

Given a nonzero element a ∈ GF(p), we knowthat the inverse a−1 exists and is unique, buthow can we compute it efficiently ?

Solution 1. From Fermat’s little theorem,

a−1 = ap−2 mod p ,

and we have seen that the right hand side canbe computed efficiently.

Solution 2. Apply the extended Euclideanalgorithm to (a, p). This gives λ, µ such that

λa + µp = GCD(a, p)

but (because p is prime) GCD(a, p) = 1, so

λa = 1− µp = 1 mod p ,

and we see that λ = a−1.

Exercise: Compare the work required by the twomethods if p is an n-bit number and n is large. (Themore or less “obvious” algorithms require O(n3) andO(n2) operations resp., so we expect solution 2 to befaster than solution 1.)

Exercise: Show method 2 also works for composite

p, provided GCD(a, p) = 1.

53

Chinese remainder theorem (CRT)

To avoid working with large numbers, we canoften choose a set of moduli n1, n2, . . . , nk andperform the computation modulo each of themoduli separately. Provided the moduli arepairwise relatively prime, i.e. GCD(ni, nj) = 1for 1 ≤ i < j ≤ k, and the productn = n1n2 · · ·nk is sufficiently large, we may beable to reconstruct the answer from the separateresults. The following theorem, known as the“Chinese remainder theorem”, is helpful.

Theorem. Let ni be as above, mi = n/ni,di = m−1

i mod ni, and ci = midi. Supposeresidues ai are given. Then the set of equations

x = ai mod ni, i = 1, 2, . . . , k (2)

has a unique solution modulo n. The solution isgiven by

x = a1c1 + a2c2 + · · ·+ akck mod n . (3)

54

Example of CRT

Before proving the CRT, it may be helpful totry a small but not quite trivial example. Letn = 2 · 3 · 5 · 7 = 210 and x = 123. Thus

(n1, n2, n3, n4) = (2, 3, 5, 7) ,

(a1, a2, a3, a4) = (1, 0, 3, 4) ,

(m1, m2, m3, m4) = (105, 70, 42, 30) ,

(d1, d2, d3, d4) = (1, 1, 3, 4) ,

(c1, c2, c3, c4) = (105, 70, 126, 120) ,

and the Theorem gives

x = 1× 105 + 0× 70 + 3× 126 + 4× 120

= 963

= 123 mod 210

which is correct !

Exercise: Check my arithmetic and/orconstruct your own examples.

55

Outline proof of the CRT

A proof is given in CLRS (Thm. 31.27), so wegive an outline here. First, observe thatmj = 0 mod ni if i 6= j; it follows thatcj = 0 mod ni if i 6= j, so if x satisfies (3) then

x mod ni = aici mod ni .

However,

ci mod ni = mi(m−1i ) mod ni = 1 mod ni ,

sox = ai mod ni ,

i.e. (2) holds. This proves the existence of asolution satisfying (2). Note the analogy withLagrange interpolation.

To prove uniqueness modulo n of the solution,suppose there are two solutions, x′ and x′′.Considering x = x′ − x′′, we can restrictattention to the case ai = 0. However, in thiscase ni | x for each i, so LCM(n1, . . . , nk) | x,i.e. n | x, so x is unique modulo n. ⊓⊔

56

Preconditioning

Suppose we want to apply the CRT many timeswith the same set of moduli (n1, . . . , nk). Weonly have to compute the constants (c1, . . . , ck)once. Thus, we save work every time we solvean equation of the form (2), except for the firsttime.

The computation (and saving for future use) of(c1, . . . , ck) is an example of preconditioning.

Generalisation to polynomials

The CRT can be generalised to handle moduliwhich are relatively prime polynomials ratherthan integers (which can be regarded aspolynomials of degree 0). See Aho, Hopcroftand Ullman, Theorem 8.13.

57

FFT over a finite field

To avoid problems with rounding errors whenworking with the “classical” FFT, whichinvolves n-th roots of unity and irrationalnumbers (except in the trivial cases n = 1, 2, 4),we may choose to work over one or more finitefields GF(p). (If more than one, we may be ableto use the CRT to obtain the final answer.)

In order to apply the FFT over n points, weneed n-th roots of unity. In the field GF(p), themultiplicative group G is cyclic with orderp− 1. Suppose w is a primitive n-th root ofunity, i.e. wn = 1 and n is the least positiveinteger for which this holds. The subgroupH = <w> generated by w consists of the nelements w, w2, w3, . . . , wn−1, wn.A well-known theorem in finite group theory(Lagrange’s theorem) implies that |H| is adivisor of |G|, i.e. n | (p− 1). Thus, the onlypossible n-th roots of unity are for n a divisor ofp− 1. Since G is cyclic, any divisor of p− 1 is infact possible.

58

Primes in arithmetic progressions

Suppose we want to apply a radix-2 FFTalgorithm such as the Cooley-Tukey algorithm.Then n = 2k for some k (the depth of recursion)and we need a prime p of the form λ2k + 1.

Fortunately, by a theorem of Dirichlet(extending the “prime number theorem”), thereare an infinite number of primes in eacharithmetic progression αm + β | m ≥ 0,provided GCD(α, β) = 1. (This condition isobviously necessary.)

Applying Dirichlet’s theorem with α = 2k,β = 1, we see that there are an infinite numberof primes we can use. (The question of howlarge the smallest such prime can be isinteresting, but in practice it is not a problem.)

59

4. Number-theoretic algorithms

In this lecture (maybe two) we consideralgorithms for testing primality, integerfactorisation, and some applications such aspublic-key cryptograpy. Many of the bestalgorithms are Monte Carlo or Las Vegasalgorithms (CLRS §5.3).

Recommended reading: CLRS Ch. 31;Knuth, Vol. 2, §4.5.4;Motwani and Raghavan (1995), Ch. 14;Riesel, parts of Chs. 4–7.

60

Some number theory

Recall Fermat’s little theorem:

Theorem. If p is a prime and 0 < a < p, then

ap−1 = 1 mod p .

Suppose p is an odd prime. From Fermat’s littletheorem,

a(p−1)/2 = ±1 mod p ;

To determine the sign, we need to considerquadratic residues.

We say that a is a quadratic residue mod p ifthere is an x such that

a = x2 mod p ;

otherwise a is a quadratic nonresidue mod p.

If p is an odd prime then, because(+x)2 = (−x)2, we see that exactly half of thenumbers 1, 2, . . . , p− 1 are quadratic residuesmod p.

61

Euler’s criterion

The following result is useful for determining ifa number is a quadratic residue.

Theorem (Euler’s criterion). If p is an oddprime and 0 < a < p, then a is a quadraticresidue mod p iff

a(p−1)/2 = 1 mod p .

Proof. If a is a quadratic residue, saya = x2 mod p, then a(p−1)/2 = xp−1 = 1 mod pby Fermat’s little theorem.Conversely, suppose a is not a quadraticresidue. Let g be a primitive root mod p. Thusa = gk mod p for some k. If k is even thenx = gk/2 mod p satisfies x2 = a mod p,contradicting our assumption. Thus k is odd,say k = 2m + 1. Then

a(p−1)/2 = gm(p−1)+(p−1)/2 = g(p−1)/2 mod p .

Now g(p−1)/2 6= 1 mod p because g has orderp− 1. ⊓⊔

62

Finding a quadratic nonresidue

Using Euler’s criterion, we obtain a very simpleLas Vegas algorithm for finding a quadraticnonresidue mod p (where, as usual, p is an oddprime).

repeatchoose random a ∈ 1, 2, . . . , p− 1.until a(p−1)/2 = −1 mod p;

return a.

The probability of success at each step isexactly 1/2, so the expected number of timesthe loop is repeated is 2.

Note that we can not prove that the algorithmterminates. It is conceivable that it will keepchoosing quadratic residues and never find anonresidue. However, the probability ofchoosing k quadratic residues in a row is 2−k,and the probability that the algorithm neverterminates is zero. Thus, for practical purposes,we can have confidence that the algorithm willterminate.

63

Testing primality – Algorithm RM

Suppose we want an algorithm to determine if agiven odd positive integer n is prime. Thefollowing Monte Carlo algorithm is due toRabin, with improvements by Miller. Itsexpected run time is polynomial in log n.

1. Write n as 2kq + 1, where q is odd andk > 0.

2. Choose a random integerx ∈ 2, . . . , n− 1.

3. Compute y = xq mod n. This can be donewith O(log q) operations mod n, using thebinary representation of q.

4. If y = 1 then return “yes”.

5. For j = 1, 2, . . . , k do

if y = n− 1 then return “yes”

else if y = 1 then return “no”

else y ← y2 mod n.

6. Return “no”.

64

Explanation of Algorithm RM

A slight extension of Fermat’s little Theorem isuseful, because its converse is usually true.

If n = 2kq + 1 is an odd prime, then eitherxq = 1 mod n, or the sequence

(

x2jq mod n)

j=0,1,...,k

ends with 1, and the value just preceding thefirst appearance of 1 must be n− 1.

Proof: If y2 = 1 mod n then n|(y − 1)(y + 1).Since n is prime, n|(y − 1) or n|(y + 1).Thus y = ±1 mod n. ⊓⊔The extension gives a necessary (but notsufficient) condition for primality of n.Algorithm RM just checks if this condition issatisfied for a random choice of x, and returns“yes” if it is.

If the answer is “no” then we say that x is awitness to the compositeness of n. Fortunately,witnesses are common (there are at least 3n/4of them) so they are easy to find.

65

Reliability of Algorithm RM

Algorithm RM can not give false negatives(unless we make an arithmetic mistake), but itcan give false positives (i.e. “yes” when n iscomposite). However, the probability of a falsepositive is less than 1/4. Usually it is muchless – see Knuth, ex. 4.5.4.22. A weaker result,with 1/4 replaced by 1/2, is proved inCLRS §31.8, Theorem 31.38.

If we repeat the algorithm 10 times there is lessthan 1 in 106 chance of a false positive, and ifwe repeat 100 times the results should satisfyanyone but a pure mathematician.

Algorithm RM works well even if the input is aCarmichael number.

Use of randomness

Note that in our examples randomness wasintroduced into the algorithms. We did notmake any assumption about randomness of theinputs.

66

Summary of Algorithm RM

Given any ε > 0, we can check primality of anumber n in

O((log n)3 log(1/ε))

bit-operations, provided we are willing to accepta probability of error of at most ε.

By way of comparison, the best knowndeterministic algorithm takes

O((log n)6)

bit-operations, and is much more complicated.This is the AKS algorithm1, not mentioned inCLRS, with an improvement by Lenstra toreduce the exponent.

1Agrawal, Kayal and Saxena, Annals of Mathematics

160 (2004), 781–793.

67

Factorisation algorithms

The Rabin-Miller primality test is a MonteCarlo algorithm, because it can occasionallygive the wrong answer (claiming that acomposite number is prime).

There are several randomised algorithms forfactoring integers. Because it is easy to check ifa factorisation is correct (by multiplying thesupposed factors), we can easily convert thesealgorithms into Las Vegas algorithms – theywill never return the wrong answer, but thetime taken to determine a correct answer israndom. Examples are Pollard’s “rho” (ρ)method, Lenstra’s elliptic curve method (ECM),the multiple-polynomial quadratic sieve(MPQS), and the number field sieve (NFS).

We shall look at simplified versions of Pollard’srho method, Pollard’s “p− 1” method, ECM(very briefly) and MPQS.

68

Pollard rho

Suppose we want to find a prime factor q > 3 ofan odd composite integer N .

Take a random x0, and a polynomial P ∈ Z[x]of the form

P (x) = x2 + a ,

where a 6= 0,−2. Define a sequence (xj) by

xj = P (xj−1) mod N, j ≥ 1 ,

and the “doubled” sequence (yj) by

yj = x2j .

Note that we can compute (yj) using therecurrence

yj = P (P (yj−1)) mod N .

69

Simple version of rho

The simplest form of Pollard’s “rho” algorithmis

j ← 0;x← x0; y ← x0;repeat

j ← j + 1;x← P (x) mod N ;y ← P (P (y) mod N) mod N ;f ← GCD(x− y, N)until f > 1;

return f .

The algorithm may fail because it returnsf = N . However, this is unlikely. Usually, if q isthe smallest prime factor of N , the algorithmreturns f = q in O(

√q) steps.

To understand why, consider the sequence(x′

j = xj mod q). Note that

x′j = P (x′

j−1) mod q

because q|N and P (x) is a polynomial withinteger coefficients.

70

Pollard rho continued

The sequence (x′j) must repeat after at most q

steps, i.e. there is some period p ≤ q such that

x′n+p = x′

n

for all n ≥ t. The integer t is the length of thenonperiodic part of the sequence (possiblyt = 0).

By a “tail chasing” argument, there is ann = O(q) such that

x′2n = x′

n

but this means that p|n and q|(yn − xn).

Although the worst case is Ω(q), the “birthdayparadox” argument shows that we can expectn = O(

√q) if P behaves like a random function.

In practice, this seems to be the case.

For further details (and a more efficient versionof the algorithm), see CLRS §31.9.

71

Example

Using a slightly modified version, Pollard and Ifound the factor

q = 1238926361552897

of the Fermat number

F8 = 228

+ 1 .

You can remember this factor by the epigram

I am now entirely persuaded toemploy the method,a handy trick, on gigantic compositenumbers

72

Pollard’s “p− 1” method

The Pollard “p− 1” method is interestingbecause it is the basis of the elliptic curvemethod, which we consider later.The p− 1 method depends on Fermat’s littletheorem. We want to find a prime factor p of anumber N . Suppose that (somehow) we knowor guess a number E such that p− 1 is a divisorof E. Choose some nonzero a, 1 < a < N − 1.Since we are looking for a factor of N , we canassume that GCD(a, N) = 1. Compute

g = GCD(aE − 1, N) .

By Fermat’s little theorem, ap−1 = 1 mod p soaE = 1 mod p, i.e. p is a divisor of aE − 1, so pis also a divisor of g. Thus, unless we areunlucky and find the “trivial” result g = N , weget a nontrivial divisor of N .How do we find E? We can take E to be aproduct of all prime powers q less than somebound B. If B is sufficiently large, the methodwill work. In fact, B has to be at least as largeas the largest prime factor of p− 1.

73

The elliptic curve method (ECM)

The Pollard rho algorithm is an example of afactorisation algorithm whose expected runningtime depends mainly on the size of the factor fwhich is found, and only secondarily on the sizeof the number N which is being factored (onlybecause arithmetic operations are performedmod N). The expected running time is Θ(

√f).

Pollard rho is not the best such algorithm. Amore sophisticated algorithm is Lenstra’s ellipticcurve algorithm/method (usually abbreviatedECM). This is a randomised algorithm whichfinds a factor f in expected time

O(exp((1 + ε)√

2 ln f ln ln f)) ,

where ε→ 0 as f →∞.

74

ECM continued

Because√

f = exp((ln f)/2) and

ln f

2≫

√

2 ln f ln ln f

for large f , ECM is faster than Pollard rho forlarge factors f . The crossover point is forfactors of about 12 decimal digits (so Pollardrho is a useful method for finding “small”factors).ECM is useful for finding factors of up to about40 decimal digits, and if you are lucky youmight find larger ones (the current record is73 decimal digits).ECM is a randomised algorithm. It uses anumber of “trials”, where each trial depends onchoosing a random group (defined by an ellipticcurve) and then trying to find a factor by ananalogue of the Pollard “p− 1” algorithm.The details of ECM are outside the scope ofthis course. If you are interested, look in thebook by Riesel or in some of the papersaccessible from my home page.

75

Pseudo-deterministic algorithms

Some randomised algorithms use manyindependent random numbers, and because ofthe “law of large numbers” their performance isvery predictable. One example is themultiple-polynomial quadratic sieve (MPQS)algorithm for integer factorisation.

Suppose we want to factor a large compositenumber N (not a perfect power). The key ideaof MPQS is to generate a sufficiently largenumber of relations of the form

y2 = pα1

1 · · · pαk

k mod N,

where p1, . . . , pk are small primes in aprecomputed “factor base”, and y is close to√

N . Many y are tried, and the “successful”ones are found efficiently by a sieving process.Finding a relation is a fairly rare event, but wehave to find many (as many as there are primesin the factor base), so by the law of largenumbers it is relatively easy to predict howlong it will take.

76

Combining relations

After a sufficient number of relations have beenfound, we solve a system of linear equations (or,more accurately, find a linear dependency) overGF(2) to obtain a relation where all theexponents are even. This gives us a relation ofthe form

y2 = z2 mod N ,

and we check GCD(y − z, N). With probabilityat least 0.5, this gives a nontrivial factor of N .(If it fails, we find another linear dependencyand try again . . .)

Making some plausible assumptions, theexpected run time of MPQS is

T = O(exp(√

c log N log log N)),

where c ≃ 1. In practice, this estimate is goodand the variance is small.

77

MPQS example

MPQS is currently the best general-purposealgorithm for factoring moderately largenumbers N whose factors are in the range N1/3

to N1/2. For example, Lenstra and Manassefound

3329 + 1 = 22 · 547 · 16921 · 256057 ·36913801 · 177140839 ·1534179947851 · p50 · p67 ,

where the penultimate factor p50 is a 50-digitprime 24677078822840014266652779036768062918372697435241,

and the largest factor p67 is a 67-digit prime.

The computation used a network ofworkstations for “sieving”, then a super-computer for the solution of a very large linearsystem.

78

MPQS and NFS

MPQS has been used to factor numbers of up to129 decimal digits, although a moresophisticated method (the number field sieve,NFS) is faster for numbers of more than about110 decimals. It is now feasible to factor232-digit (768-bit) numbers by NFS using anetwork of workstations.

Public key cryptography

Integer factorisation is of interest incryptography because the security of thepopular RSA (for Rivest, Shamir and Adleman)algorithm for public-key cryptography dependson the difficulty of factoring large integers (sayproducts of 150-decimal digit primes). We givea brief outline, omitting some of the practicaldetails. See CLRS §31.7 (or a recent book oncryptography, such as those mentioned below)for a more detailed description of RSA.

79

Setting up RSA

To set up an RSA scheme, Alice chooses twolarge primes p and q. She computes n = pq andφ = (p− 1)(q − 1). She chooses a randominteger e, 1 < e < φ, such that GCD(e, φ) = 1.(Sometimes, for efficiency, a small e such as 17or even 3 is used.)Using the extended Euclidean algorithm, shecomputes d such that

de = 1 mod φ .

n is the modulus, e is the encryption exponent,and d is the decryption exponent.

Alice’s public key is (n, e) and her private keyis d. The numbers p, q and φ should be keptsecret (they may be discarded as they are nolonger needed, although sometimes p and q arekept for efficiency reasons).

80

RSA encryption/decryption

For Bob to send a message m to Alice, heobtains her public key (n, e). We suppose thatm is an integer in the range n1/3 < m < n(otherwise, pad m or split m into pieces, encodethem as integers, and send each pieceseparately).

Bob computes

c = me mod n .

The ciphertext is c.

To decrypt the ciphertext and retrieve theplaintext m, Alice computes

m = cd mod n .

This works because mφ+1 = m mod n, so

cd = med = m mod n .

81

Breaking RSA

If an eavesdropper (Eve) can factor n, then shecan compute d and break the system. There isno obvious way to do this without factoring n(although it has not been proved that theproblems are equivalent).

82

Discrete logarithms

The discrete logarithm problem is to find aninteger x such that

gx = b mod n ,

where g, n and b are given integers. We writex = logg b. The solution (if it exists) is notunique, so we assume that logg b is the smallestnon-negative solution.

If n is a large prime and n− 1 has at least onelarge prime factor, then the discrete logarithmproblem seems to be difficult (at least asdifficult as factoring integers of about the samesize as n).

The concept of discrete logarithm can begeneralised to other algebraic structures. Weare just considering the simplest case here.

83

Diffie-Hellman key exchange

Suppose Bob and Alice want to send messagesto each other using ordinary (not public-key)cryptography. They need to agree on a key Kto use for encrypting/decrypting their messages.This may be difficult if they are communicatingby phone or email and someone (Eve) iseavesdropping. Diffie and Hellman suggested anice solution.

First, Bob and Alice agree on a large prime pand an element g which is a primitive rootmod p. Preferably q = (p− 1)/2 should beprime.

Bob/Alice can find suitable p and q usingAlgorithm RM, and then find g by arandomized algorithm. Testing that g is aprimitive root is made easy because thefactorisation of p− 1 is known.

Bob and Alice can make p and g public.It does not matter if Eve knows them.

84

Diffie-Hellman continued

The Diffie-Hellman algorithm for generating akey K known to Bob and Alice, but not to Eve,is as follows.

1. Alice chooses a random x ∈ 2, . . . , p− 2,computes X = gx mod p, and sends X toBob.

2. Bob chooses a random y ∈ 2, . . . , p− 2,computes Y = gy mod p, and sends Y toAlice.

3. Alice computes K = Y x mod p.

4. Bob computes K = Xy mod p.

Now both Alice and Bob know K = gxy mod p,so it can be used as a key or transformed into akey in some agreed manner.

Eve may know p, g, X and Y . However, shedoes not know x or y. Although it has not beenproved, it seems that she can not computeK = Xy mod p = Y x mod p without effectivelyfinding x or y, and this requires solving adiscrete logarithm problem.

85

Encryption using discrete logarithms

There is a public-key encryption scheme, theEl Gamal scheme, which depends on thedifficulty of computing discrete logarithms, andis closely related to the Diffie-Hellman keyexchange algorithm. For a complete descriptionof the El Gamal algorithm, see

• B. Schneier, Applied Cryptography, 2ndedition, §19.6, or

• A. Menezes, P. van Oorschot andS. Vanstone, Handbook of AppliedCryptography, §8.4(we refer to this book as MOV).

86

El Gamal encryption/decryption

For public key encryption/decryption, Alicechooses a large random prime p and a primitiveroot g. She also chooses a random exponent x,2 ≤ x ≤ p− 2, and computes X = gx mod p.Alice’s public key is (p, g, X) and her private keyis x.

For Bob to send a message m to Alice, heobtains Alice’s public key (p, g, X). We assumethat m is an integer in the range 0 ≤ m < p(otherwise, split m into pieces, encode them asintegers, and send each piece separately).

Bob chooses a random integer y, 2 ≤ y ≤ p− 2,and computes

Y = gy mod p and Z = mXy mod p .

The ciphertext is (Y, Z).

To decode the ciphertext (Y, Z), Alice uses thefact that (with all computations in GF(p))

Xy = gxy = gyx = Y x ,

som = ZY −x .

87

Notes on El Gamal

1. Alice and Bob have essentially used theDiffie-Hellman scheme to agree on a keygxy. The only difference is that Alicepublishes X as part of her public key, andit does not change.

2. It is very important for Bob to choose adifferent random y each time.

3. There is an El Gamal scheme forsignatures, but it is a little morecomplicated than the scheme forencryption/decryption.

4. El Gamal is a good alternative to RSA,especially if you have doubts about thedifficulty of integer factorisation.

5. For added security (or to permit shorterkeys) El Gamal can be generalised to usediscrete logarithms over elliptic curves orother groups. This is the basis for ellipticcurve cryptography (ECC).

88

Related topics

There is no time to cover the following topics indetail, but if you are interested you can find adiscussion of them and further references in §3.6of the book MOV mentioned above.

• Baby step, giant step method of Shanksfor discrete logs.

• Pollard rho method for discrete logs. Thisis an alternative which uses less spacethan the method of Shanks.

• Pohlig-Hellman reduction. This is a wayof reducing a discrete log problem mod nto a sequence of discrete log problemsmod p, where p ranges over the primefactors of n.

• Implication of these algorithms for theEl Gamal cryptosystem: the modulus n(or, in generalisations, the group order)should have at least one large primefactor.

89

Secret sharing

Many years ago, the University of Oxford’s cashwas stored in a large chest which had severallocks, and no one person could open all thelocks.

In more modern times, there might be eightdirectors of a company, and at least three ofthem might be required to sign cheques above acertain amount.

More generally, suppose that 1 ≤ t ≤ n andthere are n people who want to “distribute” asecret so that at least t of them need tocooperate to reconstruct the secret. A way ofimplementing this is called a “(t, n) thresholdscheme”.

If t = n, an obvious way to do this is to splitthe secret into t pieces and give one piece toeach person. However, this is not very good,because if t− 1 people get together, they mightbe able to guess the remaining piece. Also, it isnot clear how to generalise to the case t < n.

90

Shamir’s threshold scheme

Shamir proposed a (t, n) threshold schemebased on polynomial interpolation. Suppose thesecret is an integer S ≥ 0. We choose a primep > max(S, n) and define a0 = S. We chooserandom, independent valuesa1, a2, . . . , at−1 ∈ GF(p). The coefficientsa0, . . . , at−1 define a polynomial

f(x) =t−1∑

j=0

ajxj .

We compute Si = f(i) mod p for i = 1, . . . , n.Then person i is given the “share” (i, Si).

91

Reconstructing the secret

If any t people pool their shares, their values off(x) at t distinct points define f(x) uniquely, sothey can find S0 = f(0). (All this is in GF(p),or, if you prefer, just consider values mod p.)

Recall the Lagrange interpolation formula for apolynomial interpolating t points (xi, yi):

f(x) =t∑

i=1

yi

∏

j 6=i

x− xj

xi − xj.

where the product is over j in the range1 ≤ j ≤ t, j 6= i. Because the computation isdone in GF(p), there is no problem withrounding errors.

92

Properties of Shamir’s scheme

Shamir’s scheme has some desirable properties:

• Given knowledge of any t− 1 or a smallernumber of shares, all values 0 ≤ S < p ofthe shared secret remain equally probable.

• The size of one share is close to the size ofthe secret.

• New shares may be computed withoutchanging existing shares.

• A single person can be given more thanone share if desired.

• There are no unproven assumptionsabout the difficulty of solving somenumber-theoretic problem.

93

5. Annotated list of References

A. V. Aho, J. E. Hopcroft and J. D. Ullman,The Design and Analysis of ComputerAlgorithms, Addison-Wesley, 1974. Rather old(as Computer Science books go) but stillrelevant. See Ch. 6 for matrix algorithms, Ch. 7for the FFT and its applications, and Ch. 8 forinteger and polynomial arithmetic.

A. V. Aho, J. E. Hopcroft and J. D. Ullman,Data Structures and Algorithms,Addison-Wesley, 1983. A revised and moreelementary version of the first six chapters ofthe 1974 book by the same authors.

[CLRS] T. H. Cormen, C. E. Leiserson,R. L. Rivest and C. Stein, Introduction toAlgorithms, third edition, MIT Press, 2009. Themain textbook for this course. In earliereditions some of the chapter numbers differ.

94

References continued

D. E. Knuth, The Art of ComputerProgramming Addison-Wesley. Volume 1:Fundamental Algorithms (third edition, 1997).Good for mathematics relevant to analysis ofalgorithms.Volume 2: Seminumerical Algorithms (thirdedition, 1997). Good for random numbergenerators, integer and polynomial arithmetic,etc.Volume 3: Sorting and Searching (secondedition, 1998). Probably contains more thanyou want to know about sorting and searchingalgorithms.Volume 4A: Combinatorial Algorithms, Part 1(first edition, 2011). We are still waiting forVolumes 4B–7.

In all 312 volumes the exercises (and solutions)

are a mine of information. Try to find the latesteditions as they are significantly different fromthe earlier editions.

95

References continued

A. Menezes, P. van Oorschot and S. Vanstone,Handbook of Applied Cryptography, CRC Press,2001. A great reference. Seehttp://www.cacr.math.uwaterloo.ca/hac/.

Rajeev Motwani and Prabhakar Raghavan,Randomized Algorithms, Cambridge UniversityPress, 1995. A good introduction.

Hans Riesel, Prime Numbers and ComputerMethods for Factorization, second edition,Birkhauser, Boston, 1994. An introduction toalgorithms for primality testing, integerfactorisation, and some of their applications.

B. Schneier, Applied Cryptography, 2nd edition,Wiley, 1996. Describes many cryptographicalgorithms.

B. L. van der Waerden, Algebra (Volumes 1-2),Springer, 2003. A classic. The first edition(1931) had the title Modern Algebra.

96

Richard P. Brent ANUbrent/pd/number_theoretic_algs.pdf · Examples P(x) = 5+2x3 −9x7 is a...

Documents

Transcript of Richard P. Brent ANUbrent/pd/number_theoretic_algs.pdf · Examples P(x) = 5+2x3 −9x7 is a...