Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs...

13
Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E. Leiserson M. Sipser

Transcript of Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs...

Page 1: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Texts and Monographs in Computer Science

Editor David Gries

Advisory Board F.L. Bauer

S.D. Brookes C.E. Leiserson

M. Sipser

Page 2: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Texts and Monographs in Computer Science

Suad Alagic Relational Database Technology

Suad Alagic and Michael A. Arbib The Design of Well-Structured and Correct Programs

S. Thomas Alexander Adaptive Signal Processing: Theory and Applications

Michael A. Arbib, A.J. Kfoury, and Robert N. Moll A Basis for Theoretical Computer Science

Michael A. Arbib and Ernest G. Manes Algebraic Approaches to Program Semantics

F.L. Bauer and H. Wiissner Algorithmic Language and Program Development

Kaare Christian The Guide to Modula-2

Edsger W. Dijkstra Selected Writings on Computing: A Personal Perspective

Nissim Francez Fairness

Peter W. Frey. Ed. Chess Skill in Man and Machine, 2nd Edition

R. T. Gregory and E. V. Krishnamurthy Methods and Applications of Error-Free Computation

David Gries. Ed. Programming Methodology: A Collection of Articles by Members of IFIP WG2.3

David Gries The Science of Programming

Micha Hofri Probabilistic Analysis of Algorithms

A.J. Kfoury. Robert N. Moll. and Michael A. Arbib A Programming Approach to Computability

E. V. Krishnamurthy Error-Free Polynomial Matrix Computations

Franco P. Preparata and Michael Ian Shamos Computational Geometry: An Introduction

Brian Randell. Ed. The Origins of Digital Computers: Selected Papers

Arto Salomaa and Matti Soittola Automata-Theoretic Aspects of Formal Power Series

Jeffrey R. Sampson Adaptive Information Processing: An Introductory Survey

J.T. Schwartz, R.B. Dewar. E. Dubinsky, and E. Schonberg Programming with Sets: An Introduction to SETL

William M. Waite and Gerhard Goos Compiler Construction

Niklaus Wirth Programming in Modula-2

Page 3: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Micha Hofri

Probabilistic Analysis of Algorithms

On Computing Methodologies for Computer Algorithms Performance Evaluation

With 14 Illustrations

Springer-Verlag New York Berlin Heidelberg London Paris Tokyo

Page 4: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Micha Hofn Departmenl of Computer Science Technion- lIT Haifa 32000 Israel

Series Editor David Gries Department of CompUler Science Cornell University Ithaca. New York 14853 USA

Library of Congress Cataloging· in·Publication.Data Hofri. Micha.

Probabilistic analysis of algorithms. (Te~ ts and monographs in romputl'r science) Bibliography: p. IlIC ludcs i nde~ .

I . Ele<:tronic digital computers- Programming. 2. Algorithms. 3. Probabilities. I. Title. II . Series. OA76.6.1159 1987 005. 1'2 87· 16581

C 1987 by Springer.verlag New York Inc. Softcover reprint of the hardcover 1st edition 1987 All rights reserved . This work may 001 be translated or copied in whole or in pan without the: written permi&sion of the publisher (Springer. Verlag. 115 Fifth Avenue. New Yorir: . New York 10010. USA). e.\tepl for brief uccrpu in conection with reviews or scholarly analysis. Use in connection wi th any form of information stonr.ge and retrieval. elcctronic adaptat ion. computer software. or by similar or dissimilar methodology now known or hereafter developed is forbidden. TlIc use of general descriptive names. trade names. trademans. etc. in this publkat ion. even if the former are not especially identified. is not to be taken as a sign thai such names. as understood by the Trade Mans and Merchandise Mans Act. may accordingly be used free ly by anyone .

Tut prepared by the author in camera·ready form on Laser Writer Plus. Printed and bound by R.R. Donnclley &. Sons. Harrisonburg. Virginia. Printed in the United States of America.

98765432 1

ISBN·13: 978·146i2·916()..2 e· ISBN· 13: 978·146124800-2 001: 10.1007/978· 1461 24800-2

Page 5: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

- 'llill

'19n j?n~'l illW

Page 6: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

PREFACE

This book originated from notes used for a course I gave in the Department of Computer Science of the Technion during the Spring term of 1986. The course had the same title as this volume. It was intended for both graduates and undergraduates who are close to finishing their studies and are looking for a suitable area in which to specialize. This was one of the factors that determined the coverage of the course - and consequently of the text as well; the latter contains much material that I had no time to discuss in class, as well as a good deal more exercises than the students could be asked to tackle.

Until quite recently, "analysis of algorithms" was nearly synonymous with determining the "complexity class" of an algorithm. This has the objective, most often, of finding whether in all cases the running time (or storage requirements) of the algorithm operation is or is not bounded by some specified function of the size of a suitably devised representation of the problem. It usually boils down to the consideration of some extreme, especially crafted problem instances. The realization that there is more one could say to characterize the cost of using an algorithm is probably due to the influence of Knuth's series on "The Art of Computer Programming", which started out in 1968. There, clearly, the operation of algorithms was shown to be associated with probabilistic concepts and processes.

Random elements, and hence the call for stochastic analysis, may enter algorithms in essentially two ways. On the one hand, we find the so called "probabilistic algorithms", such that choose part of their actions on the basis of random elements, explicitly introduced into the algorithm specification (pseudo­random numbers, simulated coin flipping and the like). Numerous algorithms of this class were recently developed, some showing prowess well beyond anything one has believed hitherto possible (primality testing algorithms provide a good example). On the other hand, we find the operation of deterministic algorithms on input data over which some probability measure can be stipulated. While the

Page 7: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

viii Preface

sources of the randomness present a true dichotomy, the required analyses turn out typically to be of the same nature in both cases. Among the algorithms for which we provide detailed analyses, the reader will find examples of both varieties. While the analyses proper are similar, we show in Chapter 1 that the second type brings up methodological and conceptual problems that the first case need not entail. The difficulty there may be phrased as lending substance to the notion of two algorithms having the same complexity "on the average", or "in distribution". The problem may also be seen to reside in the attribution of a priori probability measures to the input instance space. At the time of writing there is no coherent accepted theory or even taxonomy for these vexing issues, comparable to standard complexity theory; we shall mostly skirt them, using reasonable - sometimes seemingly facile - assumptions, invoking naturalness as our guideline.

The probabilistic analysis of algorithms, as a discipline, draws on a fair number of branches of mathematics. Principally: probability theory (especially as applied to stochastic processes), graph theory, combinatorics, real and complex analysis, and occasionally algebra, number theory, computation theory, operational calculus and more. It was unreasonable to expect the students to have more than a cursory knowledge of most of the techniques we used, so much of the time was given over to introducing and exploring these methods as we went along. Arranging the text so it could be conveniently used both as a text and as a reference posed a problem which was solved by departing in the book version from the order of the class presentation to a large extent, collecting most of the methodological material in Chapter 2.

The prerequisites that were assumed are basic courses in discrete mathematics, calculus (including a smattering of differential equations), linear algebra, probability theory, data structures and graph algorithms; all these being required courses in our department. The last two are assumed to impart to the students some algorithmic literacy.

The emphasis throughout is on the analytic and probabilistic aspects of our activity, rather than the algorithmic ones. Good texts for the latter exist and are multiplying satisfactorily; references are scattered throughout the book. For a while I occupied myself with the question whether the text should have some chapter on the needed tools from probability theory, or an introduction to the subject, or at least an appendix - and decided against it. This subject has been royally served in the last decade or two, with very good books, at every level of depth and sophistication. Indeed, since I am not even aware of all the good sources that exist now, the reference section mentions only those recently used in my work, as well as for the preparation of this text. What I did include is a section -2.6 - containing a few results from probability theory that are outside the normal curricula and that I had occasion to witness their usefulness.

Computer algorithms deal with discrete quantities, and estimating their operations is very often reducible to a counting problem, in one guise or another. Hence the above mentioned need of combinatorics. Chapter 2 contains many

Page 8: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Preface ix

combinatorial tools and concepts, either in the text proper or in exercises. Further sources for this important area are discussed in Chapter 1.

In addition to Chapter 1, which is used mainly to establish the context of the book, and the methodological Chapter 2 mentioned above, a few sample analyses are collected in the subsequent three chapters. The choice of the particular algorithms to analyze in detail was another non-trivial issue. The quest was easier for the first, simpler ones which are used mainly to introduce the nomenclature and a point of view. Here I could capitalize on ideas found in the so-far rather scanty literature on the subject; but beyond that I was mainly guided by personal preferences and experience, and to some extent by requests of students in the class, inasmuch as they agreed with the types of analysis I were interested in or able to display. The last reservation hints at some of the difficulty in teaching this subject: for some interesting analyses the sheer technical complexity (as distinct from their mathematical profundity) is such that presenting them in class is an unjustifiable harassment of the students (and the teacher too, no doubt). An analysis in which I was involved could serve as an illustration of such an unpleasantness (Fayolle et aI., 1986).

The exercises are mostly extensions of the text, and sometimes were used to avoid giving much space to technicalities that tax the endurance of the reader and that the student should anyway gain some proficiency in performing on his own. Occasionally a large exercise presents a complete research problem, broken down to steps, with hints provided on the way to proceed.

Given the course (or book) prerequisites, the text is quite self-contained. There was no effort to keep the level of detail in the presentations uniform, and earlier sections tend to be more detailed. In the later sections a larger part of the derivation devolves to exercises, as mentioned above.

The material I used in class was (in order of presentation): Chapter 1, portions of sections 2.2.1, 2.2.2, 2.6.1, Chapter 3, 2.2.3-4, 2.4 and Chapter 4 (barring section 4.2.3). The teaching assistant, beside doing some of the exercises also covered Section 2.1, most of 2.3, and some of 2.5.

NOTATION

The references are all cited by author(s) and year, as in the above example, (Fayolle et aI., 1986), and are collected at the back of the volume. I make no claim there for either completeness or historical fairness; these are simply the sources I actually used in preparing the text.

Equations are numbered separately in each major section (1.2, 3.3 etc.). Equation (7) in section 3.2 will be denoted as (7) throughout 3.2, and by (3.2-7) elsewhere; similar notation applies to exercises. The notation (A.1.7) and (C.3) refers to equations number (1.7) and (3) in appendices A and C, respectively. The mark 0 indicates the end of a proof, a theorem which is not followed by a proof, or of a longer example.

Page 9: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

x Preface

Symbols are defined as needed. I tried to follow a few ground rules for uniformity: random variables and processes are denoted by capital italics, realized values by lower-case ones; probability distribution-, density- and mass-functions by F, f, and p, respectively, with suitable subscripts. Generating functions of several varieties by forms of the letter g, or by a letter derived from the name of the generated series or variable. Integer indices by i ,j, k, I, m, n (shades of FORTRAN?). Kronecker's delta by 0ij' E and V are reserved for the expectation and variance of random variables. There are a few exceptions. Most of the latter resulted from my trying, when following an analysis developed elsewhere, to adhere to the notation used there, so as to facilitate reference to the original paper. Symbols that are frequently used are collected in the Notation Index, at the back of the volume. Acronyms are expanded in the general index.

Starred exercises are those for which no satisfactory solution has yet been worked out.

ACKNOWLEDGMENTS

The decision to write the book was taken following a conversation with Philippe Flajolet. Alexander H.G. Rinnooy-Kan suggested the chapter on bin-packing. The involvement of Ms Lynn Montz, the Computer Science Editor of Springer-Verlag, New York, in the book production, was of a considerable help and is much appreciated. Mrs. Raya Anavi set an example in typing part of the material. The following colleagues and students read portions of the text and provided comments and corrections that greatly improved the presentation: Eyal Bardavid, John Bruno, Guy Fayolle, Aura Ganz, Zehava Koren, Shay Leshkovitz, Johann Makowsky, Hadas Shachnai and Moshe Sidi. Their contributions are gratefully acknowledged.

Still, their help does not entitle the above to any claim on errors and misrepresentations, which remain wholly mine. Help in further reducing these is avidly solicited, and will be very welcome.

The typesetting was done on equipment of the Department of Computer Science of the Technion. The help of the principal system engineers, Aythan A vior and Shlomo Goldberg, has been invaluable in getting around a myriad trivial obstacles that were as exasperating as they were unexpected.

Haifa, Israel, March 1987. M. Hofri

Page 10: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

ACKNOWLEDGMENTS

Figure 3.4, p. 130 is based on Fig. 11, p.87 of Donald E. Knuth, THE

ART OF COMPUTER PROGRAMMING, Vol 3, © 1973 by Addison Wesley Publishing Company, Inc. Reading Massachusetts. Reprinted with permission.

Figures 4.1 and 4.2, p.149, and portions of Tables 1 and 2, p.158, are reproduced from Figures 1 and 2 p.276, and Tables 2 and 3, p.280 of J.e. Lagarias, A.M. Odlyzko and D.B. Zagier: "On the Capacity of DisjointIy Shared Networks.", published in Computer Networks, 10, 275-285, © 1985 by North-Holland Publishing Company, Amsterdam. Reprinted with permission.

Figures 1.1 and 1.2, pp.5, 6 are reprinted from unnumbered figures on pp.2, 3 in Rainer Kemp, FUNDAMENTALS OF THE

AVERAGE CASE ANALYSIS OF PARTICULAR ALGORITHMS, © 1984 by John Wiley & Sons Ltd. and B. G. Teubner, Stuttgart. Reprinted with permission.

Page 11: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Contents

CHAPTER 1 Introduction

1.1 Criteria for the Performance and Quality of Algorithms

1.2 The Analysis of a Very Simple Algorithm

1.3 Comments on Sources and Resources Exercises and Complements

CHAPTER 2 Tools of the Trade

2.1 Introduction to Asymptotics 2.1.1 Asymptotic Notation 2.1.2 Summation Asymptotics 2.1.3 Euler's Summation Formula Exercises and Complements

2.2 Generating Functions 2.2.1 Elementary Properties and Applications 2.2.2 Probability gf's (pgf), Moment gf's 2.2.3 Lagrange Expansion and Applications 2.2.4 The Poisson Transform Exercises and Complements

2.3 Integral Transforms (it's) 2.3.1 Laplace Transform 2.3.2 Mellin Transform 2.3.3 Mellin Summation Formula Exercises and Complements

1

1

5

8 9

11

11 12 13 18 20

25 25 31 32 35 39

44 44 48 50 51

Page 12: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

xiv

2.4 Combinatorial Calculus (The Symbolic Operator Method) 2.4.1 Elementary Examples 2.4.2 Admissible Combinatorial Constructions 2.4.3 Operator Methods Exercises and Complements

2.5 Asymptotics from Generating Functions 2.5.1 Complex Functions - Definitions and Theorems 2.5.2 Expansions at Singularities 2.5.3 Entire Functions Exercises and Complements

2.6 Selected Results from Probability Theory 2.6.1 The Representation of an Algorithm by a Markov Chain 2.6.2 Inequalities for Sums of Bounded Random Variables 2.6.3 Wald's Identity Exercises and Complements

CHAPTER 3 Algorithms over Permutations

3.1 MAX - Locating the Largest Term in a Permutation Exercises and Complements

3.2 Representations of Permutations 3.2.1 Cycles in a Permutation 3.2.2 Inversions Exercises and Complements

3.3 Analysis of Sorting Algorithms 3.3.1 Insertion Sort 3.3.2 Shell Sort 3.3.3 Linear Probing Sort Exercises and Complements

CHAPTER 4 Algorithms for Communications Networks

4.1 The Efficiency of Multiple Connections 4.1.1 Disjointly Shared Channels - Capacity Considerations 4.1.2 Counting Realizable Configurations 4.1.3 Asymptotic Capacity Estimates Exercises and Complements

4.2 Collision Resolution Stack Algorithms

Contents

53 53 60 67 72

77 77 80 90 95

98 98

103 106 108

112

112 117

120 120 122 124

127 127 128 134 143

148

148 149 152 157 159

160

Page 13: Texts and Monographs in Computer Science - Springer978-1-4612-4800-2/1.pdf · Texts and Monographs in Computer Science Editor David Gries Advisory Board F.L. Bauer S.D. Brookes C.E.

Contents

4.2.1 Channel Capacity Analysis 4.2.2 Top of Stack Probabilities and Message Delay 4.2.3 Message Delay via Renewal Considerations 4.2.4 Note on Computations Exercises and Complements

CHAPTER 5 Bin Packing Heuristics

5.1 The Next-Fit Bin Packing Algorithm 5.1.1 Regularity and Convergence Properties 5.1.2 Next-Fit with X - U(O,l) - Expected Values 5.1.3 Next-Fit with X - U(O,a) - Expected Values 5.1.4 The Distribution of An (NF), when X - U (0,1) Exercises and Complements

5.2 The Next-Fit-Decreasing Bin Packing Algorithm 5.2.1 Direct Evaluation of Bin Requirements 5.2.2 Asymptotic Bounds on Moments Exercises and Complements

APPENDIX A: Binomial Coefficients

APPENDIX B: Stirling Numbers

APPENDIX C: Inequalities

References

Notation Index

Index

xv

163 170 176 180 181

185

187 188 189 191 195 200

203 204 208 214

219

222

225

228

234

236