MMIXware - A RISC Computer for the Third Millennium - Knuth

556
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen 1750

Transcript of MMIXware - A RISC Computer for the Third Millennium - Knuth

Page 1: MMIXware - A RISC Computer for the Third Millennium - Knuth

Lecture Notes in Computer ScienceEdited by G. Goos, J. Hartmanis and J. van Leeuwen

1750

Page 2: MMIXware - A RISC Computer for the Third Millennium - Knuth

SpringerBerlinHeidelbergNew YorkBarcelonaHong KongLondonMilanParisSingaporeTokyo

Page 3: MMIXware - A RISC Computer for the Third Millennium - Knuth

Donald E. Knuth

MMIXware

A RISC Computerfor the Third Millennium

Springer

Page 4: MMIXware - A RISC Computer for the Third Millennium - Knuth

Series Editors

Gerhard Goos, Karlsruhe University, GermanyJuris Hartmanis, Cornell University, NY, USAJan van Leeuwen, Utrecht University, The Netherlands

Author

Donald E. KnuthComputer Science DepartmentStanford UniversityStanford, CA 94305-9045 USA

Cataloging-in-Publication Data

Die Deutsche Bibliothek - CIP-Einheitsauftiahme

Knuth, Donald E.:MMEXware : a rise computer for the third millennium / Donald E. Kjiuth - Berlin ;Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ;Singapore ; Tokyo : Springer, 1999(Lecture notes in computer science ; 1750)ISBN 3-540-66938-8

CR Subject Classification (1998): C.I, C.5, D.4.8, B.3.2

ISSN 0302-9743ISBN 3-540-66938-8 Springer-Verlag Berlin Heidelberg New York

The following copyright notice is included in all files of the MMIXware package:

© 1999 Donald E. Knuth

This file may be freely copied and distributed, provided that no changes whatsoever arcmade. All users are asked to help keep the MMIXware files consistent and "uncorrupted,"identical everywhere in the world. Changes are permissible only if the modified file isgiven a new name, different from the names of existing files in the MMIXware package,and only if the modified file is clearly identified as not being part of that package. (TheCWEB system has a "change file" facility by which users can easily make minor alterationswithout modifying the master source files in any way. Everybody is supposed to usechange files instead of changing the files.) The author has tried his best to producecorrect and useful programs, in order to help promote computer science research, butno warranty of any kind should be assumed.

Usage of those files in derived works is otherwise unrestricted.

All portions of the present book that are not distributed as part of the MMIXware files arecopyright © 1999 by Springer-Verlag. All rights for those portions (including the specialindexes) are reserved.

Printed in Germany

Typesetting: Camera-ready by authorSPIN: 10750063 06/3142 - 5 4 3 2 1 0 Printed on acid-free paper

Page 5: MMIXware - A RISC Computer for the Third Millennium - Knuth

README

MMIX is a computer intended to illustrate machine-level aspects of programming.In my books The Art of Computer Programming, it replaces MIX, the 1960s-stylemachine that formerly played such a role. MMIX's so-called RISC (\Reduced Instruc-tion Set Computer") architecture is much better able to represent the computersbeing built at the turn of the millennium.I strove to design MMIX so that its machine language would be simple, elegant, and

easy to learn. At the same time I was careful to include all of the complexities neededto achieve high performance in practice, so that MMIX could in principle be built andeven perhaps be competitive with some of the fastest general-purpose computers inthe marketplace. I hope that MMIX will therefore prove to be a useful vehicle forpeople who are studying how to improve compilers and operating systems, and thatother authors will like MMIX well enough to make use of it in their own textbooks.My goal in this work is to provide a clean, complete, and well-documented \machine-independent machine" that people all over the world will be able to use as a testbedfor long-term research projects of lasting value, even as real computers continue tochange rapidly.This book is a collection of programs that make MMIX a virtual reality. One of the

programs is an assembler, MMIXAL, which converts MMIX symbolic �les to MMIX object�les. There also are two simulators, which execute the programs in given object �les.The �rst simulator, called MMIX-SIM or simply MMIX, executes a program one in-struction at a time and allows convenient debugging. The second simulator, MMMIX,simulates a high-performance pipeline in which many aspects of the computation areoverlapped in time. MMMIX is in fact a highly con�gurable \meta-simulator," capa-ble of simulating an enormous variety of di�erent kinds of pipelines with any numberof functional units and with many possible strategies for caching, virtual addresstranslation, branch prediction, super-scalar instruction issue, etc., etc.The programs in this book are somewhat primitive, because they all are based on

a simple terminal interface: Users type commands and the computer types out areply. Still, these programs are adequate to provide a basis for future developments.I'm hoping that at least one reader of this book will discover how much fun MMIX

programming can be and will be motivated to create a nice graphical interface, sothat other people will more easily be able to join in the fun. I don't have the time ortalent to construct a good GUI myself, but I've tried to write the programs in such away that modi�cations and enhancements will be easy to make.The latest versions of all these programs can be downloaded from MMIX's home page

http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html

in a �le called mmix.tar.gz. The programs are copyrighted, but anyone can usethem without charge. Furthermore I explicitly allow anybody to copy and modifythe programs in any way they like, provided only that the computer �les are givendi�erent names whenever they have been changed. Nobody but me is allowed tomake a correction or addition to the copyrighted �le mmixal.w, for example, unlessthe corrected �le is identi�ed by some other name (possibly `turbo-mmixal.w ' or`mmixal++.w ', etc.).

Page 6: MMIXware - A RISC Computer for the Third Millennium - Knuth

README vi

The programs are all written in CWEB, a language that combines C with TEX in sucha way that standard preprocessors can easily convert mmixal.w into a compilable �lemmixal.c or a documentation �le mmixal.tex. CWEB also includes a \change �le"mechanism by which people can easily customize a master source �le like mmixal.w

without changing the master �le in any way. (See

http://www-cs-faculty.stanford.edu/~knuth/cweb.html

for complete information about CWEB, including installation instructions for the relatedsoftware.) Readers of the present book who are unfamiliar with CWEB might want torefer to the notes on \How to read CWEB programs" that appear on pages 70{73 of mybook The Stanford GraphBase (New York: ACM Press, 1993), but the general ideasare almost self-explanatory so I decided not to reprint those notes here.During the next several years, as I write Volume 4 of The Art of Computer Pro-

gramming, I plan to prepare updates to Volumes 1{3 whenever Volume 4 needs torefer to new material that belongs more properly in earlier volumes. These updates,called \fascicles," will be available on the Internet via

http://www-cs-faculty.stanford.edu/~knuth/taocp.html

and they will also be published in hardcopy form. The �rst such fascicle is already�nished and available for downloading; it is a tutorial introduction to MMIX and theMMIX assembly language. Everybody who is seriously interested in MMIX should readthat First Fascicle, preferably before reading the programs in the present book.

I've tried to make the MMIXware programs interesting to read as well as useful.Indeed, the MMIX-PIPE program, which is the chief component of the MMMIX meta-simulator, is one of the most instructive programs I've ever had the pleasure of writing.But I don't expect a great number of people to study every part of this book closely, oreven to study every part of MMIX-PIPE. The main purpose of this book is to providea complete documentation of the MMIX computer and its assembly language. Manydetails about MMIX were too \picky" or too system-oriented to be appropriate for theFirst Fascicle, but every detail about MMIX can be found in the present book.

After the MMIXware programs have been installed on a UNIX-like system, they aretypically used as follows. First a user program is written in assembly language andput into a �le, say foo.mms. (The suÆx .mms stands for \MMIX symbolic.") Then thecommand

mmixal foo.mms

will translate it into an object �le, foo.mmo. Alternatively, a command such as

mmixal -l foo.lst foo.mms

could be used; this would produce a listing �le, foo.lst, in addition to foo.mmo.The listing �le, when printed, would show the contents of foo.mms together with theassembled machine language instructions.

Page 7: MMIXware - A RISC Computer for the Third Millennium - Knuth

vii README

Once an object �le like foo.mmo exists, it can be run on the simple simulator byissuing a command such as

mmix foo

(or mmix foo.mmo). Many options are also possible; for example,

mmix -s foo

will print running time statistics when the program ends;

mmix -P foo

will print a pro�le that shows exactly how often each instruction was executed;

mmix -v foo

will give \verbose" details about everything the simulator did;

mmix -t2 foo

will trace each instruction the �rst two times it is performed; etc. Also

mmix -i foo

will run the simulator in interactive mode, obeying various online commands by whichthe user can watch exactly what is happening when key parts of the program arereached. The command

mmix foo bar

will run the simulator as if MMIX itself were running the command `foo bar' with arudimentary operating system; any number of command-line arguments can followthe name of the program being simulated.The MMMIX meta-simulator can also be applied to the same program, although a

bit more preparation is necessary. First the command

mmix -Dfoo.mmb foo bar

will dump out a binary �le foo.mmb containing the information needed to load `foobar' into MMIX's memory. Then a command like

mmmix plain.mmconfig foo.mmb

will invoke the meta-simulator with a \plain" pipeline con�guration. The meta-simulator always runs interactively, using the prompt `mmmix>' when it wants in-structions about what to do next. Users can type `?' in response to this prompt ifthey want to be reminded about what the simulator can do. Typical responses are`vff' (run verbosely); `v0' (run quietly); `p' (show the pipeline); `g255' (show globalregister 255); `D' (show the D-cache); `b200' (pause when location #200 is fetched);`1000' (run 1000 cycles); etc. Some familiarity with MMIX-PIPE is necessary to un-derstand the meta-simulator's reports of its activity, but users of mmmix are assumed

Page 8: MMIXware - A RISC Computer for the Third Millennium - Knuth

README viii

to be able to extract high-level information from a mass of low-level details. (Thistalent, after all, is the hallmark of a computer scientist.)

The programs in this book appear in alphabetical order:

MMIX explains everything about the MMIX architecture.

MMIX-ARITH contains subroutines for 64-bit �xed and oating point arithmetic,using only 32-bit �xed point arithmetic.

MMIX-CONFIG processes con�guration �les for MMMIX.

MMIX-IO contains subroutines for the primitive input/output operations of a rudi-mentary operating system.

MMIX-MEM handles memory references of MMMIX in special cases associated withmemory-mapped input/output.

MMIX-PIPE does the hard work of pipeline simulation.

MMIX-SIM is the program for the non-pipelined simulator.

MMIXAL is the assembly program.

MMMIX is the driver program for the meta-simulator.

MMOTYPE is a utility program that translates an MMIX object �le into human-readable form.

The �rst of these, MMIX, is not actually a program, although it has been formattedas a CWEB document; it is a complete de�nition of MMIX, including the details offeatures that are used only by the operating system. It should be read �rst, but theother programs can be read in any order. (Actually MMIXAL or MMIX-SIM shouldprobably be read next after MMIX, and MMIX-PIPE last. The program MMIX-SIM isthe line-at-a-time simulator that is known simply as mmix after it has been compiled.)

Mini-indexes have been provided on each right-hand page of this book so that theprograms can be read essentially as hypertext. Every identi�er that is used on a two-page spread but de�ned on some other page is listed in the mini-index. For example, amini-index entry such as `oplus : octa ( ), MMIX-ARITH x5' means that the identi�eroplus denotes a function de�ned in section x5 of the MMIX-ARITH module, returninga value of type octa. A master index to all uses of all identi�ers appears at the endof this book.

I've tried to make this book error-free, but I must have blundered at least once.Therefore I shall use part of Internet page

http://www-cs-faculty.stanford.edu/~knuth/mmixware.html

as a bulletin board for posting all errors that are known to me. The �rst personwho �nds a hitherto unreported error will be entitled, as usual, to a reward of $2.56,gratefully paid.Happy hacking!

Donald E. Knuth

Cambridge, Massachusetts

17 October 1999

Page 9: MMIXware - A RISC Computer for the Third Millennium - Knuth

CONTENTS

v . . . . . README (a preface)

1 . . . . . CONTENTS (a table of contents)

2 . . . . . MMIX (a de�nition)62 . . . . . MMIX-ARITH (a library)110 . . . . . MMIX-CONFIG (a part of MMMIX)138 . . . . . MMIX-IO (a library)148 . . . . . MMIX-MEM (a triviality)150 . . . . . MMIX-PIPE (a part of MMMIX)332 . . . . . MMIX-SIM (a simulator)422 . . . . . MMIXAL (an assembler)494 . . . . . MMMIX (a meta-simulator)510 . . . . . MMOTYPE (a utility program)

524 . . . . . Master Index (a table of references)

Page 10: MMIXware - A RISC Computer for the Third Millennium - Knuth

2

MMIX

1. Introduction to MMIX. Thirty-eight years have passed since the MIX com-puter was designed, and computer architecture has been converging during those yearstowards a rather di�erent style of machine. Therefore it is time to replace MIX witha new computer that contains even less saturated fat than its predecessor.Exercise 1.3.1{25 in the third edition of Fundamental Algorithms speaks of an

extended MIX called MixMaster, which is upward compatible with the old version.But MixMaster itself is hopelessly obsolete; although it allows for several gigabytes ofmemory, we can't even use it with ASCII code to get lowercase letters. And ouch, thestandard subroutine calling convention of MIX is irrevocably based on self-modifyingcode! Decimal arithmetic and self-modifying code were popular in 1962, but they surehave disappeared quickly as machines have gotten bigger and faster. A completelynew design is called for, based on the principles of RISC architecture as expoundedin Computer Architecture by Hennessy and Patterson (Morgan Kaufmann, 1996).So here is MMIX, a computer that will totally replace MIX in the \ultimate" editions

of The Art of Computer Programming, Volumes 1{3, and in the �rst editions of theremaining volumes. I must confess that I can hardly wait to own a computer like this.How do you pronounce MMIX? I've been saying \em-mix" to myself, because the

�rst `M' represents a new millenium. Therefore I use the article \an" instead of \a"before the name MMIX in English phrases like \an MMIX simulator."Incidentally, the Dictionary of American Regional English 3 (1996) lists \mommix"

as a common dialect word used both as a noun and a verb; to mommix somethingmeans to botch it, to bollix it. Only time will tell whether I have mommixed thede�nition of MMIX.

2. The original MIX computer could be operated without an operating system; youcould bootstrap it with punched cards or paper tape and do everything yourself. Butnowadays such power is no longer in the hands of ordinary users. The MMIX hardware,like all other computing machines made today, relies on an operating system to getjobs started in their own address spaces and to provide I/O capabilities.Whenever anybody has asked if I will be writing about operating systems, my reply

has always been \Nix." Therefore the name of MMIX's operating system, NNIX, willcome as no surprise. From time to time I will necessarily have to refer to things thatNNIX does for its users, but I am unable to build NNIX myself. Life is too short.It would be wonderful if some expert in operating system design became inspired towrite a book that explains exactly how to construct a nice, clean NNIX kernel for anMMIX chip.

3. I am deeply grateful to the many people who have helped me shape the behav-ior of MMIX. In particular, John Hennessy and (especially) Dick Sites have madesigni�cant contributions.

D.E. Knuth: MMIXware, LNCS 1750, pp. 2-61, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 11: MMIXware - A RISC Computer for the Third Millennium - Knuth

3 MMIX: INTRODUCTION TO MMIX

4. A programmer's introduction to MMIX appears in \Fascicle 1," a booklet con-taining tutorial material that will ultimately appear in the fourth edition of The Artof Computer Programming. The description in the following sections is rather dif-ferent, because we are concerned about a complete implementation, including all ofthe features used by the operating system and invisible to normal programs. Hereit is important to emphasize exceptional cases that were glossed over in the tutorial,and to consider nitpicky details about things that might go wrong.

Page 12: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: MMIX BASICS 4

5. MMIX basics. MMIX is a 64-bit RISC machine with at least 256 general-purpose registers and a 64-bit address space. Every instruction is four bytes long andhas the form

OP X Y Z :

The 256 possible OP codes fall into a dozen or so easily remembered categories; aninstruction usually means, \Set register X to the result of Y OP Z." For example,

32 1 2 3

sets register 1 to the sum of registers 2 and 3. A few instructions combine the Y andZ bytes into a 16-bit YZ �eld; two of the jump instructions use a 24-bit XYZ �eld.But the three bytes X, Y, Z usually have three-pronged signi�cance independent ofeach other.Instructions are usually represented in a symbolic form corresponding to the MMIX

assembly language, in which each operation code has a mnemonic name. For example,operation 32 is ADD, and the instruction above might be written `ADD $1,$2,$3'; adollar sign `$' symbolizes a register number. In general, the instruction ADD $X,$Y,$Z

is the operation of setting $X = $Y+$Z. An assembly language instruction with twocommas has three operand �elds X, Y, Z; an instruction with one comma has twooperand �elds X, YZ; an instruction with no comma has one operand �eld, XYZ; aninstruction with no operands has X = Y = Z = 0.Most instructions have two forms, one in which the Z �eld stands for register $Z, and

one in which Z is an unsigned \immediate" constant. Thus, for example, the command`ADD $X,$Y,$Z' has a counterpart `ADD $X,$Y,Z', which sets $X = $Y+Z. Immediateconstants are always nonnegative. In the descriptions below we will introduce suchpairs of instructions by writing just `ADD $X,$Y,$Z|Z' instead of naming both casesexplicitly.The operation code for ADD $X,$Y,$Z is 32, but the operation code for ADD $X,$Y,Z

is 33. The MMIX assembler chooses the correct code by noting whether the thirdargument is a register number or not.Register numbers and constants can be given symbolic names; for example, the as-

sembly language instruction `x IS $1' makes x an abbreviation for register number 1.Similarly, `FIVE IS 5' makes FIVE an abbreviation for the constant 5. After these ab-breviations have been speci�ed, the instruction ADD x,x,FIVE increases $1 by 5, usingopcode 33, while the instruction ADD x,x,x doubles $1 using opcode 32. Symbolicnames that stand for register numbers conventionally begin with a lowercase letter,while names that stand for constants conventionally begin with an uppercase letter.This convention is not actually enforced by the assembler, but it tends to reduce aprogrammer's confusion.

6. A nybble is a 4-bit quantity, often used to denote a decimal or hexadecimal digit.A byte is an 8-bit quantity, often used to denote an alphanumeric character in ASCII

code. The Unicode standard extends ASCII to essentially all the world's languages byusing 16-bit-wide characters called wydes. (Weight watchers know that two nybblesmake one byte, but two bytes make one wyde.) In the discussion below we use the

Page 13: MMIXware - A RISC Computer for the Third Millennium - Knuth

5 MMIX: MMIX BASICS

term tetrabyte or \tetra" for a 4-byte quantity, and the similar term octabyte or \octa"for an 8-byte quantity. Thus, a tetra is two wydes, an octa is two tetras; an octabytehas 64 bits. Each MMIX register can be thought of as containing one octabyte, or twotetras, or four wydes, or eight bytes, or sixteen nybbles.When bytes, wydes, tetras, and octas represent numbers they are said to be either

signed or unsigned. An unsigned byte is a number between 0 and 28�1 = 255 inclusive;an unsigned wyde lies, similarly, between 0 and 216 � 1 = 65535; an unsigned tetralies between 0 and 232 � 1 = 4;294;967;295; an unsigned octa lies between 0 and264 � 1 = 18;446;744;073;709;551;615. Their signed counterparts use the conventionsof two's complement notation, by subtracting respectively 28, 216, 232, or 264 timesthe most signi�cant bit. Thus, the unsigned bytes 128 through 255 are regarded asthe numbers �128 through �1 when they are evaluated as signed bytes; a signed bytetherefore lies between �128 and +127, inclusive. A signed wyde is a number between�32768 and +32767; a signed tetra lies between�2;147;483;648 and +2;147;483;647; asigned octa lies between �9;223;372;036;854;775;808 and +9;223;372;036;854;775;807.The virtual memory of MMIX is an array M of 264 bytes. If k is any unsigned

octabyte, M[k] is a 1-byte quantity. MMIX machines do not actually have such vastmemories, but programmers can act as if 264 bytes are indeed present, because MMIXprovides address translation mechanisms by which an operating system can maintainthis illusion.We use the notation M2t [k] to stand for a number consisting of 2t consecutive bytes

starting at location k ^ (264 � 2t). (The notation k ^ (264 � 2t) means that the leastsigni�cant t bits of k are set to 0, and only the least 64 bits of the resulting addressare retained. Similarly, the notation k_ (2t�1) means that the least signi�cant t bitsof k are set to 1.) All accesses to 2t-byte quantities by MMIX are aligned, in the sensethat the �rst byte is a multiple of 2t.Addressing is always \big-endian." In other words, the most signi�cant (leftmost)

byte of M2t [k] is M1[k ^ (264 � 2t)] and the least signi�cant (rightmost) byte isM1[k _ (2t � 1)]. We use the notation s(M2t [k]) when we want to regard this 2t-byte number as a signed integer. Formally speaking, if l = 2t,

s(Ml[k]) =�M1[k^(�l)]M1[k^(�l)+1] : : : M1[k_(l�1)]

�256�28l[M1[k^(�l)]�128]:

Page 14: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: LOADING AND STORING 6

7. Loading and storing. Several instructions can be used to get informationfrom memory into registers. For example, the \load tetra unsigned" instruction LDTU

$1,$4,$5 puts the four bytes M4[$4 + $5] into register 1 as an unsigned integer; themost signi�cant four bytes of register 1 are set to zero. The similar instruction LDT

$1,$4,$5, \load tetra," sets $1 to the signed integer s(M4[$4 + $5]). (Instructionsgenerally treat numbers as signed unless the operation code speci�cally calls themunsigned.) In the signed case, the most signi�cant four bytes of the register will becopies of the most signi�cant bit of the tetrabyte loaded; thus they will be all 0s orall 1s, depending on whether the number is � 0 or < 0.

� LDB $X,$Y,$Z|Z `load byte'.Byte s(M[$Y + $Z]) or s(M[$Y + Z]) is loaded into register X as a signed numberbetween �128 and +127, inclusive.

� LDBU $X,$Y,$Z|Z `load byte unsigned'. Byte M[$Y + $Z] or M[$Y + Z] is loadedinto register X as an unsigned number between 0 and 255, inclusive.

� LDW $X,$Y,$Z|Z `load wyde'.Bytes s(M2[$Y+$Z]) or s(M2[$Y+Z]) are loaded into register X as a signed numberbetween �32768 and +32767, inclusive. As mentioned above, our notation M2[k]implies that the least signi�cant bit of the address $Y + $Z or $Y + Z is ignored andassumed to be 0.

� LDWU $X,$Y,$Z|Z `load wyde unsigned'. Bytes M2[$Y + $Z] or M2[$Y + Z] areloaded into register X as an unsigned number between 0 and 65535, inclusive.

� LDT $X,$Y,$Z|Z `load tetra'.Bytes s(M4[$Y+$Z]) or s(M4[$Y+Z]) are loaded into register X as a signed numberbetween �2;147;483;648 and +2;147;483;647, inclusive. As mentioned above, ournotation M4[k] implies that the two least signi�cant bits of the address $Y + $Z or$Y + Z are ignored and assumed to be 0.

� LDTU $X,$Y,$Z|Z `load tetra unsigned'.Bytes M4[$Y + $Z] or M4[$Y + Z] are loaded into register X as an unsigned numberbetween 0 and 4,294,967,296, inclusive.

� LDO $X,$Y,$Z|Z `load octa'.Bytes M8[$Y + $Z] or M8[$Y + Z] are loaded into register X. As mentioned above,our notation M8[k] implies that the three least signi�cant bits of the address $Y+$Zor $Y + Z are ignored and assumed to be 0.

� LDOU $X,$Y,$Z|Z `load octa unsigned'.Bytes M8[$Y + $Z] or M8[$Y + Z] are loaded into register X. There is in fact nodi�erence between the behavior of LDOU and LDO, since an octabyte can be regardedas either signed or unsigned. LDOU is included in MMIX just for completeness andconsistency, in spite of the fact that a foolish consistency is the hobgoblin of littleminds. (Niklaus Wirth made a strong plea for such consistency in his early critiqueof System/360; see JACM 15 (1967), 37{74.)

� LDHT $X,$Y,$Z|Z `load high tetra'.Bytes M4[$Y+$Z] or M4[$Y+Z] are loaded into the most signi�cant half of register X,and the least signi�cant half is cleared to zero. (One use of \high tetra arithmetic"

Page 15: MMIXware - A RISC Computer for the Third Millennium - Knuth

7 MMIX: LOADING AND STORING

is to detect over ow easily when tetrabytes are added or subtracted.)

� LDA $X,$Y,$Z|Z `load address'.The address $Y + $Z or $Y + Z is loaded into register X. This instruction is simplyanother name for the ADDU instruction discussed below; it can be used when theprogrammer is thinking of memory addresses instead of numbers. The MMIX assemblerconverts LDA into the same OP-code as ADDU.

8. Another family of instructions goes the other way, storing registers into memory.For example, the \store octa immediate" command STO $3,$2,17 puts the currentcontents of register 3 into M8[$2 + 17].

� STB $X,$Y,$Z|Z `store byte'.The least signi�cant byte of register X is stored into byte M[$Y + $Z] or M[$Y + Z].An integer over ow exception occurs if $X is not between �128 and +127. (We willdiscuss over ow and other kinds of exceptions later.)

� STBU $X,$Y,$Z|Z `store byte unsigned'.The least signi�cant byte of register X is stored into byte M[$Y + $Z] or M[$Y + Z].STBU instructions are the same as STB instructions, except that no test for over owis made.

� STW $X,$Y,$Z|Z `store wyde'.The two least signi�cant bytes of register X are stored into bytes M2[$Y + $Z] orM2[$Y + Z]. An integer over ow exception occurs if $X is not between �32768 and+32767.

� STWU $X,$Y,$Z|Z `store wyde unsigned'.The two least signi�cant bytes of register X are stored into bytes M2[$Y + $Z] orM2[$Y + Z]. STWU instructions are the same as STW instructions, except that no testfor over ow is made.

� STT $X,$Y,$Z|Z `store tetra'.The four least signi�cant bytes of register X are stored into bytes M4[$Y + $Z] orM4[$Y+Z]. An integer over ow exception occurs if $X is not between �2;147;483;648and +2;147;483;647.

� STTU $X,$Y,$Z|Z `store tetra unsigned'.The four least signi�cant bytes of register X are stored into bytes M4[$Y + $Z] orM4[$Y + Z]. STTU instructions are the same as STT instructions, except that no testfor over ow is made.

� STO $X,$Y,$Z|Z `store octa'.Register X is stored into bytes M8[$Y + $Z] or M8[$Y + Z].

� STOU $X,$Y,$Z|Z `store octa unsigned'.Identical to STO $X,$Y,$Z|Z.

� STCO X,$Y,$Z|Z `store constant octabyte'.An octabyte whose value is the unsigned byte X is stored into M8[$Y + $Z] orM8[$Y + Z].

� STHT $X,$Y,$Z|Z `store high tetra'.The most signi�cant four bytes of register X are stored into M4[$Y+$Z] or M4[$Y+Z].

Page 16: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: ADDING AND SUBTRACTING 8

9. Adding and subtracting. Once numbers are in registers, we can computewith them. Let's consider addition and subtraction �rst.

� ADD $X,$Y,$Z|Z `add'.The sum $Y+ $Z or $Y +Z is placed into register X using signed, two's complementarithmetic. An integer over ow exception occurs if the sum is � 263 or < �263. (Wewill discuss over ow and other kinds of exceptions later.)

� ADDU $X,$Y,$Z|Z `add unsigned'.The sum ($Y + $Z) mod 264 or ($Y + Z) mod 264 is placed into register X. Theseinstructions are the same as ADD $X,$Y,$Z|Z commands except that no test forover ow is made. (Over ow could be detected if desired by using the commandCMPU ovflo,$X,$Y after addition, where CMPU means \compare unsigned"; see below.)

� 2ADDU $X,$Y,$Z|Z `times 2 and add unsigned'.The sum (2$Y + $Z) mod 264 or (2$Y + Z) mod 264 is placed into register X.

� 4ADDU $X,$Y,$Z|Z `times 4 and add unsigned'.The sum (4$Y + $Z) mod 264 or (4$Y + Z) mod 264 is placed into register X.

� 8ADDU $X,$Y,$Z|Z `times 8 and add unsigned'.The sum (8$Y + $Z) mod 264 or (8$Y + Z) mod 264 is placed into register X.

� 16ADDU $X,$Y,$Z|Z `times 16 and add unsigned'.The sum (16$Y + $Z) mod 264 or (16$Y + Z) mod 264 is placed into register X.

Page 17: MMIXware - A RISC Computer for the Third Millennium - Knuth

9 MMIX: ADDING AND SUBTRACTING

� SUB $X,$Y,$Z|Z `subtract'.The di�erence $Y � $Z or $Y � Z is placed into register X using signed, two'scomplement arithmetic. An integer over ow exception occurs if the di�erence is � 263

or < �263.� SUBU $X,$Y,$Z|Z `subtract unsigned'.The di�erence ($Y�$Z) mod 264 or ($Y�Z) mod 264 is placed into register X. Thesetwo instructions are the same as SUB $X,$Y,$Z|Z except that no test for over ow ismade.

� NEG $X,Y,$Z|Z `negate'.The value Y � $Z or Y � Z is placed into register X using signed, two's complementarithmetic. An integer over ow exception occurs if the result is greater than 263 � 1.(Notice that in this case MMIX works with the \immediate" constant Y, not register Y.NEG commands are analogous to the immediate variants of other commands, becausethey save us from having to put one-byte constants into a register. When Y = 0,over ow occurs if and only if $Z = �263. The instruction NEG $X,1,2 has exactly thesame e�ect as NEG $X,0,1.)

� NEGU $X,Y,$Z|Z `negate unsigned'.The value (Y � $Z) mod 264 or (Y � Z) mod 264 is placed into register X. NEGU

instructions are the same as NEG instructions, except that no test for over ow ismade.

CMPU, x15.

Page 18: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: BIT FIDDLING 10

10. Bit �ddling. Before looking at multiplication and division, which take longerthan addition and subtraction, let's look at some of the other things that MMIX cando fast. There are eighteen instructions for bitwise logical operations on unsignednumbers.

� AND $X,$Y,$Z|Z `bitwise and'.Each bit of register Y is logically anded with the corresponding bit of register Z or ofthe constant Z, and the result is placed in register X. In other words, a bit of register Xis set to 1 if and only if the corresponding bits of the operands are both 1; in symbols,$X = $Y ^ $Z or $X = $Y ^ Z. This means in particular that AND $X,$Y,Z alwayszeroes out the seven most signi�cant bytes of register X, because 0s are pre�xed tothe constant byte Z.

� OR $X,$Y,$Z|Z `bitwise or'.Each bit of register Y is logically ored with the corresponding bit of register Z orof the constant Z, and the result is placed in register X. In other words, a bit ofregister X is set to 0 if and only if the corresponding bits of the operands are both 0;in symbols, $X = $Y _ $Z or $X = $Y _ Z.In the special case Z = 0, the immediate variant of this command simply copies

register Y to register X. The MMIX assembler allows us to write `SET $X,$Y' as aconvenient abbreviation for `OR $X,$Y,0'.

� XOR $X,$Y,$Z|Z `bitwise exclusive or'.Each bit of register Y is logically xored with the corresponding bit of register Z orof the constant Z, and the result is placed in register X. In other words, a bit ofregister X is set to 0 if and only if the corresponding bits of the operands are equal;in symbols, $X = $Y � $Z or $X = $Y � Z.

� ANDN $X,$Y,$Z|Z `bitwise and-not'.Each bit of register Y is logically anded with the complement of the correspondingbit of register Z or of the constant Z, and the result is placed in register X. In otherwords, a bit of register X is set to 1 if and only if the corresponding bit of register Yis 1 and the other corresponding bit is 0; in symbols, $X = $Y n $Z or $X = $Y n Z.(This is the logical di�erence operation; if the operands are bit strings representingsets, we are computing the elements that lie in one set but not the other.)

� ORN $X,$Y,$Z|Z `bitwise or-not'.Each bit of register Y is logically ored with the complement of the corresponding bitof register Z or of the constant Z, and the result is placed in register X. In otherwords, a bit of register X is set to 1 if and only if the corresponding bit of register Yis greater than or equal to the other corresponding bit; in symbols, $X = $Y _ $Z or$X = $Y _ Z. (This is the complement of $Z n $Y or Z n $Y.)

Page 19: MMIXware - A RISC Computer for the Third Millennium - Knuth

11 MMIX: BIT FIDDLING

� NAND $X,$Y,$Z|Z `bitwise not and'.Each bit of register Y is logically anded with the corresponding bit of register Z orof the constant Z, and the complement of the result is placed in register X. In otherwords, a bit of register X is set to 0 if and only if the corresponding bits of theoperands are both 1; in symbols, $X = $Y ^ $Z or $X = $Y ^ Z.� NOR $X,$Y,$Z|Z `bitwise not or'.Each bit of register Y is logically ored with the corresponding bit of register Z or of theconstant Z, and the complement of the result is placed in register X. In other words,a bit of register X is set to 1 if and only if the corresponding bits of the operands areboth 0; in symbols, $X = $Y _ $Z or $X = $Y _ Z.� NXOR $X,$Y,$Z|Z `bitwise not exclusive or'.Each bit of register Y is logically xored with the corresponding bit of register Z orof the constant Z, and the complement of the result is placed in register X. In otherwords, a bit of register X is set to 1 if and only if the corresponding bits of theoperands are equal; in symbols, $X = $Y � $Z or $X = $Y � Z.

� MUX $X,$Y,$Z|Z `bitwise multiplex'.For each bit position j, the jth bit of register X is set either to bit j of register Y orto bit j of the other operand $Z or Z, depending on whether bit j of the special mask

register rM is 1 or 0: if Mj then Yj else Zj . In symbols, $X = ($Y ^ rM)_ ($Z^ rM)or $X = ($Y ^ rM) _ (Z ^ rM). (MMIX has several such special registers, associatedwith instructions that need more than two inputs or produce more than one output.)

Page 20: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: BIT FIDDLING 12

11. Besides the eighteen bitwise operations, MMIX can also perform unsigned byte-wise and biggerwise operations that are somewhat more exotic.

� BDIF $X,$Y,$Z|Z `byte di�erence'.For each byte position j, the jth byte of register X is set to byte j of register Y minusbyte j of the other operand $Z or Z, unless that di�erence is negative; in the lattercase, byte j of $X is set to zero.

� WDIF $X,$Y,$Z|Z `wyde di�erence'.For each wyde position j, the jth wyde of register X is set to wyde j of register Yminus wyde j of the other operand $Z or Z, unless that di�erence is negative; in thelatter case, wyde j of $X is set to zero.

� TDIF $X,$Y,$Z|Z `tetra di�erence'.For each tetra position j, the jth tetra of register X is set to tetra j of register Yminus tetra j of the other operand $Z or Z, unless that di�erence is negative; in thelatter case, tetra j of $X is set to zero.

� ODIF $X,$Y,$Z|Z `octa di�erence'.Register X is set to register Y minus the other operand $Z or Z, unless $Z or Z exceedsregister Y; in the latter case, $X is set to zero. The operands are treated as unsignedintegers.

The BDIF and WDIF commands are useful in applications to graphics or video;TDIF and ODIF are also present for reasons of consistency. For example, if a andb are registers containing 8-byte quantities, their bytewise maxima c and bytewiseminima d are computed by

BDIF x,a,b; ADDU c,x,b; SUBU d,a,x;

similarly, the individual \pixel di�erences" e, namely the absolute values of thedi�erences of corresponding bytes, are computed by

BDIF x,a,b; BDIF y,b,a; OR e,x,y.

To add individual bytes of a and b while clipping all sums to 255 if they don't �t ina single byte, one can say

NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;

in other words, complement a, apply BDIF, and complement the result. The opera-tions can also be used to construct eÆcient operations on strings of bytes or wydes.Exercise: Implement a \nybble di�erence" instruction that operates in a similar

way on sixteen nybbles at a time.Answer: AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m; BDIF x,x,y; BDIF

xx,xx,yy; OR ans,x,xx where register m contains the mask #0f0f0f0f0f0f0f0f.(The ANDN operation can be regarded as a \bit di�erence" instruction that operates

in a similar way on 64 bits at a time.)

Page 21: MMIXware - A RISC Computer for the Third Millennium - Knuth

13 MMIX: BIT FIDDLING

12. Three more pairs of bit-�ddling instructions round out the collection of exotics.

� SADD $X,$Y,$Z|Z `sideways add'.Each bit of register Y is logically anded with the complement of the correspondingbit of register Z or of the constant Z, and the number of 1 bits in the result is placedin register X. In other words, register X is set to the number of bit positions in whichregister Y has a 1 and the other operand has a 0; in symbols, $X = �($Y n $Z) or$X = �($Y n Z). When the second operand is zero this operation is sometimes called\population counting," because it counts the number of 1s in register Y.

� MOR $X,$Y,$Z|Z `multiple or'.Suppose the 64 bits of register Y are indexed as

y00y01 : : : y07y10y11 : : : y17 : : : y70y71 : : : y77;

in other words, yij is the jth bit of the ith byte, if we number the bits and bytes from0 to 7 in big-endian fashion from left to right. Let the bits of the other operand, $Zor Z, be indexed similarly:

z00z01 : : : z07z10z11 : : : z17 : : : z70z71 : : : z77:

The MOR operation replaces each bit xij of register X by the bit

y0jzi0 _ y1jzi1 _ � � � y7jzi7:

Thus, for example, if register Z contains the constant #0102040810204080, MOR

reverses the order of the bytes in register Y, converting between little-endian andbig-endian addressing. (The ith byte of $X depends on the bytes of $Y as speci�edby the ith byte of $Z or Z. If we regard 64-bit words as 8� 8 Boolean matrices, withone byte per column, this operation computes the Boolean product $X = $Y $Z or$X = $YZ. Alternatively, if we regard 64-bit words as 8� 8 matrices with one byteper row, MOR computes the Boolean product $X = $Z $Y or $X = Z$Y with operandsin the opposite order. The immediate form MOR $X,$Y,Z always sets the leading sevenbytes of register X to zero; the other byte is set to the bitwise or of whatever bytesof register Y are speci�ed by the immediate operand Z.)Exercise: Explain how to compute a mask m that is #ff in byte positions where

a exceeds b, #00 in all other bytes. Answer: BDIF x,a,b; MOR m,minusone,x; hereminusone is a register consisting of all 1s. (Moreover, if we AND this result with#8040201008040201, then MOR with Z = 255, we get a one-byte encoding of m.)

� MXOR $X,$Y,$Z|Z `multiple exclusive or'.This operation is like the Boolean multiplication just discussed, but exclusive or is usedto combine the bits. Thus we obtain a matrix product over the �eld of two elementsinstead of a Boolean matrix product. This operation can be used to construct hashfunctions, among many other things. (The hash functions aren't bad, but they arenot \universal" in the sense of exercise 6.4{72.)

Page 22: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: BIT FIDDLING 14

13. Sixteen \immediate wyde" instructions are available for the common case thata 16-bit constant is needed. In this case the Y and Z �elds of the instruction areregarded as a single 16-bit unsigned number YZ.

� SETH $X,YZ `set to high wyde'; SETMH $X,YZ `set to medium high wyde';SETML $X,YZ `set to medium low wyde'; SETL $X,YZ `set to low wyde'.The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, re-spectively, and placed into register X. Thus, for example, SETML inserts a given valueinto the second-least-signi�cant wyde of register X and sets the other three wydes tozero.

� INCH $X,YZ `increase by high wyde'; INCMH $X,YZ `increase by medium high wyde';INCML $X,YZ `increase by medium low wyde'; INCL $X,YZ `increase by low wyde'.The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits,respectively, and added to register X, ignoring over ow; the result is placed back intoregister X.If YZ is the hexadecimal constant #8000, the command INCH $X,YZ complements

the most signi�cant bit of register X. We will see below that this can be used tonegate a oating point number.

� ORH $X,YZ `bitwise or with high wyde'; ORMH $X,YZ `bitwise or with medium highwyde'; ORML $X,YZ `bitwise or with medium low wyde'; ORL $X,YZ `bitwise or withlow wyde'.The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits,respectively, and ored with register X; the result is placed back into register X.Notice that any desired 4-wyde constant GH IJ KL MN can be inserted into a register

with a sequence of four instructions such as

SETH $X,GH; INCMH $X,IJ; INCML $X,KL; INCL $X,MN;

any of these INC instructions could also be replaced by OR.

� ANDNH $X,YZ `bitwise and-not high wyde'; ANDNMH $X,YZ `bitwise and-not mediumhigh wyde'; ANDNML $X,YZ `bitwise and-not medium low wyde'; ANDNL $X,YZ `bitwiseand-not low wyde'.The 16-bit unsigned number YZ is shifted left by either 0 or 16 or 32 or 48 bits,respectively, then complemented and anded with register X; the result is placed backinto register X.If YZ is the hexadecimal constant #8000, the command ANDNH $X,YZ forces the

most signi�cant bit of register X to be 0. This can be used to compute the absolutevalue of a oating point number.

14. MMIX knows several ways to shift a register left or right by any number of bits.

� SL $X,$Y,$Z|Z `shift left'.The bits of register Y are shifted left by $Z or Z places, and 0s are shifted in fromthe right; the result is placed in register X. Register Y is treated as a signed number,but the second operand is treated as an unsigned number. The e�ect is the same asmultiplication by 2$Z or by 2Z; an integer over ow exception occurs if the result is� 263 or < �263. In particular, if the second operand is 64 or more, register X willbecome entirely zero, and integer over ow will be signaled unless register Y was zero.

Page 23: MMIXware - A RISC Computer for the Third Millennium - Knuth

15 MMIX: BIT FIDDLING

� SLU $X,$Y,$Z|Z `shift left unsigned'.The bits of register Y are shifted left by $Z or Z places, and 0s are shifted in fromthe right; the result is placed in register X. Both operands are treated as unsignednumbers. The SLU instructions are equivalent to SL, except that no test for over owis made.

� SR $X,$Y,$Z|Z `shift right'.The bits of register Y are shifted right by $Z or Z places, and copies of the leftmost bit(the sign bit) are shifted in from the left; the result is placed in register X. Register Yis treated as a signed number, but the second operand is treated as an unsignednumber. The e�ect is the same as division by 2$Z or by 2Z and rounding down. Inparticular, if the second operand is 64 or more, register X will become zero if $Y wasnonnegative, �1 if $Y was negative.

� SRU $X,$Y,$Z|Z `shift right unsigned'.The bits of register Y are shifted right by $Z or Z places, and 0s are shifted in fromthe left; the result is placed in register X. Both operands are treated as unsignednumbers. The e�ect is the same as unsigned division of a 64-bit number by 2$Z orby 2Z; if the second operand is 64 or more, register X will become entirely zero.

Page 24: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: COMPARISONS 16

15. Comparisons. Arithmetic and logical operations are nice, but computerprograms also need to compare numbers and to change the course of a calculationdepending on what they �nd. MMIX has four comparison instructions to facilitate suchdecision-making.

� CMP $X,$Y,$Z|Z `compare'.Register X is set to �1 if register Y is less than register Z or less than the unsignedimmediate value Z, using the conventions of signed arithmetic; it is set to 0 if register Yis equal to register Z or equal to the unsigned immediate value Z; otherwise it is setto 1. In symbols, $X = [$Y>$Z]� [$Y<$Z] or $X = [$Y>Z]� [$Y<Z].

� CMPU $X,$Y,$Z|Z `compare unsigned'.Register X is set to �1 if register Y is less than register Z or less than the unsignedimmediate value Z, using the conventions of unsigned arithmetic; it is set to 0 ifregister Y is equal to register Z or equal to the unsigned immediate value Z; otherwiseit is set to 1. In symbols, $X = [$Y>$Z]� [$Y<$Z] or $X = [$Y>Z]� [$Y<Z].

Page 25: MMIXware - A RISC Computer for the Third Millennium - Knuth

17 MMIX: COMPARISONS

16. There also are 32 conditional instructions, which choose quickly between twoalternative courses of action.

� CSN $X,$Y,$Z|Z `conditionally set if negative'.If register Y is negative (namely if its most signi�cant bit is 1), register X is set tothe contents of register Z or to the unsigned immediate value Z. Otherwise nothinghappens.

� CSZ $X,$Y,$Z|Z `conditionally set if zero'.� CSP $X,$Y,$Z|Z `conditionally set if positive'.� CSOD $X,$Y,$Z|Z `conditionally set if odd'.� CSNN $X,$Y,$Z|Z `conditionally set if nonnegative'.� CSNZ $X,$Y,$Z|Z `conditionally set if nonzero'.� CSNP $X,$Y,$Z|Z `conditionally set if nonpositive'.� CSEV $X,$Y,$Z|Z `conditionally set if even'.These instructions are entirely analogous to CSN, except that register X changes onlyif register Y is respectively zero, positive, odd, nonnegative, nonzero, nonpositive, ornonodd.

� ZSN $X,$Y,$Z|Z `zero or set if negative'.If register Y is negative (namely if its most signi�cant bit is 1), register X is set tothe contents of register Z or to the unsigned immediate value Z. Otherwise register Xis set to zero.

� ZSZ $X,$Y,$Z|Z `zero or set if zero'.� ZSP $X,$Y,$Z|Z `zero or set if positive'.� ZSOD $X,$Y,$Z|Z `zero or set if odd'.� ZSNN $X,$Y,$Z|Z `zero or set if nonnegative'.� ZSNZ $X,$Y,$Z|Z `zero or set if nonzero'.� ZSNP $X,$Y,$Z|Z `zero or set if nonpositive'.� ZSEV $X,$Y,$Z|Z `zero or set if even'.These instructions are entirely analogous to ZSN, except that $X is set to $Z or Z ifregister Y is respectively zero, positive, odd, nonnegative, nonzero, nonpositive, oreven; otherwise $X is set to zero.Notice that the two instructions CMPU r,s,0 and ZSNZ r,s,1 have the same e�ect.

So do the two instructions CSNP r,s,0 and ZSP r,s,r. So do AND r,s,1 andZSOD r,s,1.

Page 26: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: BRANCHES AND JUMPS 18

17. Branches and jumps. MMIX ordinarily executes instructions in sequence,proceeding from an instruction in tetrabyte M4[�] to the instruction in M4[� + 4].But there are several ways to interrupt the normal ow of control, most of whichuse the Y and Z �elds of an instruction as a combined 16-bit YZ �eld. For example,BNZ $3,@+4000 (branch if nonzero) is typical: It means that control should skip ahead1000 instructions to the command that appears 4000 bytes after the BNZ, if register 3is not equal to zero.There are eight branch-forward instructions, corresponding to the eight conditions

in the CS and ZS commands that we discussed earlier. And there are eight similarbranch-backward instructions; for example, BOD $2,@-4000 (branch if odd) takescontrol to the instruction that appears 4000 bytes before this BOD command, ifregister 2 is odd. The numeric OP-code when branching backward is one greater thanthe OP-code when branching forward; the assembler takes care of this automatically,just as it takes cares of changing ADD from 32 to 33 when necessary.Since branches are relative to the current location, the MMIX assembler treats branch

instructions in a special way. Suppose a programmer writes `BNZ $3,Case5', whereCase5 is the address of an instruction in location l. If this instruction appears inlocation �, the assembler �rst computes the displacement Æ = b(l � �)=4c. Then ifÆ is nonnegative, the quantity Æ is placed in the YZ �eld of a BNZ command, and itshould be less than 216; if Æ is negative, the quantity 216 + Æ is placed in the YZ �eldof a BNZ command with OP-code increased by 1, and Æ should not be less than �216.The symbol @ used in our examples of BNZ and BOD above is interpreted by the

assembler as an abbreviation for \the location of the current instruction." In thefollowing notes we will de�ne pairs of branch commands by writing, for example,`BNZ $X,@+4*YZ[-262144]'; this stands for a branch-forward command that branchesto the current location plus four times YZ, as well as for a branch-backward commandthat branches to the current location plus four times (YZ� 65536).

� BN $X,@+4*YZ[-262144] `branch if negative'.� BZ $X,@+4*YZ[-262144] `branch if zero'.� BP $X,@+4*YZ[-262144] `branch if positive'.� BOD $X,@+4*YZ[-262144] `branch if odd'.� BNN $X,@+4*YZ[-262144] `branch if nonnegative'.� BNZ $X,@+4*YZ[-262144] `branch if nonzero'.� BNP $X,@+4*YZ[-262144] `branch if nonpositive'.� BEV $X,@+4*YZ[-262144] `branch if even'.If register X is respectively negative, zero, positive, odd, nonnegative, nonzero, non-positive, or even, and if this instruction appears in memory location �, the next in-struction is taken from memory location �+4YZ (branching forward) or �+4(YZ�216)(branching backward). Thus one can go from location � to any location between�� 262;144 and �+ 262;140, inclusive.

Sixteen additional branch instructions called probable branches are also provided.They have exactly the same meaning as ordinary branch instructions; for example,PBOD $2,@-4000 and BOD $2,@-4000 both go backward 4000 bytes if register 2 isodd. But they di�er in running time: On some implementations of MMIX, a branch

Page 27: MMIXware - A RISC Computer for the Third Millennium - Knuth

19 MMIX: BRANCHES AND JUMPS

instruction takes longer when the branch is taken, while a probable branch takeslonger when the branch is not taken. Thus programmers should use a B instructionwhen they think branching is relatively unlikely, but they should use PB when theyexpect branching to occur more often than not. Here is a list of the probable branchcommands, for completeness:

� PBN $X,@+4*YZ[-262144] `probable branch if negative'.� PBZ $X,@+4*YZ[-262144] `probable branch if zero'.� PBP $X,@+4*YZ[-262144] `probable branch if positive'.� PBOD $X,@+4*YZ[-262144] `probable branch if odd'.� PBNN $X,@+4*YZ[-262144] `probable branch if nonnegative'.� PBNZ $X,@+4*YZ[-262144] `probable branch if nonzero'.� PBNP $X,@+4*YZ[-262144] `probable branch if nonpositive'.� PBEV $X,@+4*YZ[-262144] `probable branch if even'.

18. Locations that are relative to the current instruction can be transformed intoabsolute locations with GETA commands.

� GETA $X,@+4*YZ[-262144] `get address'.The value �+4YZ or �+4(YZ�216) is placed in register X. (The assembly languageconventions of branch instructions apply; for example, we can write `GETA $X,Addr'.)

19. MMIX also has unconditional jump instructions, which change the location ofthe next instruction no matter what.

� JMP @+XYZ[-67108864] `jump'.A JMP command treats bytes X, Y, and Z as an unsigned 24-bit integer XYZ. It allowsa program to transfer control from location � to any location between �� 67;108;864and �+67;108;860 inclusive, using relative addressing as in the B and PB commands.

� GO $X,$Y,$Z|Z `go to location'.MMIX takes its next instruction from location $Y + $Z or $Y + Z, and continues fromthere. Register X is set equal to � + 4, the location of the instruction that wouldordinarily have been executed next. (GO is similar to a jump, but it is not relativeto the current location. Since GO has the same format as a load or store instruction,a loading routine can treat program labels with the same mechanism that is used totreat references to data.)An old-fashioned type of subroutine linkage can be implemented by saying either

`GO r,subloc,0' or `GETA r,@+8; JMP Sub' to enter a subroutine, then `GO r,r,0' toreturn. But subroutines are normally entered with the instructions PUSHJ or PUSHGO.The two least signi�cant bits of the address in a GO command are essentially ignored.

They will, however, appear in the the value of � returned by GETA instructions, andin the return-jump register rJ after PUSHJ or PUSHGO instructions are performed, andin the where-interrupted register at the time of an interrupt. Therefore they couldbe used to send some kind of signal to a subroutine or (less likely) to an interrupthandler.

PUSHGO, x29. PUSHJ, x29.

Page 28: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: MULTIPLICATION AND DIVISION 20

20. Multiplication and division. Now for some instructions that make MMIX

work harder.

� MUL $X,$Y,$Z|Z `multiply'.The signed product of the number in register Y by either the number in register Z orthe unsigned byte Z replaces the contents of register X. An integer over ow exceptioncan occur, as with ADD or SUB, if the result is less than �263 or greater than 263 � 1.(Immediate multiplication by powers of 2 can be done more rapidly with the SL

instruction.)

� MULU $X,$Y,$Z|Z `multiply unsigned'.The lower 64 bits of the unsigned 128-bit product of register Y and either register Zor Z are placed in register X, and the upper 64 bits are placed in the special himult

register rH. (Immediate multiplication by powers of 2 can be done more rapidly withthe SLU instruction, if the upper half is not needed. Furthermore, an instruction like4ADDU $X,$Y,$Y is faster than MULU $X,$Y,5.)

Page 29: MMIXware - A RISC Computer for the Third Millennium - Knuth

21 MMIX: MULTIPLICATION AND DIVISION

� DIV $X,$Y,$Z|Z `divide'.The signed quotient of the number in register Y divided by either the number inregister Z or the unsigned byte Z replaces the contents of register X, and the signedremainder is placed in the special remainder register rR. An integer divide checkexception occurs if the divisor is zero; in that case $X is set to zero and rR is setto $Y. An integer over ow exception occurs if the number �263 is divided by �1;otherwise integer over ow is impossible. The quotient of y divided by z is de�ned tobe by=zc, and the remainder is de�ned to be y�by=zcz (also written y mod z). Thus,the remainder is either zero or has the sign of the divisor. Dividing by z = 2t givesexactly the same quotient as shifting right t via the SR command, and exactly thesame remainder as anding with z � 1 via the AND command. Division of a positive63-bit number by a positive constant can be accomplished more quickly by computingthe upper half of a suitable unsigned product and shifting it right appropriately.

� DIVU $X,$Y,$Z|Z `divide unsigned'.The unsigned 128-bit number obtained by pre�xing the special dividend register rD tothe contents of register Y is divided either by the unsigned number in register Z or bythe unsigned byte Z, and the quotient is placed in register X. The remainder is placedin the remainder register rR. However, if rD is greater than or equal to the divisor(and in particular if the divisor is zero), then $X is set to rD and rR is set to $Y.(Unsigned arithmetic never signals an exceptional condition, even when dividing byzero.) If rD is zero, unsigned division by z = 2t gives exactly the same quotientas shifting right t via the SRU command, and exactly the same remainder as andingwith z�1 via the AND command. Section 4.3.1 of Seminumerical Algorithms explainshow to use unsigned division to obtain the quotient and remainder of extremely largenumbers.

Page 30: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: FLOATING POINT COMPUTATIONS 22

21. Floating point computations. Floating point arithmetic conforming to thefamous IEEE/ANSI Standard 754 is provided for arbitrary 64-bit numbers. The IEEEstandard refers to such numbers as \double format" quantities, but MMIX calls themsimply oating point numbers because 64-bit quantities are the norm.A positive oating point number has 53 bits of precision and can range from

approximately 10�308 to 10308. \Denormal numbers" between 10�324 and 10�308

can also be represented, but with fewer bits of precision. Floating point numberscan be in�nite, and they satisfy such identities as 1:0=1 = +0:0, �2:8 �1 = �1.Floating point quantities can also be \Not-a-Numbers" or NaNs, which are furtherclassi�ed into signaling NaNs and quiet NaNs.Five kinds of exceptions can occur during oating point computations, and they

each have code letters: Floating over ow (O) or under ow (U); oating divide byzero (Z); oating inexact (X); and oating invalid (I). For example, the multiplicationof suÆciently small integers causes no exceptions, and the division of 91.0 by 13.0 isalso exception-free, but the division 1.0/3.0 is inexact. The multiplication of extremelylarge or extremely small oating point numbers is inexact and it also causes over owor under ow. Invalid results occur when taking the square root of a negative number;mathematicians can remember the I exception by relating it to the square root of�1:0.Invalid results also occur when trying to convert in�nity or a quiet NaN to a �xed-point integer, or when any signaling NaN is encountered, or when mathematicallyunde�ned operations like 1 �1 or 0=0 are requested. (Programmers can be surethat they have not erroneously used uninitialized oating point data if they initializeall their variables to signaling NaN values.)Four di�erent rounding modes for inexact results are available: round to nearest

(and to even in case of ties); round o� (toward zero); round up (toward +1); or rounddown (toward �1). MMIX has a special arithmetic status register rA that speci�esthe current rounding mode and the user's current preferences for exception handling.IEEE standard arithmetic provides an excellent foundation for scienti�c calcula-

tions, and it will be thoroughly explained in the fourth edition of SeminumericalAlgorithms, Section 4.2. For our present purposes, we need not study all the details;but we do need to specify MMIX's behavior with respect to several things that are notcompletely de�ned by the standard. For example, the IEEE standard does not de�nethe representations of NaNs.When an octabyte represents a oating point number in MMIX's registers, the

leftmost bit is the sign; then come 11 bits for an exponent e; and the remaining 52 bitsare the fraction part f . We regard e as an integer between 0 and (11111111111)2 =2047, and we regard f as a fraction between 0 and (:111 : : : 1)2 = 1 � 2�52. Eachoctabyte has the following signi�cance:

�0:0, if e = f = 0 (zero);�2�1022f , if e = 0 and f > 0 (denormal);

�2e�1023(1 + f), if 0 < e < 2047 (normal);�1, if e = 2047 and f = 0 (in�nite);

�NaN(f), if e = 2047 and 0 < f < 1=2 (signaling NaN);�NaN(f), if e = 2047 and f � 1=2 (quiet NaN).

Page 31: MMIXware - A RISC Computer for the Third Millennium - Knuth

23 MMIX: FLOATING POINT COMPUTATIONS

Notice that +0:0 is distinguished from �0:0; this fact is important for interval arith-metic.Exercise: What 64 bits represent the oating point number 1.0? Answer: We want

e = 1023 and f = 0, so the answer is #3ff0000000000000.Exercise: What is the largest �nite oating point number? Answer: We want

e = 2046 and f = 1� 2�52, so the answer is #7fefffffffffffff = 21024 � 2971.

Page 32: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: FLOATING POINT COMPUTATIONS 24

22. The seven IEEE oating point arithmetic operations (addition, subtraction,multiplication, division, remainder, square root, and nearest-integer) all share com-mon features, called the standard oating point conventions in the discussion below:The operation is performed on oating point numbers found in two registers, $Yand $Z, except that square root and integerization ignore $Y since they involve onlyone operand. If neither input operand is a NaN, we �rst determine the exact result,then round it using the current rounding mode found in special register rA. In�niteresults are exact and need no rounding. A oating over ow exception occurs if therounded result is �nite but needs an exponent greater than 2046. A oating under- ow exception occurs if the rounded result needs an exponent less than 1 and either(i) the unrounded result cannot be represented exactly as a denormal number or(ii) the \ oating under ow trip" is enabled in rA. (Trips are discussed below.) NaNsare treated specially as follows: If either $Y or $Z is a signaling NaN, an invalid excep-tion occurs and the NaN is quieted by adding 1/2 to its fraction part. Then if $Z is aquiet NaN, the result is set to $Z; otherwise if $Y is a quiet NaN, the result is set to $Y.

� FADD $X,$Y,$Z ` oating add'.The oating point sum $Y+$Z is computed by the standard oating point conventionsjust described, and placed in register X. An invalid exception occurs if the sum is(+1) + (�1) or (�1) + (+1); in that case the result is NaN(1=2) with the signof $Z. If the sum is exactly zero and the current mode is not rounding-down, theresult is +0:0 except that (�0:0) + (�0:0) = �0:0. If the sum is exactly zero and thecurrent mode is rounding-down, the result is �0:0 except that (+0:0)+(+0:0) = +0:0.These rules for signed zeros turn out to be useful when doing interval arithmetic: Ifthe lower bound of an interval is +0:0 or if the upper bound is �0:0, the interval doesnot contain zero, so the numbers in the interval have a known sign.Floating point under ow cannot occur unless the U-trip has been enabled, because

any under owing result of oating point addition can be represented exactly as adenormal number.Silly but instructive exercise: Find all pairs of numbers ($Y; $Z) such that the

commands FADD $X,$Y,$Z and ADDU $X,$Y,$Z both produce the same result in $X(although FADD may cause oating exceptions). Answer: Of course $Y or $Z couldbe zero, if the other one is not a signaling NaN. Or one could be signaling and theother #0008000000000000. Other possibilities occur when they are both positiveand less than #0010000000000001; or when one operand is #0000000000000001 andthe other is an odd number between #0020000000000001 and #002ffffffffffffd

inclusive (rounding to nearest). And still more surprising possibilities exist, such as#7f6001b4c67bc809+ #ff5ffb6a4534a3f7. All eight families of solutions will berevealed some day in the fourth edition of Seminumerical Algorithms.

� FSUB $X,$Y,$Z ` oating subtract'.This instruction is equivalent to FADD, but with the sign of $Z negated unless rZ isa NaN.

� FMUL $X,$Y,$Z ` oating multiply'.The oating point product $Y � $Z is computed by the standard oating pointconventions, and placed in register X. An invalid exception occurs if the product is

Page 33: MMIXware - A RISC Computer for the Third Millennium - Knuth

25 MMIX: FLOATING POINT COMPUTATIONS

(�0:0)� (�1) or (�1)� (�0:0); in that case the result is �NaN(1=2). No exceptionoccurs for the product (�1)� (�1). If neither $Y nor $Z is a NaN, the sign of theresult is the product of the signs of $Y and $Z.

� FDIV $X,$Y,$Z ` oating divide'.The oating point quotient $Y=$Z is computed by the standard oating point con-ventions, and placed in $X. A oating divide by zero exception occurs if the quo-tient is (normal or denormal)=(�0:0). An invalid exception occurs if the quotient is(�0:0)=(�0:0) or (�1)=(�1); in that case the result is �NaN(1=2). No exceptionoccurs for the quotient (�1)=(�0:0). If neither $Y nor $Z is a NaN, the sign of theresult is the product of the signs of $Y and $Z.If a oating point number in register X is known to have an exponent between 2

and 2046, the instruction INCH $X,#fff0 will divide it by 2.0.

� FREM $X,$Y,$Z ` oating remainder'.The oating point remainder $Y rem$Z is computed by the standard oating pointconventions, and placed in register X. (The IEEE standard de�nes the remainder tobe $Y � n � $Z, where n is the nearest integer to $Y=$Z, and n is an even integerin case of ties. This is not the same as the remainder $Y mod $Z computed by DIV

or DIVU.) A zero remainder has the sign of rY. An invalid exception occurs if $Y isin�nite and/or $Z is zero; in that case the result is NaN(1=2) with the sign of $Y.

� FSQRT $X,$Z ` oating square root'.The oating point square root

p$Z is computed by the standard oating point

conventions, and placed in register X. An invalid exception occurs if $Z is a negativenumber (either in�nite, normal, or denormal); in that case the result is �NaN(1=2).No exception occurs when taking the square root of �0:0 or +1. In all cases the signof the result is the sign of $Z.

� FINT $X,$Z ` oating integer'.The oating point number in register Z is rounded (if necessary) to a oating pointinteger, using the current rounding mode, and placed in register $X. In�nite valuesand quiet NaNs are not changed; signaling NaNs are treated as in the standardconventions. Floating point over ow and under ow exceptions cannot occur.The Y �eld of FSQRT and FINT can be used to specify a special rounding mode, as

explained below.

Page 34: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: FLOATING POINT COMPUTATIONS 26

23. Besides doing arithmetic, we need to compare oating point numbers with eachother, taking proper account of NaNs and the fact that �0:0 should be consideredequal to +0:0. The following instructions are analogous to the comparison operatorsCMP and CMPU that we have used for integers.

� FCMP $X,$Y,$Z ` oating compare'.Register X is set to �1 if $Y < $Z according to the conventions of oating pointarithmetic, or to 1 if $Y > $Z according to those conventions. Otherwise it is setto 0. An invalid exception occurs if either $Y or $Z is a NaN; in such cases the resultis zero.

� FEQL $X,$Y,$Z ` oating equal to'.Register X is set to 1 if $Y = $Z according to the conventions of oating pointarithmetic. Otherwise it is set to 0. The result is zero if either $Y or $Z is a NaN,even if a NaN is being compared with itself. However, no invalid exception occurs,not even when $Y or $Z is a signaling NaN. (Perhaps MMIX di�ers slightly from theIEEE standard in this regard, but programmers sometimes need to look at signalingNaNs without encountering side e�ects. Programmers who insist on raising an invalidexception whenever a signaling NaN is compared for oating equality should issue theinstructions FSUB $X,$Y,$Y; FSUB $X,$Z,$Z just before saying FEQL $X,$Y,$Z.)Suppose w, x, y, and z are unsigned 64-bit integers with w < x < 263 � y < z.

Thus, the leftmost bits of w and x are 0, while the leftmost bits of y and z are 1. Thenwe have w < x < y < z when these numbers are considered as unsigned integers, buty < z < w < x when they are considered as signed integers, because y and z arenegative. Furthermore, we have z < y � w < x when these same 64-bit quantities areconsidered to be oating point numbers, assuming that no NaNs are present, becausethe leftmost bit of a oating point number represents its sign and the remaining bitsrepresent its magnitude. The case y = w occurs in oating point comparison if andonly if y is the representation of �0:0 and w is the representation of +0:0.

� FUN $X,$Y,$Z ` oating unordered'.Register X is set to 1 if $Y and $Z are unordered according to the conventions of oating point arithmetic (namely, if either one is a NaN); otherwise register X is setto 0. No invalid exception occurs, not even when $Y or $Z is a signaling NaN.

The IEEE standard discusses 26 di�erent possible relations on oating point num-bers; MMIX implements 14 of them with single instructions, followed by a branch (orby a ZS to make a \pure" 0 or 1 result); all 26 can be evaluated with a sequence of atmost four MMIX commands and a subsequent branch. The hardest case to handle is`?>=' (unordered or greater or equal, to be computed without exceptions), for whichthe following sequence makes $X � 0 if and only if $Y ?>= $Z:

FUN $255,$Y,$Z

BP $255,1F % skip ahead if unordered

FCMP $X,$Y,$Z % $X=[$Y>$Z]-[$Y<$Z]; no exceptions will arise

1H CSNZ $X,$255,1 % $X=1 if unordered

Page 35: MMIXware - A RISC Computer for the Third Millennium - Knuth

27 MMIX: FLOATING POINT COMPUTATIONS

24. Exercise: Suppose MMIX had no FINT instruction. Explain how to obtain theequivalent of FINT $X,$Z using other instructions. Your program should do the properthing with respect to NaNs and exceptions. (For example, it should cause an invalidexception if and only if $Z is a signaling NaN; it should cause an inexact exceptiononly if $Z needs to be rounded to another value.)Answer: (The assembler pre�xes hexadecimal constants by #.)

SETH $0,#4330 % $0=2^53

SET $1,$Z % $1=$Z

ANDNH $1,#8000 % $1=abs($Z)

ANDN $2,$Z,$1 % $2=signbit($Z)

FUN $3,$Z,$Z % $3=[$Z is a NaN]

BNZ $3,1F % skip ahead if $Z is a NaN

FCMP $3,$1,$0 % $3=[abs($Z)>2^53]-[abs($Z)<2^53]

CSNN $0,$3,0 % set $0=0 if $3>=0

OR $0,$2,$0 % attach sign of $Z to $0

1H FADD $1,$Z,$0 % $1=$Z+$0

FSUB $X,$1,$0 % $X=$1-$0

This program handles most cases of interest by adding and subtracting 253 using oating point arithmetic. It would be incorrect to do this in all cases; for example,such addition/subtraction might fail to give the correct answer when $Z is a smallnegative quantity (if rounding toward zero), or when $Z is a number like 2106 + 254

(if rounding to nearest).

Page 36: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: FLOATING POINT COMPUTATIONS 28

25. MMIX goes beyond the IEEE standard to de�ne additional relations between oating point numbers, as suggested by the theory in Section 4.2.2 of SeminumericalAlgorithms. Given a nonnegative number �, each normal oating point numberu = (f; e) has a neighborhood

N�(u) = fx j jx� uj � 2e�1022�g;we also de�ne N�(0) = f0g, N�(u) = fx j jx � uj � 2�1021�g if u is denormal;N�(�1) = f�1g if � < 1, N�(�1) = feverything except �1g if 1 � � < 2,N�(�1) = feverythingg if � � 2. Then we write

u � v (�), if u < N�(v) and N�(u) < v;u � v (�), if u 2 N�(v) or v 2 N�(u);u � v (�), if u 2 N�(v) and v 2 N�(u);u � v (�), if u > N�(v) and N�(u) > v.

� FCMPE $X,$Y,$Z ` oating compare (with respect to epsilon)'.Register X is set to �1 if $Y � $Z (rE) according to the conventions of SeminumericalAlgorithms as stated above; it is set to 1 if $Y � $Z (rE) according to thoseconventions; otherwise it is set to 0. Here rE is a oating point number in the specialepsilon register , which is used only by the oating point comparison operations FCMPE,FEQLE, and FUNE. An invalid exception occurs, and the result is zero, if any of $Y,$Z, or rE are NaN, or if rE is negative. If no such exception occurs, exactly one ofthe three conditions $Y � $Z, $Y � $Z, $Y � $Z holds with respect to rE.

� FEQLE $X,$Y,$Z ` oating equivalent (with respect to epsilon)'.Register X is set to 1 if $Y � $Z (rE) according to the conventions of SeminumericalAlgorithms as stated above; otherwise it is set to 0. An invalid exception occurs,and the result is zero, if any of $Y, $Z, or rE are NaN, or if rE is negative. Noticethat the relation $Y � $Z computed by FEQLE is stronger than the relation $Y � $Zcomputed by FCMPE.

� FUNE $X,$Y,$Z ` oating unordered (with respect to epsilon)'.Register X is set to 1 if $Y, $Z, or rE are exceptional as discussed for FCMPE and FEQLE;otherwise it is set to 0. No exceptions occur, even if $Y, $Z, or rE is a signaling NaN.

Exercise: What oating point numbers does FCMPE regard as � 0:0 with respect to� = 1=2, when no exceptions arise? Answer: Zero, denormal numbers, and normalnumbers with f = 0. (The numbers similar to zero with respect to � are zero, denormalnumbers with f � 2�, normal numbers with f � 2�� 1, and �1 if � >= 1.)

26. The IEEE standard also de�nes 32-bit oating point quantities, which it calls\single format" numbers. MMIX calls them short oats, and converts between 32-bit and 64-bit forms when such numbers are loaded from memory or stored frommemory. A short oat consists of a sign bit followed by an 8-bit exponent and a 23-bit fraction. After it has been loaded into one of MMIX's registers, its 52-bit fractionpart will have 29 trailing zero bits, and its exponent e will be one of the 256 values 0,(01110000001)2 = 897, (01110000010)2 = 898, : : : , (10001111110)2 = 1150, or 2047,unless it was denormal; a denormal short oat loads into a normal number with874 � e � 896.

Page 37: MMIXware - A RISC Computer for the Third Millennium - Knuth

29 MMIX: FLOATING POINT COMPUTATIONS

� LDSF $X,$Y,$Z|Z `load short oat'.Register X is set to the 64-bit oating point number corresponding to the 32-bit oating point number represented by M4[$Y + $Z] or M4[$Y + Z]. No arithmeticexceptions occur, not even if a signaling NaN is loaded.

� STSF $X,$Y,$Z|Z `store short oat'.The value obtained by rounding register X to a 32-bit oating point number is placedin M4[$Y+$Z] or M4[$Y+Z]. Rounding is done with the current rounding mode, ina manner exactly analogous to the standard conventions for rounding 64-bit results,except that the precision and exponent range are limited. In particular, oatingover ow, under ow, and inexact exceptions might occur; a signaling NaN will triggeran invalid exception and it will become quiet. The fraction part of a NaN is truncatedif necessary to a multiple of 2�23, by ignoring the least signi�cant 29 bits.If we load any two short oats and operate on them once with either FADD, FSUB,

FMUL, FDIV, FREM, FSQRT, or FINT, and if we then store the result as a short oat,we obtain the results required by the IEEE standard for single format arithmetic,because the double format can be shown to have enough precision to avoid anyproblems of \double rounding." But programmers are usually better o� stickingto 64-bit arithmetic unless they have a strong reason to emulate the precise behaviorof a 32-bit computer; 32 bits do not o�er much precision.

27. Of course we need to be able to go back and forth between integers and oatingpoint values.

� FIX $X,$Z `convert oating to �xed'.The oating point number in register $Z is converted to an integer as with the FINTinstruction, and the resulting integer (mod 264) is placed in register X. An invalidexception occurs if $Z is in�nite or a NaN; in that case $X is simply set equal to $Z.A oat-to-�x exception occurs if the result is less than �263 or greater than 263 � 1.

� FIXU $X,$Z `convert oating to �xed unsigned'.This instruction is identical to FIX except that no oat-to-�x exception occurs.

� FLOT $X,$Z|Z `convert �xed to oating'.The integer in $Z or the immediate constant Z is converted to the nearest oatingpoint value (using the current rounding mode) and placed in register $X. A oatinginexact exception occurs if rounding is necessary.

� FLOTU $X,$Z|Z `convert �xed to oating unsigned'.FLOTU is like FLOT, but $Z is treated as an unsigned integer.

� SFLOT $X,$Z|Z `convert �xed to short oat'; SFLOTU $X,$Z|Z `convert �xed toshort oat unsigned'.The SFLOT instructions are like the FLOT instructions, except that they round to a oating point number whose fraction part is a multiple of 2�23. (Thus, the resultingvalue will not be changed by a \store short oat" instruction.) Such conversionsappear in MMIX's repertoire only to establish complete conformance with the IEEEstandard; a programmer needs them only when emulating a 32-bit machine.

Page 38: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: FLOATING POINT COMPUTATIONS 30

28. Since the variants of FIX and FLOT involve only one input operand ($Z or Z),their Y �eld is normally zero. A programmer can, however, force the mode of roundingused with these commands by setting

Y = 1, ROUND_OFF;Y = 2, ROUND_UP;Y = 3, ROUND_DOWN;Y = 4, ROUND_NEAR;

for example, the instruction FLOTU $X,ROUND_OFF,$Z will set the exponent e ofregister X to 1086� l if $Z is a nonzero quantity with l leading zero bits. Thus we cancount leading zeros by continuing with SETL $0,1086; SR $X,$X,52; SUB $X,$0,$X;CSZ $X,$Z,64.The Y �eld can also be used in the same way to specify any desired rounding

mode in the other oating point instructions that have only a single operand, namelyFSQRT and FINT. An illegal instruction interrupt occurs if Y exceeds 4 in any of thesecommands.

Page 39: MMIXware - A RISC Computer for the Third Millennium - Knuth

31 MMIX: FLOATING POINT COMPUTATIONS

Page 40: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: SUBROUTINE LINKAGE 32

29. Subroutine linkage. MMIX has a several special operations designed to facil-itate the process of calling and implementing subroutines. The key notion is the ideaof a hardware-supported register stack, which can coexist with a software-supportedstack of variables that are not maintained in registers. From a programmer's stand-point, MMIX maintains a potentially unbounded list S[0], S[1], : : : , S[��1] of octabytesholding the contents of registers that are temporarily inaccessible; initially � = 0.When a subroutine is entered, registers can be \pushed" on to the end of this list, in-creasing � ; when the subroutine has �nished its execution, the registers are \popped"o� again and � decreases.Our discussion so far has treated all 256 registers $0, $1, : : : , $255 as if they were

alike. But in fact, MMIX maintains two internal one-byte counters L and G, where0 � L � G < 256, with the property that

registers 0, 1, : : : , L� 1 are \local";registers L, L+ 1, : : : , G� 1 are \marginal";registers G, G+ 1, : : : , 255 are \global."

A marginal register is zero when its value is read.The G counter is normally set to a �xed value once and for all when a program

is loaded, thereby de�ning the number of program variables that will live entirely inregisters rather than in memory during the course of execution. A programmer may,however, change G dynamically using the PUT instruction described below.The L counter starts at 0. If an instruction places a value into a register that is

currently marginal, namely a register x such that L � x < G, the value of L willincrease to x+ 1, and any newly local registers will be zero. For example, if L = 10and G = 200, the instruction ADD $5,$15,1 would simply set $5 to 1. But theinstruction ADD $15,$5,$200 would set $10, $11, : : : , $14 to zero, $15 to $5 + $200,and L to 16. (The process of clearing registers and increasing L might take quite afew machine cycles in the worst case. We will see later that MMIX is able to take careof any high-priority interrupts that might occur during this time.)

� PUSHJ $X,@+4*YZ[-262144] `push registers and jump'.� PUSHGO $X,$Y,$Z|Z `push registers and go'.Suppose �rst that X < L. Register X is set equal to the number X, then registers 0,1, : : : , X are pushed onto the register stack as described below. If this instruction isin location �, the value �+4 is placed into the special return-jump register rJ. Thencontrol jumps to instruction �+ 4YZ or �+ 4YZ� 262144 or $Y + $Z or $Y + Z, asin a JMP or GO command.Pushing the �rst X + 1 registers onto the stack means essentially that we set

S[� ] $0, S[� + 1] $1, : : : , S[� + X] $X, � � + X + 1, $0 $(X + 1),: : : , $(L�X� 2) $(L� 1), L L�X� 1. For example, if X = 1 and L = 5, thecurrent contents of $0 and the number 1 are placed on the register stack, where theywill be temporarily inaccessible. Then control jumps to a subroutine with L reducedto 3; the registers that we had been calling $2, $3, and $4 appear as $0, $1, and $2to the subroutine.If X � L the actions are similar, except that all of the local registers $0, : : : , $(L�1)

are placed on the register stack followed by the number L, and L is reset to zero. In

Page 41: MMIXware - A RISC Computer for the Third Millennium - Knuth

33 MMIX: SUBROUTINE LINKAGE

particular, the instruction PUSHGO $255,$Y,$Z pushes all the local registers onto thestack and sets L to zero, regardless of the previous value of L.We will see later that MMIX is able to achieve the e�ect of pushing and renaming

local registers without actually doing very much work at all.

� POP X,YZ `pop registers and return from subroutine'.This command preserves X of the current local registers, undoes the e�ect of the mostrecent PUSHJ or PUSHGO, and jumps to the instruction in M4[4YZ + rJ]. If X > 0,the value of $(X� 1) goes into the \hole" position where PUSHJ or PUSHGO stored thenumber of registers previously pushed.The formal details of POP are slightly complicated, but we will see that they

make sense: If X > L, we �rst replace X by L + 1. Then we essentially set x S[� � 1] mod 256, S[� � 1] $(X� 1), L min(x+X; G), $(L� 1) $(L� x� 2),: : : , $(x + 1) $0, $x S[� � 1], : : : , $0 S[� � x � 1], � � � x � 1; herex is the e�ective value of the X �eld on the previous PUSHGO. The operating systemshould arrange things so that a memory-protection interrupt will occur if a programdoes more pops than pushes. (If x > G, these formulas don't make sense as written;we actually set $j S[� � x� 1 + j] for L > j � 0 in that rare case.)Suppose, for example, that a subroutine has three input parameters ($0; $1; $2) and

produces two outputs ($0; $1). If the subroutine does not call any other subroutines,it can simply end with POP 2,0, because rJ will contain the return address. Otherwiseit should begin by saving rJ, for example with the instruction GET $4,rJ if it will beusing local registers $0 through $3, and it should use PUSHJ $5 or PUSHGO $5 whencalling sub-subroutines; �nally it should PUT rJ,$4 before saying POP 2,0. To callthe subroutine from another routine that has, say, 6 local registers, we would put theinput arguments into $7, $8, and $9, then issue the command PUSHGO $6,base,Subr;in due time the outputs of the subroutine will appear in $7 and $6.Notice that the push and pop commands make use of a one-place \hole" in the

register stack, between the registers that are pushed down and the registers thatremain local. (The hole is position $6 in the example just considered.) MMIX needsthis hole position to remember the number of registers that are pushed down. Asubroutine with no outputs ends with POP 0,0 and the hole disappears (becomesmarginal). A subroutine with one output $0 ends with POP 1,0 and the hole gets theformer value of $0. A subroutine with two outputs ($0; $1) ends with POP 2,0 andthe hole gets the former value of $1; in this case, therefore, the relative order of thetwo outputs has been switched on the register stack. If a subroutine has, say, �veoutputs ($0; : : : ; $4), it ends with POP 5,0 and $4 goes into the hole position, whereit is followed by ($0; $1; $2; $3). MMIX makes this curious permutation in the case ofmultiple outputs because the hole is most easily plugged by moving one value down(namely $4) instead of by sliding each of �ve values down in the stack.

GET, x43. PUT, x43.

Page 42: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: SUBROUTINE LINKAGE 34

These conventions for parameter passing are admittedly a bit confusing in thegeneral case, and I suppose people who use them extensively might someday �ndthemselves talking about \the infamous MMIX register shu�e." However, there isgood use for subroutines that convert a sequence of register contents like (x; a; b; c)into (f; a; b; c) where f is a function of a, b, and c but not x. Moreover, PUSHGO andPOP can be implemented with great eÆciency, and subroutine linkage tends to be asigni�cant bottleneck when other conventions are used.Information about a subroutine's calling conventions needs to be communicated to

a debugger. That can readily be done at the same time as we inform the debuggerabout the symbolic names of addresses in memory.A subroutine that uses 50 local registers will not function properly if it is called by

a program that sets G less than 50. MMIX does not allow the value of G to become lessthan 32. Therefore any subroutine that avoids global registers and uses at most 32local registers can be sure to work properly regardless of the current value of G.The rules stated above imply that a PUSHJ or PUSHGO instruction with X = 255

pushes all of the currently de�ned local registers onto the stack and sets L to zero.This makes G local registers available for use by the subroutine jumped to. If thatsubroutine later returns with POP 0,0, the former value of L and the former contentsof $0, : : : , $(L� 1) will be restored.A POP instruction with X = 255 preserves all the local registers as outputs of the

subroutine (provided that the total doesn't exceed G after popping), and puts zerointo the hole. The best policy, however, is almost always to use POP with a smallvalue of X, and in general to keep the value of L as small as possible by decreasing itwhen registers are no longer active. A smaller value of L means that MMIX can changecontext more easily when switching from one process to another.

Page 43: MMIXware - A RISC Computer for the Third Millennium - Knuth

35 MMIX: SYSTEM CONSIDERATIONS

30. System considerations. High-performance implementations of MMIX gainspeed by keeping caches of instructions and data that are likely to be needed ascomputation proceeds. [See M. V. Wilkes, IEEE Transactions EC-14 (1965), 270{271; J. S. Liptay, IBM System J. 7 (1968), 15{21.] Careful programmers can makethe computer run even faster by giving hints about how to maintain such caches.

� LDUNC $X,$Y,$Z|Z `load octa uncached'.These instructions, which have the same meaning as LDO, also inform the computerthat the loaded octabyte (and its neighbors in a cache block) will probably not beread or written in the near future.

� STUNC $X,$Y,$Z|Z `store octa uncached'.These instructions, which have the same meaning as STO, also inform the computerthat the stored octabyte (and its neighbors in a cache block) will probably not beread or written in the near future.

� PRELD X,$Y,$Z|Z `preload data'.These instructions have no e�ect on registers or memory, but they inform the com-puter that many of the X+1 bytes M[$Y+$Z] through M[$Y+$Z+X], or M[$Y+Z]through M[$Y+Z+X], will probably be loaded and/or stored in the near future. Noprotection failure occurs if the memory is not accessible.

� PREGO X,$Y,$Z|Z `prefetch to go'.These instructions have no e�ect on registers or memory, but they inform the com-puter that many of the X+1 bytes M[$Y+$Z] through M[$Y+$Z+X], or M[$Y+Z]through M[$Y + Z +X], will probably be used as instructions in the near future. Noprotection failure occurs if the memory is not accessible.

� PREST X,$Y,$Z|Z `prestore data'.These instructions have no e�ect on registers or memory if the computer has no datacache. But when such a cache exists, they inform the computer that all of the X + 1bytes M[$Y + $Z] through M[$Y + $Z + X], or M[$Y + Z] through M[$Y + Z + X],will de�nitely be stored in the near future before they are loaded. (Therefore itis permissible for the machine to ignore the present contents of those bytes. Also, ifthose bytes are being shared by several processors, the current processor should try toacquire exclusive access.) No protection failure occurs if the memory is not accessible.

Page 44: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: SYSTEM CONSIDERATIONS 36

� SYNCD X,$Y,$Z|Z `synchronize data'.When executed from nonnegative locations, these instructions have no e�ect onregisters or memory if neither a write bu�er nor a \write back" data cache are present.But when such a bu�er or cache exists, they force the computer to make sure that alldata for the X+ 1 bytes M[$Y + $Z] through M[$Y + $Z+X], or M[$Y +Z] throughM[$Y + Z + X], will be present in memory. (Otherwise the result of a previous storeinstruction might appear only in the cache; the computer is being told that now isthe time to write the information back, if it hasn't already been written. A programcan use this feature before outputting directly from memory.) No protection failureoccurs if the memory is not accessible.The action is similar when SYNCD is executed from a negative address, but in this

case the speci�ed bytes are also removed from the data cache (and from a secondarycache, if present). The operating system can use this feature when a page of virtualmemory is being swapped out, or when data is input directly into memory.

� SYNCID X,$Y,$Z|Z `synchronize instructions and data'.When executed from nonnegative locations these instructions have no e�ect on regis-ters or memory if the computer has no instruction cache separate from a data cache.But when such a cache exists, they force the computer to make sure that the X + 1bytes M[$Y + $Z] through M[$Y + $Z + X], or M[$Y + Z] through M[$Y + Z + X],will be interpreted correctly if used as instructions before they are next modi�ed.(Generally speaking, an MMIX program is not expected to store anything in memorylocations that are also being used as instructions. Therefore MMIX's instruction cacheis allowed to become inconsistent with respect to its data cache. Programmers whoinsist on executing instructions that have been fabricated dynamically, for examplewhen setting a breakpoint for debugging, must �rst SYNCID those instructions in orderto guarantee that the intended results will be obtained. A SYNCID command might beimplemented in several ways; for example, the machine might update its instructioncache to agree with its data cache. A simpler solution, which is good enough becausethe need for SYNCID ought to be rare, removes instructions in the speci�ed range fromthe instruction cache, if present, so that they will have to be fetched from memorythe next time they are needed; in this case the machine also carries out the e�ect ofa SYNCD command. No protection failure occurs if the memory is not accessible.The behavior is more drastic, but faster, when SYNCID is executed from a negative

location. Then all bytes in the speci�ed range are simply removed from all caches,and the memory corresponding to any \dirty" cache blocks involving such bytes isnot brought up to date. An operating system can use this version of the commandwhen pages of virtual memory are being discarded (for example, when a program isbeing terminated).

31. MMIX is designed to work not only on a single processor but also in situationswhere several processors share a common memory. The following commands are usefulfor eÆcient operation in such circumstances.

� CSWAP $X,$Y,$Z|Z `compare and swap octabytes'.If the octabyte M8[$Y + $Z] or M8[$Y + Z] is equal to the contents of the specialprediction register rP, it is replaced in memory with the contents of register X, and

Page 45: MMIXware - A RISC Computer for the Third Millennium - Knuth

37 MMIX: SYSTEM CONSIDERATIONS

register X is set equal to 1. Otherwise the octabyte in memory replaces rP andregister X is set to zero. This is an atomic (indivisible, uninterruptible) operation,useful for interprocess communication when independent computers are sharing thesame memory.The compare-and-swap operation was introduced by IBM in late models of the

System/370 architecture, and it soon spread to several other machines. Signi�-cant ways to use it are discussed, for example, in section 7.2.3 of Harold Stone'sHigh-Performance Computer Architecture (Reading, Massachusetts: Addison{Wesley,1987), and in sections 8.2 and 8.3 of Transaction Processing by Jim Gray and AndreasReuter (San Francisco: Morgan Kaufmann, 1993).

� SYNC XYZ `synchronize'.If XYZ = 0, the machine drains its pipeline (that is, it stalls until all precedinginstructions have completed their activity). If XYZ = 1, the machine controls itsactions less drastically, in such a way that all store instructions preceding this SYNCwill be completed before all store instructions after it. If XYZ = 2, the machinecontrols its actions in such a way that all load instructions preceding this SYNC willbe completed before all load instructions after it. If XYZ = 3, the machine controlsits actions in such a way that all load or store instructions preceding this SYNC willbe completed before all load or store instructions after it. If XYZ = 4, the machinegoes into a power-saver mode, in which instructions may be executed more slowly(or not at all) until some kind of \wake-up" signal is received. If XYZ = 5, themachine empties its write bu�er and cleans its data caches, if any (including a possiblesecondary cache); the caches retain their data, but the cache contents also appear inmemory. If XYZ = 6, the machine clears its virtual address translation caches (seebelow). If XYZ = 7, the machine clears its instruction and data caches, discardingany information in the data caches that wasn't previously in memory. (\Clearing" isstronger than \cleaning"; a clear cache remembers nothing. Clearing is also faster,because it simply obliterates everything.) If XYZ > 7, an illegal instruction interruptoccurs.Of course no SYNC is necessary between a command that loads from or stores

into memory and a subsequent command that loads from or stores into exactly thesame location. However, SYNC might be necessary in certain cases even on a one-processor system, because input/output processes take place in parallel with ordinarycomputation.The cases XYZ > 3 are privileged, in the sense that only the operating system

can use them. More precisely, if a SYNC command is encountered with XYZ = 4 orXYZ = 5 or XYZ = 6 or XYZ = 7, a \privileged instruction interrupt" occurs unlessthat interrupt is currently disabled. Only the operating system can disable interrupts(see below).

Page 46: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: TRIPS AND TRAPS 38

32. Trips and traps. Special register rA records the current status informationabout arithmetic exceptions. Its least signi�cant byte contains eight \event" bitscalled DVWIOUZX from left to right, where D stands for integer divide check, V forinteger over ow, W for oat-to-�x over ow, I for invalid operation, O for oatingover ow, U for oating under ow, Z for oating division by zero, and X for oatinginexact. The next least signi�cant byte of E contains eight \enable" bits with thesame names DVWIOUZX and the same meanings. When an exceptional conditionoccurs, there are two cases: If the corresponding enable bit is 0, the correspondingevent bit is set to 1. But if the corresponding enable bit is 1, MMIX interrupts itscurrent instruction stream and executes a special \exception handler." Thus, theevent bits record exceptions that have not been \tripped."Floating point over ow always causes two exceptions, O and X. (The strictest

interpretation of the IEEE standard would raise exception X on over ow only if oating over ow is not enabled, but MMIX always considers an over owed result to beinexact.) Floating point under ow always causes both U and X when under ow is notenabled, and it might cause both U and X when under ow is enabled. If both enablebits are set to 1 in such cases, the over ow or under ow handler is called and theinexact handler is ignored. All other types of exceptions arise one at a time, so thereis no ambiguity about which exception handler should be invoked unless exceptionsare raised by \ropcode 2" (see below); in general the �rst enabled exception in thelist DVWIOUZX takes precedence.What about the six high-order bytes of the status register rA? At present, only

two of those 48 bits are de�ned; the others must be zero for compatibility withpossible future extensions. The two bits corresponding to 217 and 216 in rA specify arounding mode, as follows: 00 means round to nearest (the default); 01 means roundo� (toward zero); 10 means round up (toward positive in�nity); and 11 means rounddown (toward negative in�nity).

33. The execution of MMIX programs can be interrupted in several ways. We havejust seen that arithmetic exceptions will cause interrupts if they are enabled; sowill illegal or privileged instructions, or instructions that are emulated in softwareinstead of provided by the hardware. Input/output operations or external timers areanother common source of interrupts; the operating system knows how to deal withall gadgets that might be hooked up to an MMIX processor chip. Interrupts occuralso when memory accesses fail|for example if memory is nonexistent or protected.Power failures that force the machine to use its backup battery power in order to keeprunning in an emergency, or hardware failures like parity errors, all must be handledas gracefully as possible.Users can also force interrupts to happen by giving explicit TRAP or TRIP instruc-

tions:

� TRAP X,Y,Z `trap'; TRIP X,Y,Z `trip'.Both of these instructions interrupt processing and transfer control to a handler. Thedi�erence between them is that TRAP is handled by the operating system but TRIPis handled by the user. More precisely, the X, Y, and Z �elds of TRAP have specialsigni�cance prede�ned by the operating system kernel. For example, a system call|

Page 47: MMIXware - A RISC Computer for the Third Millennium - Knuth

39 MMIX: TRIPS AND TRAPS

say an I/O command, or a command to allocate more memory|might be invoked bycertain settings of X, Y, and Z. The X, Y, and Z �elds of TRIP, on the other hand,are de�nable by users for their own applications, and users also de�ne their ownhandlers. \Trip handler" programs invoked by TRIP are interruptible, but interruptsare normally inhibited while a TRAP is being serviced. Speci�c details about theprecise actions of TRIP and TRAP appear below, together with the description ofanother command called RESUME that returns control from a handler to the interruptedprogram.Only two variants of TRAP are prede�ned by the MMIX architecture: If XYZ = 0 in

a TRAP command, a user process should terminate. If XYZ = 1, the operating systemshould provide default action for cases in which the user has not provided any handlerfor a particular kind of interrupt (see below).A few additional variants of TRAP are prede�ned in the rudimentary operating

system used with MMIX simulators. These variants, which allow simple input/outputoperations to be done, all have X = 0, and the Y �eld is a small positive constant.For example, Y = 1 invokes the Fopen routine, which opens a �le. (See the programMMIX-SIM for full details.)

Page 48: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: TRIPS AND TRAPS 40

34. Non-catastrophic interrupts in MMIX are always precise, in the sense that all legalinstructions before a certain point have e�ectively been executed, and no instructionsafter that point have yet been executed. The current instruction, which may or maynot have been completed at the time of interrupt and which may or may not need tobe resumed after the interrupt has been serviced, is put into the special executionregister rX, and its operands (if any) are placed in special registers rY and rZ.The address of the following instruction is placed in the special where-interruptedregister rW. The instruction in rX may not be the same as the instruction in locationrW�4; for example, it may be an instruction that branched or jumped to rW. It mightalso be an instruction inserted internally by the MMIX processor. (For example, thecomputer silently inserts an internal instruction that increases L before an instructionlike ADD $9,$1,$0 if L is currently less than 10. If an interrupt occurs, between theinserted instruction and the ADD, the instruction in rX will say ADD, because aninternal instruction retains the identity of the actual command that spawned it; butrW will point to the real ADD command.)When an instruction has the normal meaning \set $X to the result of $Y op $Z"

or \set $X to the result of $Y op Z," special registers rY and rZ will relate in theobvious way to the Y and Z operands of the instruction; but this is not always thecase. For example, after an interrupted store instruction, the �rst operand rY willhold the virtual memory address ($Y plus either $Z or Z), and the second operand rZwill be the octabyte to be stored in memory (including bytes that have not changed,in cases like STB). In other cases the actual contents of rY and rZ are de�ned by eachimplementation of MMIX, and programmers should not rely on their signi�cance.Some instructions take an unpredictable and possibly long amount of time, so it may

be necessary to interrupt them in progress. For example, the FREM instruction ( oatingpoint remainder) is extremely diÆcult to compute rapidly if its �rst operand has anexponent of 2046 and its second operand has an exponent of 1. In such cases the rYand rZ registers saved during an interrupt show the current state of the computation,not necessarily the original values of the operands. The value of rY rem rZ will stillbe the desired remainder, but rY may well have been reduced to a number that hasan exponent closer to the exponent of rZ. After the interrupt has been processed, theremainder computation will continue where it left o�. (Alternatively, an operationlike FREM or even FADD might be implemented in software instead of hardware, as wewill see later.)Another example arises with an instruction like PREST (prestore), which can specify

prestoring up to 256 bytes. An implementation of MMIX might choose to prestore only32 or 64 bytes at a time, depending on the cache block size; then it can change thecontents of rX to re ect the un�nished part of a partially completed PREST command.Commands that decrease G, pop the stack, save the current context, or unsave an

old context also are interruptible. Register rX is used to communicate informationabout partial completion in such a way that the interruption will be essentially\invisible" after a program is resumed.

Page 49: MMIXware - A RISC Computer for the Third Millennium - Knuth

41 MMIX: TRIPS AND TRAPS

35. Three kinds of interruption are possible: trips, forced traps, and dynamic traps.We will discuss each of these in turn.A TRIP instruction puts itself into the right half of the execution register rX, and

sets the 32 bits of the left half to #80000000. (Therefore rX is negative; this fact willtell the RESUME command not to TRIP again.) The special registers rY and rZ are setto the contents of the registers speci�ed by the Y and Z �elds of the TRIP command,namely $Y and $Z. Then $255 is placed into the special bootstrap register rB, and $255is set to zero. MMIX now takes its next instruction from virtual memory address 0.Arithmetic exceptions interrupt the computation in essentially the same way as

TRIP, if they are enabled. The only di�erence is that their handlers begin at therespective addresses 16, 32, 48, 64, 80, 96, 112, and 128, for exception bits D, V, W,I, O, U, Z, and X of rA; registers rY and rZ are set to the operands of the interruptedinstruction as explained earlier.A 16-byte block of memory is more than enough for a sequence of commands like

PUSHJ 255,Handler; GET $255,rB; RESUME

which will invoke a user's handler. And if the user does not choose to provide acustom-designed handler, the operating system provides a default handler via theinstructions

TRAP 1; GET $255,rB; RESUME.

A trip handler might simply record the fact that tripping occurred. But the handlerfor an arithmetic interrupt might want to change the default result of a computation.In such cases, the handler should place the desired substitute result into rZ, and itshould change the most signi�cant byte of rX from #80 to #02. This will have thedesired e�ect, because of the rules of RESUME explained below, unless the exceptionoccurred on a command like STB or STSF. (A bit more work is needed to alter thee�ect of a command that stores into memory.)Instructions in negative virtual locations do not invoke trip handlers, either for

TRIP or for arithmetic exceptions. Such instructions are reserved for the operatingsystem, as we will see.

G, x29.

RESUME, x38.

STB, x8. STSF, x26.

Page 50: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: TRIPS AND TRAPS 42

36. A TRAP instruction interrupts the computation essentially like TRIP, but withthe following modi�cations: (i) $255 is set to the contents of the special \trap addressregister" rT, not zero; (ii) the interrupt mask register rK is cleared to zero, therebyinhibiting interrupts; (iii) control jumps to virtual memory address rT, not zero;(iv) information is placed in a separate set of special registers rBB, rWW, rXX, rYY,and rZZ, instead of rB, rW, rX, rY, and rZ. (These special registers are neededbecause a trap might occur while processing a TRIP.)Another kind of forced trap occurs on implementations of MMIX that emulate certain

instructions in software rather than in hardware. Such instructions cause a TRAP eventhough their opcode is something else like FREM or FADD or DIV. The trap handlercan tell what instruction to emulate by looking at the opcode, which appears in rXX.In such cases the lefthand half of rXX is set to #02000000; the handler emulatingFADD, say, should compute the oating point sum of rYY and rZZ and place the resultin rZZ. A subsequent RESUME 1 will then place the value of rZZ in the proper register.Implementations of MMIX might also emulate the process of virtual-address-to-

physical-address translation described below, instead of providing for page tablecalculations in hardware. Then if, say, a LDB instruction does not know the physicalmemory address corresponding to a speci�ed virtual address, it will cause a forcedtrap with the left half of rXX set to #03000000 and with rYY set to the virtual addressin question. The trap handler should place the physical page address into rZZ; thenRESUME 1 will complete the LDB.

37. The third and �nal kind of interrupt is called a dynamic trap. Such interruptionsoccur when one or more of the 64 bits in the the special interrupt request register rQhave been set to 1, and when at least one corresponding bit of the special interruptmask register rK is also equal to 1. The bit positions of rQ and rK have the generalform

24 8 24 8

low-priority I/O program high-priority I/O machine

where the 8-bit \program" bits are called rwxnkbsp and have the following meanings:

r bit: instruction tries to load from a page without read permission;w bit: instruction tries to store to a page without write permission;x bit: instruction appears in a page without execute permission;n bit: instruction refers to a negative virtual address;k bit: instruction is privileged, for use by the \kernel" only;b bit: instruction breaks the rules of MMIX;s bit: instruction violates security (see below);p bit: instruction comes from a privileged (negative) virtual address.

Negative addresses are for the use of the operating system only; a security violationoccurs if an instruction in a nonnegative address is executed without the rwxnkbsp

bits of rK all set to 1. (In such cases the s bits of both rQ and rK are set to 1.)The eight \machine" bits of rQ and rK represent the most urgent kinds of interrupts.

The rightmost bit stands for power failure, the next for memory parity error, the next

Page 51: MMIXware - A RISC Computer for the Third Millennium - Knuth

43 MMIX: TRIPS AND TRAPS

for nonexistent memory, the next for rebooting, etc. Interrupts that need especiallyquick service, like requests from a high-speed network, also are allocated bit positionsnear the right end. Low priority I/O devices like keyboards are assigned to bitsat the left. The allocation of input/output devices to bit positions will di�er fromimplementation to implementation, depending on what devices are available.Once rQ^ rK becomes nonzero, the machine waits brie y until it can give a precise

interrupt. Then it proceeds as with a forced trap, except that it uses the special\dynamic trap address register" rTT instead of rT. The trap handler that begins atlocation rTT can �gure out the reason for interrupt by examining rQ ^ rK. (Forexample, after the instructions

GET $0,rQ; GET $1,rK; AND $0,$0,$1; SUBU $1,$0,1;

XOR $2,$0,$1; ANDN $1,$0,$1; SADD $2,$2,0

the highest-priority o�ending bit will be in $1 and its position will be in $2.)If the interrupted instruction contributed 1s to any of the rwxnkbsp bits of rQ, the

corresponding bits are set to 1 also in rX. A dynamic trap handler might be able touse this information (although it should service higher-priority interrupts �rst if theright half of rQ ^ rK is nonzero).The rules of MMIX are rigged so that only the operating system can execute in-

structions with interrupts suppressed. Therefore the operating system can in fact useinstructions that would interrupt an ordinary program. Control of register rK turnsout to be the ultimate privilege, and in a sense the only important one.An instruction that causes a dynamic trap is usually executed before the interrup-

tion occurs. However, an instruction that traps with bits x, k, or b does nothing; aload instruction that traps with r or n loads zero; a store instruction that traps withany of rwxnkbsp stores nothing.

GET, x43. RESUME, x38. SADD, x12.

Page 52: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: TRIPS AND TRAPS 44

38. After a trip handler or trap handler has done its thing, it generally invokes thefollowing command.

� RESUME Z `resume after interrupt'; the X and Y �elds must be zero.If the Z �eld of this instruction is zero, MMIX will use the information found in specialregisters rW, rX, rY, and rZ to restart an interrupted computation. If the executionregister rX is negative, it will be ignored and instructions will be executed starting atvirtual address rW; otherwise the instruction in the right half of the execution registerwill be inserted into the program as if it had appeared in location rW� 4, subject tocertain modi�cations that we will explain momentarily, and the next instruction willcome from rW.If the Z �eld of RESUME is 1 and if this instruction appears in a negative location,

registers rWW, rXX, rYY, and rZZ are used instead of rW, rX, rY, and rZ. Also,just before resuming the computation, mask register rK is set to $255 and $255 is setto rBB. (Only the operating system gets to use this feature.)An interrupt handler within the operating system might choose to allow itself to

be interrupted. In such cases it should save the contents of rBB, rWW, rXX, rYY,and rZZ on some kind of stack, before making rK nonzero. Then, before resumingwhatever caused the base level interrupt, it must again disable all interrupts; thiscan be done with TRAP, because the trap handler can tell from the virtual addressin rWW that it has been invoked by the operating system. Once rK is again zero,the contents of rBB, rWW, rXX, rYY, and rZZ are restored from the stack, the outerlevel interrupt mask is placed in $255, and RESUME 1 �nishes the job.Values of Z greater than 1 are reserved for possible later de�nition. Therefore they

cause an illegal instruction interrupt (that is, they set the `b' bit of rQ) in the presentversion of MMIX.If the execution register rX is nonnegative, its leftmost byte controls the way its

righthand half will be inserted into the program. Let's call this byte the \ropcode."A ropcode of 0 simply inserts the instruction into the execution stream; a ropcodeof 1 is similar, but it substitutes rY and rZ for the two operands, assuming that thismakes sense for the operation considered.Ropcode 2 inserts a command that sets $X to rZ, where X is the second byte in the

right half of rX. This ropcode is normally used with forced-trap emulations, so thatthe result of an emulated instruction is placed into the correct register. It also uses thethird-from-left byte of rX to raise any or all of the arithmetic exceptions DVWIOUZX,at the same time as rZ is being placed in $X. Emulated instructions and explicit TRAPcommands can therefore cause over ow, say, just as ordinary instructions can. (Suchnew exceptions may, of course, spawn a trip interrupt, if any of the correspondingbits are enabled in rA.)Finally, ropcode 3 is the same as ropcode 0, except that it also tells MMIX to treat

rZ as the page table entry for the virtual address rY. (See the discussion of virtualaddress translation below.) Ropcodes greater than 3 are not permitted; moreover,only RESUME 1 is allowed to use ropcode 3.

Page 53: MMIXware - A RISC Computer for the Third Millennium - Knuth

45 MMIX: TRIPS AND TRAPS

The ropcode rules in the previous paragraphs should of course be understood toinvolve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and rZ when the ropcodeis seen by RESUME 1. Thus, in particular, ropcode 3 always applies to rYY and rZZ,never to rY and rZ.Special restrictions must hold if resumption is to work properly: Ropcodes 0 and 3

must not insert a RESUME instruction; ropcode 1 must insert a \normal" instruction,namely one whose opcode begins with one of the hexadecimal digits #0, #1, #2, #3,#6, #7, #C, #D, or #E. (See the opcode chart below.) Some implementations mayalso allow ropcode 1 with SYNCD[I] and SYNCID[I], so that those instructions canconveniently be interrupted. Moreover, the destination register $X used with ropcode1 or 2 must not be marginal. All of these restrictions hold automatically in normaluse; they are relevant only if the programmer tries to do something tricky.Notice that the slightly tricky sequence

LDA $0,Loc; PUT rW,$0; LDT $1,Inst; PUT rX,$1; RESUME

will execute an almost arbitrary instruction Inst as if it had been in location Loc-4,and then will jump to location Loc (assuming that Inst doesn't branch elsewhere).

opcode chart, x51.

SYNCD, x30.

SYNCID, x30. virtual addresses, x47.

Page 54: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: SPECIAL REGISTERS 46

39. Special registers. Quite a few special registers have been mentioned so far,and MMIX actually has even more. It is time now to enumerate them all, togetherwith their internal code numbers:

rA, arithmetic status register [21];rB, bootstrap register (trip) [0];rC, cycle counter [8];rD, dividend register [1];rE, epsilon register [2];rF, failure location register [22];rG, global threshold register [19];rH, himult register [3];rI, interval counter [12];rJ, return-jump register [4];rK, interrupt mask register [15];rL, local threshold register [20];rM, multiplex mask register [5];rN, serial number [9];rO, register stack o�set [10];rP, prediction register [23];rQ, interrupt request register [16];rR, remainder register [6];rS, register stack pointer [11];rT, trap address register [13];rU, usage counter [17];rV, virtual translation register [18];rW, where-interrupted register (trip) [24];rX, execution register (trip) [25];rY, Y operand (trip) [26];rZ, Z operand (trip) [27];

rBB, bootstrap register (trap) [7];rTT, dynamic trap address register [14];

rWW, where-interrupted register (trap) [28];rXX, execution register (trap) [29];rYY, Y operand (trap) [30];rZZ, Z operand (trap) [31];

In this list rG and rL are what we have been calling simply G and L; rC, rF, rI, rN,rO, rS, rU, and rV have not been mentioned before.

40. The cycle counter rC advances by 1 on every \clock pulse" of the MMIX processor.Thus if MMIX is running at 500 MHz, the cycle counter increases every 2 nanoseconds.There is no need to worry about rC over owing; even if it were to increase once everynanosecond, it wouldn't reach 264 until more than 584.55 years have gone by.The interval counter rI is similar, but it decreases by 1 on each cycle, and causes an

interval interrupt when it reaches zero. Such interrupts can be extremely useful for

Page 55: MMIXware - A RISC Computer for the Third Millennium - Knuth

47 MMIX: SPECIAL REGISTERS

\continuous pro�ling" as a means of studying the empirical running time of programs;see Jennifer M. Anderson, Lance M. Berc, Je�rey Dean, Sanjay Ghemawat, Monika R.Henzinger, Shun-Tak A. Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A.Waldspurger, and William E. Weihl, ACM Transactions on Computer Systems 15

(1997), 357{390. The interval interrupt is achieved by setting the leftmost bit of the\machine" byte of rQ equal to 1; this is the eighth-least-signi�cant bit.The usage counter rU consists of three �elds (up; um; uc), called the usage pat-

tern up, the usage mask um, and the usage count uc. The most signi�cant byte of rUis the usage pattern; the next most signi�cant byte is the usage mask; and the re-maining 48 bits are the usage count. Whenever an instruction whose OP ^ um = uphas been executed, the value of uc increases by 1 (mod 248). Thus, for example,the OP-code chart below implies that all instructions are counted if up = um = 0;all loads and stores are counted together with GO and PUSHGO if up = (10000000)2and um = (11000000)2; all oating point instructions are counted together with �xedpoint multiplications and divisions if up = 0 and um = (11100000)2; �xed point multi-plications and divisions alone are counted if up = (00011000)2 and um = (11111000)2;completed subroutine calls are counted if up = POP and um = (11111111)2. Instruc-tions in negative locations, which belong to the operating system, are exceptional:They are included in the usage count only if the leading bit of uc is 1.Incidentally, the 64-bit counters rC and rI can be implemented rather cheaply

with only two levels of logic, using an old trick called \carry-save addition" [see,for example, G. Metze and J. E. Robertson, Proc. International Conf. InformationProcessing (Paris: 1959), 389{396]. One nice embodiment of this idea is to represent abinary number x in a redundant form as the di�erence x0�x00 of two binary numbers.Any two such numbers can be added without carry propagation as follows: Let

f(x; y; z) = (x ^ �y) _ (x ^ z) _ (�y ^ z); g(x; y; z) = x� y � z:

Then it is easy to check that x� y + z = 2f(x; y; z) � g(x; y; z); we need only verifythis in the eight cases when x, y, and z are 0 or 1. Thus we can subtract 1 from acounter x0 � x00 by setting

(x0; x00) (f(x0; x00;�1)� 1; g(x0; x00;�1));

we can add 1 by setting (x0; x00) (g(x00; x0;�1); f(x00; x0;�1) � 1). The result iszero if and only if x0 = x00. We need not actually compute the di�erence x0 � x00

until we need to examine the register. The computation of f(x; y; z) and g(x; y; z)is particularly simple in the special cases z = 0 and z = �1. A similar trick worksfor rU, but extra care is needed in that case because several instructions might �nishat the same time. (Thanks to Frank Yellin for his improvements to this paragraph.)

G, x29. L, x29. opcode chart, x51.

Page 56: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: SPECIAL REGISTERS 48

41. The special serial number register rN is permanently set to the time thisparticular instance of MMIX was created (measured as the number of seconds since00:00:00 Greenwich mean time on 1 January 1970), in its �ve least signi�cant bytes.The three most signi�cant bytes are permanently set to the version number of theMMIX architecture that is being implemented together with two additional bytes thatmodify the version number. This quantity serves as an essentially unique identi�cationnumber for each copy of MMIX.Version 1.0.0 of the architecture is described in the present document. Version 1.0.1

is similar, but simpli�ed to avoid the complications of pipelines and operating systems.Other versions may become necessary in the future.

42. The register stack o�set rO and register stack pointer rS are especially inter-esting, because they are used to implement MMIX's register stack S[0], S[1], S[2], : : : .The operating system initializes a register stack by assigning a large area of virtual

memory to each running process, beginning at an address like #6000000000000000. Ifthis starting address is �, stack entry S[k] will go into the octabyte M8[�+8k]. Stackunder ow will be detected because the process does not have permission to read fromM[�� 1]. Stack over ow will be detected because something will give out|either theuser's budget or the user's patience or the user's swap space|long before 261 bytesof virtual memory are �lled by a register stack.The MMIX hardware maintains the register stack by having two banks of 64-bit

general-purpose registers, one for globals and one for locals. The global registersg[32], g[33], : : : , g[255] are used for register numbers that are � G in MMIX commands;recall that G is always 32 or more. The local registers come from another array thatcontains 2n registers for some n where 8 � n � 10; for simplicity of exposition we willassume that there are exactly 512 local registers, but there may be only 256 or theremay be 1024.The local register slots l[0], l[1], : : : , l[511] act as a cyclic bu�er with addresses that

wrap around mod 512, so that l[512] = l[0], l[513] = l[1], etc. This bu�er is dividedinto three parts by three pointers, which we will call �, �, and .

L

Registers l[�], l[� + 1], : : : , l[� � 1] are what program instructions currently call $0,$1, : : : , $(L�1); registers l[�], l[�+1], : : : , l[ �1] are currently unused; and registersl[ ], l[ + 1], : : : , l[� � 1] contain items of the register stack that have been pusheddown but not yet stored in memory. Special register rS holds the virtual memoryaddress where l[ ] will be stored, if necessary. Special register rO holds the address

Page 57: MMIXware - A RISC Computer for the Third Millennium - Knuth

49 MMIX: SPECIAL REGISTERS

where l[�] will be stored; this always equals 8� plus the address of S[0]. We candeduce the values of �, �, and from the contents of rL, rO, and rS, because

� = (rO=8) mod 512; � = (�+ rL) mod 512; and = (rS=8) mod 512:

To maintain this situation we need to make sure that the pointers �, �, and nevermove past each other. A PUSHJ or PUSHGO operation simply advances � toward �,so it is very simple. The �rst part of a POP operation, which moves � toward �,is also very simple. But the next part of a POP requires � to move downward, andmemory accesses might be required. MMIX will decrease rS by 8 (thereby decreasing by 1) and set l[ ] M8[rS], one or more times if necessary, to keep � fromdecreasing past . Similarly, the operation of increasing L may cause MMIX to setM8[rS] l[ ] and increase rS by 8 (thereby increasing by 1) one or more times, tokeep � from increasing past . If many registers need to be loaded or stored at once,these operations are interruptible.[A somewhat similar scheme was introduced by David R. Ditzel and H. R. McLellan

in SIGPLAN Notices 17, 4 (April 1982), 48{56, and incorporated in the so-calledCRISP architecture developed at AT&T Bell Labs. An even more similar schemewas adopted in the late 1980s by Advanced Micro Devices, in the processors of theirAm29000 series|a family of computers whose instructions have essentially the format`OP X Y Z' used by MMIX.]Limited versions of MMIX, having fewer registers, can also be envisioned. For

example, we might have only 32 local registers l[0], l[1], : : : , l[31] and only 32 globalregisters g[224], g[225], : : : , g[255]. Such a machine could run any MMIX program thatmaintains the inequalities L < 32 and G � 224.

G, x29. L, x29.

Page 58: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: SPECIAL REGISTERS 50

43. Access to MMIX's special registers is obtained via the GET and PUT commands.

� GET $X,Z `get from special register'; the Y �eld must be zero.Register X is set to the contents of the special register identi�ed by its code number Z,using the code numbers listed earlier. An illegal instruction interrupt occurs if Z � 32.Every special register is readable; MMIX does not keep secrets from an inquisitive

user. But of course only the operating system is allowed to change registers likerK and rQ (the interrupt mask and request registers). And not even the operatingsystem is allowed to change rC (the cycle counter) or rN (the serial number) or thestack pointers rO and rS.

� PUT X,$Z|Z `put into special register'; the Y �eld must be zero.The special register identi�ed by X is set to the contents of register Z or to theunsigned byte Z itself, if permissible. Some changes are, however, impermissible: Bitsof rA that are always zero must remain zero; the leading seven bytes of rG and rLmust remain zero, and rL must not exceed rG; special registers 8{11 (namely rC, rN,rO, and rS) must not change; special registers 12{18 (namely rI, rK, rQ, rT, rU, rV,and rTT) can be changed only if the privilege bit of rK is zero; and certain bits of rQ(depending on available hardware) might not allow software to change them from 0to 1. Moreover, any bits of rQ that have changed from 0 to 1 since the most recentGET x,rQ will remain 1 after PUT rQ,z. The PUT command will not increase rL; itsets rL to the minimum of the current value and the new value. (A program shouldsay SETL $99,0 instead of PUT rL,100 when rL is known to be less than 100.)Impermissible PUT commands cause an illegal instruction interrupt, or (in the case

of rI, rK, rQ, rT, rU, rV, and rTT) a privileged operation interrupt.

� SAVE $X,0 `save process state'; UNSAVE 0,$Z `restore process state'; the Y �eldmust be 0, and so must the Z �eld of SAVE, the X �eld of UNSAVE.The SAVE instruction stores all registers and special registers that might a�ect thecomputation of the currently running process. First the current local registers $0, $1,: : : , $(L � 1) are pushed down as in PUSHGO $255, and L is set to zero. Then thecurrent global registers $G, $(G+1), : : : , $255 are placed above them in the registerstack; �nally rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and rZ are placed at thevery top, followed by registers rG and rA packed into eight bytes:

8 24 32

rG 0 rA

The address of the topmost octabyte is then placed in register X, which must be aglobal register. (This instruction is interruptible. If an interrupt occurs while theregisters are being saved, we will have � = � = in the ring of local registers; thusrO will equal rS and rL will be zero. The interrupt handler essentially has a newregister stack, starting on top of the partially saved context.) Immediately after aSAVE the values of rO and rS are equal to the location of the �rst byte following thestack just saved. The current register stack is e�ectively empty at this point; thusone shouldn't do a POP until this context or some other context has been unsaved.

Page 59: MMIXware - A RISC Computer for the Third Millennium - Knuth

51 MMIX: SPECIAL REGISTERS

The UNSAVE instruction goes the other way, restoring all the registers when givenan address in register Z that was returned by a previous SAVE. Immediately afteran UNSAVE the values of rO and rS will be equal. Like SAVE, this instruction isinterruptible.The operating system uses SAVE and UNSAVE to switch context between di�erent

processes. It can also use UNSAVE to establish suitable initial values of rO and rS. Buta user program that knows what it is doing can in fact allocate its own register stackor stacks and do its own process switching.Caution: UNSAVE is destructive, in the sense that a program can't reliably UNSAVE

twice from the same saved context. Once an UNSAVE has been done, further operationsare likely to change the memory record of what was saved. Moreover, an interruptduring the middle of an UNSAVE may have already clobbered some of the data inmemory before the UNSAVE has completely �nished, although the data will appearproperly in all registers.

G, x29.

L, x29.

rO, x42. rS, x42.

Page 60: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: VIRTUAL AND PHYSICAL ADDRESSES 52

44. Virtual and physical addresses. Virtual 64-bit addresses are converted tophysical addresses in a manner governed by the special virtual translation register rV.Thus M[A] really refers to m[�(A)], where m is the physical memory array and �(A)is determined by the physical mapping function �. The details of this conversion arerather technical and of interest mainly to the operating system, but two simple rulesare important to ordinary users:

� Negative addresses are mapped directly to physical addresses, by simply suppressingthe sign bit:

�(A) = A+ 263 = A ^ #7fffffffffff; if A < 0.

All accesses to negative addresses are privileged, for use by the operating system only.(Thus, for example, the trap addresses in rT and rTT should be negative, because theyare addresses inside the operating system.) Moreover, all physical addresses � 248

are intended for use by memory-mapped I/O devices; values read from or written tosuch locations are never placed in a cache.

� Nonnegative addresses belong to four segments, depending on whether the threeleading bits are 000, 001, 010, or 011. These 261-byte segments are traditionally usedfor a program's text, data, dynamic memory, and register stack, respectively, but suchconventions are not mandatory. There are four mappings �0, �1, �2, and �3 of 61-bitaddresses into 48-bit physical memory space, one for each segment:

�(A) = �bA=261c(A mod 261); if 0 � A < 263.

In general, the machine is able to access smaller addresses of a segment more eÆcientlythan larger addresses. Thus a programmer should let each segment grow upward fromzero, trying to keep any of the 61-bit addresses from becoming larger than necessary,although arbitrary addresses are legal.

45. Now it's time for the technical details of virtual address translation. Themappings �0, �1, �2, and �3 are de�ned by the following rules.

(1) The �rst two bytes of rV are four nybbles called b1, b2, b3, b4; we also de�neb0 = 0. Segment i has at most 1024 bi+1�bi pages. In particular, segment i must haveat most one page when bi = bi+1, and it must be entirely empty if bi > bi+1.(2) The next byte of rV, s, speci�es the current page size, which is 2s bytes. We

must have s � 13 (hence at least 8K bytes per page). Values of s larger than, say,20 or so are of use only in rather large programs that will reside in main memory forlong periods of time, because memory protection and swapping are applied to entirepages. The maximum legal value of s is 48.(3) The remaining �ve bytes of rV are a 27-bit root location r, a 10-bit address

space number n, and a 3-bit function �eld f :

rV =

4 4 4 4 8 27 10 3

b1 b2 b3 b4 s r n f

Normally f = 0; if f = 1, virtual address translation will be done by software insteadof hardware, and the b1, b2, b3, b4, and r �elds of rV will be ignored by the hardware.

Page 61: MMIXware - A RISC Computer for the Third Millennium - Knuth

53 MMIX: VIRTUAL AND PHYSICAL ADDRESSES

(Values of f > 1 are reserved for possible future use; if f > 1 when MMIX tries totranslate an address, a memory-protection failure will occur.)(4) Each page has an 8-byte page table entry (PTE), which looks like this:

PTE =

16 48� s s� 13 10 3

x a y n p

Here x and y are ignored (thus they are usable for any purpose by the operatingsystem); a is the physical address of byte 0 on the page; and n is the address spacenumber (which must match the number in rV). The �nal three bits are the protectionbits pr pw px; the user needs pr = 1 to load from this page, pw = 1 to store on thispage, and px = 1 to execute instructions on this page. If n fails to match the numberin rV, or if the appropriate protection bit is zero, a memory-protection fault occurs.Page table entries should be writable only by the operating system. The 16 ignored

bits of x imply that physical memory size is limited to 248 bytes (namely 256 largeterabytes); that should be enough capacity for awhile, if not for the entire newmillennium.(5) A given 61-bit address A belongs to page bA=2sc of its segment, and

�i(A) = 2s a+ (A mod 2s)

if a is the address in the PTE for page bA=2sc of segment i.(6) Suppose bA=2sc = (a4a3a2a1a0)1024 in the radix-1024 number system. In the

common case a4 = a3 = a2 = a1 = 0, the PTE is simply the octabyte m8[213(r +

bi) + 8a0]; this rule de�nes the mapping for the �rst 1024 pages. The next millionor so pages are accessed through an auxiliary page table pointer

PTP =

1 50 10 3

1 c n q

in m8[213(r+bi+1)+8a1]; here the sign must be 1 and the n-�eld must match rV, but

the q bits are ignored. The desired PTE for page (a1a0)1024 is then in m8[213c+8a0].

The next billion or so pages, namely the pages (a2a1a0)1024 with a2 6= 0, are accessedsimilarly, through an auxiliary PTP at level two; and so on.

Page 62: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: VIRTUAL AND PHYSICAL ADDRESSES 54

Notice that if b3 = b4, there is just one page in segment 3, and its PTE appears allalone in physical location 213(r + b3). Otherwise the PTEs appear in 1024-octabyteblocks. We usually have 0 < b1 < b2 < b3 < b4, but the null case b1 = b2 = b3 = b4 = 0is worthy of mention: In this special case there is only one page, and the segment bitsof a virtual address are ignored; the other 61� s bits of each virtual address must bezero.If s = 13, b1 = 3, b2 = 2, b3 = 1, and b4 = 0, there are at most 230 pages of 8K

bytes each, all belonging to segment 0. This is essentially the virtual memory setupin the Alpha 21064 computers with DIGITAL UNIXTM.I know these rules look extremely complicated, and I sincerely wish I could have

found an alternative that would be both simple and eÆcient in practice. I triedvarious schemes based on hashing, but came to the conclusion that \trie" methodssuch as those described here are better for this application. Indeed, the page tables inmost contemporary computers are based on very similar ideas, but with signi�cantlysmaller virtual addresses and without the shortcut for small page numbers. I triedalso to �nd formats for rV and the page tables that would match byte boundaries ina more friendly way, but the corresponding page sizes did not work well. Fortunatelythese grungy details are almost always completely hidden from ordinary users.

Page 63: MMIXware - A RISC Computer for the Third Millennium - Knuth

55 MMIX: VIRTUAL AND PHYSICAL ADDRESSES

46. Of course MMIX can't a�ord to perform a lengthy calculation of physical ad-dresses every time it accesses memory. The machine therefore maintains a translation

cache (TC), which contains the translations of recently accessed pages. (In fact, thereusually are two such caches, one for instructions and one for data.) A TC holds a setof 64-bit translation keys

1 2 61� s s� 13 10 3

0 i v 0 n 0

associated with 38-bit translations

48� s s� 13 3

a 0 p

representing the relevant parts of the PTE for page v of segment i. Di�erent pro-cesses typically have di�erent values of n, and possibly also di�erent values of s. Theoperating system needs a way to keep such caches up to date when pages are being al-located, moved, swapped, or recycled. The operating system also likes to know whichpages have been recently used. The LDVTS instructions facilitate such operations:

� LDVTS $X,$Y,$Z|Z `load virtual translation status'.The sum $Y+$Z or $Y+Z should have the form of a translation cache key as above,except that the rightmost three bits need not be zero. If this key is present in a TC,the rightmost three bits replace the current protection code p; however, if p is therebyset to zero, the key is removed from the TC. Register X is set to 0 if the key wasnot present in any translation cache, or to 1 if the key was present in the TC forinstructions, or to 2 if the key was present in the TC for data, or to 3 if the key waspresent in both. This instruction is for the operating system only.

Page 64: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: VIRTUAL AND PHYSICAL ADDRESSES 56

47. We mentioned earlier that cheap versions of MMIX might calculate the physicaladdresses with software instead of hardware, using forced traps when the operatingsystem needs to do page table calculations. Here is some code that could be usedfor such purposes; it de�nes the translation process precisely, given a nonnegativevirtual address in register rYY. First we must unpack the �elds of rV and computethe relevant base addresses for PTEs and PTPs:

GET virt,rYY

GET $7,rV % $7=(virtual translation register)

AND $1,$7,#7 % $1=rightmost three bits

BNZ $1,Fail % those bits should be zero

SRU $1,virt,61 % $1=i (segment number of virtual address)

SLU $1,$1,2

NEG $1,52,$1 % $1=52-4i

SRU $1,$7,$1

SLU $2,$1,4

SETL $0,#f000

AND $1,$1,$0 % $1=b[i]<<12

AND $2,$2,$0 % $2=b[i+1]<<12

SLU $3,$7,24

SRU $3,$3,37

SLU $3,$3,13 % $3=(r field of rV)

ORH $3,#8000 % make $3 a physical address

2ADDU base,$1,$3 % base=address of first page table

2ADDU limit,$2,$3 % limit=address after last page table

SRU s,$7,40

AND s,s,#ff % s=(s field of rV)

CMP $0,s,13

BN $0,Fail % s must be 13 or more

CMP $0,s,49

BNN $0,Fail % s must be 48 or less

SETH mask,#8000

ORL mask,#1ff8 % mask=(sign bit and n field)

ORH $7,#8000 % set sign bit for PTP validation below

SRU $0,virt,s % $0=a4a3a2a1a0 (page number of virt)

ZSZ $1,$0,1 % $1=[page number is zero]

ADD limit,limit,$1 % increase limit if page number is zero

The next part of the routine �nds the \digits" of the page number (a4a3a2a1a0)1024,from right to left:

OR $5,base,0; SRU $1,$0,10; PBZ $1,1F

AND $0,#3ff; INCL base,#2000

OR $5,base,0; SRU $2,$1,10; PBZ $2,2F

AND $1,#3ff; INCL base,#2000

OR $5,base,0; SRU $3,$2,10; PBZ $3,3F

AND $2,#3ff; INCL base,#2000

OR $5,base,0; SRU $4,$3,10; PBZ $4,4F

AND $3,#3ff; INCL base,#2000

Page 65: MMIXware - A RISC Computer for the Third Millennium - Knuth

57 MMIX: VIRTUAL AND PHYSICAL ADDRESSES

Then the process cascades back through PTPs.

OR $5,base,0

8ADDU $6,$4,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

ANDNL base,#1fff

4H 8ADDU $6,$3,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

ANDNL base,#1fff

3H 8ADDU $6,$2,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

ANDNL base,#1fff

2H 8ADDU $6,$1,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

Finally we obtain the PTE and communicate it to the machine. If errors have beendetected, we set the translation to zero; actually any translation with permission bitszero would have the same e�ect.

ANDNL base,#1fff % remove low 13 bits of PTP

1H 8ADDU $6,$0,base

LDO base,$6,0 % base=PTE

XOR base,base,$7

ANDN $6,base,#7

SLU $6,$6,51

BNZ $6,Fail % branch if n doesn't match

CMP $6,$5,limit

BN $6,Ready % did we run off the end of the page table?

Fail SETL base,0 % errors lead to PTE of zero

Ready PUT rZZ,base

LDO $255,IntMask % load the desired setting of rK

RESUME 1 % now the machine will digest the translation

All loads and stores in this program deal with negative virtual addresses. This e�ec-tively shuts o� memory mapping and makes the page tables inaccessible to the user.The program assumes that the ropcode in rXX is 3 (which it is when a forced trap

is triggered by the need for virtual translation).The translation from virtual pages to physical pages need not actually follow the

rules for PTPs and PTEs; any other mapping could be substituted by operatingsystems with special needs. But people usually want compatibility between di�erentimplementations whenever possible. The only parts of rV that MMIX really needs arethe s �eld, which de�nes page sizes, and the n �eld, which keeps TC entries of oneprocess from being confused with the TC entries of another.

Page 66: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: THE COMPLETE INSTRUCTION SET 58

48. The complete instruction set. We have now described all of MMIX's specialregisters|except one: The special failure location register rF is set to a physicalmemory address when a parity error or other memory fault occurs. (The instructionleading to this error will probably be long gone before such a fault is detected; forexample, the machine might be trying to write old data from a cache in order to makeroom for new data. Thus there is generally no connection between the current virtualprogram location rW and the physical location of a memory error. But knowledgeof the latter location can still be useful for hardware repair, or when an operatingsystem is booting up.)

49. One additional instruction proves to be useful.

� SWYM X,Y,Z `sympathize with your machinery'.This command lubricates the disk drives, fans, magnetic tape drives, laser printers,scanners, and any other mechanical equipment hooked up to MMIX, if necessary. FieldsX, Y, and Z are ignored.The SWYM command was originally included in MMIX's repertoire because machines

occasionally need grease to keep in shape, just as human beings occasionally needto swim or do some other kind of exercise in order to maintain good muscle tone.But in fact, SWYM has turned out to be a \no-op," an instruction that does nothingat all; the hypothetical manufacturers of our hypothetical machine have pointed outthat modern computer equipment is already well oiled and sealed for permanent use.Even so, a no-op instruction provides a good way for software to send signals to thehardware, for such things as scheduling the way instructions are issued on superscalarsuperpipelined buzzword-compliant machines. Software programs can also use no-opsto communicate with other programs like symbolic debuggers.When a forced trap computes the translation rZZ of a virtual address rYY, rop-

code 3 of RESUME 1 will put (rYY; rZZ) into the TC for instructions if the opcodein rXX is SWYM; otherwise (rYY; rZZ) will be put into the TC for data.

50. The running time of MMIX programs depends to a great extent on changes intechnology. MMIX is a mythical machine, but its mythical hardware exists in cheap,slow versions as well as in costly high-performance models. Details of running timeusually depend on things like the amount of main memory available to implementvirtual memory, as well as the sizes of caches and other bu�ers.For practical purposes, the running time of an MMIX program can often be estimated

satisfactorily by assigning a �xed cost to each operation, based on the approximaterunning time that would be obtained on a high-performance machine with lots ofmain memory; so that's what we will do. Each operation will be assumed to takean integer number of �, where � (pronounced \oops") is a unit that represents theclock cycle time in a pipelined implementation. The value of � will probably decreasefrom year to year, but I'll keep calling it �. The running time will also depend onthe number of memory references or mems that a program uses; this is the numberof load and store instructions. For example, each LDO (load octa) instruction will beassumed to cost �+ �, where � is the average cost of a memory reference. The totalrunning time of a program might be reported as, say, 35�+1000�, meaning 35 memsplus 1000 oops. The ratio �=� will probably increase with time, so mem-counting is

Page 67: MMIXware - A RISC Computer for the Third Millennium - Knuth

59 MMIX: THE COMPLETE INSTRUCTION SET

likely to become increasingly important. [See the discussion of mems in The Stanford

GraphBase (New York: ACM Press, 1994).]Integer addition, subtraction, and comparison all take just 1�. The same is true

for SET, GET, PUT, SYNC, and SWYM instructions, as well as bitwise logical operations,shifts, relative jumps, comparisons, conditional assignments, and correctly predictedbranches-not-taken or probable-branches-taken. Mispredicted branches or probablebranches cost 3�, and so do the POP and GO commands. Integer multiplication takes10�; integer division weighs in at 60�. TRAP, TRIP, and RESUME cost 5� each.Most oating point operations have a nominal running time of 4�, although the

comparison operators FCMP, FEQL, and FUN need only 1�. FDIV and FSQRT cost 40�each. The actual running time of oating point computations will vary depending onthe operands; for example, the machine might need one extra � for each denormalinput or output, and it might slow down greatly when trips are enabled. The FREM

instruction might typically cost (3+Æ)�, where Æ is the amount by which the exponentof the �rst operand exceeds the exponent of the second (or zero, if this amount isnegative). A oating point operation might take only 1� if at least one of its operandsis zero, in�nity, or NaN. However, the �xed values stated at the beginning of thisparagraph will be used for all seat-of-the-pants estimates of running time, since wewant to keep the estimates as simple as possible without making them terribly out ofline.All load and store operations will be assumed to cost � + �, except that CSWAP

costs 2� + 2�. (This applies to all OP codes that begin with #8, #9, #A, and #B,except #98{#9F and #B8{#BF. It's best to keep the rules simple, because � is justan approximate device for estimating average memory cost.) SAVE and UNSAVE arecharged 20�+ �.Of course we must remember that these numbers are very rough. We have not

included the cost of fetching instructions from memory. Furthermore, an integermultiplication or division might have an e�ective cost of only 1�, if the result is notneeded while other numbers are being calculated. Only a detailed simulation can beexpected to be truly realistic.

Page 68: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX: THE COMPLETE INSTRUCTION SET 60

51. If you think that MMIX has plenty of operation codes, you are right; we havenow described them all. Here is a chart that shows their numeric values:

#0 #1 #2 #3 #4 #5 #6 #7

TRAP FCMP FUN FEQL FADD FIX FSUB FIXU#0x #0x

FLOT[I] FLOTU[I] SFLOT[I] SFLOTU[I]

FMUL FCMPE FUNE FEQLE FDIV FSQRT FREM FINT#1x #1x

MUL[I] MULU[I] DIV[I] DIVU[I]

ADD[I] ADDU[I] SUB[I] SUBU[I]#2x #2x

2ADDU[I] 4ADDU[I] 8ADDU[I] 16ADDU[I]

CMP[I] CMPU[I] NEG[I] NEGU[I]#3x #3x

SL[I] SLU[I] SR[I] SRU[I]

BN[B] BZ[B] BP[B] BOD[B]#4x #4x

BNN[B] BNZ[B] BNP[B] BEV[B]

PBN[B] PBZ[B] PBP[B] PBOD[B]#5x #5x

PBNN[B] PBNZ[B] PBNP[B] PBEV[B]

CSN[I] CSZ[I] CSP[I] CSOD[I]#6x #6x

CSNN[I] CSNZ[I] CSNP[I] CSEV[I]

ZSN[I] ZSZ[I] ZSP[I] ZSOD[I]#7x #7x

ZSNN[I] ZSNZ[I] ZSNP[I] ZSEV[I]

LDB[I] LDBU[I] LDW[I] LDWU[I]#8x #8x

LDT[I] LDTU[I] LDO[I] LDOU[I]

LDSF[I] LDHT[I] CSWAP[I] LDUNC[I]#9x #9x

LDVTS[I] PRELD[I] PREGO[I] GO[I]

STB[I] STBU[I] STW[I] STWU[I]#Ax #Ax

STT[I] STTU[I] STO[I] STOU[I]

STSF[I] STHT[I] STCO[I] STUNC[I]#Bx #Bx

SYNCD[I] PREST[I] SYNCID[I] PUSHGO[I]

OR[I] ORN[I] NOR[I] XOR[I]#Cx #Cx

AND[I] ANDN[I] NAND[I] NXOR[I]

BDIF[I] WDIF[I] TDIF[I] ODIF[I]#Dx #Dx

MUX[I] SADD[I] MOR[I] MXOR[I]

SETH SETMH SETML SETL INCH INCMH INCML INCL#Ex #Ex

ORH ORMH ORML ORL ANDNH ANDNMH ANDNML ANDNL

JMP[B] PUSHJ[B] GETA[B] PUT[I]#Fx #Fx

POP RESUME SAVE UNSAVE SYNC SWYM GET TRIP

#8 #9 #A #B #C #D #E #F

Page 69: MMIXware - A RISC Computer for the Third Millennium - Knuth

61 MMIX: THE COMPLETE INSTRUCTION SET

The notation `[I]' indicates an operation with an \immediate" variant in whichthe Z �eld denotes a constant instead of a register number. Similarly, `[B]' indicatesan operation with a \backward" variant in which a relative address has a negativedisplacement. Simulators and other programs that need to present MMIX instructionsin symbolic form will say that opcode #20 is ADD while opcode #21 is ADDI; they willsay that #F2 is PUSHJ while #F3 is PUSHJB. But the MMIX assembler uses only theforms ADD and PUSHJ, not ADDI or PUSHJB.To read this chart, use the hexadecimal digits at the top, bottom, left, and right.

For example, operation code A9 in hexadecimal notation appears in the lower part ofthe #Ax row and in the #1/#9 column; it is STTI, `store tetrabyte immediate'.

Page 70: MMIXware - A RISC Computer for the Third Millennium - Knuth

62

MMIX-ARITH

1. Introduction. The subroutines below are used to simulate 64-bit MMIX arith-metic on an old-fashioned 32-bit computer|like the one the author had when hewrote MMIXAL and the �rst MMIX simulators in 1998 and 1999. All operations are fab-ricated from 32-bit arithmetic, including a full implementation of the IEEE oatingpoint standard, assuming only that the C compiler has a 32-bit unsigned integer type.Some day 64-bit machines will be commonplace and the awkward manipulations

of the present program will look quite archaic. Interested readers who have suchcomputers will be able to convert the code to a pure 64-bit form without diÆculty,thereby obtaining much faster and simpler routines. Meanwhile, however, we cansimulate the future and hope for continued progress.This program module has a simple structure, intended to make it suitable for

loading with MMIX simulators and assemblers.

h Stu� for C preprocessor 2 i

typedef enum f false ; true g bool;

hTetrabyte and octabyte type de�nitions 3 ihOther type de�nitions 36 ihGlobal variables 4 ih Subroutines 5 i

2. Subroutines of this program are declared �rst with a prototype, as in ANSI C,then with an old-style C function de�nition. Here are some preprocessor commandsthat make this work correctly with both new-style and old-style compilers.

hStu� for C preprocessor 2 i �#ifdef __STDC__

#de�ne ARGS(list ) list

#else#de�ne ARGS(list ) ( )#endif

This code is used in section 1.

3. The de�nition of type tetra should be changed, if necessary, so that it representsan unsigned 32-bit integer.

hTetrabyte and octabyte type de�nitions 3 i �typedef unsigned int tetra;=� for systems conforming to the LP-64 data model �=

typedef struct ftetra h; l;

g octa; =� two tetrabytes makes one octabyte �=

This code is used in section 1.

4. #de�ne sign bit ((unsigned) #80000000)

hGlobal variables 4 i �octa zero octa ; =� zero octa :h = zero octa :l = 0 �=octa neg one = f�1;�1g; =� neg one :h = neg one :l = �1 �=octa inf octa = f#7ff00000; 0g; =� oating point +1 �=

D.E. Knuth: MMIXware, LNCS 1750, pp. 62-109, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 71: MMIXware - A RISC Computer for the Third Millennium - Knuth

63 MMIX-ARITH: INTRODUCTION

octa standard NaN = f#7ff80000; 0g; =� oating point NaN(.5) �=octa aux ; =� auxiliary output of a subroutine �=bool over ow ; =� set by certain subroutines for signed arithmetic �=

See also sections 9, 30, 32, 69, and 75.

This code is used in section 1.

5. It's easy to add and subtract octabytes, if we aren't terribly worried about speed.

hSubroutines 5 i �octa oplus ARGS((octa;octa));octa oplus (y; z) =� compute y + z �=

octa y; z;f octa x;

x:h = y:h+ z:h; x:l = y:l + z:l;if (x:l < y:l) x:h++;return x;

g

octa ominus ARGS((octa;octa));octa ominus (y; z) =� compute y � z �=

octa y; z;f octa x;

x:h = y:h� z:h; x:l = y:l � z:l;if (x:l > y:l) x:h��;return x;

g

See also sections 6, 7, 8, 12, 13, 24, 25, 26, 27, 28, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 50, 54, 60,61, 62, 68, 82, 85, 86, 88, 89, 91, and 93.

This code is used in section 1.

6. In the following subroutine, delta is a signed quantity that is assumed to �t in asigned tetrabyte.

hSubroutines 5 i +�octa incr ARGS((octa; int));octa incr (y; delta ) =� compute y + Æ �=

octa y;int delta ;

f octa x;

x:h = y:h; x:l = y:l + delta ;if (delta � 0) fif (x:l < y:l) x:h++;

g else if (x:l > y:l) x:h��;return x;

g

__STDC__, Standard C.

Page 72: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: INTRODUCTION 64

7. Left and right shifts are only a bit more diÆcult.

hSubroutines 5 i +�octa shift left ARGS((octa; int));octa shift left (y; s) =� shift left by s bits, where 0 � s � 64 �=

octa y;int s;

fwhile (s � 32) y:h = y:l; y:l = 0; s �= 32;if (s) f register tetra yhl = y:h� s; ylh = y:l� (32� s);

y:h = yhl + ylh ; y:l�= s;greturn y;

g

octa shift right ARGS((octa; int; int));octa shift right (y; s; u) =� shift right, arithmetically if u = 0 �=

octa y;int s; u;

fwhile (s � 32) y:l = y:h; y:h = (u ? 0 : �(y:h� 31)); s �= 32;if (s) f register tetra yhl = y:h� (32� s); ylh = y:l� s;

y:h = (u ? 0 : (�(y:h� 31))� (32� s)) + (y:h� s); y:l = yhl + ylh ;greturn y;

g

Page 73: MMIXware - A RISC Computer for the Third Millennium - Knuth

65 MMIX-ARITH: MULTIPLICATION

8. Multiplication. We need to multiply two unsigned 64-bit integers, obtainingan unsigned 128-bit product. It is easy to do this on a 32-bit machine by usingAlgorithm 4.3.1M of Seminumerical Algorithms, with b = 216.The following subroutine returns the lower half of the product, and puts the upper

half into a global octabyte called aux .

hSubroutines 5 i +�octa omult ARGS((octa;octa));octa omult (y; z)

octa y; z;fregister int i; j; k;tetra u[4]; v[4]; w[8];register tetra t;octa acc ;

hUnpack the multiplier and multiplicand to u and v 10 i;for (j = 0; j < 4; j++) w[j] = 0;for (j = 0; j < 4; j++)if (:v[j]) w[j + 4] = 0;else ffor (i = k = 0; i < 4; i++) f

t = u[i] � v[j] + w[i+ j] + k;w[i+ j] = t& #

ffff; k = t� 16;gw[j + 4] = k;

ghPack w into the outputs aux and acc 11 i;return acc ;

g

9. hGlobal variables 4 i +�extern octa aux ; =� secondary output of subroutines with multiple outputs �=extern bool over ow ;

10. hUnpack the multiplier and multiplicand to u and v 10 i �u[3] = y:h� 16; u[2] = y:h& #

ffff; u[1] = y:l� 16; u[0] = y:l & #ffff;

v[3] = z:h� 16; v[2] = z:h& #ffff; v[1] = z:l � 16; v[0] = z:l & #

ffff;

This code is used in section 8.

11. hPack w into the outputs aux and acc 11 i �aux :h = (w[7]� 16) + w[6]; aux :l = (w[5]� 16) + w[4];acc :h = (w[3]� 16) + w[2]; acc :l = (w[1]� 16) + w[0];

This code is used in section 8.

ARGS=macro ( ), x2.aux : octa, x4.bool: enum, x1.

h: tetra, x3.l: tetra, x3.octa= struct, x3.

over ow : bool, x4.tetra=unsigned int, x3.

Page 74: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: MULTIPLICATION 66

12. Signed multiplication has the same lower half product as unsigned multiplica-tion. The signed upper half product is obtained with at most two further subtractions,after which the result has over owed if and only if the upper half is unequal to 64copies of the sign bit in the lower half.

hSubroutines 5 i +�octa signed omult ARGS((octa;octa));octa signed omult (y; z)

octa y; z;focta acc ;

acc = omult (y; z);if (y:h& sign bit ) aux = ominus (aux ; z);if (z:h& sign bit ) aux = ominus (aux ; y);over ow = (aux :h 6= aux :l _ ((aux :h� acc :h) & sign bit ));return acc ;

g

Page 75: MMIXware - A RISC Computer for the Third Millennium - Knuth

67 MMIX-ARITH: DIVISION

13. Division. Long division of an unsigned 128-bit integer by an unsigned 64-bitinteger is, of course, one of the most challenging routines needed for MMIX arithmetic.The following program, based on Algorithm 4.3.1D of Seminumerical Algorithms,computes octabytes q and r such that (264x + y) = qz + r and 0 � r < z, givenoctabytes x, y, and z, assuming that x < z. (If x � z, it simply sets q = x andr = y.) The quotient q is returned by the subroutine; the remainder r is stored inaux

hSubroutines 5 i +�octa odiv ARGS((octa;octa;octa));octa odiv (x; y; z)

octa x; y; z;fregister int i; j; k; n; d;tetra u[8]; v[4]; q[4]; mask ; qhat ; rhat ; vh ; vmh ;register tetra t;octa acc ;

hCheck that x < z; otherwise give trivial answer 14 i;hUnpack the dividend and divisor to u and v 15 i;hDetermine the number of signi�cant places n in the divisor v 16 i;hNormalize the divisor 17 i;for (j = 3; j � 0; j��) hDetermine the quotient digit q[j] 20 i;hUnnormalize the remainder 18 i;hPack q and u to acc and aux 19 i;return acc ;

g

14. hCheck that x < z; otherwise give trivial answer 14 i �if (x:h > z:h _ (x:h � z:h ^ x:l � z:l)) f

aux = y; return x;g

This code is used in section 13.

15. hUnpack the dividend and divisor to u and v 15 i �u[7] = x:h� 16; u[6] = x:h& #

ffff; u[5] = x:l � 16; u[4] = x:l & #ffff;

u[3] = y:h� 16; u[2] = y:h& #ffff; u[1] = y:l� 16; u[0] = y:l & #

ffff;v[3] = z:h� 16; v[2] = z:h& #

ffff; v[1] = z:l � 16; v[0] = z:l & #ffff;

This code is used in section 13.

16. hDetermine the number of signi�cant places n in the divisor v 16 i �for (n = 4; v[n� 1] � 0; n��) ;

This code is used in section 13.

ARGS=macro ( ), x2.aux : octa, x4.h: tetra, x3.l: tetra, x3.

octa= struct, x3.ominus : octa ( ), x5.omult : octa ( ), x8.

over ow : bool, x4.sign bit =macro, x4.tetra=unsigned int, x3.

Page 76: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: DIVISION 68

17. We shift u and v left by d places, where d is chosen to make 215 � vn�1 < 216.

hNormalize the divisor 17 i �vh = v[n� 1];for (d = 0; vh < #

8000; d++; vh �= 1) ;for (j = k = 0; j < n+ 4; j++) f

t = (u[j]� d) + k;u[j] = t& #

ffff; k = t� 16;gfor (j = k = 0; j < n; j++) f

t = (v[j]� d) + k;v[j] = t& #

ffff; k = t� 16;gvh = v[n� 1];vmh = (n > 1 ? v[n� 2] : 0);

This code is used in section 13.

18. hUnnormalize the remainder 18 i �mask = (1� d)� 1;for (j = 3; j � n; j��) u[j] = 0;for (k = 0; j � 0; j��) f

t = (k � 16) + u[j];u[j] = t� d; k = t&mask ;

g

This code is used in section 13.

19. hPack q and u to acc and aux 19 i �acc :h = (q[3]� 16) + q[2]; acc :l = (q[1]� 16) + q[0];aux :h = (u[3]� 16) + u[2]; aux :l = (u[1]� 16) + u[0];

This code is used in section 13.

20. hDetermine the quotient digit q[j] 20 i �fhFind the trial quotient, q 21 i;h Subtract bj qv from u 22 i;h If the result was negative, decrease q by 1 23 i;q[j] = qhat ;

g

This code is used in section 13.

21. hFind the trial quotient, q 21 i �t = (u[j + n]� 16) + u[j + n� 1];qhat = t=vh ; rhat = t� vh � qhat ;while (qhat � #

10000 _ qhat � vmh > (rhat � 16) + u[j + n� 2]) fqhat��; rhat += vh ;if (rhat � #

10000) break;g

This code is used in section 20.

Page 77: MMIXware - A RISC Computer for the Third Millennium - Knuth

69 MMIX-ARITH: DIVISION

22. After this step, u[j + n] will either equal k or k � 1. The true value of u wouldbe obtained by subtracting k from u[j + n]; but we don't have to fuss over u[j + n],because it won't be examined later.

hSubtract bj qv from u 22 i �for (i = k = 0; i < n; i++) f

t = u[i+ j] + #ffff0000� k � qhat � v[i];

u[i+ j] = t& #ffff; k = #

ffff� (t� 16);g

This code is used in section 20.

23. The correction here occurs only rarely, but it can be necessary|for example,when dividing the number #7fff800100000000 by #

800080020005.

h If the result was negative, decrease q by 1 23 i �if (u[j + n] 6= k) f

qhat��;for (i = k = 0; i < n; i++) f

t = u[i+ j] + v[i] + k;u[i+ j] = t& #

ffff; k = t� 16;g

g

This code is used in section 20.

acc : octa, x13.aux : octa, x4.d: register int, x13.h: tetra, x3.i: register int, x13.j: register int, x13.

k: register int, x13.l: tetra, x3.mask : tetra, x13.n: register int, x13.q: tetra [ ], x13.qhat : tetra, x13.

rhat : tetra, x13.t: register tetra, x13.u: tetra [ ], x13.v: tetra [ ], x13.vh : tetra, x13.vmh : tetra, x13.

Page 78: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: DIVISION 70

24. Signed division can be reduced to unsigned division in a tedious but straight-forward manner. We assume that the divisor isn't zero.

hSubroutines 5 i +�octa signed odiv ARGS((octa;octa));octa signed odiv (y; z)

octa y; z;focta yy ; zz ; q;register int sy ; sz ;

if (y:h& sign bit ) sy = 2; yy = ominus (zero octa ; y);else sy = 0; yy = y;if (z:h& sign bit ) sz = 1; zz = ominus (zero octa ; z);else sz = 0; zz = z;q = odiv (zero octa ; yy ; zz );over ow = false ;switch (sy + sz ) fcase 2 + 1 : aux = ominus (zero octa ; aux );if (q:h � sign bit ) over ow = true ;

case 0 + 0 : return q;case 2 + 0: if (aux :h _ aux :l) aux = ominus (zz ; aux );goto negate q ;

case 0 + 1: if (aux :h _ aux :l) aux = ominus (aux ; zz );negate q : if (aux :h _ aux :l) return ominus (neg one ; q);else return ominus (zero octa ; q);

gg

Page 79: MMIXware - A RISC Computer for the Third Millennium - Knuth

71 MMIX-ARITH: BIT FIDDLING

25. Bit �ddling. The bitwise operators of MMIX are fairly easy to implementdirectly, but three of them occur often enough to deserve packaging as subroutines.

hSubroutines 5 i +�octa oand ARGS((octa;octa));octa oand (y; z) =� compute y ^ z �=

octa y; z;f octa x;

x:h = y:h& z:h; x:l = y:l & z:l;return x;

g

octa oandn ARGS((octa;octa));octa oandn (y; z) =� compute y ^ �z �=

octa y; z;f octa x;

x:h = y:h&�z:h; x:l = y:l &�z:l;return x;

g

octa oxor ARGS((octa;octa));octa oxor (y; z) =� compute y � z �=

octa y; z;f octa x;

x:h = y:h� z:h; x:l = y:l � z:l;return x;

g

26. Here's a fun way to count the number of bits in a tetrabyte. [This classicaltrick is called the \Gillies{Miller method for sideways addition" in The Preparation

of Programs for an Electronic Digital Computer by Wilkes, Wheeler, and Gill, secondedition (Reading, Mass.: Addison{Wesley, 1957), 191{193.]

hSubroutines 5 i +�int count bits ARGS((tetra));int count bits (x)

tetra x;fregister int xx = x;

xx = (xx & #55555555) + ((xx � 1) & #

55555555);xx = (xx & #

33333333) + ((xx � 2) & #33333333);

xx = (xx & #0f0f0f0f) + ((xx � 4) & #

0f0f0f0f);xx = (xx & #

00ff00ff) + ((xx � 8) & #00ff00ff);

return (xx & #0000ffff) + (xx � 16);

g

ARGS=macro ( ), x2.aux : octa, x4.false =0, x1.h: tetra, x3.l: tetra, x3.

neg one : octa, x4.octa= struct, x3.odiv : octa ( ), x13.ominus : octa ( ), x5.over ow : bool, x4.

sign bit =macro, x4.tetra=unsigned int, x3.true =1, x1.zero octa : octa, x4.

Page 80: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: BIT FIDDLING 72

27. To compute the nonnegative byte di�erences of two given tetrabytes, we cancarry out the following 20-step branchless computation:

hSubroutines 5 i +�tetra byte di� ARGS((tetra; tetra));tetra byte di� (y; z)

tetra y; z;fregister tetra d = (y & #

00ff00ff) + #01000100� (z & #

00ff00ff);register tetra m = d& #

01000100;register tetra x = d& (m� (m� 8));

d = ((y � 8) & #00ff00ff) + #

01000100� ((z � 8) & #00ff00ff);

m = d& #01000100;

return x+ ((d& (m� (m� 8)))� 8);g

28. To compute the nonnegative wyde di�erences of two tetrabytes, another trickleads to a 15-step branchless computation. (Research problem: Can count bits ,byte di� , or wyde di� be done with fewer operations?)

hSubroutines 5 i +�tetra wyde di� ARGS((tetra; tetra));tetra wyde di� (y; z)

tetra y; z;fregister tetra a = ((y � 16)� (z � 16)) & #

10000;register tetra b = ((y & #

ffff)� (z & #ffff)) & #

10000;

return y � (z � ((y � z) & (b� a� (b� 16))));g

29. The last bitwise subroutine we need is the most interesting: It implementsMMIX's MOR and MXOR operations.

hSubroutines 5 i +�octa bool mult ARGS((octa;octa;bool));octa bool mult (y; z; xor )

octa y; z; =� the operands �=bool xor ; =� do we do xor instead of or? �=

focta o; x;register tetra a; b; c;register int k;

for (k = 0; o = y; x = zero octa ; o:h _ o:l; k++; o = shift right (o; 8; 1))if (o:l & #

ff) fa = ((z:h� k) & #

01010101) � #ff;b = ((z:l � k) & #

01010101) � #ff;c = (o:l & #

ff) � #01010101;if (xor ) x:h �= a& c; x:l �= b& c;else x:h j= a& c; x:l j= b& c;

greturn x;

g

Page 81: MMIXware - A RISC Computer for the Third Millennium - Knuth

73 MMIX-ARITH: BIT FIDDLING

ARGS=macro ( ), x2.bool: enum, x1.count bits : int ( ), x26.

h: tetra, x3.l: tetra, x3.octa= struct, x3.

shift right : octa ( ), x7.tetra=unsigned int, x3.zero octa : octa, x4.

Page 82: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 74

30. Floating point packing and unpacking. Standard IEEE oating binarynumbers pack a sign, exponent, and fraction into a tetrabyte or octabyte. In thissection we consider basic subroutines that convert between IEEE format and theseparate unpacked components.

#de�ne ROUND_OFF 1#de�ne ROUND_UP 2#de�ne ROUND_DOWN 3#de�ne ROUND_NEAR 4

hGlobal variables 4 i +�int cur round ; =� the current rounding mode �=

31. The fpack routine takes an octabyte f , a raw exponent e, and a sign s, andpacks them into the oating binary number that corresponds to �2e�1076f , using agiven rounding mode. The value of f should satisfy 254 � f � 255.Thus, for example, the oating binary number +1:0 = #

3ff0000000000000 isobtained when f = 254, e = #

3fe, and s = '+'. The raw exponent e is usually oneless than the �nal exponent value; the leading bit of f is essentially added to theexponent. (This trick works nicely for denormal numbers, when e < 0, or in caseswhere the value of f is rounded upwards to 255.)Exceptional events are noted by oring appropriate bits into the global variable

exceptions . Special considerations apply to under ow, which is not fully speci�ed bySection 7.4 of the IEEE standard: Implementations of the standard are free to choosebetween two de�nitions of \tininess" and two de�nitions of \accuracy loss." MMIX

determines tininess after rounding, hence a result with e < 0 is not necessarily tiny;MMIX treats accuracy loss as equivalent to inexactness. Thus, a result under ows ifand only if it is tiny and either (i) it is inexact or (ii) the under ow trap is enabled.The fpack routine sets U_BIT in exceptions if and only if the result is tiny, X_BIT ifand only if the result is inexact.

#de�ne X_BIT (1� 8) =� oating inexact �=#de�ne Z_BIT (1� 9) =� oating division by zero �=#de�ne U_BIT (1� 10) =� oating under ow �=#de�ne O_BIT (1� 11) =� oating over ow �=#de�ne I_BIT (1� 12) =� oating invalid operation �=#de�ne W_BIT (1� 13) =� oat-to-�x over ow �=#de�ne V_BIT (1� 14) =� integer over ow �=#de�ne D_BIT (1� 15) =� integer divide check �=#de�ne E_BIT (1� 18) =� external (dynamic) trap bit �=

hSubroutines 5 i +�octa fpack ARGS((octa; int; char; int));octa fpack (f; e; s; r)

octa f ; =� the normalized fraction part �=int e; =� the raw exponent �=char s; =� the sign �=int r; =� the rounding mode �=

focta o;

if (e > #7fd) e = #

7ff; o = zero octa ;

Page 83: MMIXware - A RISC Computer for the Third Millennium - Knuth

75 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING

else fif (e < 0) fif (e < �54) o:h = 0; o:l = 1;else f octa oo ;

o = shift right (f;�e; 1);oo = shift left (o;�e);if (oo :l 6= f:l _ oo :h 6= f:h) o:l j= 1; =� sticky bit �=

ge = 0;

g else o = f ;ghRound and return the result 33 i;

g

32. hGlobal variables 4 i +�int exceptions ; =� bits possibly destined for rA �=

33. Everything falls together so nicely here, it's almost too good to be true!

hRound and return the result 33 i �if (o:l & 3) exceptions j= X_BIT;switch (r) fcase ROUND_DOWN: if (s � '-') o = incr (o; 3); break;case ROUND_UP: if (s 6= '-') o = incr (o; 3);case ROUND_OFF: break;case ROUND_NEAR: o = incr (o; o:l & 4 ? 2 : 1); break;go = shift right (o; 2; 1);o:h += e� 20;if (o:h � #

7ff00000) exceptions j= O_BIT + X_BIT; =� over ow �=else if (o:h < #

100000) exceptions j= U_BIT; =� tininess �=if (s � '-') o:h j= sign bit ;return o;

This code is used in section 31.

ARGS=macro ( ), x2.h: tetra, x3.incr : octa ( ), x6.

l: tetra, x3.octa= struct, x3.shift left : octa ( ), x7.

shift right : octa ( ), x7.sign bit =macro, x4.zero octa : octa, x4.

Page 84: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 76

34. Similarly, sfpack packs a short oat, from inputs having the same conventionsas fpack .

hSubroutines 5 i +�tetra sfpack ARGS((octa; int; char; int));tetra sfpack (f; e; s; r)

octa f ; =� the fraction part �=int e; =� the raw exponent �=char s; =� the sign �=int r; =� the rounding mode �=

fregister tetra o;

if (e > #47d) e = #

47f; o = 0;else f

o = shift left (f; 3):h;if (f:l & #

1fffffff) o j= 1;if (e < #

380) fif (e < #

380� 25) o = 1;else f register tetra o0 ; oo ;

o0 = o;o = o� (#380� e);oo = o� (#380� e);if (oo 6= o0 ) o j= 1; =� sticky bit �=

ge = #

380;g

ghRound and return the short result 35 i;

g

35. hRound and return the short result 35 i �if (o& 3) exceptions j= X_BIT;switch (r) fcase ROUND_DOWN: if (s � '-') o += 3; break;case ROUND_UP: if (s 6= '-') o += 3;case ROUND_OFF: break;case ROUND_NEAR: o += (o& 4 ? 2 : 1); break;go = o� 2;o += (e� #

380)� 23;if (o � #

7f800000) exceptions j= O_BIT + X_BIT; =� over ow �=else if (o < #

100000) exceptions j= U_BIT; =� tininess �=if (s � '-') o j= sign bit ;return o;

This code is used in section 34.

36. The funpack routine is, roughly speaking, the opposite of fpack . It takes agiven oating point number x and separates out its fraction part f , exponent e, andsign s. It clears exceptions to zero. It returns the type of value found: zro , num , inf ,

Page 85: MMIXware - A RISC Computer for the Third Millennium - Knuth

77 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING

or nan . When it returns num , it will have set f , e, and s to the values from whichfpack would produce the original number x without exceptions.

#de�ne zero exponent (�1000) =� zero is assumed to have this exponent �=

hOther type de�nitions 36 i �typedef enum f

zro ; num ; inf ;nang ftype;

See also section 59.

This code is used in section 1.

37. h Subroutines 5 i +�ftype funpack ARGS((octa;octa �; int �; char �));ftype funpack (x; f; e; s)

octa x; =� the given oating point value �=octa �f ; =� address where the fraction part should be stored �=int �e; =� address where the exponent part should be stored �=char �s; =� address where the sign should be stored �=

fregister int ee ;

exceptions = 0;�s = (x:h& sign bit ? '-' : '+');�f = shift left (x; 2);f~h &= #

3fffff;ee = (x:h� 20) & #

7ff;if (ee ) f�e = ee � 1;f~h j=

#400000;

return (ee < #7ff ? num : f~h �

#400000 ^ :f~ l ? inf : nan );

gif (:x:l ^ :f~h) f�e = zero exponent ; return zro ;

gdo f ee��; �f = shift left (�f; 1); g while (:(f~h& #

400000));�e = ee ; return num ;

g

ARGS=macro ( ), x2.exceptions : int, x32.fpack : octa ( ), x31.h: tetra, x3.l: tetra, x3.O_BIT=macro, x31.

octa= struct, x3.ROUND_DOWN=3, x30.ROUND_NEAR=4, x30.ROUND_OFF=1, x30.ROUND_UP=2, x30.

shift left : octa ( ), x7.sign bit =macro, x4.tetra=unsigned int, x3.U_BIT=macro, x31.X_BIT=macro, x31.

Page 86: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 78

38. h Subroutines 5 i +�ftype sfunpack ARGS((tetra;octa �; int �; char �));ftype sfunpack (x; f ; e; s)

tetra x; =� the given oating point value �=octa �f ; =� address where the fraction part should be stored �=int �e; =� address where the exponent part should be stored �=char �s; =� address where the sign should be stored �=

fregister int ee ;

exceptions = 0;�s = (x& sign bit ? '-' : '+');f~h = (x� 1) & #

3fffff; f~ l = x� 31;ee = (x� 23) & #

ff;if (ee ) f�e = ee + #

380� 1;f~h j=

#400000;

return (ee < #ff ? num : (x& #

7fffffff) � #7f800000 ? inf : nan );

gif (:(x& #

7fffffff)) f�e = zero exponent ; return zro ;

gdo f ee��; �f = shift left (�f; 1); g while (:(f~h& #

400000));�e = ee + #

380; return num ;g

39. Since MMIX downplays 32-bit operations, it uses sfpack and sfunpack only whenloading and storing short oats, or when converting from �xed point to oating point.

hSubroutines 5 i +�octa load sf ARGS((tetra));octa load sf (z)

tetra z; =� 32 bits to be loaded into a 64-bit register �=focta f; x; int e; char s; ftype t;

t = sfunpack (z;&f;&e;&s);switch (t) fcase zro : x = zero octa ; break;case num : return fpack (f; e; s; ROUND_OFF);case inf : x = inf octa ; break;case nan : x = shift right (f; 2; 1); x:h j= #

7ff00000; break;gif (s � '-') x:h j= sign bit ;return x;

g

Page 87: MMIXware - A RISC Computer for the Third Millennium - Knuth

79 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING

40. h Subroutines 5 i +�tetra store sf ARGS((octa));tetra store sf (x)

octa x; =� 64 bits to be loaded into a 32-bit word �=focta f ; tetra z; int e; char s; ftype t;

t = funpack (x;&f;&e;&s);switch (t) fcase zro : z = 0; break;case num : return sfpack (f; e; s; cur round );case inf : z = #

7f800000; break;case nan : if (:(f:h& #

200000)) ff:h j= #

200000; exceptions j= I_BIT; =� NaN was signaling �=gz = #

7f800000 j (f:h� 1) j (f:l� 31); break;gif (s � '-') z j= sign bit ;return z;

g

ARGS=macro ( ), x2.cur round : int, x30.exceptions : int, x32.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.I_BIT=macro, x31.

inf =2, x36.inf octa : octa, x4.l: tetra, x3.nan =3, x36.num =1, x36.octa= struct, x3.ROUND_OFF=1, x30.sfpack : tetra ( ), x34.

shift left : octa ( ), x7.shift right : octa ( ), x7.sign bit =macro, x4.tetra=unsigned int, x3.zero exponent =macro, x36.zero octa : octa, x4.zro =0, x36.

Page 88: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING MULTIPLICATION AND DIVISION 80

41. Floating multiplication and division. The hardest �xed point operationswere multiplication and division; but these two operations are the easiest to implementin oating point arithmetic, once their �xed point counterparts are available.

hSubroutines 5 i +�octa fmult ARGS((octa;octa));octa fmult (y; z)

octa y; z;fftype yt ; zt ;int ye ; ze ;char ys ; zs ;octa x; xf ; yf ; zf ;register int xe ;register char xs ;

yt = funpack (y;&yf ;&ye ;&ys );zt = funpack (z;&zf ;&ze ;&zs );xs = ys + zs � '+'; =� will be '-' when the result is negative �=switch (4 � yt + zt ) fhThe usual NaN cases 42 i;case 4 � zro + zro : case 4 � zro +num : case 4 �num + zro : x = zero octa ; break;case 4 � num + inf : case 4 � inf + num : case 4 � inf + inf : x = inf octa ; break;case 4 � zro + inf : case 4 � inf + zro : x = standard NaN ;

exceptions j= I_BIT; break;case 4 � num + num : hMultiply nonzero numbers and return 43 i;gif (xs � '-') x:h j= sign bit ;return x;

g

42. hThe usual NaN cases 42 i �case 4 � nan + nan : if (:(y:h& #

80000)) exceptions j= I_BIT; =� y is signaling �=case 4 � zro + nan : case 4 � num + nan : case 4 � inf + nan :if (:(z:h& #

80000)) exceptions j= I_BIT; z:h j= #80000;

return z;case 4 � nan + zro : case 4 � nan + num : case 4 � nan + inf :if (:(y:h& #

80000)) exceptions j= I_BIT; y:h j= #80000;

return y;

This code is used in sections 41, 44, 46, and 93.

43. hMultiply nonzero numbers and return 43 i �xe = ye + ze � #

3fd; =� the raw exponent �=x = omult (yf ; shift left (zf ; 9));if (aux :h � #

400000) xf = aux ;else xf = shift left (aux ; 1); xe��;if (x:h _ x:l) xf :l j= 1; =� adjust the sticky bit �=return fpack (xf ; xe ; xs ; cur round );

This code is used in section 41.

Page 89: MMIXware - A RISC Computer for the Third Millennium - Knuth

81 MMIX-ARITH: FLOATING MULTIPLICATION AND DIVISION

44. h Subroutines 5 i +�octa fdivide ARGS((octa;octa));octa fdivide (y; z)

octa y; z;fftype yt ; zt ;int ye ; ze ;char ys ; zs ;octa x; xf ; yf ; zf ;register int xe ;register char xs ;

yt = funpack (y;&yf ;&ye ;&ys );zt = funpack (z;&zf ;&ze ;&zs );xs = ys + zs � '+'; =� will be '-' when the result is negative �=switch (4 � yt + zt ) fhThe usual NaN cases 42 i;case 4 � zro + inf : case 4 � zro +num : case 4 �num + inf : x = zero octa ; break;case 4 � num + zro : exceptions j= Z_BIT;case 4 � inf + num : case 4 � inf + zro : x = inf octa ; break;case 4 � zro + zro : case 4 � inf + inf : x = standard NaN ;

exceptions j= I_BIT; break;case 4 � num + num : hDivide nonzero numbers and return 45 i;gif (xs � '-') x:h j= sign bit ;return x;

g

45. hDivide nonzero numbers and return 45 i �xe = ye � ze + #

3fd; =� the raw exponent �=xf = odiv (yf ; zero octa ; shift left (zf ; 9));if (xf :h � #

800000) faux :l j= xf :l & 1;xf = shift right (xf ; 1; 1);xe++;

gif (aux :h _ aux :l) xf :l j= 1; =� adjust the sticky bit �=return fpack (xf ; xe ; xs ; cur round );

This code is used in section 44.

ARGS=macro ( ), x2.aux : octa, x4.cur round : int, x30.exceptions : int, x32.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.I_BIT=macro, x31.inf =2, x36.

inf octa : octa, x4.l: tetra, x3.nan =3, x36.num =1, x36.octa= struct, x3.odiv : octa ( ), x13.omult : octa ( ), x8.shift left : octa ( ), x7.shift right : octa ( ), x7.

sign bit =macro, x4.standard NaN : octa, x4.y: octa, x46.y: octa, x93.z: octa, x46.z: octa, x93.Z_BIT=macro, x31.zero octa : octa, x4.zro =0, x36.

Page 90: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION 82

46. Floating addition and subtraction. Now for the bread-and-butter oper-ation, the sum of two oating point numbers. It is not terribly diÆcult, but manycases need to be handled carefully.

hSubroutines 5 i +�octa fplus ARGS((octa;octa));octa fplus (y; z)

octa y; z;fftype yt ; zt ;int ye ; ze ;char ys ; zs ;octa x; xf ; yf ; zf ;register int xe ; d;register char xs ;

yt = funpack (y;&yf ;&ye ;&ys );zt = funpack (z;&zf ;&ze ;&zs );switch (4 � yt + zt ) fhThe usual NaN cases 42 i;case 4 � zro + num : return fpack (zf ; ze ; zs ; ROUND_OFF); break;

=� may under ow �=case 4 � num + zro : return fpack (yf ; ye ; ys ; ROUND_OFF); break;

=� may under ow �=case 4 � inf + inf : if (ys 6= zs ) f

exceptions j= I_BIT; x = standard NaN ; xs = zs ; break;g

case 4 � num + inf : case 4 � zro + inf : x = inf octa ; xs = zs ; break;case 4 � inf + num : case 4 � inf + zro : x = inf octa ; xs = ys ; break;case 4 � num + num : if (y:h 6= (z:h� #

80000000) _ y:l 6= z:l)hAdd nonzero numbers and return 47 i;

case 4 � zro + zro : x = zero octa ;xs = (ys � zs ? ys : cur round � ROUND_DOWN ? '-' : '+'); break;

gif (xs � '-') x:h j= sign bit ;return x;

g

47. hAdd nonzero numbers and return 47 i �f octa o; oo ;

if (ye < ze _ (ye � ze ^ (yf :h < zf :h _ (yf :h � zf :h ^ yf :l < zf :l))))hExchange y with z 48 i;

d = ye � ze ;xs = ys ; xe = ye ;if (d) hAdjust for di�erence in exponents 49 i;if (ys � zs ) f

xf = oplus (yf ; zf );if (xf :h � #

800000) xe++; xf = shift right (xf ; 1; 1);g else f

xf = ominus (yf ; zf );if (xf :h � #

800000) xe++; d = xf :l & 1; xf = shift right (xf ; 1; 1); xf :l j= d;

Page 91: MMIXware - A RISC Computer for the Third Millennium - Knuth

83 MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION

else while (xf :h < #400000) xe��; xf = shift left (xf ; 1);

greturn fpack (xf ; xe ; xs ; cur round );

g

This code is used in section 46.

48. hExchange y with z 48 i �f

o = yf ; yf = zf ; zf = o;d = ye ; ye = ze ; ze = d;d = ys ; ys = zs ; zs = d;

g

This code is used in sections 47 and 51.

49. Proper rounding requires two bits to the right of the fraction delivered to fpack .The �rst is the true next bit of the result; the other is a \sticky" bit, which is nonzeroif any further bits of the true result are nonzero. Sticky rounding to an integer takesx into the number bx=2c+ dx=2e.Some subtleties need to be observed here, in order to prevent the sticky bit from

being shifted left. If we did not shift yf left 1 before shifting zf to the right,an incorrect answer would be obtained in certain cases|for example, if yf = 254,zf = 254 + 253 � 1, d = 52.

hAdjust for di�erence in exponents 49 i �fif (d � 2) zf = shift right (zf ; d; 1); =� exact result �=else if (d > 53) zf :h = 0; zf :l = 1; =� tricky but OK �=else fif (ys 6= zs ) d��; xe��; yf = shift left (yf ; 1);o = zf ;zf = shift right (o; d; 1);oo = shift left (zf ; d);if (oo :l 6= o:l _ oo :h _ o:h) zf :l j= 1;

gg

This code is used in section 47.

ARGS=macro ( ), x2.cur round : int, x30.exceptions : int, x32.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.I_BIT=macro, x31.

inf =2, x36.inf octa : octa, x4.l: tetra, x3.num =1, x36.octa= struct, x3.ominus : octa ( ), x5.oplus : octa ( ), x5.ROUND_DOWN=3, x30.

ROUND_OFF=1, x30.shift left : octa ( ), x7.shift right : octa ( ), x7.sign bit =macro, x4.standard NaN : octa, x4.zero octa : octa, x4.zro =0, x36.

Page 92: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION 84

50. The comparison of oating point number with respect to � shares some of thecharacteristics of oating point addition/subtraction. In some ways it is simpler, andin other ways it is more diÆcult; we might as well deal with it now.Subroutine fepscomp(y; z; e; s) returns 2 if y, z, or e is a NaN or e is negative. It

returns 1 if s = 0 and y � z (e) or if s 6= 0 and y � z (e), as de�ned in Section 4.2.2of Seminumerical Algorithms; otherwise it returns 0.

hSubroutines 5 i +�int fepscomp ARGS((octa;octa;octa; int));int fepscomp(y; z; e; s)

octa y; z; e; =� the operands �=int s; =� test similarity? �=

focta yf ; zf ; ef ; o; oo ;int ye ; ze ; ee ;char ys ; zs ; es ;register int yt ; zt ; et ; d;

et = funpack (e;&ef ;&ee ;&es );if (es � '-') return 2;switch (et ) fcase nan : return 2;case inf : ee = 10000;case num : case zro : break;gyt = funpack (y;&yf ;&ye ;&ys );zt = funpack (z;&zf ;&ze ;&zs );switch (4 � yt + zt ) fcase 4 �nan +nan : case 4 � nan + inf : case 4 � nan + num : case 4 �nan + zro :case 4 � inf + nan : case 4 � num + nan : case 4 � zro + nan : return 2;

case 4 � inf + inf : return (ys � zs _ ee � 1023);case 4 � inf + num : case 4 � inf + zro : case 4 � num + inf : case 4 � zro + inf :return (s ^ ee � 1022);

case 4 � zro + zro : return 1;case 4 � zro + num : case 4 � num + zro : if (:s) return 0;case 4 � num + num : break;ghCompare two numbers with respect to epsilon and return 51 i;

g

51. The relation y � z (�) reduces to y � z (�=2d), if d is the di�erence between thelarger and smaller exponents of y and z.

hCompare two numbers with respect to epsilon and return 51 i �hUndenormalize y and z, if they are denormal 52 i;if (ye < ze _ (ye � ze ^ (yf :h < zf :h _ (yf :h � zf :h ^ yf :l < zf :l))))hExchange y with z 48 i;

if (ze � zero exponent ) ze = ye ;d = ye � ze ;if (:s) ee �= d;if (ee � 1023) return 1;

Page 93: MMIXware - A RISC Computer for the Third Millennium - Knuth

85 MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION

hCompute the di�erence of fraction parts, o 53 i;if (:o:h ^ :o:l) return 1;if (ee < 968) return 0; =� if y 6= z and � < 2�54, y 6� z �=if (ee � 1021) ef = shift left (ef ; ee � 1021);else ef = shift right (ef ; 1021 � ee ; 1);return o:h < ef :h _ (o:h � ef :h ^ o:l � ef :l);

This code is used in section 50.

52. hUndenormalize y and z, if they are denormal 52 i �if (ye < 0) yf = shift left (y; 2); ye = 0;if (ze < 0) zf = shift left (z; 2); ze = 0;

This code is used in section 51.

53. When d > 2, the di�erence of fraction parts might not �t exactly in an octabyte;in that case the numbers are not similar unless � � 3=8, and we replace the di�erenceby the ceiling of the true result. When � < 1=8, our program essentially replaces 255�by b255�c. These truncations are not needed simultaneously. Therefore the logic isjusti�ed by the facts that, if n is an integer, we have x � n if and only if dxe � n; n � xif and only if n � bxc. (Notice that the concept of \sticky bit" is not appropriatehere.)

hCompute the di�erence of fraction parts, o 53 i �if (d > 54) o = zero octa ; oo = zf ;else o = shift right (zf ; d; 1); oo = shift left (o; d);if (oo :h 6= zf :h _ oo :l 6= zf :l) f =� truncated result, hence d > 2 �=if (ee < 1020) return 0; =� di�erence is too large for similarity �=o = incr (o; ys � zs ? �1 : 1); =� adjust for ceiling �=

go = (ys � zs ? ominus (yf ; o) : oplus (yf ; o));

This code is used in section 51.

ARGS=macro ( ), x2.funpack : ftype ( ), x37.h: tetra, x3.incr : octa ( ), x6.inf =2, x36.l: tetra, x3.

nan =3, x36.num =1, x36.octa= struct, x3.ominus : octa ( ), x5.oplus : octa ( ), x5.

shift left : octa ( ), x7.shift right : octa ( ), x7.zero exponent =macro, x36.zero octa : octa, x4.zro =0, x36.

Page 94: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 86

54. Floating point output conversion. The print oat routine converts anoctabyte to a oating decimal representation that will be input as precisely the samevalue.

hSubroutines 5 i +�static void bignum times ten ARGS((bignum�));static void bignum dec ARGS((bignum�; bignum�; tetra));static int bignum compare ARGS((bignum�; bignum�));void print oat ARGS((octa));void print oat (x)

octa x;fhLocal variables for print oat 56 i;if (x:h& sign bit ) printf ("-");hExtract the exponent e and determine the fraction interval [f : : g] or (f : : g) 55 i;h Store f and g as multiprecise integers 63 i;hCompute the signi�cant digits s and decimal exponent e 64 i;hPrint the signi�cant digits with proper context 67 i;

g

55. One way to visualize the problem being solved here is to consider the vastlysimpler case in which there are only 2-bit exponents and 2-bit fractions. Then thesixteen possible 4-bit combinations have the following interpretations:

0000 [0 : : 0:125]0001 (0:125 : : 0:375)0010 [0:375 : : 0:625]0011 (0:625 : : 0:875)0100 [0:875 : : 1:125]0101 (1:125 : : 1:375)0110 [1:375 : : 1:625]0111 (1:625 : : 1:875)1000 [1:875 : : 2:25]1001 (2:25 : : 2:75)1010 [2:75 : : 3:25]1011 (3:25 : : 3:75)1100 [3:75 : : 1]1101 NaN(0 : : 0:375)1110 NaN[0:375 : : 0:625]1111 NaN(0:625 : : 1)

Notice that the interval is closed, [f : : g], when the fraction part is even; it is open,(f : : g), when the fraction part is odd. The printed outputs for these sixteen values,if we actually were dealing with such short exponents and fractions, would be 0., .2,.5, .7, 1., 1.2, 1.5, 1.7, 2., 2.5, 3., 3.5, Inf, NaN.2, NaN, NaN.8, respectively.

hExtract the exponent e and determine the fraction interval [f : : g] or (f : : g) 55 i �f = shift left (x; 1);e = f:h� 21;

Page 95: MMIXware - A RISC Computer for the Third Millennium - Knuth

87 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION

f:h &= #1fffff;

if (:f:h ^ :f:l) hHandle the special case when the fraction part is zero 57 ielse f

g = incr (f; 1);f = incr (f;�1);if (:e) e = 1; =� denormal �=else if (e � #

7ff) fprintf ("NaN");if (g:h � #

100000 ^ g:l � 1) return; =� the \standard" NaN �=e = #

3ff; =� extreme NaNs come out OK even without adjusting f or g �=g else f:h j= #

200000; g:h j= #200000;

g

This code is used in section 54.

56. hLocal variables for print oat 56 i �octa f; g; =� lower and upper bounds on the fraction part �=register int e; =� exponent part �=register int j; k; =� all purpose indices �=

See also section 66.

This code is used in section 54.

57. The transition points between exponents correspond to powers of 2. At suchpoints the interval extends only half as far to the left of that power of 2 as it does tothe right. For example, in the 4-bit mini oat numbers considered above, case 1000corresponds to the interval [1:875 : : 2:25].

hHandle the special case when the fraction part is zero 57 i �fif (:e) f

printf ("0."); return;gif (e � #

7ff) fprintf ("Inf"); return;

ge��;f:h = #

3fffff; f :l = #ffffffff;

g:h = #400000; g:l = 2;

g

This code is used in section 55.

ARGS=macro ( ), x2.bignum = struct, x59.h: tetra, x3.incr : octa ( ), x6.

l: tetra, x3.octa= struct, x3.printf : int ( ), <stdio.h>.s: char [ ], x66.

shift left : octa ( ), x7.sign bit =macro, x4.tetra=unsigned int, x3.

Page 96: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 88

58. We want to �nd the \simplest" value in the interval corresponding to the givennumber, in the sense that it has fewest signi�cant digits when expressed in decimalnotation. Thus, for example, if the oating point number can be described by a rela-tively short string such as `.1' or `37e100', we want to discover that representation.The basic idea is to generate the decimal representations of the two endpoints of

the interval, outputting the leading digits where both endpoints agree, then makinga �nal decision at the �rst place where they disagree.The \simplest" value is not always unique. For example, in the case of 4-bit

mini oat numbers we could represent the bit pattern 0001 as either .2 or .3, andwe could represent 1001 in �ve equally short ways: 2.3 or 2.4 or 2.5 or 2.6 or 2.7.The algorithm below tries to choose the middle possibility in such cases.[A solution to the analogous problem for �xed-point representations, without the

additional complication of round-to-even, was used by the author in the program forTEX; see Beauty is Our Business (Springer, 1990), 233{242.]Suppose we are given two fractions f and g, where 0 � f < g < 1, and we want

to compute the shortest decimal in the closed interval [f : : g]. If f = 0, we are done.Otherwise let 10f = d+f 0 and 10g = e+g0, where 0 � f 0 < 1 and 0 � g0 < 1. If d < e,we can terminate by outputting any of the digits d + 1, : : : , e; otherwise we outputthe common digit d = e, and repeat the process on the fractions 0 � f 0 < g0 < 1. Asimilar procedure works with respect to the open interval (f : : g).

59. The program below carries out the stated algorithm by using multiprecisionarithmetic on 77-place integers with 28 bits each. This choice facilitates multiplicationby 10, and allows us to deal with the whole range of oating binary numbers using�xed point arithmetic. We keep track of the leading and trailing digit positions sothat trivial operations on zeros are avoided.If f points to a bignum, its radix-228 digits are f~dat [0] through f~dat [76], from

most signi�cant to least signi�cant. We assume that all digit positions are zero unlessthey lie in the subarray between indices f~a and f~b, inclusive. Furthermore, bothf~dat [f~a] and f~dat [f~b] are nonzero, unless f~a = f~b = bignum prec � 1.The bignum data type can be used with any radix less than 232; we will use it

later with radix 109. The dat array is made large enough to accommodate bothapplications.

#de�ne bignum prec 157 =� would be 77 if we cared only about print oat �=

hOther type de�nitions 36 i +�typedef struct fint a; =� index of the most signi�cant digit �=int b; =� index of the least signi�cant digit; must be � a �=tetra dat [bignum prec ]; =� the digits; unde�ned except between a and b �=

g bignum;

Page 97: MMIXware - A RISC Computer for the Third Millennium - Knuth

89 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION

60. Here, for example, is how we go from f to 10f , assuming that over ow will notoccur and that the radix is 228:

hSubroutines 5 i +�static void bignum times ten (f)

bignum �f ;fregister tetra �p; �q;register tetra x; carry ;

for (p = &f~dat [f~b]; q = &f~dat [f~a]; carry = 0; p � q; p��) fx = �p � 10 + carry ;�p = x& #

fffffff;carry = x� 28;

g�p = carry ;if (carry ) f~a��;if (f~dat [f~b] � 0 ^ f~b > f~a) f~b��;

g

61. And here is how we test whether f < g, f = g, or f > g, using any radixwhatever:

hSubroutines 5 i +�static int bignum compare (f; g)

bignum �f; �g;fregister tetra �p; �pp ; �q; �qq ;

if (f~a 6= g~a) return f~a > g~a ? �1 : 1;pp = &f~dat [f~b]; qq = &g~dat [g~b];for (p = &f~dat [f~a]; q = &g~dat [g~a]; p � pp ; p++; q++) fif (�p 6= �q) return �p < �q ? �1 : 1;if (q � qq ) return p < pp ;

greturn �1;

g

f : octa, x56. print oat : void ( ), x54. tetra=unsigned int, x3.

Page 98: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 90

62. The following subroutine subtracts g from f , assuming that f � g > 0 andusing a given radix.

hSubroutines 5 i +�static void bignum dec(f; g; r)

bignum �f; �g;tetra r; =� the radix �=

fregister tetra �p; �q; �qq ;register int x; borrow ;

while (g~b > f~b) f~dat [++f~b] = 0;qq = &g~dat [g~a];for (p = &f~dat [g~b]; q = &g~dat [g~b]; borrow = 0; q � qq ; p��; q��) f

x = �p� �q � borrow ;if (x � 0) borrow = 0; �p = x;else borrow = 1; �p = x+ r;

gfor ( ; borrow ; p��)if (�p) borrow = 0; �p = �p� 1;else �p = r;

while (f~dat [f~a] � 0) fif (f~a � f~b) f =� the result is zero �=

f~a = f~b = bignum prec � 1; f~dat [bignum prec � 1] = 0;return;

gf~a++;

gwhile (f~dat [f~b] � 0) f~b��;

g

63. Armed with these subroutines, we are ready to solve the problem. The �rst taskis to put the numbers into bignum form. If the exponent is e, the number destinedfor digit dat [k] will consist of the rightmost 28 bits of the given fraction after it hasbeen shifted right c � e � 28k bits, for some constant c. We choose c so that, whene has its maximum value #7ff, the leading digit will go into position dat [1], and sothat when the number to be printed is exactly 1 the integer part of g will also beexactly 1.

#de�ne magic o�set 2112 =� the constant c that makes it work �=#de�ne origin 37 =� the radix point follows dat [37] �=

hStore f and g as multiprecise integers 63 i �k = (magic o�set � e)=28;�:dat [k � 1] = shift right (f;magic o�set + 28� e� 28 � k; 1):l & #

fffffff;gg :dat [k � 1] = shift right (g;magic o�set + 28� e� 28 � k; 1):l & #

fffffff;�:dat [k] = shift right (f;magic o�set � e� 28 � k; 1):l & #

fffffff;gg :dat [k] = shift right (g;magic o�set � e� 28 � k; 1):l & #

fffffff;�:dat [k + 1] = shift left (f; e+ 28 � k � (magic o�set � 28)):l & #

fffffff;gg :dat [k + 1] = shift left (g; e+ 28 � k � (magic o�set � 28)):l & #

fffffff;�:a = (�:dat [k � 1] ? k � 1 : k);�:b = (�:dat [k + 1] ? k + 1 : k);

Page 99: MMIXware - A RISC Computer for the Third Millennium - Knuth

91 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION

gg :a = (gg :dat [k � 1] ? k � 1 : k);gg :b = (gg :dat [k + 1] ? k + 1 : k);

This code is used in section 54.

64. If e is suÆciently small, the fractions f and g will be less than 1, and we can usethe stated algorithm directly. Of course, if e is extremely small, a lot of leading zerosneed to be lopped o�; in the worst case, we may have to multiply f and g by 10 morethan 300 times. But hey, we don't need to do that extremely often, and computersare pretty fast nowadays.In the small-exponent case, the computation always terminates before f becomes

zero, because the interval endpoints are fractions with denominator 2t for some t > 50.The invariant relations �:dat [�:a] 6= 0 and gg :dat [gg :a] 6= 0 are not maintained by

the computation here, when �:a = origin or gg :a = origin . But no harm is done,because bignum compare is not used.

hCompute the signi�cant digits s and decimal exponent e 64 i �if (e > #

401) hCompute the signi�cant digits in the large-exponent case 65 ielse f =� if e � #

401 we have gg :a � origin and gg :dat [origin ] � 8 �=if (�:a > origin ) �:dat [origin ] = 0;for (e = 1; p = s; gg :a > origin _ �:dat [origin ] � gg :dat [origin ]; ) fif (gg :a > origin ) e��;else �p++ = �:dat [origin ] + '0';�:dat [origin ] = 0; gg :dat [origin ] = 0;bignum times ten (&�);bignum times ten (&gg );

g�p++ = ((�:dat [origin ] + 1 + gg :dat [origin ])� 1) + '0'; =� the middle digit �=

g�p = '\0'; =� terminate the string s �=

This code is used in section 54.

a: int, x59.b: int, x59.bignum= struct, x59.bignum compare : static int

( ), x61.bignum prec =157, x59.bignum times ten : static void

( ), x60.dat : tetra [ ], x59.e: register int, x56.f : octa, x56.�: bignum, x66.g: octa, x56.gg : bignum, x66.

k: register int, x56.l: tetra, x3.p: register char �, x66.s: char [ ], x66.shift left : octa ( ), x7.shift right : octa ( ), x7.tetra=unsigned int, x3.

Page 100: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 92

65. When e is large, we use the stated algorithm by considering f and g to befractions whose denominator is a power of 10.An interesting case arises when the number to be converted is #44ada56a4b0835bf,

since the interval turns out to be

(69999999999999991611392 : : 70000000000000000000000):

If this were a closed interval, we could simply give the answer 7e22; but the number7e22 actually corresponds to #44ada56a4b0835c0 because of the round-to-even rule.Therefore the correct answer is, say, 6.9999999999999995e22 . This example showsthat we need a slightly di�erent strategy in the case of open intervals; we cannotsimply look at the �rst position in which the endpoints have di�erent decimal digits.Therefore we change the invariant relation to 0 � f < g � 1, when open intervals areinvolved, and we do not terminate the process when f = 0 or g = 1.

hCompute the signi�cant digits in the large-exponent case 65 i �f register int open = x:l & 1;

tt :dat [origin ] = 10;tt :a = tt :b = origin ;for (e = 1; bignum compare (&gg ;&tt ) � open ; e++) bignum times ten (&tt );p = s;while (1) f

bignum times ten (&�);bignum times ten (&gg );for (j = '0'; bignum compare (&�;&tt ) � 0; j++)

bignum dec(&�;&tt ;#10000000); bignum dec(&gg ;&tt ;#10000000);if (bignum compare (&gg ;&tt ) � open ) break;�p++ = j;if (�:a � bignum prec � 1^:open ) goto done ; =� f = 0 in a closed interval �=

gfor (k = j; bignum compare (&gg ;&tt ) � open ; k++)

bignum dec(&gg ;&tt ;#10000000);�p++ = (j + 1 + k)� 1; =� the middle digit �=

done : ;g

This code is used in section 64.

66. The length of string s will be at most 17. For if f and g agree to 17 places, wehave g=f < 1 + 10�16; but the ratio g=f is always � (1 + 2�52 + 2�53)=(1 + 2�52 �2�53) > 1 + 2� 10�16.

hLocal variables for print oat 56 i +�bignum �; gg ; =� fractions or numerators of fractions �=bignum tt ; =� power of ten (used as the denominator) �=char s[18];register char �p;

Page 101: MMIXware - A RISC Computer for the Third Millennium - Knuth

93 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION

67. At this point the signi�cant digits are in string s, and s[0] 6= '0'. If we put adecimal point at the left of s, the result should be multiplied by 10e.We prefer the output `300.' to the form `3e2', and we prefer `.03' to `3e-2'. In

general, the output will use an explicit exponent only if the alternative would takemore than 18 characters.

hPrint the signi�cant digits with proper context 67 i �if (e > 17 _ e < (int) strlen (s)� 17)

printf ("%c%s%se%d"; s[0]; (s[1] ? "." : ""); s+ 1; e� 1);else if (e < 0) printf (".%0*d%s";�e; 0; s);else if (strlen (s) � e) printf ("%.*s.%s"; e; s; s+ e);else printf ("%s%0*d."; s; e� (int) strlen (s); 0);

This code is used in section 54.

a: int, x59.b: int, x59.bignum= struct, x59.bignum compare : static int

( ), x61.bignum dec : static void ( ),

x62.

bignum prec =157, x59.bignum times ten : static void

( ), x60.dat : tetra [ ], x59.e: register int, x56.j: register int, x56.k: register int, x56.

l: tetra, x3.origin =37, x63.print oat : void ( ), x54.printf : int ( ), <stdio.h>.strlen : int ( ), <string.h>.x: octa, x54.

Page 102: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT INPUT CONVERSION 94

68. Floating point input conversion. Going the other way, we want to be ableto convert a given decimal number into its oating binary equivalent. The followingsyntax is supported:

hdigit i �! 0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9hdigit string i �! hdigit i j hdigit string ihdigit ihdecimal string i �! hdigit string i. j .hdigit string i j

hdigit string i.hdigit string ih optional sign i �! h empty i j + j -h exponent i �! eh optional sign ihdigit string ih optional exponent i �! h empty i j h exponent ih oating magnitude i �! hdigit string ih exponent i j

hdecimal string ih optional exponent i jInf j NaN j NaN.hdigit string i

h oating constant i �! h optional sign ih oating magnitude ihdecimal constant i �! h optional sign ihdigit string i

For example, `-3.' is the oating constant #c008000000000000 ; `1e3' and `1000' areboth equivalent to #

408f400000000000 ; `NaN' and `+NaN.5' are both equivalent to#7ff8000000000000.The scan const routine looks at a given string and �nds the longest initial substring

that matches the syntax of either hdecimal constant i or h oating constant i. It putsthe corresponding value into the global octabyte variable val ; it also puts the positionof the �rst unscanned character in the global pointer variable next char . It returns 1if a oating constant was found, 0 if a decimal constant was found, �1 if nothing wasfound. A decimal constant that doesn't �t in an octabyte is computed modulo 264.

hSubroutines 5 i +�static void bignum double ARGS((bignum �));int scan const ARGS((char �));int scan const (s)

char �s;fhLocal variables for scan const 70 i;val :h = val :l = 0;p = s;if (�p � '+' _ �p � '-') sign = �p++; else sign = '+';if (strncmp (p; "NaN"; 3) � 0) NaN = true ; p += 3;else NaN = false ;if ((isdigit (�p) ^ :NaN ) _ (�p � '.' ^ isdigit (�(p+ 1))))h Scan a number and return 73 i;

if (NaN ) hReturn the standard NaN 71 i;if (strncmp (p; "Inf"; 3) � 0) hReturn in�nity 72 i;

no const found : next char = s; return �1;g

69. hGlobal variables 4 i +�octa val ; =� value returned by scan const �=char �next char ; =� pointer returned by scan const �=

Page 103: MMIXware - A RISC Computer for the Third Millennium - Knuth

95 MMIX-ARITH: FLOATING POINT INPUT CONVERSION

70. hLocal variables for scan const 70 i �register char �p; �q; =� for string manipulations �=register bool NaN ; =� are we processing a NaN? �=int sign ; =� '+' or '-' �=

See also sections 76 and 81.

This code is used in section 68.

71. hReturn the standard NaN 71 i �f

next char = p;val :h = #

600000;goto packit ;

g

This code is used in section 68.

72. hReturn in�nity 72 i �f

next char = p+ 3;goto make it in�nite ;

g

This code is used in section 68.

ARGS=macro ( ), x2.bignum= struct, x59.bool: enum, x1.false =0, x1.

h: tetra, x3.isdigit : int ( ), <ctype.h>.l: tetra, x3.make it in�nite : label, x79.

octa= struct, x3.packit : label, x78.strncmp : int ( ), <string.h>.true =1, x1.

Page 104: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT INPUT CONVERSION 96

73. We saw above that a string of at most 17 digits is enough to characterize a oating point number, for purposes of output. But a much longer bu�er for dig-its is needed when we're doing input. For example, consider the borderline quan-tity (1 + 2�53)=21022; its decimal expansion, when written out exactly, is a num-ber with more than 750 signi�cant digits: 2.2250738585...8125e-308 . If any

one of those digits is increased, or if additional nonzero digits are added as in2.2250738585...81250000001e-308 , the rounded value is supposed to change from#0010000000000000 to #

0010000000000001.We assume here that the user prefers a perfectly correct answer to a speedy almost-

correct one, so we implement the most general case.

hScan a number and return 73 i �ffor (q = buf0 ; dec pt = (char �) 0; isdigit (�p); p++) f

val = oplus (val ; shift left (val ; 2)); =� multiply by 5 �=val = incr (shift left (val ; 1); �p� '0');if (q > buf0 _ �p 6= '0')if (q < buf max ) �q++ = �p;else if (�(q � 1) � '0') �(q � 1) = �p;

gif (NaN ) �q++ = '1';if (�p � '.') h Scan a fraction part 74 i;next char = p;if (�p � 'e') h Scan an exponent 77 ielse exp = 0;if (dec pt ) hReturn a oating point constant 78 i;if (sign � '-') val = ominus (zero octa ; val );return 0;

g

This code is used in section 68.

74. h Scan a fraction part 74 i �f

dec pt = q;p++;for (zeros = 0; isdigit (�p); p++)if (�p � '0' ^ q � buf0 ) zeros++;else if (q < buf max ) �q++ = �p;else if (�(q � 1) � '0') �(q � 1) = �p;

g

This code is used in section 73.

75. The bu�er needs room for eight digits of padding at the left, followed by up to1022+53�307 signi�cant digits, followed by a \sticky" digit at position buf max �1,and eight more digits of padding.

#de�ne buf0 (buf + 8)#de�ne buf max (buf + 777)

hGlobal variables 4 i +�static char buf [785] = "00000000"; =� where we put signi�cant input digits �=

Page 105: MMIXware - A RISC Computer for the Third Millennium - Knuth

97 MMIX-ARITH: FLOATING POINT INPUT CONVERSION

76. hLocal variables for scan const 70 i +�register char �dec pt ; =� position of decimal point in buf �=register int exp ; =� scanned exponent; later used for raw binary exponent �=register int zeros ; =� leading zeros removed after decimal point �=

77. Here we don't advance next char and force a decimal point until we know thata syntactically correct exponent exists.The code here will convert extra-large inputs like `9e+9999999999999999' into 1

and extra-small inputs into zero. Strange inputs like `-00.0e9999999' must also beaccommodated.

hScan an exponent 77 i �f register char exp sign ;

p++;if (�p � '+' _ �p � '-') exp sign = �p++; else exp sign = '+';if (isdigit (�p)) ffor (exp = �p++ � '0'; isdigit (�p); p++)if (exp < 1000) exp = 10 � exp + �p� '0';

if (:dec pt ) dec pt = q; zeros = 0;if (exp sign � '-') exp = �exp ;next char = p;

gg

This code is used in section 73.

78. hReturn a oating point constant 78 i �fhMove the digits from buf to � 79 i;hDetermine the binary fraction and binary exponent 83 i;

packit : hPack and round the answer 84 i;return 1;

g

This code is used in section 73.

�: bignum, x81.incr : octa ( ), x6.isdigit : int ( ), <ctype.h>.NaN : register bool, x70.next char : char �, x69.

ominus : octa ( ), x5.oplus : octa ( ), x5.p: register char �, x70.q: register char �, x70.scan const : int ( ), x68.

shift left : octa ( ), x7.sign : int, x70.val : octa, x69.zero octa : octa, x4.

Page 106: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT INPUT CONVERSION 98

79. Now we get ready to compute the binary fraction bits, by putting the scannedinput digits into a multiprecision �xed-point accumulator � that spans the full neces-sary range. After this step, the number that we want to convert to oating binary willappear in �:dat [�:a], �:dat [�:a+ 1], : : : , �:dat [�:b]. The radix-109 digit in � [36� k]is understood to be multiplied by 109k, for 36 � k � �120.hMove the digits from buf to � 79 i �

x = buf + 341 + zeros � dec pt � exp ;if (q � buf0 _ x � 1413) fmake it zero : exp = �99999; goto packit ;gif (x < 0) fmake it in�nite : exp = 99999; goto packit ;g�:a = x=9;for (p = q; p < q + 8; p++) �p = '0'; =� pad with trailing zeros �=q = q� 1� (q+341+ zeros � dec pt � exp)% 9; =� compute stopping place in buf �=for (p = buf0 � x% 9; k = �:a; p � q ^ k � 156; p += 9; k++)hPut the 9-digit number �p : : : �(p+ 8) into �:dat [k] 80 i;

�:b = k � 1;for (x = 0; p � q; p += 9)if (strncmp (p; "000000000"; 9) 6= 0) x = 1;

�:dat [156] += x; =� nonzero digits that fall o� the right are sticky �=while (�:dat [�:b] � 0) �:b��;

This code is used in section 78.

80. hPut the 9-digit number �p : : : �(p+ 8) into �:dat [k] 80 i �ffor (x = �p� '0'; pp = p+ 1; pp < p+ 9; pp++) x = 10 � x+ �pp � '0';�:dat [k] = x;

g

This code is used in section 79.

81. hLocal variables for scan const 70 i +�register int k; x;register char �pp ;bignum �; tt ;

82. Here's a subroutine that is dual to bignum times ten . It changes f to 2f ,assuming that over ow will not occur and that the radix is 109.

hSubroutines 5 i +�static void bignum double (f)

bignum �f ;fregister tetra �p; �q;register int x; carry ;

for (p = &f~dat [f~b]; q = &f~dat [f~a]; carry = 0; p � q; p��) fx = �p+ �p+ carry ;if (x � 1000000000) carry = 1; �p = x� 1000000000;else carry = 0; �p = x;

Page 107: MMIXware - A RISC Computer for the Third Millennium - Knuth

99 MMIX-ARITH: FLOATING POINT INPUT CONVERSION

g�p = carry ;if (carry ) f~a��;if (f~dat [f~b] � 0 ^ f~b > f~a) f~b��;

g

83. hDetermine the binary fraction and binary exponent 83 i �val = zero octa ;if (�:a > 36) ffor (exp = #

3fe; �:a > 36; exp��) bignum double (&�);for (k = 54; k; k��) fif (�:dat [36]) fif (k � 32) val :h j= 1� (k � 32); else val :l j= 1� k;�:dat [36] = 0;if (�:b � 36) break; =� break if � now zero �=

gbignum double (&�);

gg else f

tt :a = tt :b = 36; tt :dat [36] = 2;for (exp = #

3fe; bignum compare (&�;&tt ) � 0; exp++) bignum double (&tt );for (k = 54; k; k��) f

bignum double (&�);if (bignum compare (&�;&tt ) � 0) fif (k � 32) val :h j= 1� (k � 32); else val :l j= 1� k;bignum dec(&�;&tt ; 1000000000);if (�:a � bignum prec � 1) break; =� break if � now zero �=

gg

gif (k � 0) val :l j= 1; =� add sticky bit if � nonzero �=

This code is used in section 78.

a: int, x59.b: int, x59.bignum= struct, x59.bignum compare : static int

( ), x61.bignum dec : static void ( ),

x62.bignum prec =157, x59.bignum times ten : static void

( ), x60.buf : static char [ ], x75.buf0 =macro, x75.dat : tetra [ ], x59.dec pt : register char �, x76.exp : register int, x76.h: tetra, x3.l: tetra, x3.p: register char �, x70.

packit : label, x78.q: register char �, x70.scan const : int ( ), x68.strncmp : int ( ), <string.h>.tetra=unsigned int, x3.val : octa, x69.zero octa : octa, x4.zeros : register int, x76.

Page 108: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT INPUT CONVERSION 100

84. We need to be careful that the input `NaN.999999999999999999999 ' doesn'tget rounded up; it is supposed to yield #

7fffffffffffffff.Although the input `NaN.0' is illegal, strictly speaking, we silently convert it to

#7ff0000000000001|a number that would be output as `NaN.0000000000000002'.

hPack and round the answer 84 i �val = fpack (val ; exp ; sign ; ROUND_NEAR);if (NaN ) fif ((val :h& #

7fffffff) � #40000000) val :h j= #

7fffffff; val :l = #ffffffff;

else if ((val :h& #7fffffff) � #

3ff00000 ^ :val :l) val :h j= #40000000; val :l = 1;

else val :h j= #40000000;

g

This code is used in section 78.

Page 109: MMIXware - A RISC Computer for the Third Millennium - Knuth

101 MMIX-ARITH: FLOATING POINT REMAINDERS

85. Floating point remainders. In this section we implement the remainder ofthe oating point operations|one of which happens to be the operation of taking theremainder.The easiest task remaining is to compare two oating point quantities. Routine

fcomp returns �1 if y < z, 0 if y = z, +1 if y > z, and +2 if y and z are unordered.

hSubroutines 5 i +�int fcomp ARGS((octa;octa));int fcomp(y; z)

octa y; z;fftype yt ; zt ;int ye ; ze ;char ys ; zs ;octa yf ; zf ;register int x;

yt = funpack (y;&yf ;&ye ;&ys );zt = funpack (z;&zf ;&ze ;&zs );switch (4 � yt + zt ) fcase 4 �nan +nan : case 4 � zro +nan : case 4 � num + nan : case 4 � inf + nan :case 4 � nan + zro : case 4 � nan + num : case 4 � nan + inf : return 2;

case 4 � zro + zro : return 0;case 4 � zro + num : case 4 � num + zro : case 4 � zro + inf : case 4 � inf + zro :case 4 � num + num : case 4 � num + inf : case 4 � inf + num :case 4 � inf + inf :if (ys 6= zs ) x = 1;else if (y:h > z:h) x = 1;else if (y:h < z:h) x = �1;else if (y:l > z:l) x = 1;else if (y:l < z:l) x = �1;else return 0;break;

greturn (ys � '-' ? �x : x);

g

ARGS=macro ( ), x2.exp : register int, x76.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.

inf =2, x36.l: tetra, x3.NaN : register bool, x70.nan =3, x36.num =1, x36.

octa= struct, x3.ROUND_NEAR=4, x30.sign : int, x70.val : octa, x69.zro =0, x36.

Page 110: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT REMAINDERS 102

86. Several MMIX operations act on a single oating point number and accept anarbitrary rounding mode. For example, consider the operation of rounding to thenearest oating point integer:

hSubroutines 5 i +�octa �ntegerize ARGS((octa; int));octa �ntegerize (z; r)

octa z; =� the operand �=int r; =� the rounding mode �=

fftype zt ;int ze ;char zs ;octa xf ; zf ;

zt = funpack (z;&zf ;&ze ;&zs );if (:r) r = cur round ;switch (zt ) fcase nan : if (:(z:h& #

80000)) f exceptions j= I_BIT; z:h j= #80000; g

case inf : case zro : return z;case num : h Integerize and return 87 i;g

g

87. h Integerize and return 87 i �if (ze � 1074) return fpack (zf ; ze ; zs ; ROUND_OFF); =� already an integer �=if (ze � 1020) xf :h = 0; xf :l = 1;else f octa oo ;

xf = shift right (zf ; 1074� ze ; 1);oo = shift left (xf ; 1074� ze );if (oo :l 6= zf :l _ oo :h 6= zf :h) xf :l j= 1; =� sticky bit �=

gswitch (r) fcase ROUND_DOWN: if (zs � '-') xf = incr (xf ; 3); break;case ROUND_UP: if (zs 6= '-') xf = incr (xf ; 3);case ROUND_OFF: break;case ROUND_NEAR: xf = incr (xf ; xf :l & 4 ? 2 : 1); break;gxf :l &= #

fffffffc;if (ze � 1022) return fpack (shift left (xf ; 1074� ze ); ze ; zs ; ROUND_OFF);if (xf :l) xf :h = #

3ff00000; xf :l = 0;if (zs � '-') xf :h j= sign bit ;return xf ;

This code is used in section 86.

Page 111: MMIXware - A RISC Computer for the Third Millennium - Knuth

103 MMIX-ARITH: FLOATING POINT REMAINDERS

88. To convert oating point to �xed point, we use �xit .

hSubroutines 5 i +�octa �xit ARGS((octa; int));octa �xit (z; r)

octa z; =� the operand �=int r; =� the rounding mode �=

fftype zt ;int ze ;char zs ;octa zf ; o;

zt = funpack (z;&zf ;&ze ;&zs );if (:r) r = cur round ;switch (zt ) fcase nan : case inf : exceptions j= I_BIT; return z;case zro : return zero octa ;case num : if (funpack (�ntegerize (z; r);&zf ;&ze ;&zs ) � zro) return zero octa ;if (ze � 1076) o = shift right (zf ; 1076� ze ; 1);else fif (ze > 1085 _ (ze � 1085 ^ (zf :h > #

400000 _(zf :h � #

400000 ^ (zf :l _ zs 6= '-'))))) exceptions j= W_BIT;if (ze � 1140) return zero octa ;o = shift left (zf ; ze � 1076);

greturn (zs � '-' ? ominus (zero octa ; o) : o);

gg

ARGS=macro ( ), x2.cur round : int, x30.exceptions : int, x32.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.I_BIT=macro, x31.incr : octa ( ), x6.

inf =2, x36.l: tetra, x3.nan =3, x36.num =1, x36.octa= struct, x3.ominus : octa ( ), x5.ROUND_DOWN=3, x30.ROUND_NEAR=4, x30.

ROUND_OFF=1, x30.ROUND_UP=2, x30.shift left : octa ( ), x7.shift right : octa ( ), x7.sign bit =macro, x4.W_BIT=macro, x31.zero octa : octa, x4.zro =0, x36.

Page 112: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT REMAINDERS 104

89. Going the other way, we can specify not only a rounding mode but whether thegiven �xed point octabyte is signed or unsigned, and whether the result should berounded to short precision.

hSubroutines 5 i +�octa oatit ARGS((octa; int; int; int));octa oatit (z; r; u; p)

octa z; =� octabyte to oat �=int r; =� rounding mode �=int u; =� unsigned? �=int p; =� short precision? �=

fint e; char s;register int t;

exceptions = 0;if (:z:h ^ :z:l) return zero octa ;if (:r) r = cur round ;if (:u ^ (z:h& sign bit )) s = '-'; z = ominus (zero octa ; z); else s = '+';e = 1076;while (z:h < #

400000) e��; z = shift left (z; 1);while (z:h � #

800000) fe++;t = z:l & 1;z = shift right (z; 1; 1);z:l j= t;

gif (p) hConvert to short oat 90 i;return fpack (z; e; s; r);

g

90. hConvert to short oat 90 i �fregister int ex ; register tetra t;

t = sfpack (z; e; s; r);ex = exceptions ;sfunpack (t;&z;&e;&s);exceptions = ex ;

g

This code is used in section 89.

91. The square root operation is more interesting.

hSubroutines 5 i +�octa froot ARGS((octa; int));octa froot (z; r)

octa z; =� the operand �=int r; =� the rounding mode �=

fftype zt ;int ze ;

Page 113: MMIXware - A RISC Computer for the Third Millennium - Knuth

105 MMIX-ARITH: FLOATING POINT REMAINDERS

char zs ;octa x; xf ; rf ; zf ;register int xe ; k;

if (:r) r = cur round ;zt = funpack (z;&zf ;&ze ;&zs );if (zs � '-' ^ zt 6= zro ) exceptions j= I_BIT; x = standard NaN ;else switch (zt ) fcase nan : if (:(z:h& #

80000)) exceptions j= I_BIT; z:h j= #80000;

return z;case inf : case zro : x = z; break;case num : hTake the square root and return 92 i;g

if (zs � '-') x:h j= sign bit ;return x;

g

92. The square root can be found by an adaptation of the old pencil-and-papermethod. If n = bpsc, where s is an integer, we have s = n2 + r where 0 � r � 2n;this invariant can be maintained if we replace s by 4s+(0; 1; 2; 3) and n by 2n+(0; 1).The following code implements this idea with 2n in xf and r in rf . (It could easilybe made to run about twice as fast.)

hTake the square root and return 92 i �xf :h = 0; xf :l = 2;xe = (ze + #

3fe)� 1;if (ze & 1) zf = shift left (zf ; 1);rf :h = 0; rf :l = (zf :h� 22)� 1;for (k = 53; k; k��) f

rf = shift left (rf ; 2); xf = shift left (xf ; 1);if (k � 43) rf = incr (rf ; (zf :h� (2 � (k � 43))) & 3);else if (k � 27) rf = incr (rf ; (zf :l� (2 � (k � 27))) & 3);if ((rf :l > xf :l ^ rf :h � xf :h) _ rf :h > xf :h) f

xf :l++; rf = ominus (rf ; xf ); xf :l++;g

gif (rf :h _ rf :l) xf :l++; =� sticky bit �=return fpack (xf ; xe ; '+'; r);

This code is used in section 91.

ARGS=macro ( ), x2.cur round : int, x30.exceptions : int, x32.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.I_BIT=macro, x31.

incr : octa ( ), x6.inf =2, x36.l: tetra, x3.nan =3, x36.num =1, x36.octa= struct, x3.ominus : octa ( ), x5.sfpack : tetra ( ), x34.

sfunpack : ftype ( ), x38.shift left : octa ( ), x7.shift right : octa ( ), x7.sign bit =macro, x4.standard NaN : octa, x4.tetra=unsigned int, x3.zero octa : octa, x4.zro =0, x36.

Page 114: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: FLOATING POINT REMAINDERS 106

93. And �nally, the genuine oating point remainder. Subroutine fremstep eithercalculates y rem z or reduces y to a smaller number having the same remainder withrespect to z. In the latter case the E_BIT is set in exceptions . A third parameter,delta , gives a decrease in exponent that is acceptable for incomplete results; if deltais suÆciently large, say 2500, the correct result will always be obtained in one step offremstep .

hSubroutines 5 i +�octa fremstep ARGS((octa;octa; int));octa fremstep(y; z; delta )

octa y; z;int delta ;

fftype yt ; zt ;int ye ; ze ;char xs ; ys ; zs ;octa x; xf ; yf ; zf ;register int xe ; thresh ; odd ;

yt = funpack (y;&yf ;&ye ;&ys );zt = funpack (z;&zf ;&ze ;&zs );switch (4 � yt + zt ) fhThe usual NaN cases 42 i;case 4 � zro + zro : case 4 � num + zro : case 4 � inf + zro : case 4 � inf + num :case 4 � inf + inf : x = standard NaN ;exceptions j= I_BIT; break;

case 4 � zro + num : case 4 � zro + inf : case 4 � num + inf : return y;case 4 � num + num : hRemainderize nonzero numbers and return 94 i;zero out : x = zero octa ;gif (ys � '-') x:h j= sign bit ;return x;

g

94. If there's a huge di�erence in exponents and the remainder is nonzero, thiscomputation will take a long time. One could compute (2ny) rem z much more quicklyfor large n by using O(log n) multiplications modulo z, but the oating remainderoperation isn't important enough to justify such expensive hardware.Results of oating remainder are always exact, so the rounding mode is immaterial.

hRemainderize nonzero numbers and return 94 i �odd = 0; =� will be 1 if we've subtracted an odd multiple of z from y �=thresh = ye � delta ;if (thresh < ze ) thresh = ze ;while (ye � thresh ) hReduce (ye ; yf ) by a multiple of zf ; goto zero out if the

remainder is zero, goto try complement if appropriate 95 i;if (ye � ze ) f

exceptions j= E_BIT; return fpack (yf ; ye ; ys ; ROUND_OFF);gif (ye < ze � 1) return fpack (yf ; ye ; ys ; ROUND_OFF);yf = shift right (yf ; 1; 1);

Page 115: MMIXware - A RISC Computer for the Third Millennium - Knuth

107 MMIX-ARITH: FLOATING POINT REMAINDERS

try complement : xf = ominus (zf ; yf ); xe = ze ; xs = '+' + '-' � ys ;if (xf :h > yf :h _ (xf :h � yf :h ^ (xf :l > yf :l _ (xf :l � yf :l ^ :odd )))) xf = yf ; xs = ys ;while (xf :h < #

400000) xe��; xf = shift left (xf ; 1);return fpack (xf ; xe ; xs ; ROUND_OFF);

This code is used in section 93.

95. Here we are careful not to change the sign of y, because a remainder of 0 issupposed to inherit the original sign of y.

hReduce (ye ; yf ) by a multiple of zf ; goto zero out if the remainder is zero, gototry complement if appropriate 95 i �

fif (yf :h � zf :h ^ yf :l � zf :l) goto zero out ;if (yf :h < zf :h _ (yf :h � zf :h ^ yf :l < zf :l)) fif (ye � ze ) goto try complement ;ye��; yf = shift left (yf ; 1);

gyf = ominus (yf ; zf );if (ye � ze ) odd = 1;while (yf :h < #

400000) ye��; yf = shift left (yf ; 1);g

This code is used in section 94.

ARGS=macro ( ), x2.E_BIT=macro, x31.exceptions : int, x32.fpack : octa ( ), x31.ftype= enum, x36.funpack : ftype ( ), x37.h: tetra, x3.

I_BIT=macro, x31.inf =2, x36.l: tetra, x3.num =1, x36.octa= struct, x3.ominus : octa ( ), x5.ROUND_OFF=1, x30.

shift left : octa ( ), x7.shift right : octa ( ), x7.sign bit =macro, x4.standard NaN : octa, x4.zero octa : octa, x4.zro =0, x36.

Page 116: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-ARITH: NAMES OF THE SECTIONS 108

96. Names of the sections.

hAdd nonzero numbers and return 47 i Used in section 46.

hAdjust for di�erence in exponents 49 i Used in section 47.

hCheck that x < z; otherwise give trivial answer 14 i Used in section 13.

hCompare two numbers with respect to epsilon and return 51 i Used in section 50.

hCompute the di�erence of fraction parts, o 53 i Used in section 51.

hCompute the signi�cant digits in the large-exponent case 65 i Used in section 64.

hCompute the signi�cant digits s and decimal exponent e 64 i Used in section 54.

hConvert to short oat 90 i Used in section 89.

hDetermine the binary fraction and binary exponent 83 i Used in section 78.

hDetermine the number of signi�cant places n in the divisor v 16 i Used in section 13.

hDetermine the quotient digit q[j] 20 i Used in section 13.

hDivide nonzero numbers and return 45 i Used in section 44.

hExchange y with z 48 i Used in sections 47 and 51.

hExtract the exponent e and determine the fraction interval [f : : g] or (f : : g) 55 iUsed in section 54.

hFind the trial quotient, q 21 i Used in section 20.

hGlobal variables 4, 9, 30, 32, 69, 75 i Used in section 1.

hHandle the special case when the fraction part is zero 57 i Used in section 55.

h If the result was negative, decrease q by 1 23 i Used in section 20.

h Integerize and return 87 i Used in section 86.

hLocal variables for print oat 56, 66 i Used in section 54.

hLocal variables for scan const 70, 76, 81 i Used in section 68.

hMove the digits from buf to � 79 i Used in section 78.

hMultiply nonzero numbers and return 43 i Used in section 41.

hNormalize the divisor 17 i Used in section 13.

hOther type de�nitions 36, 59 i Used in section 1.

hPack and round the answer 84 i Used in section 78.

hPack q and u to acc and aux 19 i Used in section 13.

hPack w into the outputs aux and acc 11 i Used in section 8.

hPrint the signi�cant digits with proper context 67 i Used in section 54.

hPut the 9-digit number �p : : : �(p+ 8) into �:dat [k] 80 i Used in section 79.

hReduce (ye ; yf ) by a multiple of zf ; goto zero out if the remainder is zero, gototry complement if appropriate 95 i Used in section 94.

hRemainderize nonzero numbers and return 94 i Used in section 93.

hReturn a oating point constant 78 i Used in section 73.

hReturn in�nity 72 i Used in section 68.

hReturn the standard NaN 71 i Used in section 68.

hRound and return the result 33 i Used in section 31.

hRound and return the short result 35 i Used in section 34.

hScan a fraction part 74 i Used in section 73.

hScan a number and return 73 i Used in section 68.

hScan an exponent 77 i Used in section 73.

hStore f and g as multiprecise integers 63 i Used in section 54.

hStu� for C preprocessor 2 i Used in section 1.

Page 117: MMIXware - A RISC Computer for the Third Millennium - Knuth

109 MMIX-ARITH: NAMES OF THE SECTIONS

hSubroutines 5, 6, 7, 8, 12, 13, 24, 25, 26, 27, 28, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 50, 54, 60,

61, 62, 68, 82, 85, 86, 88, 89, 91, 93 i Used in section 1.

hSubtract bj qv from u 22 i Used in section 20.

hTake the square root and return 92 i Used in section 91.

hTetrabyte and octabyte type de�nitions 3 i Used in section 1.

hThe usual NaN cases 42 i Used in sections 41, 44, 46, and 93.

hUndenormalize y and z, if they are denormal 52 i Used in section 51.

hUnnormalize the remainder 18 i Used in section 13.

hUnpack the dividend and divisor to u and v 15 i Used in section 13.

hUnpack the multiplier and multiplicand to u and v 10 i Used in section 8.

Page 118: MMIXware - A RISC Computer for the Third Millennium - Knuth

110

MMIX-CONFIG

1. Input format. Con�guration �les allow this simulator to adapt itself to in-�nitely many possible combinations of hardware features. The purpose of the presentmodule is to read a con�guration �le, check it for validity, and set up the relevantdata structures.All data in a con�guration �le consists simply of tokens separated by one or more

units of white space, where a \token" is any sequence of nonspace characters thatdoesn't contain a percent sign. Percent signs and anything following them on a lineare ignored; this convention allows a user to include comments in the �le. Here's asimple (but weird) example:

% Silly configuration

writebuffer 200

memaddresstime 100

Dcache associativity 4 lru

Dcache blocksize 1024

unit ODD 5555555555555555555555555555555555555555555555555555555555555555

unit EVEN aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

div 40 30 20 % three-stage divide

It means that (1) the write bu�er has capacity for 200 octabytes; (2) the memory bustakes 100 cycles to process an address; (3) there's a D-cache, in which each set has4 blocks and the replacement policy is least-recently-used; (4) each block in the D-cache has 1024 bytes; (5) there are two functional units, one for all the odd-numberedopcodes and one for all the rest; (6) the division instructions take three pipeline stages,spending 40 cycles in the �rst stage, 30 in the second, and 20 in the last; (7) all otherparameters have default values.

2. Four kinds of speci�cations can appear in a con�guration �le, according to thefollowing syntax:

h speci�cation i �! hPV spec i j h cache spec i j hpipe spec i j h functional spec ihPV spec i �! hparameter ihdecimal value ih cache spec i �! h cache name ih cache parameter ihdecimal value ihpolicy ihpipe spec i �! h operation ihpipeline times ih functional spec i �! unit hname ih 64 hexadecimal digits i

3. A hPV spec i simply assigns a given value to a given parameter. The possibilitiesfor hparameter i are as follows:

� fetchbuffer (default 4), maximum instructions in the fetch bu�er; must be � 1.

� writebuffer (default 2), maximum octabytes in the write bu�er; must be � 1.

� reorderbuffer (default 5), maximum instructions issued but not committed; mustbe � 1.

� renameregs (default 5), maximum partial results in the reorder bu�er; must be� 1.

D.E. Knuth: MMIXware, LNCS 1750, pp. 110-136, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 119: MMIXware - A RISC Computer for the Third Millennium - Knuth

111 MMIX-CONFIG: INPUT FORMAT

� memslots (default 2), maximum store instructions in the reorder bu�er; must be� 1.

� localregs (default 256), number of local registers in ring; must be 256, 512, or1024.

� fetchmax (default 2), maximum instructions fetched per cycle; must be � 1.

� dispatchmax (default 1), maximum instructions issued per cycle; must be � 1.

� peekahead (default 1), maximum lookahead for jumps per cycle.

� commitmax (default 1), maximum instructions committed per cycle; must be � 1.

� fremmax (default 1), maximum reductions in FREM computation per cycle; must be� 1.

� denin (default 1), extra cycles taken if a oating point input is denormal.

� denout (default 1), extra cycles taken if a oating point result is denormal.

� writeholdingtime (default 0), minimum number of cycles for data to remain inthe write bu�er.

� memaddresstime (default 20), cycles to process memory address; must be � 1.

� memreadtime (default 20), cycles to read one memory busload; must be � 1.

� memwritetime (default 20), cycles to write one memory busload; must be � 1.

� membusbytes (default 8), number of bytes per memory busload; must be a powerof 2 that is 8 or more.

� branchpredictbits (default 0), number of bits in each branch prediction tableentry; must be � 8.

� branchaddressbits (default 0), number of bits in instruction address used toindex the branch prediction table.

� branchhistorybits (default 0), number of bits in branch history used to indexthe branch prediction table.

� branchdualbits (default 0), number of bits of instruction-address-xor-branch-history used to index the branch prediction table.

� hardwarepagetable (default 1), is zero if page table calculations must be emulatedby the operating system.

� disablesecurity (default 0), is 1 if the hot-seat security checks are turned o�.This option is used only for testing purposes; it means that the `s' interrupt will notoccur, and the `p' interrupt will be signaled only when going from a nonnegativelocation to a negative one.

� memchunksmax (default 1000), maximum number of 216-byte chunks of simulatedmemory; must be � 1.

� hashprime (default 2009), prime number used to address simulated memory; mustexceed memchunksmax, preferably by a factor of about 2.

The values of memchunksmax and hashprime a�ect only the speed of the simulator,not its results|unless a very huge program is being simulated. The stated defaultsfor memchunksmax and hashprime should be adequate for almost all applications.

Page 120: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: INPUT FORMAT 112

4. A h cache spec i assigns a given value to a parameter a�ecting one of �ve possiblecaches:

h cache spec i �! h cache name ih cache parameter ihdecimal value ihpolicy ih cache name i �! ITcache j DTcache j Icache j Dcache j Scachehpolicy i �! h empty i j random j serial j pseudolru j lru

The possibilities for h cache parameter i are as follows:

� associativity (default 1), number of cache blocks per cache set; must be a powerof 2. (A cache with associativity 1 is said to be \direct-mapped.")

� blocksize (default 8), number of bytes per cache block; must be a power of 2, atleast equal to the granularity, and at most equal to 8192. The blocksize of ITcacheand DTcache must be 8.

� setsize (default 1), number of sets of cache blocks; must be a power of 2. (Acache with set size 1 is said to be \fully associative.")

� granularity (default 8), number of bytes per \dirty bit," used to remember whichitems of data have changed since they were read from memory; must be a powerof 2 and at least 8. The granularity must be 8 if writeallocate is 0.

� victimsize (default 0), number of cache blocks in the victim bu�er, which holdsblocks removed from the main cache sets; must be zero or a power of 2.

� writeback (default 0), is 1 in a \write-back" cache, which holds dirty data aslong as possible; is 0 in a \write-through" cache, which cleans all data as soon aspossible.

� writeallocate (default 0), is 1 in a \write-allocate" cache, which remembers allrecently written data; is 0 in a \write-around" cache, which doesn't make space fornewly written data that fails to hit an existing cache block.

� accesstime (default 1), number of cycles to query the cache; must be � 1. (Hitsin the S-cache actually require twice the accesstime, once to query the tag and onceto transmit the data.)

� copyintime (default 1), number of cycles to move a cache block from its inputbu�er into the cache proper; must be � 1.

� copyouttime (default 1), number of cycles to move a cache block from the cacheproper to its output bu�er; must be � 1.

� ports (default 1), number of processes that can simultaneous query the cache;must be � 1.

The hpolicy i parameter should be nonempty only on cache speci�cations for param-eters associativity and victimsize. If no replacement policy is speci�ed, random isthe default. All four policies are equivalent when the associativity or victimsizeis 1; pseudolru is equivalent to lru when the associativity or victimsize is 2.The granularity, writeback, writeallocate, and copyouttime parameters af-

fect the performance only of the D-cache and S-cache; the other three caches areread-only, so they never need to write their data.

Page 121: MMIXware - A RISC Computer for the Third Millennium - Knuth

113 MMIX-CONFIG: INPUT FORMAT

The ports parameter a�ects the performance of the D-cache and DT-cache, and(if the PREGO command is used) the performance of the I-cache and IT-cache. The S-cache accommodates only one process at a time, regardless of the number of speci�edports.Only the translation caches (the IT-cache and DT-cache) are present by default.

But if any speci�cations are given for, say, an I-cache, all of the unspeci�ed I-cacheparameters take their default values.The existence of a S-cache (secondary cache) implies the existence of both I-cache

and D-cache (primary caches for instructions and data). The block size of thesecondary cache must not be less than the block size of the primary caches. Thesecondary cache must have the same granularity as the D-cache.

5. A hpipe spec i governs the execution time of potentially slow operations.

hpipe spec i �! h operation ihpipeline times ihpipeline times i �! hdecimal value i j hpipeline times ihdecimal value i

Here the h operation i is one of the following:

� mul0 through mul8 (default 10); the values for mulj refer to products in which thesecond operand is less than 28j , where j is as small as possible. Thus, for example,mul1 applies to nonzero one-byte multipliers.

� div (default 60); this applies to integer division, signed and unsigned.

� sh (default 1); this applies to left and right shifts, signed and unsigned.

� mux (default 1); the multiplex operator.

� sadd (default 1); the sideways addition operator.

� mor (default 1); the boolean matrix ultiplication operators MOR and MXOR.

� fadd (default 4); oating point addition and subtraction.

� fmul (default 4); oating point multiplication.

� fdiv (default 40); oating point division.

� fsqrt (default 40); oating point square root.

� fint (default 4); oating point integerization.

� fix (default 2); conversion from oating to �xed, signed and unsigned.

� flot (default 2); conversion from �xed to oating, signed and unsigned.

� feps (default 4); oating comparison with respect to epsilon.

In each case one can specify a sequence of pipeline stages, with a positive number ofcycles to be spent in each stage. For example, a speci�cation like `fmul 3 1' wouldsay that a functional unit that supports FMUL takes a total of four cycles to computethe oating point product in two stages; it can start working on a second productafter three cycles have gone by.If a oating point operation has a denormal input, denin is added to the time for

the �rst stage. If a oating point operation has a denormal result, denout is addedto the time for the last stage.

Page 122: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: INPUT FORMAT 114

6. The fourth and �nal kind of speci�cation de�nes a functional unit:

h functional spec i �! unit hname ih 64 hexadecimal digits i

The symbolic name should be at most �fteen characters long. The 64 hexadecimaldigits contain 256 bits, with `1' for each supported opcode; the most signi�cant(leftmost) bit is for opcode 0 (TRAP), and the least signi�cant bit is for opcode 255(TRIP).For example, we can de�ne a load/store unit (which handles register/memory oper-

ations), a multiplication unit (which handles �xed and oating point multiplication), aboolean unit (which handles only bitwise operations), and a more general arithmetic-logical unit, as follows:

unit LSU 00000000000000000000000000000000fffffffcfffffffc0000000000000000

unit MUL 000080f000000000000000000000000000000000000000000000000000000000

unit BIT 000000000000000000000000000000000000000000000000ffff00ff00ff0000

unit ALU f0000000ffffffffffffffffffffffff0000000300000003ffffffffffffffff

The order in which units are speci�ed is important, because MMIX's dispatcherwill try to match each instruction with the �rst functional unit that supports itsopcode. Therefore it is best to list more specialized units (like the BIT unit in thisexample) before more general ones; this lets the specialized units have �rst chance atthe instructions they can handle.There can be any number of functional units, having possibly identical speci�ca-

tions. One should, however, give each unit a unique name (e.g., ALU1 and ALU2 if thereare two arithmetic-logical units), since these names are used in diagnostic messages.Opcodes that aren't supported by any speci�ed unit will cause an emulation trap.

7. Full details about the signi�cance of all these parameters can be found in themmix-pipe module, which de�nes and discusses the data structures that need to becon�gured and initialized.Of course the speci�cations in a con�guration �le needn't make any sense, nor need

they be practically achievable. We could, for example, specify a unit that handlesonly the two opcodes NXOR and DIVUI; we could specify 1-cycle division but pipelined100-cycle shifts, or 1-cycle memory access but 100-cycle cache access. We could createa thousand rename registers and issue a hundred instructions per cycle, etc. Somecombinations of parameters are clearly ridiculous.But there remain a huge number of possibilities of interest, especially as technology

continues to evolve. By experimenting with con�gurations that are extreme bypresent-day standards, we can see how much might be gained if the correspondinghardware could be built economically.

Page 123: MMIXware - A RISC Computer for the Third Millennium - Knuth

115 MMIX-CONFIG: BASIC INPUT/OUTPUT

8. Basic input/output. Let's get ready to program the MMIX con�g subrou-tine by building some simple infrastructure. First we need some macros to print errormessages.

#de�ne errprint0 (f) fprintf (stderr ; f)#de�ne errprint1 (f; a) fprintf (stderr ; f ; a)#de�ne errprint2 (f; a; b) fprintf (stderr ; f ; a; b)#de�ne errprint3 (f; a; b; c) fprintf (stderr ; f ; a; b; c)#de�ne panic(x) f x; errprint0 ("!\n"); exit (�1); g

9. And we need a place to look at the input.

#de�ne BUF_SIZE 100 =� we don't need long lines �=

hGlobal variables 9 i �FILE �con�g �le ; =� input comes from here �=char bu�er [BUF_SIZE]; =� input lines go here �=char token [BUF_SIZE]; =� and tokens are copied to here �=char �buf pointer = bu�er ; =� this is our current position �=bool token prescanned ; =� does token contain the next token already? �=

See also sections 15 and 28.

This code is used in section 38.

bool= enum, MMIX-PIPE x11.exit : void ( ), <stdlib.h>.

FILE, <stdio.h>.fprintf : int ( ), <stdio.h>.

MMIX con�g : void ( ), x38.stderr : FILE �, <stdio.h>.

Page 124: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: BASIC INPUT/OUTPUT 116

10. The get token routine copies the next token of input into the token bu�er. Afterthe input has ended, a �nal `end' is appended.

hSubroutines 10 i �static void get token ARGS((void));static void get token ( ) =� set token to the next token of the con�guration �le �=fregister char �p; �q;

if (token prescanned ) ftoken prescanned = false ; return;

gwhile (1) f =� scan past white space �=if (�buf pointer � '\0' _ �buf pointer � '\n' _ �buf pointer � '%') fif (:fgets (bu�er ; BUF_SIZE; con�g �le )) f

strcpy (token ; "end"); return;gif (strlen (bu�er ) � BUF_SIZE � 1 ^ bu�er [BUF_SIZE � 2] 6= '\n')

panic(errprint1 ("config file line too long: `%s...'"; bu�er ));buf pointer = bu�er ;

g else if (:isspace (�buf pointer )) break;else buf pointer ++;

gfor (p = buf pointer ; q = token ; :isspace (�p) ^ �p 6= '%'; p++; q++) �q = �p;buf pointer = p; �q = '\0';return;

g

See also sections 11, 16, 22, 23, 30, and 31.

This code is used in section 38.

11. The get int routine is called when we wish to input a decimal value. It returns�1 if the next token isn't a valid decimal integer.

hSubroutines 10 i +�static int get int ARGS((void));static int get int ( )f int v;

get token ( );if (sscanf (token ; "%d";&v) 6= 1) return �1;return v;

g

Page 125: MMIXware - A RISC Computer for the Third Millennium - Knuth

117 MMIX-CONFIG: BASIC INPUT/OUTPUT

12. A simple data structure makes it fairly easy to deal with parameter/valuespeci�cations.

hType de�nitions 12 i �typedef struct fchar name [20]; =� symbolic name �=int �v; =� internal name �=int defval ; =� default value �=int minval ; maxval ; =� minimum and maximum legal values �=bool power of two ; =� must it be a power of two? �=

g pv spec;

See also sections 13 and 14.

This code is used in section 38.

13. Cache parameters are a bit more diÆcult, but still not bad.

hType de�nitions 12 i +�typedef enum f

assoc ; blksz ; setsz ; gran ; vctsz ;wrb ;wra ; acctm ; citm ; cotm ; prtsg c param;

typedef struct fchar name [20]; =� symbolic name �=c param v; =� internal code �=int defval ; =� default value �=int minval ; maxval ; =� minimum and maximum legal values �=bool power of two ; =� must it be a power of two? �=

g cpv spec;

14. Operation codes are the easiest of all.

hType de�nitions 12 i +�typedef struct fchar name [8]; =� symbolic name �=internal opcode v; =� internal code �=int defval ; =� default value �=

g op spec;

ARGS=macro ( ), MMIX-PIPE x6.bool= enum, MMIX-PIPE x11.buf pointer : char �, x9.BUF_SIZE=100, x9.bu�er : char [ ], x9.con�g �le : FILE �, x9.

errprint1 =macro ( ), x8.false =0, MMIX-PIPE x11.fgets : char �( ), <stdio.h>.internal opcode= enum,MMIX-PIPE x49.

isspace : int ( ), <ctype.h>.

panic =macro ( ), x8.sscanf : int ( ), <stdio.h>.strcpy : char �( ), <string.h>.strlen : int ( ), <string.h>.token : char [ ], x9.token prescanned : bool, x9.

Page 126: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: BASIC INPUT/OUTPUT 118

15. Most of the parameters are external variables that are declared in the header�le mmix-pipe.h; but some are private to this module. Here we de�ne the main tablesused below.

hGlobal variables 9 i +�int fetch buf size ; write buf size ; reorder buf size ; mem bus bytes ; hardware PT ;int max cycs = 60;pv spec PV [ ] = ff"fetchbuffer";&fetch buf size ; 4; 1; INT_MAX; falseg;f"writebuffer";&write buf size ; 2; 1; INT_MAX; falseg;f"reorderbuffer";&reorder buf size ; 5; 1; INT_MAX; falseg;f"renameregs";&max rename regs ; 5; 1; INT_MAX; falseg;f"memslots";&max mem slots ; 1; 1; INT_MAX; falseg;f"localregs";&lring size ; 256; 256; 1024; trueg;f"fetchmax";&fetch max ; 2; 1; INT_MAX; falseg;f"dispatchmax";&dispatch max ; 1; 1; INT_MAX; falseg;f"peekahead";&peekahead ; 1; 0; INT_MAX; falseg;f"commitmax";&commit max ; 1; 1; INT_MAX; falseg;f"fremmax";&frem max ; 1; 1; INT_MAX; falseg;f"denin";&denin penalty ; 1; 0; INT_MAX; falseg;f"denout";&denout penalty ; 1; 0; INT_MAX; falseg;f"writeholdingtime";&holding time ; 0; 0; INT_MAX; falseg;f"memaddresstime";&mem addr time ; 20; 1; INT_MAX; falseg;f"memreadtime";&mem read time ; 20; 1; INT_MAX; falseg;f"memwritetime";&mem write time ; 20; 1; INT_MAX; falseg;f"membusbytes";&mem bus bytes ; 8; 8; INT_MAX; trueg;f"branchpredictbits";&bp n ; 0; 0; 8; falseg;f"branchaddressbits";&bp a ; 0; 0; 32; falseg;f"branchhistorybits";&bp b ; 0; 0; 32; falseg;f"branchdualbits";&bp c ; 0; 0; 32; falseg;f"hardwarepagetable";&hardware PT ; 1; 0; 1; falseg;f"disablesecurity"; (int �) &security disabled ; 0; 0; 1; falseg;f"memchunksmax";&mem chunks max ; 1000; 1; INT_MAX; falseg;f"hashprime";&hash prime ; 2009; 2; INT_MAX; falsegg;

cpv spec CPV [ ] = ff"associativity"; assoc ; 1; 1; INT_MAX; trueg;f"blocksize"; blksz ; 8; 8; 8192; trueg;f"setsize"; setsz ; 1; 1; INT_MAX; trueg;f"granularity"; gran ; 8; 8; 8192; trueg;f"victimsize"; vctsz ; 0; 0; INT_MAX; trueg;f"writeback";wrb ; 0; 0; 1; falseg;f"writeallocate";wra ; 0; 0; 1; falseg;f"accesstime"; acctm ; 1; 1; INT_MAX; falseg;f"copyintime"; citm ; 1; 1; INT_MAX; falseg;f"copyouttime"; cotm ; 1; 1; INT_MAX; falseg;f"ports"; prts ; 1; 1; INT_MAX; falsegg;

op spec OP [ ] = ff"mul0";mul0 ; 10g; f"mul1";mul1 ; 10g; f"mul2";mul2 ; 10g; f"mul3";mul3 ; 10g; f"mul4";mul4 ; 10g; f"mul5";mul5 ; 10g; f"mul6";mul6 ; 10g; f"mul7";mul7 ; 10g; f"mul8";mul8 ; 10g;f"div"; div ; 60g; f"sh"; sh ; 1g; f"mux";mux ; 1g; f"sadd"; sadd ; 1g; f"mor";mor ; 1g;

Page 127: MMIXware - A RISC Computer for the Third Millennium - Knuth

119 MMIX-CONFIG: BASIC INPUT/OUTPUT

f"fadd"; fadd ; 4g; f"fmul"; fmul ; 4g; f"fdiv"; fdiv ; 40g; f"fsqrt"; fsqrt ; 40g;f"fint";�nt ; 4g;f"fix";�x ; 2g; f"flot"; ot ; 2g; f"feps"; feps ; 4gg;

int PV size ; CPV size ; OP size ; =� the number of entries in PV , CPV , OP �=

acctm =7, x13.assoc =0, x13.blksz =1, x13.bp a : int, MMIX-PIPE x150.bp b : int, MMIX-PIPE x150.bp c : int, MMIX-PIPE x150.bp n : int, MMIX-PIPE x150.citm =8, x13.commit max : int,MMIX-PIPE x59.

cotm =9, x13.cpv spec= struct, x13.denin penalty : int,MMIX-PIPE x349.

denout penalty : int,MMIX-PIPE x349.

dispatch max : int,MMIX-PIPE x59.

div =9, MMIX-PIPE x49.fadd =14, MMIX-PIPE x49.false =0, MMIX-PIPE x11.fdiv =16, MMIX-PIPE x49.feps =21, MMIX-PIPE x49.fetch max : int, MMIX-PIPE x59.�nt =18, MMIX-PIPE x49.

�x =19, MMIX-PIPE x49. ot =20, MMIX-PIPE x49.fmul =15, MMIX-PIPE x49.frem max : int, MMIX-PIPE x349.fsqrt =17, MMIX-PIPE x49.gran =3, x13.hash prime : int,MMIX-PIPE x207.

holding time : int,MMIX-PIPE x247.

INT_MAX=macro, <limits.h>.lring size : int, MMIX-PIPE x86.max mem slots : int,MMIX-PIPE x86.

max rename regs : int,MMIX-PIPE x86.

mem addr time : int,MMIX-PIPE x214.

mem chunks max : int,MMIX-PIPE x207.

mem read time : int,MMIX-PIPE x214.

mem write time : int,MMIX-PIPE x214.

mor =13, MMIX-PIPE x49.mul0 =0, MMIX-PIPE x49.mul1 =1, MMIX-PIPE x49.mul2 =2, MMIX-PIPE x49.mul3 =3, MMIX-PIPE x49.mul4 =4, MMIX-PIPE x49.mul5 =5, MMIX-PIPE x49.mul6 =6, MMIX-PIPE x49.mul7 =7, MMIX-PIPE x49.mul8 =8, MMIX-PIPE x49.mux =11, MMIX-PIPE x49.op spec= struct, x14.peekahead : int, MMIX-PIPE x59.prts =10, x13.pv spec= struct, x12.sadd =12, MMIX-PIPE x49.security disabled : bool,MMIX-PIPE x66.

setsz =2, x13.sh =10, MMIX-PIPE x49.true =1, MMIX-PIPE x11.vctsz =4, x13.wra =6, x13.wrb =5, x13.

Page 128: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: BASIC INPUT/OUTPUT 120

16. The new cache routine creates a cache structure with default values. (Thesedefault values are \hard-wired" into the program, not actually read from the CPV

table.)

hSubroutines 10 i +�static cache �new cache ARGS((char �));static cache �new cache (name )

char �name ;f register cache �c = (cache �) calloc(1; sizeof (cache));

if (:c) panic(errprint1 ("Can't allocate %s";name ));c~aa = 1; =� default associativity, should equal CPV [0]:defval �=c~bb = 8; =� default blocksize �=c~cc = 1; =� default setsize �=c~gg = 8; =� default granularity �=c~vv = 0; =� default victimsize �=c~repl = random ; =� default replacement policy �=c~vrepl = random ; =� default victim replacement policy �=c~mode = 0; =� default mode is write-through and write-around �=c~access time = c~copy in time = c~copy out time = 1;c~�ller :ctl = &(c~�ller ctl );c~�ller ctl :ptr a = (void �) c;c~�ller ctl :go :o:l = 4;c~ usher :ctl = &(c~ usher ctl );c~ usher ctl :ptr a = (void �) c;c~ usher ctl :go :o:l = 4;c~ports = 1;c~name = name ;return c;

g

17. h Initialize to defaults 17 i �PV size = (sizeof PV )=sizeof (pv spec);CPV size = (sizeof CPV )=sizeof (cpv spec);OP size = (sizeof OP )=sizeof (op spec);ITcache = new cache ("ITcache");DTcache = new cache ("DTcache");Icache = Dcache = Scache = �;for (j = 0; j < PV size ; j++) �(PV [j]:v) = PV [j]:defval ;for (j = 0; j < OP size ; j++) f

pipe seq [OP [j]:v][0] = OP [j]:defval ;pipe seq [OP [j]:v][1] = 0; =� one stage �=

g

This code is used in section 38.

Page 129: MMIXware - A RISC Computer for the Third Millennium - Knuth

121 MMIX-CONFIG: READING THE SPECS

18. Reading the specs. Before we're ready to process the con�guration �le, weneed to count the number of functional units, so that we know how much space toallocate for them.A special background unit is always provided, just to make sure that TRAP and

TRIP instructions are handled by somebody.

hCount and allocate the functional units 18 i �funit count = 0;while (strcmp (token ; "end") 6= 0) f

get token ( );if (strcmp(token ; "unit") � 0) f

funit count++;get token ( ); get token ( ); =� a unit might be named unit or end �=

ggfunit = (func �) calloc(funit count + 1; sizeof (func));if (:funit ) panic(errprint0 ("Can't allocate the functional units"));strcpy (funit [funit count ]:name ; "%%");funit [funit count ]:ops [0] = #80000000; =� TRAP �=funit [funit count ]:ops [7] = #1; =� TRIP �=

This code is used in section 38.

aa : int, MMIX-PIPE x167.access time : int,MMIX-PIPE x167.

ARGS=macro ( ), MMIX-PIPE x6.bb : int, MMIX-PIPE x167.cache= struct,MMIX-PIPE x167.

calloc : void �( ), <stdlib.h>.cc : int, MMIX-PIPE x167.copy in time : int,MMIX-PIPE x167.

copy out time : int,MMIX-PIPE x167.

CPV : cpv spec [ ], x15.CPV size : int, x15.cpv spec= struct, x13.ctl : control �, MMIX-PIPE x23.Dcache : cache �,MMIX-PIPE x168.

defval : int, x14.defval : int, x12.DTcache : cache �,MMIX-PIPE x168.

errprint0 =macro ( ), x8.errprint1 =macro ( ), x8.

�ller : coroutine,MMIX-PIPE x167.

�ller ctl : control,MMIX-PIPE x167.

usher : coroutine,MMIX-PIPE x167.

usher ctl : control,MMIX-PIPE x167.

func: struct, MMIX-PIPE x76.funit : func �, MMIX-PIPE x77.funit count : int,MMIX-PIPE x77.

get token : static void ( ), x10.gg : int, MMIX-PIPE x167.go =72, MMIX-PIPE x49.Icache : cache �,MMIX-PIPE x168.

ITcache : cache �,MMIX-PIPE x168.

j: register int, x38.l: tetra, MMIX-PIPE x17.mode : int, MMIX-PIPE x167.name : char �, MMIX-PIPE x167.name : char [ ], MMIX-PIPE x76.o: octa, MMIX-PIPE x40.

OP : op spec [ ], x15.OP size : int, x15.op spec= struct, x14.ops : tetra [ ], MMIX-PIPE x76.panic =macro ( ), x8.pipe seq : unsigned char [ ][ ],MMIX-PIPE x136.

ports : int, MMIX-PIPE x167.ptr a : void �, MMIX-PIPE x44.PV : pv spec [ ], x15.PV size : int, x15.pv spec= struct, x12.random =0, MMIX-PIPE x164.repl : replace policy,MMIX-PIPE x167.

Scache : cache �,MMIX-PIPE x168.

strcmp : int ( ), <string.h>.strcpy : char �( ), <string.h>.token : char [ ], x9.v: internal opcode, x14.v: int �, x12.vrepl : replace policy,MMIX-PIPE x167.

vv : int, MMIX-PIPE x167.

Page 130: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: READING THE SPECS 122

19. Now we can read the speci�cations and obey them. This program doesn't botherto be very tolerant of errors, nor does it try to be very eÆcient.Incidentally, the speci�cations don't have to be broken into individual lines in any

meaningful way. We simply read them token by token.

hRecord all the specs 19 i �rewind (con�g �le );funit count = 0;token [0] = '\0';while (strcmp (token ; "end") 6= 0) f

get token ( );if (strcmp(token ; "end") � 0) break;h If token is a parameter name, process a PV spec 20 i;h If token is a cache name, process a cache spec 21 i;h If token is an operation name, process a pipe spec 24 i;if (strcmp(token ; "unit") � 0) hProcess a functional spec 25 i;panic(errprint1 ("Configuration syntax error: Specification can't start with \

`%s'"; token ));g

This code is used in section 38.

20. h If token is a parameter name, process a PV spec 20 i �for (j = 0; j < PV size ; j++)if (strcmp(token ;PV [j]:name ) � 0) f

n = get int ( );if (n < PV [j]:minval )

panic(errprint2 ("Configuration error: %s must be >= %d";PV [j]:name ;PV [j]:minval ));

if (n > PV [j]:maxval )panic(errprint2 ("Configuration error: %s must be <= %d";PV [j]:name ;

PV [j]:maxval ));if (PV [j]:power of two ^ (n& (n� 1)))

panic(errprint1 ("Configuration error: %s must be a power of 2";PV [j]:name ));

�(PV [j]:v) = n;break;

gif (j < PV size ) continue;

This code is used in section 19.

Page 131: MMIXware - A RISC Computer for the Third Millennium - Knuth

123 MMIX-CONFIG: READING THE SPECS

21. h If token is a cache name, process a cache spec 21 i �if (strcmp (token ; "ITcache") � 0) f

pcs (ITcache ); continue;g else if (strcmp(token ; "DTcache") � 0) f

pcs (DTcache ); continue;g else if (strcmp(token ; "Icache") � 0) fif (:Icache ) Icache = new cache ("Icache");pcs (Icache ); continue;

g else if (strcmp(token ; "Dcache") � 0) fif (:Dcache ) Dcache = new cache ("Dcache");pcs (Dcache ); continue;

g else if (strcmp(token ; "Scache") � 0) fif (:Icache ) Icache = new cache ("Icache");if (:Dcache ) Dcache = new cache ("Dcache");if (:Scache ) Scache = new cache ("Scache");pcs (Scache ); continue;

g

This code is used in section 19.

22. h Subroutines 10 i +�static void ppol ARGS((replace policy �));static void ppol (rr ) =� subroutine to scan for a replacement policy �=

replace policy �rr ;f

get token ( );if (strcmp(token ; "random") � 0) �rr = random ;else if (strcmp (token ; "serial") � 0) �rr = serial ;else if (strcmp (token ; "pseudolru") � 0) �rr = pseudo lru ;else if (strcmp (token ; "lru") � 0) �rr = lru ;else token prescanned = true ; =� oops, we should rescan that token �=

g

ARGS=macro ( ), MMIX-PIPE x6.con�g �le : FILE �, x9.Dcache : cache �,MMIX-PIPE x168.

DTcache : cache �,MMIX-PIPE x168.

errprint1 =macro ( ), x8.errprint2 =macro ( ), x8.funit count : int,MMIX-PIPE x77.

get int : static int ( ), x11.get token : static void ( ), x10.Icache : cache �,MMIX-PIPE x168.

ITcache : cache �,MMIX-PIPE x168.

j: register int, x38.lru =3, MMIX-PIPE x164.maxval : int, x12.minval : int, x12.n: register int, x38.name : char [ ], x12.new cache : static cache �( ),

x16.panic =macro ( ), x8.pcs : static void ( ), x23.power of two : bool, x12.pseudo lru =2, MMIX-PIPE x164.

PV : pv spec [ ], x15.PV size : int, x15.random =0, MMIX-PIPE x164.replace policy = enum,MMIX-PIPE x164.

rewind : void ( ), <stdio.h>.Scache : cache �,MMIX-PIPE x168.

serial =1, MMIX-PIPE x164.strcmp : int ( ), <string.h>.token : char [ ], x9.token prescanned : bool, x9.true =1, MMIX-PIPE x11.v: int �, x12.

Page 132: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: READING THE SPECS 124

23. h Subroutines 10 i +�static void pcs ARGS((cache �));static void pcs (c) =� subroutine to process a cache spec �=

cache �c;fregister int j; n;

get token ( );for (j = 0; j < CPV size ; j++)if (strcmp(token ;CPV [j]:name ) � 0) break;

if (j � CPV size ) panic(errprint1 ("Configuration syntax error: `%s' isn't \a cache parameter name"; token ));

n = get int ( );if (n < CPV [j]:minval )

panic(errprint2 ("Configuration error: %s must be >= %d";CPV [j]:name ;CPV [j]:minval ));

if (n > CPV [j]:maxval )panic(errprint2 ("Configuration error: %s must be <= %d";CPV [j]:name ;

CPV [j]:maxval ));if (CPV [j]:power of two ^ (n& (n� 1)))

panic(errprint1 ("Configuration error: %s must be power of 2";CPV [j]:name ));switch (CPV [j]:v) fcase assoc : c~aa = n; ppol (&(c~repl )); break;case blksz : c~bb = n; break;case setsz : c~cc = n; break;case gran : c~gg = n; break;case vctsz : c~vv = n; ppol (&(c~vrepl )); break;case wrb : c~mode = (c~mode &�WRITE_BACK) + n � WRITE_BACK; break;case wra : c~mode = (c~mode &�WRITE_ALLOC) + n � WRITE_ALLOC; break;case acctm : if (n > max cycs ) max cycs = n;

c~access time = n; break;case citm : if (n > max cycs ) max cycs = n;

c~copy in time = n; break;case cotm : if (n > max cycs ) max cycs = n;

c~copy out time = n; break;case prts : c~ports = n; break;g

g

24. h If token is an operation name, process a pipe spec 24 i �for (j = 0; j < OP size ; j++)if (strcmp(token ;OP [j]:name ) � 0) ffor (i = 0; ; i++) f

n = get int ( );if (n < 0) break;if (n � 0) panic(errprint0 ("Configuration error: Pipeline cycles mu\

st be positive"));if (n > 255)

panic(errprint0 ("Configuration error: Pipeline cycles must be <= 255"));if (n > max cycs ) max cycs = n;

Page 133: MMIXware - A RISC Computer for the Third Millennium - Knuth

125 MMIX-CONFIG: READING THE SPECS

if (i � pipe limit )panic(errprint1 ("Configuration error: More than %d pipeline stages";

pipe limit ));pipe seq [OP [j]:v][i] = n;

gtoken prescanned = true ;break;

gif (j < OP size ) continue;

This code is used in section 19.

aa : int, MMIX-PIPE x167.access time : int,MMIX-PIPE x167.

acctm =7, x13.ARGS=macro ( ), MMIX-PIPE x6.assoc =0, x13.bb : int, MMIX-PIPE x167.blksz =1, x13.cache= struct,MMIX-PIPE x167.

cc : int, MMIX-PIPE x167.citm =8, x13.copy in time : int,MMIX-PIPE x167.

copy out time : int,MMIX-PIPE x167.

cotm =9, x13.CPV : cpv spec [ ], x15.CPV size : int, x15.errprint0 =macro ( ), x8.errprint1 =macro ( ), x8.errprint2 =macro ( ), x8.

get int : static int ( ), x11.get token : static void ( ), x10.gg : int, MMIX-PIPE x167.gran =3, x13.i: register int, x38.j: register int, x38.max cycs : int, x15.maxval : int, x13.minval : int, x13.mode : int, MMIX-PIPE x167.n: register int, x38.name : char [ ], x13.name : char [ ], x14.OP : op spec [ ], x15.OP size : int, x15.panic =macro ( ), x8.pipe limit =90, MMIX-PIPE x136.pipe seq : unsigned char [ ][ ],MMIX-PIPE x136.

ports : int, MMIX-PIPE x167.power of two : bool, x13.

ppol : static void ( ), x22.prts =10, x13.repl : replace policy,MMIX-PIPE x167.

setsz =2, x13.strcmp : int ( ), <string.h>.token : char [ ], x9.token prescanned : bool, x9.true =1, MMIX-PIPE x11.v: c param, x13.v: internal opcode, x14.vctsz =4, x13.vrepl : replace policy,MMIX-PIPE x167.

vv : int, MMIX-PIPE x167.wra =6, x13.wrb =5, x13.WRITE_ALLOC=2,MMIX-PIPE x166.

WRITE_BACK=1,MMIX-PIPE x166.

Page 134: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: READING THE SPECS 126

25. hProcess a functional spec 25 i �f

get token ( );if (strlen (token ) > 15)

panic(errprint1 ("Configuration error: `%s' is more than 15 characters long";token ));

strcpy (funit [funit count ]:name ; token );get token ( );if (strlen (token ) 6= 64)

panic(errprint1 ("Configuration error: unit %s doesn't ha\ve 64 hex digit specs"; funit [funit count ]:name ));

for (i = j = n = 0; j < 64; j++) fif (token [j] � '0' ^ token [j] � '9') n = (n� 4) + (token [j]� '0');else if (token [j] � 'a' ^ token [j] � 'f') n = (n� 4) + (token [j]� 'a' + 10);else if (token [j] � 'A' ^ token [j] � 'F') n = (n� 4) + (token [j]� 'A' + 10);else

panic(errprint1 ("Configuration error: `%c' is not a hex digit"; token [j]));if ((j & #7) � #7) funit [funit count ]:ops [i++] = n; n = 0;

gfunit count++;continue;

g

This code is used in section 19.

Page 135: MMIXware - A RISC Computer for the Third Millennium - Knuth

127 MMIX-CONFIG: CHECKING AND ALLOCATING

26. Checking and allocating. The battle is only half over when we've absorbedall the data of the con�guration �le. We still must check for interactions betweendi�erent quantities, and we must allocate space for cache blocks, coroutines, etc.One of the most diÆcult tasks facing us to determine the maximum number of

pipeline stages needed by each functional unit. Let's tackle that �rst.

hAllocate coroutines in each functional unit 26 i �hBuild table of pipeline stages needed for each opcode 27 i;for (j = 0; j � funit count ; j++) fhDetermine the number of stages, n, needed by funit [j] 29 i;funit [j]:k = n;funit [j]:co = (coroutine �) calloc(n; sizeof (coroutine));for (i = 0; i < n; i++) f

funit [j]:co [i]:name = funit [j]:name ;funit [j]:co [i]:stage = i+ 1;

gg

This code is used in section 38.

27. hBuild table of pipeline stages needed for each opcode 27 i �for (j = div ; j � max pipe op ; j++) int stages [j] = strlen (pipe seq [j]);for ( ; j � max real command ; j++) int stages [j] = 1;for (j = mul0 ; n = 0; j � mul8 ; j++)if (strlen (pipe seq [j]) > n) n = strlen (pipe seq [j]);

int stages [mul ] = n;int stages [ld ] = int stages [st ] = int stages [frem ] = 2;for (j = 0; j < 256; j++) stages [j] = int stages [int op [j]];

This code is used in section 26.

calloc : void �( ), <stdlib.h>.co : coroutine �,MMIX-PIPE x76.

coroutine= struct,MMIX-PIPE x23.

div =9, MMIX-PIPE x49.errprint1 =macro ( ), x8.frem =25, MMIX-PIPE x49.funit : func �, MMIX-PIPE x77.funit count : int,MMIX-PIPE x77.

get token : static void ( ), x10.i: register int, x38.

int op : internal opcode [ ],x28.

int stages : int [ ], x28.j: register int, x38.k: int, MMIX-PIPE x76.ld =56, MMIX-PIPE x49.max pipe op = feps ,MMIX-PIPE x49.

max real command = trip ,MMIX-PIPE x49.

mul =26, MMIX-PIPE x49.mul0 =0, MMIX-PIPE x49.mul8 =8, MMIX-PIPE x49.

n: register int, x38.name : char [ ], MMIX-PIPE x76.ops : tetra [ ], MMIX-PIPE x76.panic =macro ( ), x8.pipe seq : unsigned char [ ][ ],MMIX-PIPE x136.

st =63, MMIX-PIPE x49.stage : int, MMIX-PIPE x23.stages : int [ ], x28.strcpy : char �( ), <string.h>.strlen : int ( ), <string.h>.token : char [ ], x9.

Page 136: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: CHECKING AND ALLOCATING 128

28. The int op conversion table is similar to the internal op array of theMMIX pipe

routine, but it replaces divu by div , fsub by fadd , etc.

hGlobal variables 9 i +�internal opcode int op [256] = ftrap ; fcmp ; funeq ; funeq ; fadd ;�x ; fadd ;�x ; ot ; ot ; ot ; ot ; ot ; ot ; ot ; ot ;fmul ; feps ; feps ; feps ; fdiv ; fsqrt ; frem ; �nt ;mul ;mul ;mul ;mul ; div ; div ; div ; div ;add ; add ; addu ; addu ; sub ; sub ; subu ; subu ;addu ; addu ; addu ; addu ; addu ; addu ; addu ; addu ;cmp ; cmp ; cmpu ; cmpu ; sub ; sub ; subu ; subu ;sh ; sh ; sh ; sh ; sh ; sh ; sh ; sh ;br ; br ; br ; br ; br ; br ; br ; br ;br ; br ; br ; br ; br ; br ; br ; br ;pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ;pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ;cset ; cset ; cset ; cset ; cset ; cset ; cset ; cset ;cset ; cset ; cset ; cset ; cset ; cset ; cset ; cset ;zset ; zset ; zset ; zset ; zset ; zset ; zset ; zset ;zset ; zset ; zset ; zset ; zset ; zset ; zset ; zset ;ld ; ld ; ld ; ld ; ld ; ld ; ld ; ld ;ld ; ld ; ld ; ld ; ld ; ld ; ld ; ld ;ld ; ld ; ld ; ld ; ld ; ld ; ld ; ld ;ld ; ld ; ld ; ld ; prego ; prego ; go ; go ;st ; st ; st ; st ; st ; st ; st ; st ;st ; st ; st ; st ; st ; st ; st ; st ;st ; st ; st ; st ; st ; st ; st ; st ;st ; st ; st ; st ; st ; st ; pushgo ; pushgo ;or ; or ; orn ; orn ;nor ; nor ; xor ; xor ;and ; and ; andn ; andn ;nand ;nand ; nxor ;nxor ;bdif ; bdif ;wdif ;wdif ; tdif ; tdif ; odif ; odif ;mux ;mux ; sadd ; sadd ;mor ;mor ;mor ;mor ;set ; set ; set ; set ; addu ; addu ; addu ; addu ;or ; or ; or ; or ; andn ; andn ; andn ; andn ;noop ;noop ; pushj ; pushj ; set ; set ; put ; put ;pop ; resume ; save ; unsave ; sync ; noop ; get ; tripg;int int stages [max real command +1]; =� stages as function of internal opcode �=int stages [256]; =� stages as function of mmix opcode �=

29. hDetermine the number of stages, n, needed by funit [j] 29 i �for (i = n = 0; i < 256; i++)if (((funit [j]:ops [i� 5]� (i& #1f)) & #80000000) ^ stages [i] > n) n = stages [i];

if (n � 0) panic(errprint1 ("Configuration error: unit %s doesn't do anything";funit [j]:name ));

This code is used in section 26.

Page 137: MMIXware - A RISC Computer for the Third Millennium - Knuth

129 MMIX-CONFIG: CHECKING AND ALLOCATING

30. The next hardest thing on our agenda is to set up the cache structure �elds thatdepend on the parameters. For example, although we have de�ned the parameter inthe bb �eld (the block size), we also need to compute the b �eld (log of the blocksize), and we must create the cache blocks themselves.

hSubroutines 10 i +�static int lg ARGS((int));static int lg (n) =� compute binary logarithm �=

int n;f register int j; l;

for (j = n; l = 0; j; j �= 1) l++;return l � 1;

g

add =29, MMIX-PIPE x49.addu =30, MMIX-PIPE x49.and =37, MMIX-PIPE x49.andn =38, MMIX-PIPE x49.ARGS=macro ( ), MMIX-PIPE x6.b: int, MMIX-PIPE x167.bb : int, MMIX-PIPE x167.bdif =48, MMIX-PIPE x49.br =69, MMIX-PIPE x49.cmp =46, MMIX-PIPE x49.cmpu =47, MMIX-PIPE x49.cset =53, MMIX-PIPE x49.div =9, MMIX-PIPE x49.divu =28, MMIX-PIPE x49.errprint1 =macro ( ), x8.fadd =14, MMIX-PIPE x49.fcmp =22, MMIX-PIPE x49.fdiv =16, MMIX-PIPE x49.feps =21, MMIX-PIPE x49.�nt =18, MMIX-PIPE x49.�x =19, MMIX-PIPE x49. ot =20, MMIX-PIPE x49.fmul =15, MMIX-PIPE x49.frem =25, MMIX-PIPE x49.fsqrt =17, MMIX-PIPE x49.fsub =24, MMIX-PIPE x49.

funeq =23, MMIX-PIPE x49.funit : func �, MMIX-PIPE x77.get =54, MMIX-PIPE x49.go =72, MMIX-PIPE x49.i: register int, x38.internal op : internal opcode

[ ], MMIX-PIPE x51.internal opcode= enum,MMIX-PIPE x49.

j: register int, x38.ld =56, MMIX-PIPE x49.max real command = trip ,MMIX-PIPE x49.

mmix opcode= enum,MMIX-PIPE x47.

mor =13, MMIX-PIPE x49.mul =26, MMIX-PIPE x49.mux =11, MMIX-PIPE x49.n: register int, x38.name : char [ ], MMIX-PIPE x76.nand =39, MMIX-PIPE x49.noop =81, MMIX-PIPE x49.nor =36, MMIX-PIPE x49.nxor =41, MMIX-PIPE x49.odif =51, MMIX-PIPE x49.ops : tetra [ ], MMIX-PIPE x76.

or =34, MMIX-PIPE x49.orn =35, MMIX-PIPE x49.panic =macro ( ), x8.pbr =70, MMIX-PIPE x49.pop =75, MMIX-PIPE x49.prego =73, MMIX-PIPE x49.pushgo =74, MMIX-PIPE x49.pushj =71, MMIX-PIPE x49.put =55, MMIX-PIPE x49.resume =76, MMIX-PIPE x49.sadd =12, MMIX-PIPE x49.save =77, MMIX-PIPE x49.set : cacheset �,MMIX-PIPE x167.

sh =10, MMIX-PIPE x49.st =63, MMIX-PIPE x49.sub =31, MMIX-PIPE x49.subu =32, MMIX-PIPE x49.sync =79, MMIX-PIPE x49.tdif =50, MMIX-PIPE x49.trap =82, MMIX-PIPE x49.trip =83, MMIX-PIPE x49.unsave =78, MMIX-PIPE x49.wdif =49, MMIX-PIPE x49.xor =40, MMIX-PIPE x49.zset =52, MMIX-PIPE x49.

Page 138: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: CHECKING AND ALLOCATING 130

31. h Subroutines 10 i +�static void alloc cache ARGS((cache �; char �));static void alloc cache (c; name )

cache �c;char �name ;

f register int j; k;

if (c~bb < c~gg ) panic(errprint1 ("Configuration error: blocksize of %s is\ less than granularity";name ));

if (name [1] � 'T' ^ c~bb 6= 8)panic(errprint1 ("Configuration error: blocksize of %s must be 8";name ));

c~a = lg (c~aa );c~b = lg (c~bb);c~c = lg (c~cc);c~g = lg (c~gg );c~v = lg (c~vv );c~ tagmask = �(1� (c~b+ c~c));if (c~a+ c~b+ c~c � 32)

panic(errprint1 ("Configuration error: %s has >= 4 gigabytes of data";name ));

if (c~gg 6= 8 ^ :(c~mode & WRITE_ALLOC)) panic(errprint2 ("Configuration error\: %s does write-around with granularity %d"; name ; c~gg ));

hAllocate the cache sets for cache c 32 i;if (c~vv ) hAllocate the victim cache for cache c 33 i;c~ inbuf :dirty = (char �) calloc(c~bb � c~g; sizeof (char));if (:c~ inbuf :dirty )

panic(errprint1 ("Can't allocate dirty bits for inbuffer of %s";name ));c~ inbuf :data = (octa �) calloc(c~bb � 3; sizeof (octa));if (:c~ inbuf :data )

panic(errprint1 ("Can't allocate data for inbuffer of %s"; name ));c~outbuf :dirty = (char �) calloc(c~bb � c~g; sizeof (char));if (:c~outbuf :dirty )

panic(errprint1 ("Can't allocate dirty bits for outbuffer of %s";name ));c~outbuf :data = (octa �) calloc(c~bb � 3; sizeof (octa));if (:c~outbuf :data )

panic(errprint1 ("Can't allocate data for outbuffer of %s";name ));if (name [0] 6= 'S') hAllocate reader coroutines for cache c 34 i;

g

32. #de�ne sign bit #80000000

hAllocate the cache sets for cache c 32 i �c~set = (cacheset �) calloc(c~cc ; sizeof (cacheset));if (:c~set ) panic(errprint1 ("Can't allocate cache sets for %s";name ));for (j = 0; j < c~cc ; j++) f

c~set [j] = (cacheblock �) calloc(c~aa ; sizeof (cacheblock));if (:c~set [j])

panic(errprint2 ("Can't allocate cache blocks for set %d of %s"; j; name ));for (k = 0; k < c~aa ; k++) f

c~set [j][k]:tag :h = sign bit ; =� invalid tag �=c~set [j][k]:dirty = (char �) calloc(c~bb � c~g; sizeof (char));

Page 139: MMIXware - A RISC Computer for the Third Millennium - Knuth

131 MMIX-CONFIG: CHECKING AND ALLOCATING

if (:c~set [j][k]:dirty )panic(errprint3 ("Can't allocate dirty bits for block %d of set %d of %s";

k; j; name ));c~set [j][k]:data = (octa �) calloc(c~bb � 3; sizeof (octa));if (:c~set [j][k]:data )

panic(errprint3 ("Can't allocate data for block %d of set %d of %s"; k; j;name ));

gg

This code is used in section 31.

33. hAllocate the victim cache for cache c 33 i �f

c~victim = (cacheblock �) calloc(c~vv ; sizeof (cacheblock));if (:c~victim )

panic(errprint1 ("Can't allocate blocks for victim cache of %s";name ));for (k = 0; k < c~vv ; k++) f

c~victim [k]:tag :h = sign bit ; =� invalid tag �=c~victim [k]:dirty = (char �) calloc(c~bb � c~g; sizeof (char));if (:c~victim [k]:dirty )

panic(errprint2 ("Can't allocate dirty bits for block %d \of victim cache of %s"; k;name ));

c~victim [k]:data = (octa �) calloc(c~bb � 3; sizeof (octa));if (:c~victim [k]:data )

panic(errprint2 ("Can't allocate data for block %d of victim cache of %s";k;name ));

gg

This code is used in section 31.

a: int, MMIX-PIPE x167.aa : int, MMIX-PIPE x167.ARGS=macro ( ), MMIX-PIPE x6.b: int, MMIX-PIPE x167.bb : int, MMIX-PIPE x167.cache= struct,MMIX-PIPE x167.

calloc : void �( ), <stdlib.h>.cc : int, MMIX-PIPE x167.data : octa �, MMIX-PIPE x167.dirty : char �, MMIX-PIPE x167.errprint1 =macro ( ), x8.

errprint2 =macro ( ), x8.errprint3 =macro ( ), x8.g: int, MMIX-PIPE x167.gg : int, MMIX-PIPE x167.h: tetra, MMIX-PIPE x17.inbuf : cacheblock,MMIX-PIPE x167.

lg : static int ( ), x30.mode : int, MMIX-PIPE x167.octa= struct, MMIX-PIPE x17.outbuf : cacheblock,MMIX-PIPE x167.

panic =macro ( ), x8.set : cacheset �,MMIX-PIPE x167.

tag : octa, MMIX-PIPE x167.tagmask : int, MMIX-PIPE x167.v: int, MMIX-PIPE x167.victim : cacheset,MMIX-PIPE x167.

vv : int, MMIX-PIPE x167.WRITE_ALLOC=2,MMIX-PIPE x166.

Page 140: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: CHECKING AND ALLOCATING 132

34. hAllocate reader coroutines for cache c 34 i �f

c~reader = (coroutine �) calloc(c~ports ; sizeof (coroutine));if (:c~reader ) panic(errprint1 ("Can't allocate readers for %s"; name ));for (j = 0; j < c~ports ; j++) f

c~reader [j]:stage = vanish ;c~reader [j]:name = (name [0] � 'D' ? (name [1] � 'T' ? "DTreader" : "Dreader") :

(name [1] � 'T' ? "ITreader" : "Ireader"));g

g

This code is used in section 31.

35. hAllocate the caches 35 i �alloc cache (ITcache ; "ITcache");ITcache~�ller :name = "ITfiller"; ITcache~�ller :stage = �ll from virt ;alloc cache (DTcache ; "DTcache");DTcache~�ller :name = "DTfiller"; DTcache~�ller :stage = �ll from virt ;if (Icache ) f

alloc cache (Icache ; "Icache");Icache~�ller :name = "Ifiller"; Icache~�ller :stage = �ll from mem ;

gif (Dcache ) f

alloc cache (Dcache ; "Dcache");Dcache~�ller :name = "Dfiller"; Dcache~�ller :stage = �ll from mem ;Dcache~ usher :name = "Dflusher"; Dcache~ usher :stage = ush to mem ;

gif (Scache ) f

alloc cache (Scache ; "Scache");if (Scache~bb < Icache~bb) panic(errprint0 ("Configuration error\

: Scache blocks smaller than Icache blocks"));if (Scache~bb < Dcache~bb) panic(errprint0 ("Configuration error\

: Scache blocks smaller than Dcache blocks"));if (Scache~gg 6= Dcache~gg ) panic(errprint0 ("Configuration error\

: Scache granularity differs from the Dcache"));Icache~�ller :stage = �ll from S ;Dcache~�ller :stage = �ll from S ; Dcache~ usher :stage = ush to S ;Scache~�ller :name = "Sfiller"; Scache~�ller :stage = �ll from mem ;Scache~ usher :name = "Sflusher"; Scache~ usher :stage = ush to mem ;

g

This code is used in section 38.

36. Now we are nearly done. The only nontrivial task remaining is to allocate thering of queues for coroutine scheduling; for this we need to determine the maximumwaiting time that will occur between scheduler and schedulee.

hAllocate the scheduling queue 36 i �bus words = mem bus bytes � 3;j = (mem read time < mem write time ? mem write time : mem read time );n = 1;if (Scache ^ Scache~bb > n) n = Scache~bb ;

Page 141: MMIXware - A RISC Computer for the Third Millennium - Knuth

133 MMIX-CONFIG: CHECKING AND ALLOCATING

if (Icache ^ Icache~bb > n) n = Icache~bb ;if (Dcache ^ Dcache~bb > n) n = Dcache~bb ;n = mem addr time + ((int) (n+ bus words � 1)=bus words ) � j;if (n > max cycs ) max cycs = n; =� now max cycs bounds the waiting time �=ring size = max cycs + 1;ring = (coroutine �) calloc(ring size ; sizeof (coroutine));if (:ring ) panic(errprint0 ("Can't allocate the scheduling ring"));f register coroutine �p;

for (p = ring ; p < ring + ring size ; p++) fp~name = ""; =� header nodes are nameless �=p~stage = max stage ;

gg

This code is used in section 38.

alloc cache : static void ( ),x31.

bb : int, MMIX-PIPE x167.bus words : int,MMIX-PIPE x214.

c: cache �, x31.calloc : void �( ), <stdlib.h>.coroutine= struct,MMIX-PIPE x23.

Dcache : cache �,MMIX-PIPE x168.

DTcache : cache �,MMIX-PIPE x168.

errprint0 =macro ( ), x8.errprint1 =macro ( ), x8.�ll from mem =95,MMIX-PIPE x129.

�ll from S =94,MMIX-PIPE x129.

�ll from virt =93,

MMIX-PIPE x129.�ller : coroutine,MMIX-PIPE x167.

ush to mem =97,MMIX-PIPE x129.

ush to S =96,MMIX-PIPE x129.

usher : coroutine,MMIX-PIPE x167.

gg : int, MMIX-PIPE x167.Icache : cache �,MMIX-PIPE x168.

ITcache : cache �,MMIX-PIPE x168.

j: register int, x31.j: register int, x38.max cycs : int, x15.max stage =99,MMIX-PIPE x129.

mem addr time : int,

MMIX-PIPE x214.mem bus bytes : int, x15.mem read time : int,MMIX-PIPE x214.

mem write time : int,MMIX-PIPE x214.

n: register int, x38.name : char �, x31.name : char �, MMIX-PIPE x23.panic =macro ( ), x8.ports : int, MMIX-PIPE x167.reader : coroutine �,MMIX-PIPE x167.

ring : coroutine �,MMIX-PIPE x29.

ring size : int, MMIX-PIPE x29.Scache : cache �,MMIX-PIPE x168.

stage : int, MMIX-PIPE x23.vanish =98, MMIX-PIPE x129.

Page 142: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: CHECKING AND ALLOCATING 134

37. hTouch up last-minute trivia 37 i �if (hash prime � mem chunks max )

panic(errprint0 ("Configuration error: hashprime must exceed memchunksmax"));mem hash = (chunknode �) calloc(hash prime + 1; sizeof (chunknode));if (:mem hash ) panic(errprint0 ("Can't allocate the hash table"));mem hash [0]:chunk = (octa �) calloc(1� 13; sizeof (octa));if (:mem hash [0]:chunk ) panic(errprint0 ("Can't allocate chunk 0"));mem hash [hash prime ]:chunk = (octa �) calloc(1� 13; sizeof (octa));if (:mem hash [hash prime ]:chunk ) panic(errprint0 ("Can't allocate 0 chunk"));mem chunks = 1;fetch bot = (fetch �) calloc(fetch buf size + 1; sizeof (fetch));if (:fetch bot ) panic(errprint0 ("Can't allocate the fetch buffer"));fetch top = fetch bot + fetch buf size ;reorder bot = (control �) calloc(reorder buf size + 1; sizeof (control));if (:reorder bot ) panic(errprint0 ("Can't allocate the reorder buffer"));reorder top = reorder bot + reorder buf size ;wbuf bot = (write node �) calloc(write buf size + 1; sizeof (write node));if (:wbuf bot ) panic(errprint0 ("Can't allocate the write buffer"));wbuf top = wbuf bot + write buf size ;if (bp n � 0) bp table = �;else f =� a branch prediction table is desired �=if (bp a + bp b + bp c � 32) panic(errprint0 ("Configuration error\

: Branch table has >= 4 gigabytes of data"));bp table = (char �) calloc(1� (bp a + bp b + bp c); sizeof (char));if (:bp table ) panic(errprint0 ("Can't allocate the branch table"));

gl = (specnode �) calloc(lring size ; sizeof (specnode));if (:l) panic(errprint0 ("Can't allocate local registers"));j = bus words ;if (Icache ^ Icache~bb > j) j = Icache~bb ;fetched = (octa �) calloc(j; sizeof (octa));if (:fetched ) panic(errprint0 ("Can't allocate prefetch buffer"));dispatch stat = (int �) calloc(dispatch max + 1; sizeof (int));if (:dispatch stat ) panic(errprint0 ("Can't allocate dispatch counts"));no hardware PT = 1� hardware PT ;

This code is used in section 38.

Page 143: MMIXware - A RISC Computer for the Third Millennium - Knuth

135 MMIX-CONFIG: PUTTING IT ALL TOGETHER

38. Putting it all together. Here then is the desired con�guration subroutine.

#include <stdio.h> =� fopen , fgets , sscanf , rewind �=#include <stdlib.h> =� calloc , exit �=#include <ctype.h> =� isspace �=#include <string.h> =� strcpy , strlen , strcmp �=#include <limits.h> =� INT_MAX �=#include "mmix-pipe.h"

hType de�nitions 12 ihGlobal variables 9 ih Subroutines 10 i

void MMIX con�g (�lename )char ��lename ;

f register int i; j; n;

con�g �le = fopen (�lename ; "r");if (:con�g �le ) panic(errprint1 ("Can't open configuration file %s";�lename ));h Initialize to defaults 17 i;hCount and allocate the functional units 18 i;hRecord all the specs 19 i;hAllocate coroutines in each functional unit 26 i;hAllocate the caches 35 i;hAllocate the scheduling queue 36 i;hTouch up last-minute trivia 37 i;

g

bb : int, MMIX-PIPE x167.bp a : int, MMIX-PIPE x150.bp b : int, MMIX-PIPE x150.bp c : int, MMIX-PIPE x150.bp n : int, MMIX-PIPE x150.bp table : char �,MMIX-PIPE x150.

bus words : int,MMIX-PIPE x214.

calloc : void �( ), <stdlib.h>.chunk : octa �, MMIX-PIPE x206.chunknode= struct,MMIX-PIPE x206.

con�g �le : FILE �, x9.control= struct,MMIX-PIPE x44.

dispatch max : int,MMIX-PIPE x59.

dispatch stat : int �,MMIX-PIPE x66.

errprint0 =macro ( ), x8.errprint1 =macro ( ), x8.exit : void ( ), <stdlib.h>.fetch= struct, MMIX-PIPE x68.

fetch bot : fetch �,MMIX-PIPE x69.

fetch buf size : int, x15.fetch top : fetch �,MMIX-PIPE x69.

fetched : octa �,MMIX-PIPE x284.

fgets : char �( ), <stdio.h>.fopen : FILE �( ), <stdio.h>.hardware PT : int, x15.hash prime : int,MMIX-PIPE x207.

Icache : cache �,MMIX-PIPE x168.

INT_MAX=macro, <limits.h>.isspace : int ( ), <ctype.h>.l: specnode �, MMIX-PIPE x86.lring size : int, MMIX-PIPE x86.mem chunks : int,MMIX-PIPE x207.

mem chunks max : int,MMIX-PIPE x207.

mem hash : chunknode �,MMIX-PIPE x207.

no hardware PT : bool,MMIX-PIPE x242.

octa= struct, MMIX-PIPE x17.panic =macro ( ), x8.reorder bot : control �,MMIX-PIPE x60.

reorder buf size : int, x15.reorder top : control �,MMIX-PIPE x60.

rewind : void ( ), <stdio.h>.specnode= struct,MMIX-PIPE x40.

sscanf : int ( ), <stdio.h>.strcmp : int ( ), <string.h>.strcpy : char �( ), <string.h>.strlen : int ( ), <string.h>.wbuf bot : write node �,MMIX-PIPE x247.

wbuf top : write node �,MMIX-PIPE x247.

write buf size : int, x15.write node= struct,MMIX-PIPE x246.

Page 144: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-CONFIG: NAMES OF THE SECTIONS 136

39. Names of the sections.

hAllocate coroutines in each functional unit 26 i Used in section 38.

hAllocate reader coroutines for cache c 34 i Used in section 31.

hAllocate the cache sets for cache c 32 i Used in section 31.

hAllocate the caches 35 i Used in section 38.

hAllocate the scheduling queue 36 i Used in section 38.

hAllocate the victim cache for cache c 33 i Used in section 31.

hBuild table of pipeline stages needed for each opcode 27 i Used in section 26.

hCount and allocate the functional units 18 i Used in section 38.

hDetermine the number of stages, n, needed by funit [j] 29 i Used in section 26.

hGlobal variables 9, 15, 28 i Used in section 38.

h If token is a cache name, process a cache spec 21 i Used in section 19.

h If token is a parameter name, process a PV spec 20 i Used in section 19.

h If token is an operation name, process a pipe spec 24 i Used in section 19.

h Initialize to defaults 17 i Used in section 38.

hProcess a functional spec 25 i Used in section 19.

hRecord all the specs 19 i Used in section 38.

hSubroutines 10, 11, 16, 22, 23, 30, 31 i Used in section 38.

hTouch up last-minute trivia 37 i Used in section 38.

hType de�nitions 12, 13, 14 i Used in section 38.

Page 145: MMIXware - A RISC Computer for the Third Millennium - Knuth

138

MMIX-IO

1. Introduction. This program module contains brute-force implementations ofthe ten input/output primitives de�ned at the beginning of MMIX-SIM. The subrou-tines are grouped here as a separate package, because they are intended to be loadedwith the pipeline simulator as well as with the simple simulator.

hPreprocessor macros 2 ihType de�nitions 3 ihExternal subroutines 4 ihGlobal variables 6 ih Subroutines 7 i

2. Of course we include standard C library routines, and we set things up toaccommodate older versions of C.

hPreprocessor macros 2 i �#include <stdio.h>

#include <stdlib.h>

#ifdef __STDC__

#de�ne ARGS(list ) list

#else#de�ne ARGS(list ) ( )#endif#ifndef FILENAME_MAX

#de�ne FILENAME_MAX 256#endif#ifndef SEEK_SET

#de�ne SEEK_SET 0#endif#ifndef SEEK_END

#de�ne SEEK_END 2#endif

This code is used in section 1.

3. The unsigned 32-bit type tetra must agree with its de�nition in the simulators.

hType de�nitions 3 i �typedef unsigned int tetra;typedef struct ftetra h; l;

g octa; =� two tetrabytes makes one octabyte �=

See also section 5.

This code is used in section 1.

4. Three basic subroutines are used to get strings from the simulated memory andto put strings into that memory. These subroutines are de�ned appropriately in eachsimulator. We also use a few subroutines and constants de�ned in MMIX-ARITH.

hExternal subroutines 4 i �extern char stdin chr ARGS((void));extern int mmgetchars ARGS((char �buf ; int size ;octa addr ; int stop ));

D.E. Knuth: MMIXware, LNCS 1750, pp. 138-147, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 146: MMIXware - A RISC Computer for the Third Millennium - Knuth

139 MMIX-IO: INTRODUCTION

extern void mmputchars ARGS((unsigned char �buf ; int size ;octa addr ));extern octa oplus ARGS((octa;octa));extern octa ominus ARGS((octa;octa));extern octa incr ARGS((octa; int));extern octa zero octa ; =� zero octa :h = zero octa :l = 0 �=extern octa neg one ; =� neg one :h = neg one :l = �1 �=

This code is used in section 1.

5. Each possible handle has a �le pointer and a current mode.

hType de�nitions 3 i +�typedef struct fFILE �fp ; =� �le pointer �=int mode ; =� [read OK] + 2[write OK] + 4[binary] + 8[readwrite] �=

g sim �le info;

6. hGlobal variables 6 i �sim �le info s�le [256];

See also sections 9 and 24.

This code is used in section 1.

7. The �rst three handles are initially open.

hSubroutines 7 i �void mmix io init ARGS((void));void mmix io init ( )f

s�le [0]:fp = stdin ; s�le [0]:mode = 1;s�le [1]:fp = stdout ; s�le [1]:mode = 2;s�le [2]:fp = stderr ; s�le [2]:mode = 2;

g

See also sections 8, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, and 23.

This code is used in section 1.

__STDC__, Standard C.FILE, <stdio.h>.FILENAME_MAX=macro,<stdio.h>.

incr : octa ( ), MMIX-ARITH x6.mmgetchars : int ( ),MMIX-SIM x114.

mmgetchars : int ( ),MMIX-PIPE x381.

mmputchars : int ( ),

MMIX-SIM x117.mmputchars : int ( ),MMIX-PIPE x384.

neg one : octa, MMIX-ARITH x4.ominus : octa ( ),MMIX-ARITH x5.

oplus : octa ( ), MMIX-ARITH x5.SEEK_END=macro, <stdio.h>.SEEK_SET=macro, <stdio.h>.

stderr : FILE �, <stdio.h>.stdin : FILE �, <stdio.h>.stdin chr : char ( ),MMIX-SIM x120.

stdin chr : char ( ),MMIX-PIPE x387.

stdout : FILE �, <stdio.h>.zero octa : octa,MMIX-ARITH x4.

Page 147: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-IO: INTRODUCTION 140

8. The only tricky thing about these routines is that we want to protect the standardinput, output, and error streams from being preempted.

hSubroutines 7 i +�octa mmix fopen ARGS((unsigned char;octa;octa));octa mmix fopen (handle ; name ;mode )

unsigned char handle ;octa name ; mode ;

fchar name buf [FILENAME_MAX];FILE �tmp ;

if (mode :h _mode :l > 4) goto abort ;if (mmgetchars (name buf ; FILENAME_MAX;name ; 0) � FILENAME_MAX) goto abort ;if (s�le [handle ]:mode 6= 0 ^ handle > 2) fclose (s�le [handle ]:fp);s�le [handle ]:fp = fopen (name buf ;mode string [mode :l]);if (:s�le [handle ]:fp) goto abort ;s�le [handle ]:mode = mode code [mode :l];return zero octa ; =� success �=

abort : s�le [handle ]:mode = 0;return neg one ; =� failure �=

g

9. hGlobal variables 6 i +�char �mode string [ ] = f"r"; "w"; "rb"; "wb"; "w+b"g;int mode code [ ] = f#1;#2;#5;#6;#fg;

10. If the simulator is being used interactively, we can avoid competition for stdin

by substituting another �le.

hSubroutines 7 i +�void mmix fake stdin ARGS((FILE �));void mmix fake stdin (f)

FILE �f ;f

s�le [0]:fp = f ; =� f should be open in mode "r" �=g

11. h Subroutines 7 i +�octa mmix fclose ARGS((unsigned char));octa mmix fclose (handle )

unsigned char handle ;fif (s�le [handle ]:mode � 0) return neg one ;if (handle > 2 ^ fclose (s�le [handle ]:fp) 6= 0) return neg one ;s�le [handle ]:mode = 0;return zero octa ; =� success �=

g

Page 148: MMIXware - A RISC Computer for the Third Millennium - Knuth

141 MMIX-IO: INTRODUCTION

12. h Subroutines 7 i +�octa mmix fread ARGS((unsigned char;octa;octa));octa mmix fread (handle ; bu�er ; size )

unsigned char handle ;octa bu�er ; size ;

fregister unsigned char �buf ;register int n;octa o;

o = neg one ;if (:(s�le [handle ]:mode & #

1)) goto done ;if (s�le [handle ]:mode & #

8) s�le [handle ]:mode &= �#2;if (size :h) goto done ;buf = (unsigned char �) calloc(size :l; sizeof (char));if (:buf ) goto done ;hRead n � size :l characters into buf 13 i;mmputchars (buf ; n; bu�er );free (buf );o:h = 0; o:l = n;

done : return ominus (o; size );g

13. hRead n � size :l characters into buf 13 i �if (s�le [handle ]:fp � stdin ) fregister unsigned char �p;

for (p = buf ; n = size :l; p < buf + n; p++) �p = stdin chr ( );gelse f

clearerr (s�le [handle ]:fp);n = fread (buf ; 1; size :l; s�le [handle ]:fp);if (ferror (s�le [handle ]:fp)) goto done ;

g

This code is used in section 12.

ARGS=macro ( ), x2.calloc : void �( ), <stdlib.h>.clearerr : void ( ), <stdio.h>.fclose : int ( ), <stdio.h>.ferror : int ( ), <stdio.h>.FILE, <stdio.h>.FILENAME_MAX=macro,<stdio.h>.

fopen : FILE �( ), <stdio.h>.fp : FILE �, x5.

fread : size t ( ), <stdio.h>.free : void ( ), <stdlib.h>.h: tetra, x3.l: tetra, x3.mmgetchars : extern int ( ),

x4.mmputchars : extern int ( ),

x4.mode : int, x5.

neg one : octa, MMIX-ARITH x4.octa= struct, x3.ominus : octa ( ),MMIX-ARITH x5.

s�le : sim �le info [ ], x6.stdin : FILE �, <stdio.h>.stdin chr : extern char ( ), x4.zero octa : octa,MMIX-ARITH x4.

Page 149: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-IO: INTRODUCTION 142

14. h Subroutines 7 i +�octa mmix fgets ARGS((unsigned char;octa;octa));octa mmix fgets (handle ; bu�er ; size )

unsigned char handle ;octa bu�er ; size ;

fchar buf [256];register int n; s;register char �p;octa o;int eof = 0;

if (:(s�le [handle ]:mode & #1)) return neg one ;

if (:size :l ^ :size :h) return neg one ;if (s�le [handle ]:mode & #

8) s�le [handle ]:mode &= �#2;size = incr (size ;�1);o = zero octa ;while (1) fhRead n < 256 characters into buf 15 i;mmputchars (buf ; n+ 1; bu�er );o = incr (o; n);size = incr (size ;�n);if ((n ^ buf [n� 1] � '\n') _ (:size :l ^ :size :h) _ eof ) return o;bu�er = incr (bu�er ; n);

gg

15. hRead n < 256 characters into buf 15 i �s = 255;if (size :l < s ^ :size :h) s = size :l;if (s�le [handle ]:fp � stdin )for (p = buf ; n = 0; n < s; ) f�p = stdin chr ( );n++;if (�p++ � '\n') break;

gelse fif (:fgets (buf ; s+ 1; s�le [handle ]:fp )) return neg one ;eof = feof (s�le [handle ]:fp);for (p = buf ; n = 0; n < s; ) fif (:�p ^ eof ) break;n++;if (�p++ � '\n') break;

gg�p = '\0';

This code is used in section 14.

Page 150: MMIXware - A RISC Computer for the Third Millennium - Knuth

143 MMIX-IO: INTRODUCTION

16. The routines that deal with wyde characters might need to be changed on asystem that is little-endian; the author wishes good luck to whoever has to do this.MMIX is always big-endian, but external �les prepared on random operating systemsmight be backwards.

hSubroutines 7 i +�octa mmix fgetws ARGS((unsigned char;octa;octa));octa mmix fgetws (handle ; bu�er ; size )

unsigned char handle ;octa bu�er ; size ;

fchar buf [256];register int n; s;register char �p;octa o;int eof ;

if (:(s�le [handle ]:mode & #1)) return neg one ;

if (:size :l ^ :size :h) return neg one ;if (s�le [handle ]:mode & #

8) s�le [handle ]:mode &= �#2;bu�er :l &= �2;size = incr (size ;�1);o = zero octa ;while (1) fhRead n < 128 wyde characters into buf 17 i;mmputchars (buf ; 2 � n+ 2; bu�er );o = incr (o; n);size = incr (size ;�n);if ((n ^ buf [2 � n� 1] � '\n' ^ buf [2 � n� 2] � 0) _ (:size :l ^ :size :h) _ eof )return o;

bu�er = incr (bu�er ; 2 � n);g

g

ARGS=macro ( ), x2.feof : int ( ), <stdio.h>.fgets : char �( ), <stdio.h>.fp : FILE �, x5.h: tetra, x3.incr : octa ( ), MMIX-ARITH x6.

l: tetra, x3.mmputchars : extern int ( ),

x4.mode : int, x5.neg one : octa, MMIX-ARITH x4.octa= struct, x3.

s�le : sim �le info [ ], x6.stdin : FILE �, <stdio.h>.stdin chr : extern char ( ), x4.zero octa : octa,MMIX-ARITH x4.

Page 151: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-IO: INTRODUCTION 144

17. hRead n < 128 wyde characters into buf 17 i �s = 127;if (size :l < s ^ :size :h) s = size :l;if (s�le [handle ]:fp � stdin )for (p = buf ; n = 0; n < s; ) f�p++ = stdin chr ( ); �p++ = stdin chr ( );n++;if (�(p� 1) � '\n' ^ �(p� 2) � 0) break;

gelse

for (p = buf ; n = 0; n < s; ) fif (fread (p; 1; 2; s�le [handle ]:fp ) 6= 2) f

eof = feof (s�le [handle ]:fp);if (:eof ) return neg one ;break;

gn++; p += 2;if (�(p� 1) � '\n' ^ �(p� 2) � 0) break;

g�p = �(p+ 1) = '\0';

This code is used in section 16.

18. h Subroutines 7 i +�octa mmix fwrite ARGS((unsigned char;octa;octa));octa mmix fwrite (handle ; bu�er ; size )

unsigned char handle ;octa bu�er ; size ;

fchar buf [256];register int n;

if (:(s�le [handle ]:mode & #2)) return ominus (zero octa ; size );

if (s�le [handle ]:mode & #8) s�le [handle ]:mode &= �#1;

while (1) fif (size :h _ size :l � 256) n = mmgetchars (buf ; 256; bu�er ;�1);else n = mmgetchars (buf ; size :l; bu�er ;�1);size = incr (size ;�n);if (fwrite (buf ; 1; n; s�le [handle ]:fp) 6= n) return ominus (zero octa ; size );�ush (s�le [handle ]:fp);if (:size :l ^ :size :h) return zero octa ;bu�er = incr (bu�er ; n);

gg

19. h Subroutines 7 i +�octa mmix fputs ARGS((unsigned char;octa));octa mmix fputs (handle ; string )

unsigned char handle ;octa string ;

f

Page 152: MMIXware - A RISC Computer for the Third Millennium - Knuth

145 MMIX-IO: INTRODUCTION

char buf [256];register int n;octa o;

o = zero octa ;if (:(s�le [handle ]:mode & #

2)) return neg one ;if (s�le [handle ]:mode & #

8) s�le [handle ]:mode &= �#1;while (1) f

n = mmgetchars (buf ; 256; string ; 0);if (fwrite (buf ; 1; n; s�le [handle ]:fp) 6= n) return neg one ;o = incr (o; n);if (n < 256) f

�ush (s�le [handle ]:fp);return o;

gstring = incr (string ; n);

gg

20. h Subroutines 7 i +�octa mmix fputws ARGS((unsigned char;octa));octa mmix fputws (handle ; string )

unsigned char handle ;octa string ;

fchar buf [256];register int n;octa o;

o = zero octa ;if (:(s�le [handle ]:mode & #

2)) return neg one ;while (1) f

n = mmgetchars (buf ; 256; string ; 1);if (fwrite (buf ; 1; n; s�le [handle ]:fp) 6= n) return neg one ;o = incr (o; n� 1);if (n < 256) f

�ush (s�le [handle ]:fp);return o;

gstring = incr (string ; n);

gg

ARGS=macro ( ), x2.buf : char [ ], x16.eof : int, x16.feof : int ( ), <stdio.h>.�ush : int ( ), <stdio.h>.fp : FILE �, x5.fread : size t ( ), <stdio.h>.fwrite : size t ( ), <stdio.h>.h: tetra, x3.handle : unsigned char, x16.

incr : octa ( ), MMIX-ARITH x6.l: tetra, x3.mmgetchars : extern int ( ),

x4.mode : int, x5.n: register int, x16.neg one : octa, MMIX-ARITH x4.octa= struct, x3.ominus : octa ( ),

MMIX-ARITH x5.p: register char �, x16.s: register int, x16.s�le : sim �le info [ ], x6.size : octa, x16.stdin : FILE �, <stdio.h>.stdin chr : extern char ( ), x4.zero octa : octa,MMIX-ARITH x4.

Page 153: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-IO: INTRODUCTION 146

21. #de�ne sign bit ((unsigned) #80000000)

hSubroutines 7 i +�octa mmix fseek ARGS((unsigned char;octa));octa mmix fseek (handle ; o�set )

unsigned char handle ;octa o�set ;

fif (:(s�le [handle ]:mode & #

4)) return neg one ;if (s�le [handle ]:mode & #

8) s�le [handle ]:mode = #f;

if (o�set :h& sign bit ) fif (o�set :h 6= #

ffffffff _ :(o�set :l & sign bit )) return neg one ;if (fseek (s�le [handle ]:fp ; (int) o�set :l + 1; SEEK_END) 6= 0) return neg one ;

g else fif (o�set :h _ (o�set :l & sign bit )) return neg one ;if (fseek (s�le [handle ]:fp ; (int) o�set :l; SEEK_SET) 6= 0) return neg one ;

greturn zero octa ;

g

22. h Subroutines 7 i +�octa mmix ftell ARGS((unsigned char));octa mmix ftell (handle )

unsigned char handle ;fregister long x;octa o;

if (:(s�le [handle ]:mode & #4)) return neg one ;

x = ftell (s�le [handle ]:fp);if (x < 0) return neg one ;o:h = 0; o:l = x;return o;

g

23. One last subroutine belongs here, just in case the user has modi�ed the standarderror handle.

hSubroutines 7 i +�void print trip warning ARGS((int;octa));void print trip warning (n; loc)

int n;octa loc ;

fif (s�le [2]:mode & #

2) fprintf (s�le [2]:fp ; "Warning: %s at location %08x%08x\n";trip warning [n]; loc :h; loc :l);

g

24. hGlobal variables 6 i +�char �trip warning [ ] = f"TRIP"; "integer divide check"; "integer overflow";

"float-to-fix overflow"; "invalid floating point operation";"floating point overflow"; "floating point underflow";"floating point division by zero"; "floating point inexact"g;

Page 154: MMIXware - A RISC Computer for the Third Millennium - Knuth

147 MMIX-IO: NAMES OF THE SECTIONS

25. Names of the sections.

hExternal subroutines 4 i Used in section 1.

hGlobal variables 6, 9, 24 i Used in section 1.

hPreprocessor macros 2 i Used in section 1.

hRead n < 128 wyde characters into buf 17 i Used in section 16.

hRead n < 256 characters into buf 15 i Used in section 14.

hRead n � size :l characters into buf 13 i Used in section 12.

hSubroutines 7, 8, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, 23 i Used in section 1.

hType de�nitions 3, 5 i Used in section 1.

ARGS=macro ( ), x2.fp : FILE �, x5.fprintf : int ( ), <stdio.h>.fseek : int ( ), <stdio.h>.ftell : long ( ), <stdio.h>.

h: tetra, x3.l: tetra, x3.mode : int, x5.neg one : octa, MMIX-ARITH x4.octa= struct, x3.

SEEK_END=macro, <stdio.h>.SEEK_SET=macro, <stdio.h>.s�le : sim �le info [ ], x6.zero octa : octa,MMIX-ARITH x4.

Page 155: MMIXware - A RISC Computer for the Third Millennium - Knuth

148

MMIX-MEM

1. Memory-mapped input and output. This module supplies procedures forreading and writing MMIX memory addresses that exceed 48 bits. Such addresses areused by the operating system for input and output, so they require special treatment.At present only dummy versions of these routines are implemented. Users who neednontrivial versions of spec read and/or spec write should prepare their own and linkthem with the rest of the simulator.

#include <stdio.h>

#include "mmix-pipe.h" =� header �le for all modules �=extern octa read hex ( ); =� found in the main program module �=static char buf [20];

2. If the interactive read bit of the verbose control is set, the user is supposed tosupply values dynamically. Otherwise zero is read.

octa spec read ARGS((octa));octa spec read (addr )

octa addr ;f

octa val ;

if (verbose & interactive read bit ) fprintf ("** Read from loc %08x%08x: "; addr :h; addr :l);fgets (buf ; 20; stdin );val = read hex (buf );

gelse val :l = val :h = 0;if (verbose & show spec bit )

printf (" (spec_read %08x%08x from %08x%08x at time %d)\n"; val :h; val :l;addr :h; addr :l; ticks :l);

return val ;g

3. The default spec write just reports its arguments, without actually writing any-thing.

void spec write ARGS((octa;octa));void spec write (addr ; val )

octa addr ; val ;f

if (verbose & show spec bit )printf (" (spec_write %08x%08x to %08x%08x at time %d)\n"; val :h; val :l;

addr :h; addr :l; ticks :l);g

D.E. Knuth: MMIXware, LNCS 1750, pp. 148-149, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 156: MMIXware - A RISC Computer for the Third Millennium - Knuth

149 MMIX-MEM: MEMORY-MAPPED INPUT AND OUTPUT

ARGS=macro ( ), MMIX-PIPE x6.fgets : char �( ), <stdio.h>.h: tetra, MMIX-PIPE x17.interactive read bit =1 � 5,MMIX-PIPE x8.

l: tetra, MMIX-PIPE x17.octa= struct, MMIX-PIPE x17.printf : int ( ), <stdio.h>.read hex : octa ( ), MMMIX x17.show spec bit =1 � 6,

MMIX-PIPE x8.spec write : ( ), x3.stdin : FILE �, <stdio.h>.ticks =macro, MMIX-PIPE x87.verbose : int, MMIX-PIPE x4.

Page 157: MMIXware - A RISC Computer for the Third Millennium - Knuth

150

MMIX-PIPE

1. Introduction. This program is the heart of the meta-simulator for the ultra-con�gurable MMIX pipeline: It de�nes the MMIX run routine, which does most of thework. Another routine, MMIX init , is also de�ned here, and so is a header �le calledmmix_pipe.h. The header �le is used by the main routine and by other routines likeMMIX con�g , which are compiled separately.Readers of this program should be familiar with the explanation of MMIX architec-

ture as presented in the main program module for MMMIX.A lot of subtle things can happen when instructions are executed in parallel.

Therefore this simulator ranks among the most interesting and instructive programsin the author's experience. The author has tried his best to make everything correct: : : but the chances for error are great. Anyone who discovers a bug is therefore urgedto report it as soon as possible to [email protected] ; then the programwill be as useful as possible. Rewards will be paid to bug-�nders! (Except for bugsin version 0.)It sort of boggles the mind when one realizes that the present program might

someday be translated by a C compiler for MMIX and used to simulate itself.

2. This high-performance prototype of MMIX achieves its eÆciency by means of\pipelining," a technique of overlapping that is explained for the related DLX computerin Chapter 3 of Hennessy & Patterson's book Computer Architecture (second edition).Other techniques such as \dynamic scheduling" and \multiple issue," explained inChapter 4 of that book, are used too.One good way to visualize the procedure is to imagine that somebody has organized

a high-tech car repair shop according to similar principles. There are eight indepen-dent functional units, which we can think of as eight groups of auto mechanics, eachspecializing in a particular task; each group has its own workspace with room to dealwith one car at a time. Group F (the \fetch" group) is in charge of rounding upcustomers and getting them to enter the assembly-line garage in an orderly fashion.Group D (the \decode and dispatch" group) does the initial vehicle inspection andwrites up an order that explains what kind of servicing is required. The vehicles gonext to one of the four \execution" groups: Group X handles routine maintenance,while groups XF, XM, and XD are specialists in more complex tasks that tend to takelonger. (The XF people are good at oating the points, while the XM and XD groupsare experts in multilink suspensions and di�erentials.) When the relevant X grouphas �nished its work, cars drive to M station, where they send or receive messages andpossibly pay money to members of the \memory" group. Finally all necessary partsare installed by members of group W, the \write" group, and the car leaves the shop.Everything is tightly organized so that in most cases the cars move in synchronizedfashion from station to station, at regular 100-nanocentury intervals.In a similar way, most MMIX instructions can be handled in a �ve-stage pipeline,

F{D{X{M{W, with X replaced by XF for oating-point addition or conversion, or byXM for multiplication, or by XD for division or square root. Each stage ideally takesone clock cycle, although XF, XM, and (especially) XD are slower. If the instructionsenter in a suitable pattern, we might see one instruction being fetched, another being

D.E. Knuth: MMIXware, LNCS 1750, pp. 150-331, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 158: MMIXware - A RISC Computer for the Third Millennium - Knuth

151 MMIX-PIPE: INTRODUCTION

decoded, and up to four being executed, while another is accessing memory, and yetanother is �nishing up by writing new information into registers; all this is going onsimultaneously during one clock cycle. Pipelining with eight separate stages mighttherefore make the machine run up to 8 times as fast as it could if each instructionwere being dealt with individually and without overlap. (Well, perfect speedup turnsout to be impossible, because of the shared M and W stages; the theory of knapsackprogramming, to be discussed in Section 7.7 of The Art of Computer Programming,tells us that the maximal achievable speedup is at most 8�1=p�1=q�1=r when XF,XM, and XD have delays bounded by p, q, and r cycles. But we can achieve a factorof more than 7 if we are very lucky.)Consider, for example, the ADD instruction. This instruction enters the computer's

processing unit in F stage, taking only one clock cycle if it is in the cache of instructionsrecently seen. Then the D stage recognizes the command as an ADD and acquires thecurrent values of $Y and $Z; meanwhile, of course, another instruction is being fetchedby F. On the next clock cycle, the X stage adds the values together. This prepares theway for the M stage to watch for over ow and to get ready for any exceptional actionthat might be needed with respect to the settings of special register rA. Finally, onthe �fth clock cycle, the sum is either written into $X or the trip handler for integerover ow is invoked. Although this process has taken �ve clock cycles (that is, 5�),the net increase in running time has been only 1�.Of course congestion can occur, inside a computer as in a repair shop. For example,

auto parts might not be readily available; or a car might have to sit in D station whilewaiting to move to XM, thereby blocking somebody else from moving from F to D.Sometimes there won't necessarily be a steady stream of customers. In such cases theemployees in some parts of the shop will occasionally be idle. But we assume thatthey always do their jobs as fast as possible, given the sequence of customers thatthey encounter. With a clever person setting up appointments|translation: witha clever programmer and/or compiler arranging MMIX instructions|the organizationcan often be expected to run at nearly peak capacity.In fact, this program is designed for experiments with many kinds of pipelines, po-

tentially using additional functional units (such as several independent X groups), andpotentially fetching, dispatching, and executing several noncon icting instructions si-multaneously. Such complications make this program more diÆcult than a simplepipeline simulator would be, but they also make it a lot more instructive because wecan get a better understanding of the issues involved if we are required to treat themin greater generality.

MMIX con�g : void ( ),MMIX-CONFIG x38.

MMIX init : void ( ), x10. MMIX run : void ( ), x10.

Page 159: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTRODUCTION 152

3. Here's the overall structure of the present program module.

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include "abstime.h"

hHeader de�nitions 6 ihType de�nitions 11 ihGlobal variables 20 ihExternal variables 4 ih Internal prototypes 13 ihExternal prototypes 9 ihSubroutines 14 ihExternal routines 10 i

4. The identi�er Extern is used in MMIX-PIPE to declare variables that are ac-cessed in other modules. Actually all appearances of `Extern' are de�ned to be blankhere, but `Extern' will become `extern' in the header �le.

#de�ne Extern =� blank for us, extern for them �=format Extern extern

hExternal variables 4 i �Extern int verbose ; =� controls the level of diagnostic output �=

See also sections 29, 59, 60, 66, 69, 77, 86, 98, 115, 136, 150, 168, 207, 211, 214, 242, 247, 284,

and 349.

This code is used in sections 3 and 5.

5. The header �le repeats the basic de�nitions and declarations.

h mmix-pipe.h 5 i �#de�ne Extern extern

hHeader de�nitions 6 ihType de�nitions 11 ihExternal variables 4 ihExternal prototypes 9 i

6. Subroutines of this program are declared �rst with a prototype, as in ANSI C,then with an old-style C function de�nition. The following preprocessor commandsmake this work correctly with both new-style and old-style compilers.

hHeader de�nitions 6 i �#ifdef __STDC__

#de�ne ARGS(list ) list

#else#de�ne ARGS(list ) ( )#endif

See also sections 7, 8, 52, 57, 87, 129, and 166.

This code is used in sections 3 and 5.

Page 160: MMIXware - A RISC Computer for the Third Millennium - Knuth

153 MMIX-PIPE: INTRODUCTION

7. Some of the names that are natural for this program are in con ict with librarynames on at least one of the host computers in the author's tests. So we bypass thelibrary names here.

hHeader de�nitions 6 i +�#de�ne random my random

#de�ne fsqrt my fsqrt

#de�ne div my div

8. The amount of verbosity depends on the following bit codes.

hHeader de�nitions 6 i +�#de�ne issue bit (1� 0)

=� show control blocks when issued, deissued, committed �=#de�ne pipe bit (1� 1) =� show the pipeline and locks on every cycle �=#de�ne coroutine bit (1� 2) =� show the coroutines when started on every cycle �=#de�ne schedule bit (1� 3) =� show the coroutines when scheduled �=#de�ne uninit mem bit (1� 4)

=� complain when reading from an uninitialized chunk of memory �=#de�ne interactive read bit (1� 5)

=� prompt user when reading from I/O location �=#de�ne show spec bit (1� 6)

=� display special read/write transactions as they happen �=#de�ne show pred bit (1� 7) =� display branch prediction details �=#de�ne show wholecache bit (1� 8)

=� display cache blocks even when their key tag is invalid �=9. The MMIX init ( ) routine should be called exactly once, after MMIX con�g ( )has done its work but before the simulator starts to execute any programs. ThenMMIX run can be called as often as the user likes.

hExternal prototypes 9 i �Extern void MMIX init ARGS((void));Extern void MMIX run ARGS((int cycs ;octa breakpoint ));

See also sections 38, 161, 175, 178, 180, 209, 212, and 252.

This code is used in sections 3 and 5.

__STDC__, Standard C.MMIX con�g : void ( ),

MMIX-CONFIG x38.MMIX init : void ( ), x10.

MMIX run : void ( ), x10.octa= struct, x17.

Page 161: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTRODUCTION 154

10. hExternal routines 10 i �void MMIX init ( )fregister int i; j;

h Initialize everything 22 i;gvoid MMIX run (cycs ; breakpoint )

int cycs ;octa breakpoint ;

fhLocal variables 12 i;while (cycs ) fif (verbose & (issue bit j pipe bit j coroutine bit j schedule bit ))printf ("*** Cycle %d\n"; ticks :l);

hPerform one machine cycle 64 i;if (verbose & pipe bit ) fprint pipe ( ); print locks ( );

gif (breakpoint hit _ halted ) fif (breakpoint hit )printf ("Breakpoint instruction fetched at time %d\n"; ticks :l � 1);

if (halted ) printf ("Halted at time %d\n"; ticks :l � 1);break;

gcycs��;

gcease : ;g

See also sections 39, 162, 176, 179, 181, 210, 213, and 253.

This code is used in section 3.

11. hType de�nitions 11 i �typedef enum ffalse ; true ;wow

g bool; =� slightly extended booleans �=See also sections 17, 23, 37, 40, 44, 68, 76, 164, 167, 206, 246, and 371.

This code is used in sections 3 and 5.

12. hLocal variables 12 i �register int i; j; m;bool breakpoint hit = false ;bool halted = false ;

See also sections 124 and 258.

This code is used in section 10.

Page 162: MMIXware - A RISC Computer for the Third Millennium - Knuth

155 MMIX-PIPE: INTRODUCTION

13. Error messages that abort this program are called panic messages. The macrocalled confusion will never be needed unless this program is internally inconsistent.

#de�ne errprint0 (f) fprintf (stderr ; f)#de�ne errprint1 (f; a) fprintf (stderr ; f ; a)#de�ne errprint2 (f; a; b) fprintf (stderr ; f ; a; b)#de�ne panic(x) f errprint0 ("Panic: "); x; errprint0 ("!\n"); expire ( ); g#de�ne confusion (m) errprint1 ("This can't happen: %s";m)

h Internal prototypes 13 i �static void expire ARGS((void));

See also sections 18, 24, 27, 30, 32, 34, 42, 45, 55, 62, 72, 90, 92, 94, 96, 156, 158, 169, 171, 173, 182,

184, 186, 188, 190, 192, 195, 198, 200, 202, 204, 240, 250, 254, and 377.

This code is used in section 3.

14. h Subroutines 14 i �static void expire ( ) =� the last gasp before dying �=fif (ticks :h) errprint2 ("(Clock time is %dH+%d.)\n"; ticks :h; ticks :l);else errprint1 ("(Clock time is %d.)\n"; ticks :l);exit (�2);

gSee also sections 19, 21, 25, 28, 31, 33, 35, 43, 46, 56, 63, 73, 91, 93, 95, 97, 157, 159, 170, 172, 174,

183, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 208, 241, 251, 255, 378, 379, 381, 384,and 387.

This code is used in section 3.

15. The data structures of this program are not precisely equivalent to logicalgates that could be implemented directly in silicon; we will use data structuresand algorithms appropriate to the C programming language. For example, we'll usepointers and arrays, instead of buses and ports and latches. However, the net e�ectof our data structures and algorithms is intended to be equivalent to the net e�ect ofa silicon implementation. The methods used below are essentially equivalent to thoseused in real machines today, except that diagnostic facilities are added so that we canreadily watch what is happening.Each functional unit in the MMIX pipeline is programmed here as a coroutine in C.

At every clock cycle, we will call on each active coroutine to do one phase of itsoperation; in terms of the repair-station analogy described in the main program, thiscorresponds to getting each group of auto mechanics to do one unit of operation on acar. The coroutines are performed sequentially, although a real pipeline would havethem act in parallel. We will not \cheat" by letting one coroutine access a value earlyin its cycle that another one computes late in its cycle, unless computer hardwarecould \cheat" in an equivalent way.

ARGS=macro, x6.coroutine bit =1� 2, x8.exit : void ( ), <stdlib.h>.fprintf : int ( ), <stdio.h>.h: tetra, x17.issue bit =1� 0, x8.

l: tetra, x17.octa= struct, x17.pipe bit =1� 1, x8.print locks : void ( ), x39.print pipe : void ( ), x253.

printf : int ( ), <stdio.h>.schedule bit =1� 3, x8.stderr : FILE �, <stdio.h>.ticks =macro, x87.verbose : int, x4.

Page 163: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOW-LEVEL ROUTINES 156

16. Low-level routines. Where should we begin? It is tempting to start with aglobal view of the simulator and then to break it down into component parts. Butthat task is too daunting, because there are so many unknowns about what basicingredients ought to be combined when we construct the larger components. So let uslook �rst at the primitive operations on which the superstructure will be built. Oncewe have created some infrastructure, we'll be able to proceed with con�dence to thelarger tasks ahead.

17. This program for the 64-bit MMIX architecture is based on 32-bit integer arith-metic, because nearly every computer available to the author at the time of writing(1998{1999) was limited in that way. Details of the basic arithmetic appear in a sepa-rate program module called MMIX-ARITH, because the same routines are needed alsofor the assembler and for the non-pipelined simulator. The de�nition of type tetrashould be changed, if necessary, to conform with the de�nitions found there.

hType de�nitions 11 i +�typedef unsigned int tetra;

=� for systems conforming to the LP-64 data model �=typedef struct ftetra h; l;

g octa; =� two tetrabytes makes one octabyte �=18. h Internal prototypes 13 i +�static void print octa ARGS((octa));

19. h Subroutines 14 i +�static void print octa (o)

octa o;fif (o:h) printf ("%x%08x"; o:h; o:l); else printf ("%x"; o:l);

g20. hGlobal variables 20 i �extern octa zero octa ; =� zero octa :h = zero octa :l = 0 �=extern octa neg one ; =� neg one :h = neg one :l = �1 �=extern octa aux ; =� auxiliary output of a subroutine �=extern bool over ow ; =� set by certain subroutines for signed arithmetic �=extern int exceptions ; =� bits set by oating point operations �=extern int cur round ; =� the current rounding mode �=

See also sections 36, 41, 48, 50, 51, 53, 54, 65, 70, 78, 83, 88, 99, 107, 127, 148, 154, 194, 230, 235,238, 248, 285, 303, 305, 315, 374, 376, and 388.

This code is used in section 3.

21. Most of the subroutines in MMIX-ARITH return an octabyte as a functionof two octabytes; for example, oplus (y; z) returns the sum of octabytes y and z.Multiplication returns the high half of a product in the global variable aux ; divisionreturns the remainder in aux .

hSubroutines 14 i +�extern octa oplus ARGS((octa y;octa z)); =� unsigned y + z �=extern octa ominus ARGS((octa y;octa z)); =� unsigned y � z �=

Page 164: MMIXware - A RISC Computer for the Third Millennium - Knuth

157 MMIX-PIPE: LOW-LEVEL ROUTINES

extern octa incr ARGS((octa y; int delta )); =� unsigned y + Æ (Æ is signed) �=extern octa oand ARGS((octa y;octa z)); =� y ^ z �=extern octa oandn ARGS((octa y;octa z)); =� y ^ �z �=extern octa shift left ARGS((octa y; int s)); =� y � s, 0 � s � 64 �=extern octa shift right ARGS((octa y; int s; int uns )); =� y � s, signed if :uns �=extern octa omult ARGS((octa y;octa z)); =� unsigned (aux ; x) = y � z �=extern octa signed omult ARGS((octa y;octa z));

=� signed x = y � z, setting over ow �=extern octa odiv ARGS((octa x;octa y;octa z));

=� unsigned (x; y)=z; aux = (x; y) mod z �=extern octa signed odiv ARGS((octa y;octa z));

=� signed y=z, when z 6= 0; aux = y mod z �=extern int count bits ARGS((tetra z)); =� x = �(z) �=extern tetra byte di� ARGS((tetra y; tetra z)); =� half of BDIF �=extern tetra wyde di� ARGS((tetra y; tetra z)); =� half of WDIF �=extern octa bool mult ARGS((octa y;octa z;bool xor )); =� MOR or MXOR �=extern octa load sf ARGS((tetra z)); =� load short oat �=extern tetra store sf ARGS((octa x)); =� store short oat �=extern octa fplus ARGS((octa y;octa z)); =� oating point x = y � z �=extern octa fmult ARGS((octa y;octa z)); =� oating point x = y z �=extern octa fdivide ARGS((octa y;octa z)); =� oating point x = y � z �=extern octa froot ARGS((octa; int)); =� oating point x =

pz �=

extern octa fremstep ARGS((octa y;octa z; int delta ));=� oating point x rem z = y rem z �=

extern octa �ntegerize ARGS((octa z; int mode )); =� oating point x = round(z) �=extern int fcomp ARGS((octa y;octa z));

=� �1, 0, 1, or 2 if y < z, y = z, y > z, y k z �=extern int fepscomp ARGS((octa y;octa z;octa eps ; int sim ));

=� x = sim? [y � z (�)] : [y � z (�)] �=extern octa oatit ARGS((octa z; int mode ; int unsgnd ; int shrt ));

=� �x to oat �=extern octa �xit ARGS((octa z; int mode )); =� oat to �x �=

ARGS=macro, x6.aux : octa, MMIX-ARITH x4.bool= enum, x11.bool mult : octa ( ),MMIX-ARITH x29.

byte di� : tetra ( ),MMIX-ARITH x27.

count bits : int ( ),MMIX-ARITH x26.

cur round : int,MMIX-ARITH x30.

exceptions : int,MMIX-ARITH x32.

fcomp : int ( ), MMIX-ARITH x84.fdivide : octa ( ),MMIX-ARITH x44.

fepscomp : int ( ),MMIX-ARITH x50.

�ntegerize : octa ( ),MMIX-ARITH x86.

�xit : octa ( ), MMIX-ARITH x88.

oatit : octa ( ),MMIX-ARITH x89.

fmult : octa ( ),MMIX-ARITH x41.

fplus : octa ( ),MMIX-ARITH x46.

fremstep : octa ( ),MMIX-ARITH x93.

froot : octa ( ),MMIX-ARITH x912.

incr : octa ( ), MMIX-ARITH x6.load sf : octa ( ),MMIX-ARITH x39.

neg one : octa, MMIX-ARITH x4.oand : octa ( ),MMIX-ARITH x25.

oandn : octa ( ),MMIX-ARITH x25.

odiv : octa ( ), MMIX-ARITH x13.ominus : octa ( ),MMIX-ARITH x5.

omult : octa ( ),MMIX-ARITH x8.

oplus : octa ( ), MMIX-ARITH x5.over ow : bool,MMIX-ARITH x4.

printf : int ( ), <stdio.h>.shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

signed odiv : octa ( ),MMIX-ARITH x24.

signed omult : octa ( ),MMIX-ARITH x12.

store sf : tetra ( ),MMIX-ARITH x40.

wyde di� : tetra ( ),MMIX-ARITH x28.

zero octa : octa,MMIX-ARITH x4.

Page 165: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOW-LEVEL ROUTINES 158

22. We had better check that our 32-bit assumption holds.

h Initialize everything 22 i �if (shift left (neg one ; 1):h 6= #

ffffffff)panic(errprint0 ("Incorrect implementation of type tetra"));

See also sections 26, 61, 71, 79, 89, 116, 128, 153, 231, 236, 249, and 286.

This code is used in section 10.

Page 166: MMIXware - A RISC Computer for the Third Millennium - Knuth

159 MMIX-PIPE: COROUTINES

23. Coroutines. As stated earlier, this program can be regarded as a systemof interacting coroutines. Coroutines|sometimes called threads|are more or lessindependent processes that share and pass data and control back and forth. Theycorrespond to the individual workers in an organization.We don't need the full power of recursive coroutines, in which new threads are

spawned dynamically and have independent stacks for computation; we are, after all,simulating a �xed piece of hardware. The total number of coroutines we deal with isestablished once and for all by the MMIX con�g routine, and each coroutine has a�xed amount of local data.The simulation operates one clock tick at a time, by executing all coroutines

scheduled for time t before advancing to time t + 1. The coroutines at time t maydecide to become dormant or they may reschedule themselves and/or other coroutinesfor future times.Each coroutine has a symbolic name for diagnostic purposes (e.g., ALU1); a non-

negative stage number (e.g., 2 for the second stage of a pipeline); a pointer to thenext coroutine scheduled at the same time (or � if the coroutine is unscheduled); apointer to a lock variable (or � if no lock is currently relevant); and a reference to acontrol block containing the data to be processed.

hType de�nitions 11 i +�typedef struct coroutine struct fchar �name ; =� symbolic identi�cation of a coroutine �=int stage ; =� its rank �=struct coroutine struct �next ; =� its successor �=struct coroutine struct ��lockloc ; =� what it might be locking �=struct control struct �ctl ; =� its data �=

g coroutine;24. h Internal prototypes 13 i +�static void print coroutine id ARGS((coroutine �));static void errprint coroutine id ARGS((coroutine �));

ARGS=macro, x6.control struct: struct, x44.errprint0 =macro ( ), x13.h: tetra, x17.

MMIX con�g : void ( ),MMIX-CONFIG x38.

neg one : octa, MMIX-ARITH x4.

panic =macro ( ), x13.shift left : octa ( ),MMIX-ARITH x7.

Page 167: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: COROUTINES 160

25. h Subroutines 14 i +�static void print coroutine id (c)

coroutine �c;fif (c) printf ("%s:%d"; c~name ; c~stage );else printf ("??");

gstatic void errprint coroutine id (c)

coroutine �c;fif (c) errprint2 ("%s:%d"; c~name ; c~stage );else errprint0 ("??");

g26. Coroutine control is masterminded by a ring of queues, one each for times t,t+ 1, : : : , t+ ring size � 1, when t is the current clock time.All scheduling is �rst-come-�rst-served, except that coroutines with higher stage

numbers have priority. We want to process the later stages of a pipeline �rst, in thissequential implementation, for the same reason that a car must drive from M stationinto W station before another car can enter M station.Each queue is a circular list of coroutine nodes, linked together by their next

�elds. A list head h with stage = max stage comes at the end and the beginningof the queue. (All stage numbers of legitimate coroutines are less than max stage .)The queued items are h~next , h~next~next , etc., from back to front, and we havec~stage � c~next~stage unless c = h.Initially all queues are empty.

h Initialize everything 22 i +�f register coroutine �p;for (p = ring ; p < ring + ring size ; p++) p~next = p;

g27. To schedule a coroutine c with positive delay d < ring size , we call the functionschedule (c; d; s). (The s parameter is used only if scheduling is being logged; it doesnot a�ect the computation, but we will generally set s to the state at which thescheduled coroutine will begin.)

h Internal prototypes 13 i +�static void schedule ARGS((coroutine �; int; int));

28. h Subroutines 14 i +�static void schedule (c; d; s)

coroutine �c;int d; s;

fregister int tt = (cur time + d)% ring size ;register coroutine �p = &ring [tt ]; =� start at the list head �=if (d � 0 _ d � ring size ) =� do a sanity check �=panic(confusion ("Scheduling "); errprint coroutine id (c);

errprint1 (" with delay %d"; d));

Page 168: MMIXware - A RISC Computer for the Third Millennium - Knuth

161 MMIX-PIPE: COROUTINES

while (p~next~stage < c~stage ) p = p~next ;c~next = p~next ;p~next = c;if (verbose & schedule bit ) fprintf (" scheduling "); print coroutine id (c);printf (" at time %d, state %d\n"; ticks :l + d; s);

gg

29. hExternal variables 4 i +�Extern int ring size ; =� set by MMIX con�g , must be suÆciently large �=Extern coroutine �ring ;Extern int cur time ;

30. The all-important ctl �eld of a coroutine, which contains the data being manip-ulated, will be explained below. One of its key components is the state �eld, whichhelps to specify the next actions the coroutine will perform. When we schedule acoroutine for a new task, we often want it to begin in state 0.

h Internal prototypes 13 i +�static void startup ARGS((coroutine �; int));

31. h Subroutines 14 i +�static void startup (c; d)

coroutine �c;int d;

fc~ctl~state = 0;schedule (c; d; 0);

g

ARGS=macro, x6.confusion =macro ( ), x13.coroutine= struct, x23.ctl : control �, x23.errprint0 =macro ( ), x13.errprint1 =macro ( ), x13.errprint2 =macro ( ), x13.

Extern=macro, x4.l: tetra, x17.max stage =99, x129.MMIX con�g : void ( ),MMIX-CONFIG x38.

name : char �, x23.next : coroutine �, x23.

panic =macro ( ), x13.printf : int ( ), <stdio.h>.schedule bit =1� 3, x8.stage : int, x23.state : int, x44.ticks =macro, x87.verbose : int, x4.

Page 169: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: COROUTINES 162

32. The following routine removes a coroutine from whatever queue it's in. Thecase c~next = c is also permitted; such a self-loop can occur when a coroutine goes tosleep and expects to be awakened (that is, scheduled) by another coroutine. Sleepingcoroutines have important data in their ctl �eld; they are therefore quite di�erent fromunscheduled or \unemployed" coroutines, which have c~next = �. An unemployedcoroutine is not assumed to have any valid data in its ctl �eld.

h Internal prototypes 13 i +�static void unschedule ARGS((coroutine �));

33. h Subroutines 14 i +�static void unschedule (c)

coroutine �c;f register coroutine �p;if (c~next ) ffor (p = c; p~next 6= c; p = p~next ) ;p~next = c~next ;c~next = �;if (verbose & schedule bit ) fprintf (" unscheduling "); print coroutine id (c); printf ("\n");

gg

g34. When it is time to process all coroutines that have queued up for a particulartime t, we empty the queue called ring [t] and link its items in the opposite order(from to back). The following subroutine uses the well known algorithm discussed inexercise 2.2.3{7 of The Art of Computer Programming.

h Internal prototypes 13 i +�static coroutine �queuelist ARGS((int));

35. h Subroutines 14 i +�static coroutine �queuelist (t)

int t;f register coroutine �p; �q = &sentinel ; �r;for (p = ring [t]:next ; p 6= &ring [t]; p = r) f

r = p~next ;p~next = q;q = p;

gring [t]:next = &ring [t];sentinel :next = q;return q;

g36. hGlobal variables 20 i +�coroutine sentinel ; =� dummy coroutine at origin of circular list �=

Page 170: MMIXware - A RISC Computer for the Third Millennium - Knuth

163 MMIX-PIPE: COROUTINES

37. Coroutines often start working on tasks that are speculative, in the sense thatwe want certain results to be ready if they prove to be useful; we understand thatspeculative computations might not actually be needed. Therefore a coroutine mightneed to be aborted before it has �nished its work.All coroutines must be written in such a way that important data structures remain

intact even when the coroutine is abruptly terminated. In particular, we need tobe sure that \locks" on shared resources are restored to an unlocked state when acoroutine holding the lock is aborted.A lockvar variable is � when it is unlocked; otherwise it points to the coroutine

responsible for unlocking it.

#de�ne set lock (c; l)f l = c; (c)~ lockloc = &(l); g

#de�ne release lock (c; l)f l = �; (c)~ lockloc = �; g

hType de�nitions 11 i +�typedef coroutine �lockvar;

38. hExternal prototypes 9 i +�Extern void print locks ARGS((void));

39. hExternal routines 10 i +�void print locks ( )fprint cache locks (ITcache );print cache locks (DTcache );print cache locks (Icache );print cache locks (Dcache );print cache locks (Scache );if (mem lock ) printf ("mem locked by %s:%d\n";mem lock~name ;mem lock~stage );if (dispatch lock )printf ("dispatch locked by %s:%d\n"; dispatch lock~name ; dispatch lock~stage );

if (wbuf lock ) printf ("head of write buffer locked by %s:%d\n";wbuf lock~name ;wbuf lock~stage );

if (clean lock )printf ("cleaner locked by %s:%d\n"; clean lock~name ; clean lock~stage );

if (speed lock ) printf ("write buffer flush locked by %s:%d\n"; speed lock~name ;speed lock~stage );

g

ARGS=macro, x6.clean lock : lockvar, x230.coroutine= struct, x23.ctl : control �, x23.Dcache : cache �, x168.dispatch lock : lockvar, x65.DTcache : cache �, x168.Extern=macro, x4.Icache : cache �, x168.

ITcache : cache �, x168.lockloc : coroutine ��, x23.mem lock : lockvar, x214.name : char �, x23.next : coroutine �, x23.print cache locks : static void

( ), x174.print coroutine id : staticvoid ( ), x25.

printf : int ( ), <stdio.h>.ring : coroutine �, x29.Scache : cache �, x168.schedule bit =1� 3, x8.speed lock : lockvar, x247.stage : int, x23.verbose : int, x4.wbuf lock : lockvar, x247.

Page 171: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: COROUTINES 164

40. Many of the quantities we deal with are speculative values that might not yethave been certi�ed as part of the \real" calculation; in fact, they might not yet havebeen calculated.A spec consists of a 64-bit quantity o and a pointer p to a specnode. The value o

is meaningful only if the pointer p is �; otherwise p points to a source of furtherinformation.A specnode is a 64-bit quantity o together with links to other specnodes that are

above it or below it in a doubly linked list. An additional known bit tells whether theo �eld has been calculated. There also is a 64-bit addr �eld, to identify the list andgive further information. A specnode list keeps track of speculative values relatedto a speci�c register or to all of main memory; we will discuss such lists in detail later.

hType de�nitions 11 i +�typedef struct focta o;struct specnode struct �p;

g spec;typedef struct specnode struct focta o;bool known ;octa addr ;struct specnode struct �up ; �down ;

g specnode;41. hGlobal variables 20 i +�spec zero spec ; =� zero spec :o:h = zero spec :o:l = 0 and zero spec :p = � �=

42. h Internal prototypes 13 i +�static void print spec ARGS((spec));

43. h Subroutines 14 i +�static void print spec(s)

spec s;fif (:s:p) print octa (s:o);else fprintf (">"); print specnode id (s:p~addr );

ggstatic void print specnode (s)

specnode s;fif (s:known ) f print octa (s:o); printf ("!"); gelse if (s:o:h _ s:o:l) f print octa (s:o); printf ("?"); gelse printf ("?");print specnode id (s:addr );

g

Page 172: MMIXware - A RISC Computer for the Third Millennium - Knuth

165 MMIX-PIPE: COROUTINES

ARGS=macro, x6.bool= enum, x11.h: tetra, x17.

l: tetra, x17.octa= struct, x17.print octa : static void ( ), x19.

print specnode id : static void( ), x91.

printf : int ( ), <stdio.h>.

Page 173: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: COROUTINES 166

44. The analog of an automobile in our simulator is a block of data called control,which represents all the relevant facts about an MMIX instruction. We can think of itas the work order attached to a car's windshield. Each group of employees updatesthe work order as the car moves through the shop.A control record contains the original location of an instruction, and its four bytes

OP X Y Z. An instruction has up to four inputs, which are spec records called y, z,b and ra ; it also has up to three outputs, which are specnode records called x, a,and rl . (We usually don't mention the special input ra or the special output rl , whichrefer to MMIX's internal registers rA and rL.) For example, the main inputs to a DIVU

command are $Y, $Z, and rD; the outputs are the quotient $X and the remainder rR.The inputs to a STO command are $Y, $Z, and $X; there is one \output," and the�eld x:addr will be set to the physical address of the memory location correspondingto virtual address $Y + $Z.Each control block also points to the coroutine that owns it, if any. And it has

various other �elds that contain other tidbits of information; for example, we havealready mentioned the state �eld, which often governs a coroutine's actions. Thei �eld, which contains an internal operation code number, is generally used togetherwith state to switch between alternative computational steps. If, for example, theop �eld is SUB or SUBI or NEG or NEGI, the internal opcode i will be simply sub . Weshall de�ne all the �elds of control records now and discuss them later.An actual hardware implementation of MMIX wouldn't need all the information

we are putting into a control block. Some of that information would typically belatched between stages of a pipeline; other portions would probably appear in so-called\rename registers." We simulate rename registers only indirectly, by counting howmany registers of that kind would be in use if we were mimicking low-level hardwaredetails more precisely. The go �eld is a specnode for convenience in programming,although we use only its known and o sub�elds. It generally contains the address ofthe subsequent instruction.

hType de�nitions 11 i +�hDeclare mmix opcode and internal opcode 47 itypedef struct control struct focta loc ; =� virtual address where an instruction originated �=mmix opcode op ; unsigned char xx ; yy ; zz ;

=� the original instruction bytes �=spec y; z; b; ra ; =� inputs �=specnode x; a; go ; rl ; =� outputs �=coroutine �owner ; =� a coroutine whose ctl this is �=internal opcode i; =� internal opcode �=int state ; =� internal mindset �=bool usage ; =� should rU be increased? �=bool need b ; =� should we stall until b:p � �? �=bool need ra ; =� should we stall until ra :p � �? �=bool ren x ; =� does x correspond to a rename register? �=bool mem x ; =� does x correspond to a memory write? �=bool ren a ; =� does a correspond to a rename register? �=bool set l ; =� does rl correspond to a new value of rL? �=

Page 174: MMIXware - A RISC Computer for the Third Millennium - Knuth

167 MMIX-PIPE: COROUTINES

bool interim ; =� does this instruction need to be reissued on interrupt? �=unsigned int arith exc ; =� arithmetic exceptions for event bits of rA �=unsigned int hist ; =� history bits for use in branch prediction �=int denin ; denout ; =� execution time penalties for denormal handling �=octa cur O ; cur S ; =� speculative rO and rS before this instruction �=unsigned int interrupt ; =� does this instruction generate an interrupt? �=void �ptr a ; �ptr b ; �ptr c ; =� generic pointers for miscellaneous use �=

g control;

addr : octa, x40.bool= enum, x11.coroutine= struct, x23.ctl : control �, x23.internal opcode= enum,

x49.known : bool, x40.mmix opcode= enum, x47.o: octa, x40.octa= struct, x17.

p: specnode �, x40.spec= struct, x40.specnode= struct, x40.sub =31, x49.

Page 175: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: COROUTINES 168

45. h Internal prototypes 13 i +�static void print control block ARGS((control �));

46. h Subroutines 14 i +�static void print control block (c)

control �c;focta default go ;

if (c~ loc :h _ c~ loc :l _ c~op _ c~xx _ c~yy _ c~zz _ c~owner ) fprint octa (c~ loc);printf (": %02x%02x%02x%02x(%s)"; c~op ; c~xx ; c~yy ; c~zz ; internal op name [c~ i]);

gif (c~usage ) printf ("*");if (c~ interim ) printf ("+");if (c~y:o:h _ c~y:o:l _ c~y:p) f printf (" y="); print spec(c~y); gif (c~z:o:h _ c~z:o:l _ c~z:p) f printf (" z="); print spec(c~z); gif (c~b:o:h _ c~b:o:l _ c~b:p _ c~need b) fprintf (" b="); print spec(c~b);if (c~need b) printf ("*");

gif (c~need ra ) f printf (" rA="); print spec(c~ra ); gif (c~ren x _ c~mem x ) f printf (" x="); print specnode (c~x); gelse if (c~x:o:h _ c~x:o:l) fprintf (" x="); print octa (c~x:o); printf ("%c"; c~x:known ? '!' : '?');

gif (c~ren a ) f printf (" a="); print specnode (c~a); gif (c~set l ) f printf (" rL="); print specnode (c~rl ); gif (c~ interrupt ) f printf (" int="); print bits (c~ interrupt ); gif (c~arith exc) printf (" exc="); print bits (c~arith exc � 8);default go = incr (c~ loc ; 4);if (c~go :o:l 6= default go :l _ c~go :o:h 6= default go :h) fprintf (" ->"); print octa (c~go :o);

gif (verbose & show pred bit ) printf (" hist=%x"; c~hist );printf (" state=%d"; c~state );

g

Page 176: MMIXware - A RISC Computer for the Third Millennium - Knuth

169 MMIX-PIPE: LISTS

a: specnode, x44.ARGS=macro, x6.arith exc : unsigned int, x44.b: spec, x44.control= struct, x44.go =72, x49.h: tetra, x17.hist : unsigned int, x44.i: internal opcode, x44.incr : octa ( ), MMIX-ARITH x6.interim : bool, x44.internal op name : char �[ ],x50.

interrupt : unsigned int, x44.known : bool, x40.l: tetra, x17.

loc : octa, x44.mem x : bool, x44.need b : bool, x44.need ra : bool, x44.o: octa, x40.octa= struct, x17.op : mmix opcode, x44.owner : coroutine �, x44.p: specnode �, x40.print bits : static void ( ), x56.print octa : static void ( ), x19.print spec : static void ( ), x43.print specnode : static void

( ), x43.printf : int ( ), <stdio.h>.

ra : spec, x44.ren a : bool, x44.ren x : bool, x44.rl : specnode, x44.set l : bool, x44.show pred bit =1� 7, x8.state : int, x44.usage : bool, x44.verbose : int, x4.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.yy : unsigned char, x44.z: spec, x44.zz : unsigned char, x44.

Page 177: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LISTS 170

47. Lists. Here is a (boring) list of all the MMIX opcodes, in order.

hDeclare mmix opcode and internal opcode 47 i �typedef enum f

TRAP; FCMP; FUN; FEQL; FADD; FIX; FSUB; FIXU;FLOT; FLOTI; FLOTU; FLOTUI; SFLOT; SFLOTI; SFLOTU; SFLOTUI;FMUL; FCMPE; FUNE; FEQLE; FDIV; FSQRT; FREM; FINT;MUL; MULI; MULU; MULUI; DIV; DIVI; DIVU; DIVUI;ADD; ADDI; ADDU; ADDUI; SUB; SUBI; SUBU; SUBUI;IIADDU; IIADDUI; IVADDU; IVADDUI; VIIIADDU; VIIIADDUI; XVIADDU; XVIADDUI;CMP; CMPI; CMPU; CMPUI; NEG; NEGI; NEGU; NEGUI;SL; SLI; SLU; SLUI; SR; SRI; SRU; SRUI;BN; BNB; BZ; BZB; BP; BPB; BOD; BODB;BNN; BNNB; BNZ; BNZB; BNP; BNPB; BEV; BEVB;PBN; PBNB; PBZ; PBZB; PBP; PBPB; PBOD; PBODB;PBNN; PBNNB; PBNZ; PBNZB; PBNP; PBNPB; PBEV; PBEVB;CSN; CSNI; CSZ; CSZI; CSP; CSPI; CSOD; CSODI;CSNN; CSNNI; CSNZ; CSNZI; CSNP; CSNPI; CSEV; CSEVI;ZSN; ZSNI; ZSZ; ZSZI; ZSP; ZSPI; ZSOD; ZSODI;ZSNN; ZSNNI; ZSNZ; ZSNZI; ZSNP; ZSNPI; ZSEV; ZSEVI;LDB; LDBI; LDBU; LDBUI; LDW; LDWI; LDWU; LDWUI;LDT; LDTI; LDTU; LDTUI; LDO; LDOI; LDOU; LDOUI;LDSF; LDSFI; LDHT; LDHTI; CSWAP; CSWAPI; LDUNC; LDUNCI;LDVTS; LDVTSI; PRELD; PRELDI; PREGO; PREGOI; GO; GOI;STB; STBI; STBU; STBUI; STW; STWI; STWU; STWUI;STT; STTI; STTU; STTUI; STO; STOI; STOU; STOUI;STSF; STSFI; STHT; STHTI; STCO; STCOI; STUNC; STUNCI;SYNCD; SYNCDI; PREST; PRESTI; SYNCID; SYNCIDI; PUSHGO; PUSHGOI;OR; ORI; ORN; ORNI; NOR; NORI; XOR; XORI;AND; ANDI; ANDN; ANDNI; NAND; NANDI; NXOR; NXORI;BDIF; BDIFI; WDIF; WDIFI; TDIF; TDIFI; ODIF; ODIFI;MUX; MUXI; SADD; SADDI; MOR; MORI; MXOR; MXORI;SETH; SETMH; SETML; SETL; INCH; INCMH; INCML; INCL;ORH; ORMH; ORML; ORL; ANDNH; ANDNMH; ANDNML; ANDNL;JMP; JMPB; PUSHJ; PUSHJB; GETA; GETAB; PUT; PUTI;POP; RESUME; SAVE; UNSAVE; SYNC; SWYM; GET; TRIP

g mmix opcode;

See also section 49.

This code is used in section 44.

Page 178: MMIXware - A RISC Computer for the Third Millennium - Knuth

171 MMIX-PIPE: LISTS

48. hGlobal variables 20 i +�char �opcode name [ ] = f"TRAP"; "FCMP"; "FUN"; "FEQL"; "FADD"; "FIX"; "FSUB"; "FIXU";"FLOT"; "FLOTI"; "FLOTU"; "FLOTUI"; "SFLOT"; "SFLOTI"; "SFLOTU"; "SFLOTUI";"FMUL"; "FCMPE"; "FUNE"; "FEQLE"; "FDIV"; "FSQRT"; "FREM"; "FINT";"MUL"; "MULI"; "MULU"; "MULUI"; "DIV"; "DIVI"; "DIVU"; "DIVUI";"ADD"; "ADDI"; "ADDU"; "ADDUI"; "SUB"; "SUBI"; "SUBU"; "SUBUI";"2ADDU"; "2ADDUI"; "4ADDU"; "4ADDUI"; "8ADDU"; "8ADDUI"; "16ADDU"; "16ADDUI";"CMP"; "CMPI"; "CMPU"; "CMPUI"; "NEG"; "NEGI"; "NEGU"; "NEGUI";"SL"; "SLI"; "SLU"; "SLUI"; "SR"; "SRI"; "SRU"; "SRUI";"BN"; "BNB"; "BZ"; "BZB"; "BP"; "BPB"; "BOD"; "BODB";"BNN"; "BNNB"; "BNZ"; "BNZB"; "BNP"; "BNPB"; "BEV"; "BEVB";"PBN"; "PBNB"; "PBZ"; "PBZB"; "PBP"; "PBPB"; "PBOD"; "PBODB";"PBNN"; "PBNNB"; "PBNZ"; "PBNZB"; "PBNP"; "PBNPB"; "PBEV"; "PBEVB";"CSN"; "CSNI"; "CSZ"; "CSZI"; "CSP"; "CSPI"; "CSOD"; "CSODI";"CSNN"; "CSNNI"; "CSNZ"; "CSNZI"; "CSNP"; "CSNPI"; "CSEV"; "CSEVI";"ZSN"; "ZSNI"; "ZSZ"; "ZSZI"; "ZSP"; "ZSPI"; "ZSOD"; "ZSODI";"ZSNN"; "ZSNNI"; "ZSNZ"; "ZSNZI"; "ZSNP"; "ZSNPI"; "ZSEV"; "ZSEVI";"LDB"; "LDBI"; "LDBU"; "LDBUI"; "LDW"; "LDWI"; "LDWU"; "LDWUI";"LDT"; "LDTI"; "LDTU"; "LDTUI"; "LDO"; "LDOI"; "LDOU"; "LDOUI";"LDSF"; "LDSFI"; "LDHT"; "LDHTI"; "CSWAP"; "CSWAPI"; "LDUNC"; "LDUNCI";"LDVTS"; "LDVTSI"; "PRELD"; "PRELDI"; "PREGO"; "PREGOI"; "GO"; "GOI";"STB"; "STBI"; "STBU"; "STBUI"; "STW"; "STWI"; "STWU"; "STWUI";"STT"; "STTI"; "STTU"; "STTUI"; "STO"; "STOI"; "STOU"; "STOUI";"STSF"; "STSFI"; "STHT"; "STHTI"; "STCO"; "STCOI"; "STUNC"; "STUNCI";"SYNCD"; "SYNCDI"; "PREST"; "PRESTI"; "SYNCID"; "SYNCIDI"; "PUSHGO"; "PUSHGOI";"OR"; "ORI"; "ORN"; "ORNI"; "NOR"; "NORI"; "XOR"; "XORI";"AND"; "ANDI"; "ANDN"; "ANDNI"; "NAND"; "NANDI"; "NXOR"; "NXORI";"BDIF"; "BDIFI"; "WDIF"; "WDIFI"; "TDIF"; "TDIFI"; "ODIF"; "ODIFI";"MUX"; "MUXI"; "SADD"; "SADDI"; "MOR"; "MORI"; "MXOR"; "MXORI";"SETH"; "SETMH"; "SETML"; "SETL"; "INCH"; "INCMH"; "INCML"; "INCL";"ORH"; "ORMH"; "ORML"; "ORL"; "ANDNH"; "ANDNMH"; "ANDNML"; "ANDNL";"JMP"; "JMPB"; "PUSHJ"; "PUSHJB"; "GETA"; "GETAB"; "PUT"; "PUTI";"POP"; "RESUME"; "SAVE"; "UNSAVE"; "SYNC"; "SWYM"; "GET"; "TRIP"g;

Page 179: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LISTS 172

49. And here is a (likewise boring) list of all the internal opcodes. The smallestnumbers, less than or equal to max pipe op , correspond to operations for whicharbitrary pipeline delays can be con�gured with MMIX con�g . The largest numbers,greater than max real command , correspond to internally generated operations thathave no oÆcial OP code; for example, there are internal operations to shift the pointer in the register stack, and to compute page table entries.

hDeclare mmix opcode and internal opcode 47 i +�#de�ne max pipe op feps

#de�ne max real command trip

typedef enum fmul0 ; =� multiplication by zero �=mul1 ;mul2 ;mul3 ;mul4 ;mul5 ;mul6 ;mul7 ;mul8 ;

=� multiplication by 1{8, 9{16, : : : , 57{64 bits �=div ; =� DIV[U][I] �=sh ; =� S[L,R][U][I] �=mux ; =� MUX[I] �=sadd ; =� SADD[I] �=mor ; =� M[X]OR[I] �=fadd ; =� FADD, FSUB �=fmul ; =� FMUL �=fdiv ; =� FDIV �=fsqrt ; =� FSQRT �=�nt ; =� FINT �=�x ; =� FIX[U] �= ot ; =� [S]FLOT[U][I] �=feps ; =� FCMPE, FUNE, FEQLE �=fcmp ; =� FCMP �=funeq ; =� FUN, FEQL �=fsub ; =� FSUB �=frem ; =� FREM �=mul ; =� MUL[I] �=mulu ; =� MULU[I] �=divu ; =� DIVU[I] �=add ; =� ADD[I] �=addu ; =� [2,4,8,16,]ADDU[I], INC[M][H,L] �=sub ; =� SUB[I], NEG[I] �=subu ; =� SUBU[I], NEGU[I] �=set ; =� SET[M][H,L], GETA[B] �=or ; =� OR[I], OR[M][H,L] �=orn ; =� ORN[I] �=nor ; =� NOR[I] �=and ; =� AND[I] �=andn ; =� ANDN[I], ANDN[M][H,L] �=nand ; =� NAND[I] �=xor ; =� XOR[I] �=nxor ; =� NXOR[I] �=shlu ; =� SLU[I] �=shru ; =� SRU[I] �=

Page 180: MMIXware - A RISC Computer for the Third Millennium - Knuth

173 MMIX-PIPE: LISTS

shl ; =� SL[I] �=shr ; =� SR[I] �=cmp ; =� CMP[I] �=cmpu ; =� CMPU[I] �=bdif ; =� BDIF[I] �=wdif ; =� WDIF[I] �=tdif ; =� TDIF[I] �=odif ; =� ODIF[I] �=zset ; =� ZS[N][N,Z,P][I], ZSEV[I], ZSOD[I] �=cset ; =� CS[N][N,Z,P][I], CSEV[I], CSOD[I] �=get ; =� GET �=put ; =� PUT[I] �=ld ; =� LD[B,W,T,O][U][I], LDHT[I], LDSF[I] �=ldptp ; =� load page table pointer �=ldpte ; =� load page table entry �=ldunc ; =� LDUNC[I] �=ldvts ; =� LDVTS[I] �=preld ; =� PRELD[I] �=prest ; =� PREST[I] �=st ; =� STO[U][I], STCO[I], STUNC[I] �=syncd ; =� SYNCD[I] �=syncid ; =� SYNCID[I] �=pst ; =� ST[B,W,T][U][I], STHT[I] �=stunc ; =� STUNC[I], in write bu�er �=cswap ; =� CSWAP[I] �=br ; =� B[N][N,Z,P][B] �=pbr ; =� PB[N][N,Z,P][B] �=pushj ; =� PUSHJ[B] �=go ; =� GO[I] �=prego ; =� PREGO[I] �=pushgo ; =� PUSHGO[I] �=pop ; =� POP �=resume ; =� RESUME �=save ; =� SAVE �=unsave ; =� UNSAVE �=sync ; =� SYNC �=jmp ; =� JMP[B] �=noop ; =� SWYM �=trap ; =� TRAP �=trip ; =� TRIP �=incgamma ; =� increase pointer �=decgamma ; =� decrease pointer �=incrl ; =� increase rL and � �=sav ; =� intermediate stage of SAVE �=unsav ; =� intermediate stage of UNSAVE �=resum =� intermediate stage of RESUME �=

g internal opcode;

MMIX con�g : void ( ), MMIX-CONFIG x38.

Page 181: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LISTS 174

50. hGlobal variables 20 i +�char �internal op name [ ] = f"mul0"; "mul1"; "mul2"; "mul3"; "mul4"; "mul5"; "mul6";

"mul7"; "mul8"; "div"; "sh"; "mux"; "sadd"; "mor"; "fadd"; "fmul"; "fdiv";"fsqrt"; "fint"; "fix"; "flot"; "feps"; "fcmp"; "funeq"; "fsub"; "frem"; "mul";"mulu"; "divu"; "add"; "addu"; "sub"; "subu"; "set"; "or"; "orn"; "nor"; "and";"andn"; "nand"; "xor"; "nxor"; "shlu"; "shru"; "shl"; "shr"; "cmp"; "cmpu";"bdif"; "wdif"; "tdif"; "odif"; "zset"; "cset"; "get"; "put"; "ld"; "ldptp";"ldpte"; "ldunc"; "ldvts"; "preld"; "prest"; "st"; "syncd"; "syncid"; "pst";"stunc"; "cswap"; "br"; "pbr"; "pushj"; "go"; "prego"; "pushgo"; "pop"; "resume";"save"; "unsave"; "sync"; "jmp"; "noop"; "trap"; "trip"; "incgamma"; "decgamma";"incrl"; "sav"; "unsav"; "resum"g;

51. We need a table to convert the external opcodes to internal ones.

hGlobal variables 20 i +�internal opcode internal op [256] = ftrap ; fcmp ; funeq ; funeq ; fadd ;�x ; fsub ;�x ; ot ; ot ; ot ; ot ; ot ; ot ; ot ; ot ;fmul ; feps ; feps ; feps ; fdiv ; fsqrt ; frem ; �nt ;mul ;mul ;mulu ;mulu ; div ; div ; divu ; divu ;add ; add ; addu ; addu ; sub ; sub ; subu ; subu ;addu ; addu ; addu ; addu ; addu ; addu ; addu ; addu ;cmp ; cmp ; cmpu ; cmpu ; sub ; sub ; subu ; subu ;shl ; shl ; shlu ; shlu ; shr ; shr ; shru ; shru ;br ; br ; br ; br ; br ; br ; br ; br ;br ; br ; br ; br ; br ; br ; br ; br ;pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ;pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ; pbr ;cset ; cset ; cset ; cset ; cset ; cset ; cset ; cset ;cset ; cset ; cset ; cset ; cset ; cset ; cset ; cset ;zset ; zset ; zset ; zset ; zset ; zset ; zset ; zset ;zset ; zset ; zset ; zset ; zset ; zset ; zset ; zset ;ld ; ld ; ld ; ld ; ld ; ld ; ld ; ld ;ld ; ld ; ld ; ld ; ld ; ld ; ld ; ld ;ld ; ld ; ld ; ld ; cswap ; cswap ; ldunc ; ldunc ;ldvts ; ldvts ; preld ; preld ; prego ; prego ; go ; go ;pst ; pst ; pst ; pst ; pst ; pst ; pst ; pst ;pst ; pst ; pst ; pst ; st ; st ; st ; st ;pst ; pst ; pst ; pst ; st ; st ; st ; st ;syncd ; syncd ; prest ; prest ; syncid ; syncid ; pushgo ; pushgo ;or ; or ; orn ; orn ;nor ; nor ; xor ; xor ;and ; and ; andn ; andn ;nand ;nand ; nxor ;nxor ;bdif ; bdif ;wdif ;wdif ; tdif ; tdif ; odif ; odif ;mux ;mux ; sadd ; sadd ;mor ;mor ;mor ;mor ;set ; set ; set ; set ; addu ; addu ; addu ; addu ;or ; or ; or ; or ; andn ; andn ; andn ; andn ;jmp ; jmp ; pushj ; pushj ; set ; set ; put ; put ;pop ; resume ; save ; unsave ; sync ; noop ; get ; tripg;

Page 182: MMIXware - A RISC Computer for the Third Millennium - Knuth

175 MMIX-PIPE: LISTS

add =29, x49.addu =30, x49.and =37, x49.andn =38, x49.bdif =48, x49.br =69, x49.cmp =46, x49.cmpu =47, x49.cset =53, x49.cswap =68, x49.div =9, x49.divu =28, x49.fadd =14, x49.fcmp =22, x49.fdiv =16, x49.feps =21, x49.�nt =18, x49.�x =19, x49. ot =20, x49.fmul =15, x49.frem =25, x49.fsqrt =17, x49.fsub =24, x49.funeq =23, x49.get =54, x49.

go =72, x49.internal opcode= enum,x49.

jmp =80, x49.ld =56, x49.ldunc =59, x49.ldvts =60, x49.mor =13, x49.mul =26, x49.mulu =27, x49.mux =11, x49.nand =39, x49.noop =81, x49.nor =36, x49.nxor =41, x49.odif =51, x49.or =34, x49.orn =35, x49.pbr =70, x49.pop =75, x49.prego =73, x49.preld =61, x49.prest =62, x49.pst =66, x49.

pushgo =74, x49.pushj =71, x49.put =55, x49.resume =76, x49.sadd =12, x49.save =77, x49.set =33, x49.shl =44, x49.shlu =42, x49.shr =45, x49.shru =43, x49.st =63, x49.sub =31, x49.subu =32, x49.sync =79, x49.syncd =64, x49.syncid =65, x49.tdif =50, x49.trap =82, x49.trip =83, x49.unsave =78, x49.wdif =49, x49.xor =40, x49.zset =52, x49.

Page 183: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LISTS 176

52. While we're into boring lists, we might as well de�ne all the special registernumbers, together with an inverse table for use in diagnostic outputs. These codeshave been designed so that special registers 0{7 are unencumbered, 8{11 can't be PUTby anybody, 12{18 can't be PUT by the user. Pipeline delays might occur when GET

is applied to special registers 21{31 or when PUT is applied to special registers 15{20.The SAVE and UNSAVE commands store and restore special registers 0{6 and 23{27.

hHeader de�nitions 6 i +�#de�ne rA 21 =� arithmetic status register �=#de�ne rB 0 =� bootstrap register (trip) �=#de�ne rC 8 =� cycle counter �=#de�ne rD 1 =� dividend register �=#de�ne rE 2 =� epsilon register �=#de�ne rF 22 =� failure location register �=#de�ne rG 19 =� global threshold register �=#de�ne rH 3 =� himult register �=#de�ne rI 12 =� interval counter �=#de�ne rJ 4 =� return-jump register �=#de�ne rK 15 =� interrupt mask register �=#de�ne rL 20 =� local threshold register �=#de�ne rM 5 =� multiplex mask register �=#de�ne rN 9 =� serial number �=#de�ne rO 10 =� register stack o�set �=#de�ne rP 23 =� prediction register �=#de�ne rQ 16 =� interrupt request register �=#de�ne rR 6 =� remainder register �=#de�ne rS 11 =� register stack pointer �=#de�ne rT 13 =� trap address register �=#de�ne rU 17 =� usage counter �=#de�ne rV 18 =� virtual translation register �=#de�ne rW 24 =� where-interrupted register (trip) �=#de�ne rX 25 =� execution register (trip) �=#de�ne rY 26 =� Y operand (trip) �=#de�ne rZ 27 =� Z operand (trip) �=#de�ne rBB 7 =� bootstrap register (trap) �=#de�ne rTT 14 =� dynamic trap address register �=#de�ne rWW 28 =� where-interrupted register (trap) �=#de�ne rXX 29 =� execution register (trap) �=#de�ne rYY 30 =� Y operand (trap) �=#de�ne rZZ 31 =� Z operand (trap) �=53. hGlobal variables 20 i +�char �special name [32] = f"rB"; "rD"; "rE"; "rH"; "rJ"; "rM"; "rR"; "rBB"; "rC"; "rN";

"rO"; "rS"; "rI"; "rT"; "rTT"; "rK"; "rQ"; "rU"; "rV"; "rG"; "rL"; "rA"; "rF"; "rP";"rW"; "rX"; "rY"; "rZ"; "rWW"; "rXX"; "rYY"; "rZZ"g;

54. Here are the bit codes that a�ect trips and traps. The �rst eight cases alsoapply to the upper half of rQ; the next eight apply to rA.

#de�ne P_BIT (1� 0) =� instruction in privileged location �=#de�ne S_BIT (1� 1) =� security violation �=

Page 184: MMIXware - A RISC Computer for the Third Millennium - Knuth

177 MMIX-PIPE: LISTS

#de�ne B_BIT (1� 2) =� instruction breaks the rules �=#de�ne K_BIT (1� 3) =� instruction for kernel only �=#de�ne N_BIT (1� 4) =� virtual translation bypassed �=#de�ne PX_BIT (1� 5) =� permission lacking to execute from page �=#de�ne PW_BIT (1� 6) =� permission lacking to write on page �=#de�ne PR_BIT (1� 7) =� permission lacking to read from page �=#de�ne PROT_OFFSET 5 =� distance from PR_BIT to protection code position �=#de�ne X_BIT (1� 8) =� oating inexact �=#de�ne Z_BIT (1� 9) =� oating division by zero �=#de�ne U_BIT (1� 10) =� oating under ow �=#de�ne O_BIT (1� 11) =� oating over ow �=#de�ne I_BIT (1� 12) =� oating invalid operation �=#de�ne W_BIT (1� 13) =� oat-to-�x over ow �=#de�ne V_BIT (1� 14) =� integer over ow �=#de�ne D_BIT (1� 15) =� integer divide check �=#de�ne H_BIT (1� 16) =� trip handler bit �=#de�ne F_BIT (1� 17) =� forced trap bit �=#de�ne E_BIT (1� 18) =� external (dynamic) trap bit �=hGlobal variables 20 i +�char bit code map [ ] = "EFHDVWIOUZXrwxnkbsp";

55. h Internal prototypes 13 i +�static void print bits ARGS((int));

56. h Subroutines 14 i +�static void print bits (x)

int x;fregister int b; j;

for (j = 0; b = E_BIT; (x& (b+ b� 1)) ^ b; j++; b�= 1)if (x& b) printf ("%c"; bit code map [j]);

g57. The lower half of rQ holds external interrupts of highest priority. Most of themare implementation-dependent, but a few are de�ned in general.

hHeader de�nitions 6 i +�#de�ne POWER_FAILURE (1� 0) =� try to shut down calmly and quickly �=#de�ne PARITY_ERROR (1� 1) =� try to save the �le systems �=#de�ne NONEXISTENT_MEMORY (1� 2) =� a memory address can't be used �=#de�ne REBOOT_SIGNAL (1� 4) =� it's time to start over �=#de�ne INTERVAL_TIMEOUT (1� 7) =� the timer register, rI, has reached zero �=

ARGS=macro, x6. printf : int ( ), <stdio.h>.

Page 185: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: DYNAMIC SPECULATION 178

58. Dynamic speculation. Now that we understand some basic low-level struc-tures, we're ready to look at the larger picture.

This simulator is based on the idea of \dynamic scheduling with register renam-ing," as introduced in the 1960s by R. M. Tomasulo [IBM Journal of Research and

Development 11 (1967), 25{33]. Moreover, the dynamic scheduling method is ex-tended here to \speculative execution," as implemented in several processors of the1990s and described in section 4.6 of Hennessy and Patterson's Computer Architec-

ture, second edition (1995). The essential idea is to keep track of the pipeline contentsby recording all dependencies between un�nished computations in a queue called thereorder bu�er. An entry in the reorder bu�er might, for example, correspond to aninstruction that adds together two numbers whose values are still being computed;those numbers have been allocated space in earlier positions of the reorder bu�er. Theaddition will take place as soon as both of its operands are known, but the sum won'tbe written immediately into the destination register. It will stay in the reorder bu�eruntil reaching the hot seat at the front of the queue. Finally, the addition leaves thehot seat and is said to be committed.Some instructions in the reorder bu�er may in fact be executed only on speculation,

meaning that they won't really be called for unless a prior branch instruction has thepredicted outcome. Indeed, we can say that all instructions not yet in the hot seat arebeing executed speculatively, because an external interrupt might occur at any timeand change the entire course of computation. Organizing the pipeline as a reorderbu�er allows us to look ahead and keep busy computing values that have a goodchance of being needed later, instead of waiting for slow instructions or slow memoryreferences to be completed.The reorder bu�er is in fact a queue of control records, conceptually forming part

of a circle of such records inside the simulator, corresponding to all instructions thathave been dispatched or issued but not yet committed, in strict program order.The best way to get an understanding of speculative execution is perhaps to imagine

that the reorder bu�er is large enough to hold hundreds of instructions in variousstages of execution, and to think of an implementation of MMIX that has dozens offunctional units|more than would ever actually be built into a chip. Then onecan readily visualize the kinds of control structures and checks that must be made toensure correct execution. Without such a broad viewpoint, a programmer or hardwaredesigner will be inclined to think only of the simple cases and to devise algorithmsthat lack the proper generality. Thus we have a somewhat paradoxical situation inwhich a diÆcult general problem turns out to be easier to solve than its simpler specialcases, because it enforces clarity of thinking.

Instructions that have completed execution and have not yet been committed areanalogous to cars that have gone through our hypothetical repair shop and are waitingfor their owners to pick them up. However, all analogies break down, and the worldof automobiles does not have a natural counterpart for the notion of speculativeexecution. That notion corresponds roughly to situations in which people are led tobelieve that their cars need a new piece of equipment, but they suddenly change theirmind once they see the price tag, and they insist on having the equipment removedeven after it has been partially or completely installed.

Page 186: MMIXware - A RISC Computer for the Third Millennium - Knuth

179 MMIX-PIPE: DYNAMIC SPECULATION

Speculatively executed instructions might make no sense: They might divide byzero or refer to protected memory areas, etc. Such anomalies are not consideredcatastrophic or even exceptional until the instruction reaches the hot seat.The person who designs a computer with speculative execution is an optimist, who

has faith that the vast majority of the machine's predictions will come true. Theperson who designs a reliable implementation of such a computer is a pessimist, whounderstands that all predictions might come to naught. The pessimist does, however,take pains to optimize the cases that do turn out well.

Page 187: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: DYNAMIC SPECULATION 180

59. Let's consider what happens to a single instruction, say ADD $1,$2,$3, as ittravels through the pipeline in a normal situation. The �rst time this instruction isencountered, it is placed into the I-cache (that is, the instruction cache), so that wewon't have to access memory when we need to perform it again. We will assume forsimplicity in this discussion that each I-cache access takes one clock cycle, althoughother possibilities are allowed by MMIX con�g .

Suppose the simulated machine fetches the example ADD instruction at time 1000.Fetching is done by a coroutine whose stage number is 0. A cache block typicallycontains 8 or 16 instructions. The fetch unit of our machine is able to fetch up tofetch max instructions on each clock cycle and place them in the fetch bu�er, providedthat there is room in the bu�er and that all the instructions belong to the same cacheblock.

The dispatch unit of our simulator is able to issue up to dispatch max instructionson each clock cycle and move them from the fetch bu�er to the reorder bu�er, providedthat functional units are available for those instructions and there is room in thereorder bu�er. A functional unit that handles ADD is usually called an ALU (arithmeticlogic unit), and our simulated machine might have several of them. If they aren't allstalled in stage 1 of their pipelines, and if the reorder bu�er isn't full, and if themachine isn't in the process of deissuing instructions that were mispredicted, and iffewer than dispatch max instructions are ahead of the ADD in the fetch bu�er, and ifall such prior instructions can be issued without using up all the free ALUs, our ADDinstruction will be issued at time 1001. (In fact, all of these conditions are usuallytrue.)

We assume that L > 3, so that $1, $2, and $3 are local registers. For simplicity we'llassume in fact that the register stack is empty, so that the ADD instruction is supposedto set l[1] l[2] + l[3]. The operands l[2] and l[3] might not be known at time 1001;they are spec values, which might point to specnode entries in the reorder bu�er forprevious instructions whose destinations are l[2] and l[3]. The dispatcher �lls the nextavailable control block of the reorder bu�er with information for the ADD, containingappropriate spec values corresponding to l[2] and l[3] in its y and z �elds. The x �eldof this control block will be inserted into a doubly linked list of specnode records,corresponding to l[1] and to all instructions in the reorder bu�er that have l[1] asa destination. The boolean value x:known will be set to false , meaning that thisspeculative value still needs to be computed. Subsequent instructions that need l[1]as a source will point to x, if they are issued before the sum x:o has been computed.Double linking is used in the specnode list because the ADD instruction might becancelled before it is �nally committed; thus deletions might occur at either end ofthe list for l[1].

At time 1002, the ALU handling the ADD will stall if its inputs y and z are notboth known (namely if y:p 6= � or z:p 6= �). In fact, it will also stall if its thirdinput rA is not known; the current speculative value of rA, except for its event bits,is represented in the ra �eld of the control block, and we must have ra :p � �. Insuch a case the ALU will look to see if the spec values pointed to by y:p and/orz:p and/or ra :p become de�ned on this clock cycle, and it will update its own inputvalues accordingly.

Page 188: MMIXware - A RISC Computer for the Third Millennium - Knuth

181 MMIX-PIPE: DYNAMIC SPECULATION

But let's assume that y, z, and ra are already known at time 1002. Then x:o willbe set to y:o+ z:o and x:known will become true . This will make the result destinedfor l[1] available to be used in other commands at time 1003.If no over ow occurs when adding y:o to z:o, the interrupt and arith exc �elds of

the control block for ADD are set to zero. But when over ow does occur (shudder),there are two cases, based on the V-enable bit of rA, which is found in �eld b:o of thecontrol block. If this bit is 0, the V-bit of the arith exc �eld in the control block is setto 1; the arith exc �eld will be ored into rA when the ADD instruction is eventuallycommitted. But if the V-enable bit is 1, the trip handler should be called, interruptingthe normal sequence. In such a case, the interrupt �eld of the control block is set tospecify a trip, and the fetcher and dispatcher are told to forget what they have beendoing; all instructions following the ADD in the reorder bu�er must now be deissued.The virtual starting address of the over ow trip handler, namely location 32, is hastilypassed to the fetch routine, and instructions will be fetched from that location as soonas possible. (Of course the over ow and the trip handler are still speculative untilthe ADD instruction is committed. Other exceptional conditions might cause the ADDitself to be terminated before it gets to the hot seat. But the pipeline keeps chargingahead, always trying to guess the most probable outcome.)The commission unit of this simulator is able to commit and/or deissue up to

commit max instructions on each clock cycle. With luck, fewer than commit max

instructions will be ahead of our ADD instruction at time 1003, and they will all becompleted normally. Then l[1] can be set to x:o, and the event bits of rA can beupdated from arith exc , and the ADD command can pass through the hot seat andout of the reorder bu�er.

hExternal variables 4 i +�Extern int fetch max ; dispatch max ; peekahead ; commit max ;

=� limits on instructions that can be handled per clock cycle �=

arith exc : unsigned int, x44.b: spec, x44.Extern=macro, x4.false =0, x11.interrupt : unsigned int, x44.known : bool, x40.

MMIX con�g : void ( ),MMIX-CONFIG x38.

o: octa, x40.p: specnode �, x40.ra : spec, x44.

stage : int, x23.true =1, x11.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 189: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: DYNAMIC SPECULATION 182

60. The instruction currently occupying the hot seat is the only issued-but-not-yet-committed instruction that is guaranteed to be truly essential to the machine'scomputation. All other instructions in the reorder bu�er are being executed onspeculation; if they prove to be needed, well and good, but we might want to jettisonthem all if, say, an external interrupt occurs.Thus all instructions that change the global state in complicated ways|like LDVTS,

which changes the virtual address translation caches|are performed only when theyreach the hot seat. Fortunately the vast majority of instructions are suÆciently simplethat we can deal with them more eÆciently while other computations are taking place.In this implementation the reorder bu�er is simply housed in an array of control

records. The �rst array element is reorder bot , and the last is reorder top . Variablehot points to the control block in the hot seat, and hot � 1 to its predecessor, etc.Variable cool points to the next control block that will be �lled in the reorder bu�er.If hot � cool the reorder bu�er is empty; otherwise it contains the control recordshot , hot � 1, : : : , cool + 1, except of course that we wrap around from reorder bot toreorder top when moving down in the bu�er.

hExternal variables 4 i +�Extern control �reorder bot ; �reorder top ;

=� least and greatest entries in the ring containing the reorder bu�er �=Extern control �hot ; �cool ; =� front and rear of the reorder bu�er �=Extern control �old hot ; =� value of hot at beginning of cycle �=Extern int deissues ; =� the number of instructions that need to be deissued �=

61. h Initialize everything 22 i +�hot = cool = reorder top ; deissues = 0;

62. h Internal prototypes 13 i +�static void print reorder bu�er ARGS((void));

63. h Subroutines 14 i +�static void print reorder bu�er ( )fprintf ("Reorder buffer");if (hot � cool ) printf (" (empty)\n");else f register control �p;if (deissues ) printf (" (%d to be deissued)"; deissues );if (doing interrupt ) printf (" (interrupt state %d)"; doing interrupt );printf (":\n");for (p = hot ; p 6= cool ; p = (p � reorder bot ? reorder top : p� 1)) fprint control block (p);if (p~owner ) fprintf (" "); print coroutine id (p~owner );

gprintf ("\n");

ggprintf (" %d available rename register%s, %d memory slot%s\n"; rename regs ;

rename regs 6= 1 ? "s" : "";mem slots ;mem slots 6= 1 ? "s" : "");g

Page 190: MMIXware - A RISC Computer for the Third Millennium - Knuth

183 MMIX-PIPE: DYNAMIC SPECULATION

64. Here is an overview of what happens on each clock cycle.

hPerform one machine cycle 64 i �fhCheck for external interrupt 314 i;dispatch count = 0;old hot = hot ; =� remember the hot seat position at beginning of cycle �=old tail = tail ; =� remember the fetch bu�er contents at beginning of cycle �=suppress dispatch = (deissues _ dispatch lock );if (doing interrupt ) hPerform one cycle of the interrupt preparations 318 ielse hCommit and/or deissue up to commit max instructions 67 i;hExecute all coroutines scheduled for the current time 125 i;if (:suppress dispatch ) hDispatch one cycle's worth of instructions 74 i;ticks = incr (ticks ; 1); =� and the beat moves on �=dispatch stat [dispatch count ]++;

gThis code is used in section 10.

65. hGlobal variables 20 i +�int dispatch count ; =� how many dispatched on this cycle �=bool suppress dispatch ; =� should dispatching be bypassed? �=int doing interrupt ; =� how many cycles of interrupt preparations remain �=lockvar dispatch lock ; =� lock to prevent instruction issues �=

66. hExternal variables 4 i +�Extern int �dispatch stat ; =� how often did we dispatch 0, 1, ... instructions? �=Extern bool security disabled ; =� omit security checks for testing purposes? �=

ARGS=macro, x6.bool= enum, x11.commit max : int, x59.control= struct, x44.Extern=macro, x4.incr : octa ( ), MMIX-ARITH x6.

lockvar= coroutine �, x37.mem slots : int, x86.old tail : fetch �, x70.owner : coroutine �, x44.print control block : staticvoid ( ), x46.

print coroutine id : staticvoid ( ), x25.

printf : int ( ), <stdio.h>.rename regs : int, x86.tail : fetch �, x69.ticks =macro, x87.

Page 191: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: DYNAMIC SPECULATION 184

67. hCommit and/or deissue up to commit max instructions 67 i �ffor (m = commit max ; m > 0 ^ deissues > 0; m��)hDeissue the coolest instruction 145 i;

for ( ; m > 0; m��) fif (hot � cool ) break; =� reorder bu�er is empty �=if (:security disabled ) hCheck for security violation, break if so 149 i;if (hot~owner ) break; =� hot seat instruction isn't �nished �=hCommit the hottest instruction, or break if it's not ready 146 i;i = hot~ i;if (hot � reorder bot ) hot = reorder top ;else hot��;if (i � resum ) break; =� allow the resumed instruction to see the new rK �=

gg

This code is used in section 64.

Page 192: MMIXware - A RISC Computer for the Third Millennium - Knuth

185 MMIX-PIPE: THE DISPATCH STAGE

68. The dispatch stage. It would be nice to present the parts of this simulatorby dealing with the fetching, dispatching, executing, and committing stages in thatorder. After all, instructions are �rst fetched, then dispatched, then executed, and�nally committed. However, the fetch stage depends heavily on diÆcult questionsof memory management that are best deferred until we have looked at the simplerparts of simulation. Therefore we will take our initial plunge into the details ofthis program by looking �rst at the dispatch phase, assuming that instructions havesomehow appeared magically in the fetch bu�er.The fetch bu�er, like the circular priority queue of all coroutines and the circular

queue used for the reorder bu�er, lives in an array that is best regarded as a ring ofelements. The elements are structures of type fetch, which have �ve �elds: A 32-bitinst , which is an MMIX instruction; a 64-bit loc , which is the virtual address of thatinstruction; an interrupt �eld, which is nonzero if, for example, the protection bitsin the relevant page table entry for this address do not permit execution access; aboolean noted �eld, which becomes true after the dispatch unit has peeked at theinstruction to see whether it is a jump or probable branch; and a hist �eld, whichrecords the recent branch history. (The least signi�cant bits of hist correspond to themost recent branches.)

hType de�nitions 11 i +�typedef struct focta loc ; =� virtual address of instruction �=tetra inst ; =� the instruction itself �=unsigned int interrupt ; =� bit codes that might cause interruption �=bool noted ; =� have we peeked at this instruction? �=unsigned int hist ; =� if we peeked, this was the peek hist �=

g fetch;69. The oldest and youngest entries in the fetch bu�er are pointed to by head andtail , just as the oldest and youngest entries in the reorder bu�er are called hot andcool . The fetch coroutine will be adding entries at the tail position, which starts atold tail when a cycle begins, in parallel with the actions simulated by the dispatcher.Therefore the dispatcher is allowed to look only at instructions in head , head � 1,: : : , old tail + 1, although a few more recently fetched instructions will usually bepresent in the fetch bu�er by the time this part of the program is executed.

hExternal variables 4 i +�Extern fetch �fetch bot ; �fetch top ;

=� least and greatest entries in the ring containing the fetch bu�er �=Extern fetch �head ; �tail ; =� front and rear of the fetch bu�er �=

bool= enum, x11.commit max : int, x59.cool : control �, x60.deissues : int, x60.Extern=macro, x4.hot : control �, x60.i: internal opcode, x44.

i: register int, x12.m: register int, x12.octa= struct, x17.old tail : fetch �, x70.owner : coroutine �, x44.peek hist : unsigned int, x99.

reorder bot : control �, x60.reorder top : control �, x60.resum =89, x49.security disabled : bool, x66.tetra=unsigned int, x17.true =1, x11.

Page 193: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 186

70. hGlobal variables 20 i +�fetch �old tail ; =� rear of the fetch bu�er available on the current cycle �=

71. #de�ne UNKNOWN_SPEC ((specnode �) 1)h Initialize everything 22 i +�head = tail = fetch top ;inst ptr :p = UNKNOWN_SPEC;

72. h Internal prototypes 13 i +�static void print fetch bu�er ARGS((void));

73. h Subroutines 14 i +�static void print fetch bu�er ( )fprintf ("Fetch buffer");if (head � tail ) printf (" (empty)\n");else f register fetch �p;if (resuming ) printf (" (resumption state %d)"; resuming );printf (":\n");for (p = head ; p 6= tail ; p = (p � fetch bot ? fetch top : p� 1)) fprint octa (p~ loc);printf (": %08x(%s)"; p~ inst ; opcode name [p~ inst � 24]);if (p~ interrupt ) print bits (p~ interrupt );if (p~noted ) printf ("*");printf ("\n");

ggprintf ("Instruction pointer is ");if (inst ptr :p � �) print octa (inst ptr :o);else fprintf ("waiting for ");if (inst ptr :p � UNKNOWN_SPEC) printf ("dispatch");else if (inst ptr :p~addr :h � �1)print coroutine id (((control �) inst ptr :p~up )~owner );

else print specnode id (inst ptr :p~addr );gprintf ("\n");

g74. The best way to understand the dispatching process is once again to \thinkbig," by imagining a huge fetch bu�er and the potential ability to issue dozens ofinstructions per cycle, although the actual numbers are typically quite small.If the fetch bu�er is not empty after dispatch max instructions have been dis-

patched, the dispatcher also looks at up to peekahead further instructions to see ifthey are jumps or other commands that change the ow of control. Much of this actionwould happen in parallel on a real machine, but our simulator works sequentially.In the following program, true head records the head of the fetch bu�er as in-

structions are actually dispatched, while head refers to the position currently beingexamined (possibly peeking into the future).

Page 194: MMIXware - A RISC Computer for the Third Millennium - Knuth

187 MMIX-PIPE: THE DISPATCH STAGE

If the fetch bu�er is empty at the beginning of the current clock cycle, a \dispatchbypass" allows the dispatcher to issue the �rst instruction that enters the fetch bu�eron this cycle. Otherwise the dispatcher is restricted to previously fetched instructions.

hDispatch one cycle's worth of instructions 74 i �f register fetch �true head ; �new head ;

true head = head ;if (head � old tail ^ head 6= tail ) old tail = (head � fetch bot ? fetch top : head � 1);peek hist = cool hist ;for (j = 0; j < dispatch max + peekahead ; j++)hLook at the head instruction, and try to dispatch it if j < dispatch max 75 i;

head = true head ;g

This code is used in section 64.

addr : octa, x40.ARGS=macro, x6.control= struct, x44.cool hist : unsigned int, x99.dispatch max : int, x59.fetch= struct, x68.fetch bot : fetch �, x69.fetch top : fetch �, x69.h: tetra, x17.head : fetch �, x69.inst : tetra, x68.

inst ptr : spec, x284.interrupt : unsigned int, x68.j: register int, x12.loc : octa, x68.noted : bool, x68.o: octa, x40.opcode name : char �[ ], x48.owner : coroutine �, x44.p: specnode �, x40.peek hist : unsigned int, x99.peekahead : int, x59.

print bits : static void ( ), x56.print coroutine id : staticvoid ( ), x25.

print octa : static void ( ), x19.print specnode id : static void

( ), x91.printf : int ( ), <stdio.h>.resuming : int, x78.specnode= struct, x40.tail : fetch �, x69.up : specnode �, x40.

Page 195: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 188

75. hLook at the head instruction, and try to dispatch it if j < dispatch max 75 i �fregister mmix opcode op ;register int yz ; f ;register bool freeze dispatch = false ;register func �u = �;

if (head � old tail ) break; =� fetch bu�er empty �=if (head � fetch bot ) new head = fetch top ; else new head = head � 1;op = head~ inst � 24; yz = head~ inst &

#ffff;

hDetermine the ags, f , and the internal opcode, i 80 i;h Install default �elds in the cool block 100 i;if (f & rel addr bit ) hConvert relative address to absolute address 84 i;if (head~noted ) peek hist = head~hist ;else hRedirect the fetch if control changes at this inst 85 i;if (j � dispatch max _ dispatch lock _ nullifying ) fhead = new head ; continue; =� can't dispatch, but can peek ahead �=

gif (cool � reorder bot ) new cool = reorder top ; else new cool = cool � 1;hDispatch an instruction to the cool block if possible, otherwise goto stall 101 i;hAssign a functional unit if available, otherwise goto stall 82 i;hCheck for suÆcient rename registers and memory slots, or goto stall 111 i;if ((op & #

e0) � #40) hRecord the result of branch prediction 152 i;

h Issue the cool instruction 81 i;cool = new cool ; cool O = new O ; cool S = new S ;cool hist = peek hist ; continue;

stall : hUndo data structures set prematurely in the cool block and break 123 i;g

This code is used in section 74.

76. An instruction can be dispatched only if a functional unit is available to handleit. A functional unit consists of a 256-bit vector that speci�es a subset of MMIX'sopcodes, and an array of coroutines for the pipeline stages. There are k coroutines inthe array, where k is the maximum number of stages needed by any of the opcodessupported.

hType de�nitions 11 i +�typedef struct func struct fchar name [16]; =� symbolic designation �=tetra ops [8]; =� big-endian bitmap for the opcodes supported �=int k; =� number of pipeline stages �=coroutine �co ; =� pointer to the �rst of k consecutive coroutines �=

g func;77. hExternal variables 4 i +�Extern func �funit ; =� pointer to array of functional units �=Extern int funit count ; =� the number of functional units �=

Page 196: MMIXware - A RISC Computer for the Third Millennium - Knuth

189 MMIX-PIPE: THE DISPATCH STAGE

78. It is convenient to have a 256-bit vector of all the supported opcodes, becausewe need to shut o� a lot of special actions when an opcode is not supported.

hGlobal variables 20 i +�control �new cool ; =� the reorder position following cool �=int resuming ; =� set nonzero if resuming an interrupted instruction �=tetra support [8]; =� big-endian bitmap for all opcodes supported �=

79. h Initialize everything 22 i +�f register func �u;for (u = funit ; u � funit + funit count ; u++)for (i = 0; i < 8; i++) support [i] j= u~ops [i];

g80. #de�ne sign bit ((unsigned) #80000000)

hDetermine the ags, f , and the internal opcode, i 80 i �if (:(support [op � 5] & (sign bit � (op & 31)))) f

=� oops, this opcode isn't supported by any function unit �=f = ags [TRAP]; i = trap ;

g else f = ags [op ]; i = internal op [op ];if (i � trip ^ (head~ loc :h& sign bit )) f = 0; i = noop ;

This code is used in section 75.

bool= enum, x11.control= struct, x44.cool : control �, x60.cool hist : unsigned int, x99.cool O : octa, x98.cool S : octa, x98.coroutine= struct, x23.dispatch lock : lockvar, x65.dispatch max : int, x59.Extern=macro, x4.false =0, x11.fetch bot : fetch �, x69.fetch top : fetch �, x69. ags : unsigned char [ ], x83.

h: tetra, x17.head : fetch �, x69.hist : unsigned int, x68.i: register int, x12.i: register int, x10.inst : tetra, x68.internal op : internal opcode

[ ], x51.j: register int, x12.loc : octa, x68.mmix opcode= enum, x47.new head : register fetch �,x74.

new O : octa, x99.

new S : octa, x99.noop =81, x49.noted : bool, x68.nullifying : bool, x315.old tail : fetch �, x70.peek hist : unsigned int, x99.rel addr bit =#

40, x83.reorder bot : control �, x60.reorder top : control �, x60.tetra=unsigned int, x17.trap =82, x49.TRAP=#

00, x47.trip =83, x49.

Page 197: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 190

81. h Issue the cool instruction 81 i �if (cool~ interim ) fcool~usage = false ;if (cool~op � SAVE) hGet ready for the next step of SAVE 341 ielse if (cool~op � UNSAVE) hGet ready for the next step of UNSAVE 335 ielse if (cool~ i � preld _ cool~ i � prest )hGet ready for the next step of PRELD or PREST 228 i

else if (cool~ i � prego) hGet ready for the next step of PREGO 229 igelse if (cool~ i � max real command ) fif (( ags [cool~op ] & ctl change bit ) _ cool~ i � pbr )if (inst ptr :p � �^ (inst ptr :o:h& sign bit )^:(cool~ loc :h& sign bit )^ cool~ i 6= trap )cool~ interrupt j= P_BIT; =� jumping from nonnegative to negative �=

true head = head = new head ; =� delete instruction from fetch bu�er �=resuming = 0;

gif (freeze dispatch ) set lock (u~co ; dispatch lock );cool~owner = u~co ; u~co~ctl = cool ;startup (u~co ; 1); =� schedule execution of the new inst �=if (verbose & issue bit ) fprintf ("Issuing "); print control block (cool );printf (" "); print coroutine id (u~co); printf ("\n");

gdispatch count++;

This code is used in section 75.

82. We assign the �rst functional unit that supports op and is totally unoccupied,if possible; otherwise we assign the �rst functional unit that supports op and hasstage 1 unoccupied.

hAssign a functional unit if available, otherwise goto stall 82 i �f register int t = op � 5; b = sign bit � (op & 31);

if (cool~ i � trap ^ op 6= TRAP) f =� opcode needs to be emulated �=u = funit + funit count ; =� this unit supports just TRIP and TRAP �=goto unit found ;

gfor (u = funit ; u � funit + funit count ; u++)if (u~ops [t] & b) ffor (i = 0; i < u~k; i++)if (u~co [i]:next ) goto unit busy ;

goto unit found ;unit busy : ;g

for (u = funit ; u < funit + funit count ; u++)if ((u~ops [t] & b) ^ (u~co~next � �)) goto unit found ;

goto stall ; =� all units for this op are busy �=gunit found :

This code is used in section 75.

Page 198: MMIXware - A RISC Computer for the Third Millennium - Knuth

191 MMIX-PIPE: THE DISPATCH STAGE

co : coroutine �, x76.cool : control �, x60.ctl : control �, x23.ctl change bit =#

80, x83.dispatch count : int, x65.dispatch lock : lockvar, x65.false =0, x11. ags : unsigned char [ ], x83.freeze dispatch : registerbool, x75.

funit : func �, x77.funit count : int, x77.h: tetra, x17.head : fetch �, x69.i: internal opcode, x44.i: register int, x12.inst ptr : spec, x284.interim : bool, x44.interrupt : unsigned int, x44.

issue bit =1� 0, x8.k: int, x76.loc : octa, x44.max real command = trip , x49.new head : register fetch �,x74.

next : coroutine �, x23.o: octa, x40.op : mmix opcode, x44.op : register mmix opcode,x75.

ops : tetra [ ], x76.owner : coroutine �, x44.p: specnode �, x40.P_BIT=1� 0, x54.pbr =70, x49.prego =73, x49.preld =61, x49.prest =62, x49.

print control block : staticvoid ( ), x46.

print coroutine id : staticvoid ( ), x25.

printf : int ( ), <stdio.h>.resuming : int, x78.SAVE=#

fa, x47.set lock =macro ( ), x37.sign bit =macro, x80.stall : label, x75.startup : static void ( ), x31.trap =82, x49.TRAP=#

00, x47.true head : register fetch �,x74.

u: register func �, x75.UNSAVE=#

fb, x47.usage : bool, x44.verbose : int, x4.

Page 199: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 192

83. The ags table records special properties of each operation code in binarynotation: #1 means Z is an immediate value, #2 means rZ is a source operand,#4 means Y is an immediate value, #8 means rY is a source operand, #10 means rXis a source operand, #20 means rX is a destination, #40 means YZ is part of a relativeaddress, #80 means the control changes at this point.

#de�ne X is dest bit #20

#de�ne rel addr bit #40

#de�ne ctl change bit #80

hGlobal variables 20 i +�unsigned char ags [256] = f#8a;#2a;#2a;#2a;#2a;#26;#2a;#26; =� TRAP, : : : �=#26;#25;#26;#25;#26;#25;#26;#25; =� FLOT, : : : �=

#2a;#2a;#2a;#2a;#2a;#26;#2a;#26; =� FMUL, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� MUL, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� ADD, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� 2ADDU, : : : �=

#2a;#29;#2a;#29;#26;#25;#26;#25; =� CMP, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� SL, : : : �=

#50;#50;#50;#50;#50;#50;#50;#50; =� BN, : : : �=

#50;#50;#50;#50;#50;#50;#50;#50; =� BNN, : : : �=

#50;#50;#50;#50;#50;#50;#50;#50; =� PBN, : : : �=

#50;#50;#50;#50;#50;#50;#50;#50; =� PBNN, : : : �=

#3a;#39;#3a;#39;#3a;#39;#3a;#39; =� CSN, : : : �=

#3a;#39;#3a;#39;#3a;#39;#3a;#39; =� CSNN, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� ZSN, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� ZSNN, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� LDB, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� LDT, : : : �=

#2a;#29;#2a;#29;#1a;#19;#2a;#29; =� LDSF, : : : �=

#2a;#29;#0a;#09;#0a;#09;#aa;#a9; =� LDVTS, : : : �=

#1a;#19;#1a;#19;#1a;#19;#1a;#19; =� STB, : : : �=

#1a;#19;#1a;#19;#1a;#19;#1a;#19; =� STT, : : : �=

#1a;#19;#1a;#19;#0a;#09;#1a;#19; =� STSF, : : : �=

#0a;#09;#0a;#09;#0a;#09;#8a;#89; =� SYNCD, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� OR, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� AND, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� BDIF, : : : �=

#2a;#29;#2a;#29;#2a;#29;#2a;#29; =� MUX, : : : �=

#20;#20;#20;#20;#30;#30;#30;#30; =� SETH, : : : �=

#30;#30;#30;#30;#30;#30;#30;#30; =� ORH, : : : �=

#c0;#c0;#c0;#c0;#60;#60;#02;#01; =� JMP, : : : �=

#80;#80;#00;#02;#01;#00;#20;#8ag; =� POP, : : : �=

Page 200: MMIXware - A RISC Computer for the Third Millennium - Knuth

193 MMIX-PIPE: THE DISPATCH STAGE

84. hConvert relative address to absolute address 84 i �fif (i � jmp) yz = head~ inst &

#ffffff;

if (op & 1) yz �= (i � jmp ? #1000000 : #10000);

cool~y:o = incr (head~ loc ; 4); cool~y:p = �;cool~z:o = incr (head~ loc ; yz � 2); cool~z:p = �;

gThis code is used in section 75.

85. The location of the next instruction to be fetched is in a spec variable calledinst ptr . A slightly tricky optimization of the POP instruction is made in the commoncase that the speculative value of rJ is known.

hRedirect the fetch if control changes at this inst 85 i �f register int predicted = 0;

if ((op & #e0) � #

40) hPredict a branch outcome 151 i;head~noted = true ;head~hist = peek hist ;if (predicted _ (f & ctl change bit ) _ (i � syncid ^ :(cool~ loc :h& sign bit ))) fold tail = tail = new head ; =� discard all remaining fetches �=hRestart the fetch coroutine 287 i;switch (i) fcase jmp : case br : case pbr : case pushj : inst ptr = cool~z; break;case pop : if (g[rJ ]:up~known ^ j < dispatch max ^ :dispatch lock ^ :nullifying ) f

inst ptr :o = incr (g[rJ ]:up~o; yz � 2); inst ptr :p = �; break;g =� otherwise fall through, will wait on cool~go �=

case go : case pushgo : case trap : case resume : case syncid :inst ptr :p = UNKNOWN_SPEC; break;

case trip : inst ptr = zero spec ; break;g

gg

This code is used in section 75.

br =69, x49.cool : control �, x60.dispatch lock : lockvar, x65.dispatch max : int, x59.f : register int, x75.g: specnode [ ], x86.go =72, x49.go : specnode, x44.h: tetra, x17.head : fetch �, x69.hist : unsigned int, x68.i: register int, x12.incr : octa ( ), MMIX-ARITH x6.inst : tetra, x68.inst ptr : spec, x284.j: register int, x12.

jmp =80, x49.known : bool, x40.loc : octa, x68.loc : octa, x44.new head : register fetch �,x74.

noted : bool, x68.nullifying : bool, x315.o: octa, x40.old tail : fetch �, x70.op : register mmix opcode,x75.

p: specnode �, x40.pbr =70, x49.peek hist : unsigned int, x99.pop =75, x49.

pushgo =74, x49.pushj =71, x49.resume =76, x49.rJ =4, x52.sign bit =macro, x80.syncid =65, x49.tail : fetch �, x69.trap =82, x49.trip =83, x49.true =1, x11.UNKNOWN_SPEC=macro, x71.up : specnode �, x40.y: spec, x44.yz : register int, x75.z: spec, x44.zero spec : spec, x41.

Page 201: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 194

86. At any given time the simulated machine is in two main states, the \hot state"corresponding to instructions that have been committed and the \cool state" corre-sponding to all the speculative changes currently being considered. The dispatcherworks with cool instructions and puts them into the reorder bu�er, where they grad-ually get warmer and warmer. Intermediate instructions, between hot and cool , haveintermediate temperatures.A machine register like l[101] or g[250] is represented by a specnode whose o �eld is

the current hot value of the register. If the up and down �elds of this specnode pointto the node itself, the hot and cool values of the register are identical. Otherwiseup and down are pointers to the coolest and hottest ends of a doubly linked list ofspecnodes, representing intermediate speculative values (sometimes called \renameregisters"). The rename registers are implemented as the x or a specnodes insidecontrol blocks, for speculative instructions that use this register as a destination.Speculative instructions that use the register as a source operand point to the next-hottest specnode on the list, until the value becomes known. The doubly linked listof specnodes is an input-restricted deque: A node is inserted at the cool end when thedispatcher issues an instruction with this register as destination; a node is removedfrom the cool end if an instruction needs to be deissued; a node is removed from thehot end when an instruction is committed.The special registers rA, rB, : : : occupy the same array as the global registers g[32],

g[33], : : : . For example, rB is internally the same as g[0], because rB = 0.

hExternal variables 4 i +�Extern specnode g[256]; =� global registers and special registers �=Extern specnode �l; =� the ring of local registers �=Extern int lring size ;

=� the number of on-chip local registers (must be a power of 2) �=Extern int max rename regs ; max mem slots ; =� capacity of reorder bu�er �=Extern int rename regs ; mem slots ; =� currently unused capacity �=

87. hHeader de�nitions 6 i +�#de�ne ticks g[rC ]:o =� the internal clock �=88. hGlobal variables 20 i +�int lring mask ; =� for calculations modulo lring size �=

89. The addr �elds in the specnode lists for registers are used to identify thatregister in diagnostic messages. Such addresses are negative; memory addresses arepositive.All registers are initially zero except rG, which is initially 255, and rN, which has

a constant value identifying the time of compilation. (The macro ABSTIME is de�nedexternally in the �le abstime.h, which should have just been created by ABSTIME;ABSTIME is a trivial program that computes the value of the standard library functiontime (�). We assume that this number, which is the number of seconds in the \UNIXepoch," is less than 232. Beware: Our assumption will fail in February of 2106.)

#de�ne VERSION 1 =� version of the MMIX architecture that we support �=#de�ne SUBVERSION 0 =� secondary byte of version number �=#de�ne SUBSUBVERSION 0 =� further quali�cation to version number �=

Page 202: MMIXware - A RISC Computer for the Third Millennium - Knuth

195 MMIX-PIPE: THE DISPATCH STAGE

h Initialize everything 22 i +�rename regs = max rename regs ;mem slots = max mem slots ;lring mask = lring size � 1;for (j = 0; j < 256; j++) f

g[j]:addr :h = sign bit ; g[j]:addr :l = j; g[j]:known = true ;g[j]:up = g[j]:down = &g[j];

gg[rG ]:o:l = 255;g[rN ]:o:h = (VERSION � 24) + (SUBVERSION � 16) + (SUBSUBVERSION � 8);g[rN ]:o:l = ABSTIME; =� see comment and warning above �=for (j = 0; j < lring size ; j++) f

l[j]:addr :h = sign bit ; l[j]:addr :l = 256 + j; l[j]:known = true ;l[j]:up = l[j]:down = &l[j];

g90. h Internal prototypes 13 i +�static void print specnode id ARGS((octa));

91. h Subroutines 14 i +�static void print specnode id (a)

octa a;fif (a:h � sign bit ) fif (a:l < 32) printf (special name [a:l]);else if (a:l < 256) printf ("g[%d]"; a:l);else printf ("l[%d]"; a:l � 256);

g else if (a:h 6= �1) fprintf ("m["); print octa (a); printf ("]");

gg

a: specnode, x44.ABSTIME=macro, abstime.h.addr : octa, x40.ARGS=macro, x6.cool : control �, x60.down : specnode �, x40.Extern=macro, x4.h: tetra, x17.hot : control �, x60.

j: register int, x10.known : bool, x40.l: tetra, x17.o: octa, x40.octa= struct, x17.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.rB =0, x52.rC =8, x52.

rG =19, x52.rN =9, x52.sign bit =macro, x80.special name : char �[ ], x53.specnode= struct, x40.time : time t ( ), <time.h>.true =1, x11.up : specnode �, x40.x: specnode, x44.

Page 203: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 196

92. The specval subroutine produces a spec corresponding to the currently coolestvalue of a given local or global register.

h Internal prototypes 13 i +�static spec specval ARGS((specnode �));

93. h Subroutines 14 i +�static spec specval (r)

specnode �r;f spec res ;

if (r~up~known ) res :o = r~up~o; res :p = �;else res :p = r~up ;return res ;

g94. The spec install subroutine introduces a new speculative value at the cool endof a given doubly linked list.

h Internal prototypes 13 i +�static void spec install ARGS((specnode �; specnode �));

95. h Subroutines 14 i +�static void spec install (r; t) =� insert t into list r �=

specnode �r; �t;f

t~up = r~up ;t~up~down = t;r~up = t;t~down = r;t~addr = r~addr ;

g96. Conversely, spec rem takes such a value out.

h Internal prototypes 13 i +�static void spec rem ARGS((specnode �));

97. h Subroutines 14 i +�static void spec rem (t) =� remove t from its list �=

specnode �t;f register specnode �u = t~up ; �d = t~down ;

u~down = d; d~up = u;g

Page 204: MMIXware - A RISC Computer for the Third Millennium - Knuth

197 MMIX-PIPE: THE DISPATCH STAGE

98. Some special registers are so central to MMIX's operation, they are carried alongwith each control block in the reorder bu�er instead of being treated as source anddestination registers of each instruction. For example, the register stack pointersrO and rS are treated in this way. The normal specnodes for rO and rS, namelyg[rO ] and g[rS ], are not actually used; the cool values are called cool O and cool S .(Actually cool O and cool S correspond to the register values divided by 8, since rOand rS are always multiples of 8.)The arithmetic status register, rA, is also treated specially. Its event bits are kept

up to date only at the \hot" end, by accumulating values of arith exc ; an instructionto GET the value of rA will be executed only in the hot seat. The other bits of rA,which are needed to control trip handlers and oating point rounding, are treated inthe normal way.

hExternal variables 4 i +�Extern octa cool O ; cool S ; =� values of rO, rS before the cool instruction �=

99. hGlobal variables 20 i +�int cool L; cool G ; =� values of rL and rG before the cool instruction �=unsigned int cool hist ; peek hist ; =� history bits for branch prediction �=octa new O ; new S ; =� values of rO, rS after cool �=

addr : octa, x40.ARGS=macro, x6.arith exc : unsigned int, x44.cool : control �, x60.down : specnode �, x40.Extern=macro, x4.

g: specnode [ ], x86.known : bool, x40.o: octa, x40.octa= struct, x17.p: specnode �, x40.

rO =10, x52.rS =11, x52.spec= struct, x40.specnode= struct, x40.up : specnode �, x40.

Page 205: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 198

100. h Install default �elds in the cool block 100 i �cool~op = op ; cool~ i = i;cool~xx = (head~ inst � 16) & #

ff; cool~yy = (head~ inst � 8) & #ff;

cool~zz = (head~ inst ) &#ff;

cool~ loc = head~ loc ;cool~y = cool~z = cool~b = cool~ra = zero spec ;cool~x:o = cool~a:o = cool~rl :o = zero octa ;cool~x:known = false ;cool~x:up = �;cool~a:known = false ;cool~a:up = �;cool~rl :known = true ;cool~rl :up = �;cool~need b = cool~need ra = cool~ren x = cool~mem x = cool~ren a = cool~set l = false ;cool~arith exc = cool~denin = cool~denout = 0;if ((head~ loc :h& sign bit ) ^ :(g[rU ]:o:h& #

8000)) cool~usage = false ;else cool~usage = ((op & (g[rU ]:o:h� 16)) � g[rU ]:o:h� 24 ? true : false );new O = cool~cur O = cool O ; new S = cool~cur S = cool S ;cool~ interrupt = head~ interrupt ;cool~hist = peek hist ;cool~go :o = incr (cool~ loc ; 4);cool~go :known = false ; cool~go :addr :h = �1; cool~go :up = (specnode �) cool ;cool~ interim = false ;

This code is used in section 75.

101. hDispatch an instruction to the cool block if possible, otherwise goto stall 101 i �if (new cool � hot ) goto stall ; =� reorder bu�er is full �=hMake sure cool L and cool G are up to date 102 i;h Install the operand �elds of the cool block 103 i;if (f & X is dest bit ) h Install register X as the destination, or insert an internal

command and goto dispatch done if X is marginal 110 i;switch (i) fh Special cases of instruction dispatch 117 i

default: break;gdispatch done :

This code is used in section 75.

102. The UNSAVE operation begins by loading register rG from memory. We don'treally need to know the value of rG until twelve other registers have been unsaved, sowe aren't fussy about it here.

hMake sure cool L and cool G are up to date 102 i �if (:g[rL]:up~known ) goto stall ;cool L = g[rL]:up~o:l;if (:g[rG ]:up~known ^ :(op � UNSAVE ^ cool~xx � 1)) goto stall ;cool G = g[rG ]:up~o:l;

This code is used in section 101.

Page 206: MMIXware - A RISC Computer for the Third Millennium - Knuth

199 MMIX-PIPE: THE DISPATCH STAGE

103. h Install the operand �elds of the cool block 103 i �if (resuming ) h Insert special operands when resuming an interrupted operation 324 ielse fif (f & #

10) h Set cool~b from register X 106 iif (third operand [op ] ^ (cool~ i 6= trap ))h Set cool~b and/or cool~ra from special register 108 i;

if (f & #1) cool~z:o:l = cool~zz ;

else if (f & #2) h Set cool~z from register Z 104 i

else if ((op & #f0) � #

e0) hSet cool~z as an immediate wyde 109 i;if (f & #

4) cool~y:o:l = cool~yy ;else if (f & #

8) h Set cool~y from register Y 105 ig

This code is used in section 101.

104. h Set cool~z from register Z 104 i �fif (cool~zz � cool G ) cool~z = specval (&g[cool~zz ]);else if (cool~zz < cool L) cool~z = specval (&l[(cool O :l + cool~zz ) & lring mask ]);

gThis code is used in section 103.

105. h Set cool~y from register Y 105 i �fif (cool~yy � cool G ) cool~y = specval (&g[cool~yy ]);else if (cool~yy < cool L) cool~y = specval (&l[(cool O :l + cool~yy ) & lring mask ]);

gThis code is used in section 103.

a: specnode, x44.addr : octa, x40.arith exc : unsigned int, x44.b: spec, x44.cool : control �, x60.cool G : int, x99.cool L: int, x99.cool O : octa, x98.cool S : octa, x98.cur O : octa, x44.cur S : octa, x44.denin : int, x44.denout : int, x44.f : register int, x75.false =0, x11.g: specnode [ ], x86.go : specnode, x44.h: tetra, x17.head : fetch �, x69.hist : unsigned int, x44.hot : control �, x60.i: internal opcode, x44.i: register int, x12.incr : octa ( ), MMIX-ARITH x6.inst : tetra, x68.

interim : bool, x44.interrupt : unsigned int, x68.interrupt : unsigned int, x44.known : bool, x40.l: tetra, x17.l: specnode �, x86.loc : octa, x44.loc : octa, x68.lring mask : int, x88.mem x : bool, x44.need b : bool, x44.need ra : bool, x44.new cool : control �, x78.new O : octa, x99.new S : octa, x99.o: octa, x40.op : register mmix opcode,x75.

op : mmix opcode, x44.peek hist : unsigned int, x99.ra : spec, x44.ren a : bool, x44.ren x : bool, x44.resuming : int, x78.rG =19, x52.

rl : specnode, x44.rL=20, x52.rU =17, x52.set l : bool, x44.sign bit =macro, x80.specnode= struct, x40.specval : static spec ( ), x93.stall : label, x75.third operand : unsigned char

[ ], x107.trap =82, x49.true =1, x11.UNSAVE=#

fb, x47.up : specnode �, x40.usage : bool, x44.x: specnode, x44.X is dest bit =#

20, x83.xx : unsigned char, x44.y: spec, x44.yy : unsigned char, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

zero spec : spec, x41.zz : unsigned char, x44.

Page 207: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 200

106. h Set cool~b from register X 106 i �fif (cool~xx � cool G ) cool~b = specval (&g[cool~xx ]);else if (cool~xx < cool L) cool~b = specval (&l[(cool O :l + cool~xx ) & lring mask ]);if (f & rel addr bit ) cool~need b = true ; =� br , pbr �=

gThis code is used in section 103.

107. If an operation requires a special register as third operand, that register islisted in the third operand table.

hGlobal variables 20 i +�unsigned char third operand [256] = f0; rA; 0; 0; rA; rA; rA; rA; =� TRAP, : : : �=rA; rA; rA; rA; rA; rA; rA; rA; =� FLOT, : : : �=rA; rE ; rE ; rE ; rA; rA; rA; rA; =� FMUL, : : : �=rA; rA; 0; 0; rA; rA; rD ; rD ; =� MUL, : : : �=rA; rA; 0; 0; rA; rA; 0; 0; =� ADD, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� 2ADDU, : : : �=0; 0; 0; 0; rA; rA; 0; 0; =� CMP, : : : �=rA; rA; 0; 0; 0; 0; 0; 0; =� SL, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� BN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� BNN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� PBN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� PBNN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� CSN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� CSNN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� ZSN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� ZSNN, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� LDB, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� LDT, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� LDSF, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� LDVTS, : : : �=rA; rA; 0; 0; rA; rA; 0; 0; =� STB, : : : �=rA; rA; 0; 0; 0; 0; 0; 0; =� STT, : : : �=rA; rA; 0; 0; 0; 0; 0; 0; =� STSF, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� SYNCD, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� OR, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� AND, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� BDIF, : : : �=rM ; rM ; 0; 0; 0; 0; 0; 0; =� MUX, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� SETH, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� ORH, : : : �=0; 0; 0; 0; 0; 0; 0; 0; =� JMP, : : : �=rJ ; 0; 0; 0; 0; 0; 0; 255g; =� POP, : : : �=

Page 208: MMIXware - A RISC Computer for the Third Millennium - Knuth

201 MMIX-PIPE: THE DISPATCH STAGE

108. The cool~b �eld is busy in operations like STB or STSF, which need rA. So weuse cool~ra instead, when rA is needed.

hSet cool~b and/or cool~ra from special register 108 i �fif (third operand [op ] � rA _ third operand [op ] � rE )cool~need ra = true ; cool~ra = specval (&g[rA]);

if (third operand [op ] 6= rA)cool~need b = true ; cool~b = specval (&g[third operand [op ]]);

gThis code is used in section 103.

109. h Set cool~z as an immediate wyde 109 i �fswitch (op & 3) fcase 0: cool~z:o:h = yz � 16; break;case 1: cool~z:o:h = yz ; break;case 2: cool~z:o:l = yz � 16; break;case 3: cool~z:o:l = yz ; break;gif (i 6= set ) f =� register X should also be the Y operand �=cool~y = cool~b;cool~b = zero spec ;

gg

This code is used in section 103.

b: spec, x44.br =69, x49.cool : control �, x60.cool G : int, x99.cool L: int, x99.cool O : octa, x98.f : register int, x75.g: specnode [ ], x86.h: tetra, x17.i: register int, x12.l: specnode �, x86.l: tetra, x17.

lring mask : int, x88.need b : bool, x44.need ra : bool, x44.o: octa, x40.op : register mmix opcode,x75.

pbr =70, x49.rA=21, x52.ra : spec, x44.rD =1, x52.rE =2, x52.

rel addr bit =#40, x83.

rJ =4, x52.rM =5, x52.set =33, x49.specval : static spec ( ), x93.true =1, x11.xx : unsigned char, x44.y: spec, x44.yz : register int, x75.z: spec, x44.zero spec : spec, x41.

Page 209: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 202

110. h Install register X as the destination, or insert an internal command and goto

dispatch done if X is marginal 110 i �fif (cool~xx � cool G ) cool~ren x = true ; spec install (&g[cool~xx ];&cool~x);else if (cool~xx < cool L)cool~ren x = true ; spec install (&l[(cool O :l + cool~xx ) & lring mask ];&cool~x);

else f =� we need to increase L before issuing head~ inst �=increase L: if (((cool S :l � cool O :l) & lring mask ) � cool L ^ cool L 6= 0)

h Insert an instruction to advance gamma 113 ielse h Insert an instruction to advance beta and L 112 i;

gg

This code is used in section 101.

111. hCheck for suÆcient rename registers and memory slots, or goto stall 111 i �if (rename regs < cool~ren x + cool~ren a ) goto stall ;if (cool~mem x )if (mem slots ) mem slots��; else goto stall ;

rename regs �= cool~ren x + cool~ren a ;

This code is used in section 75.

112. The incrl instruction advances � and rL by 1 at a time when we know that� 6= , in the ring of local registers.

h Insert an instruction to advance beta and L 112 i �fcool~ i = incrl ;spec install (&l[(cool O :l + cool L) & lring mask ];&cool~x);cool~need b = cool~need ra = false ;cool~y = cool~z = zero spec ;cool~x:known = true ; =� cool~x:o = zero octa �=spec install (&g[rL];&cool~rl );cool~rl :o:l = cool L + 1;cool~ren x = cool~set l = true ;op = SETH; =� this instruction to be handled by the simplest units �=cool~ interim = true ;goto dispatch done ;

gThis code is used in section 110.

113. The incgamma instruction advances and rS by storing an octabyte from thelocal register ring to virtual memory location cool S � 3.

h Insert an instruction to advance gamma 113 i �fcool~need b = cool~need ra = false ;cool~ i = incgamma ;new S = incr (cool S ; 1);cool~b = specval (&l[cool S :l & lring mask ]);cool~y:p = �; cool~y:o = shift left (cool S ; 3);cool~z = zero spec ;

Page 210: MMIXware - A RISC Computer for the Third Millennium - Knuth

203 MMIX-PIPE: THE DISPATCH STAGE

cool~mem x = true ; spec install (&mem ;&cool~x);op = STOU; =� this instruction needs to be handled by load/store unit �=cool~ interim = true ;goto dispatch done ;

gThis code is used in sections 110, 119, and 337.

114. The decgamma instruction decreases and rS by loading an octabyte fromvirtual memory location (cool S � 1)� 3 into the local register ring.

h Insert an instruction to decrease gamma 114 i �fcool~ i = decgamma ;new S = incr (cool S ;�1);cool~z = cool~b = zero spec ;cool~need b = false ;cool~y:p = �; cool~y:o = shift left (new S ; 3);cool~ren x = true ; spec install (&l[new S :l & lring mask ];&cool~x);op = LDOU; =� this instruction needs to be handled by load/store unit �=cool~ interim = true ;cool~ptr a = (void �) mem :up ;goto dispatch done ;

gThis code is used in section 120.

b: spec, x44.cool : control �, x60.cool G : int, x99.cool L: int, x99.cool O : octa, x98.cool S : octa, x98.decgamma =85, x49.dispatch done : label, x101.false =0, x11.g: specnode [ ], x86.head : fetch �, x69.i: internal opcode, x44.incgamma =84, x49.incr : octa ( ), MMIX-ARITH x6.incrl =86, x49.inst : tetra, x68.interim : bool, x44.known : bool, x40.l: specnode �, x86.

l: tetra, x17.LDOU=#

8e, x47.lring mask : int, x88.mem : specnode, x115.mem slots : int, x86.mem x : bool, x44.need b : bool, x44.need ra : bool, x44.new S : octa, x99.o: octa, x40.op : register mmix opcode,x75.

p: specnode �, x40.ptr a : void �, x44.ren a : bool, x44.ren x : bool, x44.rename regs : int, x86.rl : specnode, x44.rL=20, x52.

set l : bool, x44.SETH=#

e0, x47.shift left : octa ( ),MMIX-ARITH x7.

spec install : static void ( ),x95.

specval : static spec ( ), x93.stall : label, x75.STOU=#

ae, x47.true =1, x11.up : specnode �, x40.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

zero spec : spec, x41.

Page 211: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 204

115. Storing into memory requires a doubly linked data list of specnodes like thelists we use for local and global registers. In this case the head of the list is calledmem , and the addr �elds are physical addresses in memory.

hExternal variables 4 i +�Extern specnode mem ;

116. The addr �eld of a memory specnode is all 1s until the physical address hasbeen computed.

h Initialize everything 22 i +�mem :addr :h = mem :addr :l = �1;mem :up = mem :down = &mem ;

117. The CSWAP operation is treated as a partial store, with $X as a secondaryoutput. Partial store (pst ) commands read an octabyte from memory before theywrite it.

hSpecial cases of instruction dispatch 117 i �case cswap : cool~ren a = true ;spec install (cool~xx � cool G ? &g[cool~xx ] : &l[(cool O :l + cool~xx ) & lring mask ];

&cool~a);cool~ i = pst ;

case st : if ((op & #fe) � STCO) cool~b:o:l = cool~xx ;

case pst : cool~mem x = true ; spec install (&mem ;&cool~x); break;case ld : case ldunc : cool~ptr a = (void �) mem :up ; break;

See also sections 118, 119, 120, 121, 122, 227, 312, 322, 332, 337, 347, and 355.

This code is used in section 101.

118. When new data is PUT into special registers 16{21 (namely rK, rQ, rU, rV,rG, or rL) it can a�ect many things. Therefore we stop issuing further instructionsuntil such PUTs are committed. Moreover, we will see later that such drastic PUTsdefer execution until they reach the hot seat.

hSpecial cases of instruction dispatch 117 i +�case put : if (cool~yy 6= 0 _ cool~xx � 32) goto illegal inst ;if (cool~xx � 8) fif (cool~xx � 11) goto illegal inst ;if (cool~xx � 18 ^ :(cool~ loc :h& sign bit )) goto privileged inst ;

gif (cool~xx � 15 ^ cool~xx � 20) freeze dispatch = true ;cool~ren x = true ; spec install (&g[cool~xx ];&cool~x); break;

case get : if (cool~yy _ cool~zz � 32) goto illegal inst ;if (cool~zz � rO ) cool~z:o = shift left (cool O ; 3);else if (cool~zz � rS ) cool~z:o = shift left (cool S ; 3);else cool~z = specval (&g[cool~zz ]); break;

illegal inst : cool~ interrupt j= B_BIT; goto noop inst ;case ldvts : if (cool~ loc :h& sign bit ) break;privileged inst : cool~ interrupt j= K_BIT;noop inst : cool~ i = noop ; break;

Page 212: MMIXware - A RISC Computer for the Third Millennium - Knuth

205 MMIX-PIPE: THE DISPATCH STAGE

119. A PUSHGO instruction with X � L causes L to increase momentarily by 1, evenif L = G. But the value of L will be decreased before the PUSHGO is complete, so itwill never actually exceed G. Moreover, we needn't insert an incrl command.

hSpecial cases of instruction dispatch 117 i +�case pushgo : inst ptr :p = &cool~go ;case pushj :f register int x = cool~xx ;

if (x � cool L) fif (((cool S :l � cool O :l) & lring mask ) � cool L ^ cool L 6= 0)h Insert an instruction to advance gamma 113 i

x = cool L; cool L++;gcool~ren x = true ; spec install (&l[(cool O :l + x) & lring mask ];&cool~x);cool~x:known = true ; cool~x:o:h = 0; cool~x:o:l = x;cool~ren a = true ; spec install (&g[rJ ];&cool~a);cool~a:known = true ; cool~a:o = incr (cool~ loc ; 4);cool~set l = true ; spec install (&g[rL];&cool~rl );cool~rl :o:l = cool L � x� 1;new O = incr (cool O ; x+ 1);

g break;case syncid :if (cool~ loc :h& sign bit ) break;

case go : inst ptr :p = &cool~go ; break;

a: specnode, x44.addr : octa, x40.b: spec, x44.B_BIT=1� 2, x54.cool : control �, x60.cool G : int, x99.cool L: int, x99.cool O : octa, x98.cool S : octa, x98.cswap =68, x49.down : specnode �, x40.Extern=macro, x4.freeze dispatch : registerbool, x75.

g: specnode [ ], x86.get =54, x49.go =72, x49.go : specnode, x44.h: tetra, x17.i: internal opcode, x44.incr : octa ( ), MMIX-ARITH x6.incrl =86, x49.inst ptr : spec, x284.interrupt : unsigned int, x44.

K_BIT=1� 3, x54.known : bool, x40.l: tetra, x17.l: specnode �, x86.ld =56, x49.ldunc =59, x49.ldvts =60, x49.loc : octa, x44.lring mask : int, x88.mem x : bool, x44.new O : octa, x99.noop =81, x49.o: octa, x40.op : register mmix opcode,x75.

p: specnode �, x40.pst =66, x49.ptr a : void �, x44.pushgo =74, x49.pushj =71, x49.put =55, x49.ren a : bool, x44.ren x : bool, x44.

rJ =4, x52.rl : specnode, x44.rL=20, x52.rO =10, x52.rS =11, x52.set l : bool, x44.shift left : octa ( ),MMIX-ARITH x7.

sign bit =macro, x80.spec install : static void ( ),x95.

specnode= struct, x40.specval : static spec ( ), x93.st =63, x49.STCO=#

b4, x47.syncid =65, x49.true =1, x11.up : specnode �, x40.x: specnode, x44.xx : unsigned char, x44.yy : unsigned char, x44.z: spec, x44.zz : unsigned char, x44.

Page 213: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE DISPATCH STAGE 206

120. We need to know the topmost \hidden" element of the register stack whena POP instruction is dispatched. This element is usually present in the local registerring, unless = �.Once it is known, let x be its least signi�cant byte. We will be decreasing rO by

x + 1, so we may have to decrease repeatedly in order to maintain the conditionrS � rO.

hSpecial cases of instruction dispatch 117 i +�case pop : if (cool~xx ^ cool L � cool~xx )

cool~y = specval (&l[(cool O :l + cool~xx � 1) & lring mask ]);pop unsave : if (cool S :l � cool O :l) h Insert an instruction to decrease gamma 114 i;f register tetra x;register int new L;register specnode �p = l[(cool O :l � 1) & lring mask ]:up ;

if (p~known ) x = (p~o:l) &#ff; else goto stall ;

if ((tetra) (cool O :l � cool S :l) � x) h Insert an instruction to decrease gamma 114 i;new O = incr (cool O ;�x� 1);if (cool~ i � pop) new L = x+ (cool~xx � cool L ? cool~xx : cool L + 1);else new L = x;if (new L > cool G ) new L = cool G ;if (x < new L)cool~ren x = true ; spec install (&l[(cool O :l � 1) & lring mask ];&cool~x);

cool~set l = true ; spec install (&g[rL];&cool~rl );cool~rl :o:l = new L;if (cool~ i � pop) fcool~z:o:l = yz � 2;if (inst ptr :p � UNKNOWN_SPEC ^ new head � tail ) inst ptr :p = &cool~go ;

gbreak;

g121. h Special cases of instruction dispatch 117 i +�case mulu : cool~ren a = true ; spec install (&g[rH ];&cool~a); break;case div : case divu : cool~ren a = true ; spec install (&g[rR ];&cool~a); break;

122. It's tempting to say that we could avoid taking up space in the reorder bu�erwhen no operation needs to be done. A JMP instruction quali�es as a no-op in thissense, because the change of control occurs before the execution stage. However,even a no-op might have to be counted in the usage register rU, so it might get intothe execution stage for that reason. A no-op can also cause a protection interrupt,if it appears in a negative location. Even more importantly, a program might getinto a loop that consists entirely of jumps and no-ops; then we wouldn't be able tointerrupt it, because the interruption mechanism needs to �nd the current locationin the reorder bu�er! At least one functional unit therefore needs to provide explicitsupport for JMP, JMPB, and SWYM.The SWYM instruction with F_BIT set is a special case: This is a request from the

fetch coroutine for an update to the IT-cache, when the page table method isn'timplemented in hardware.

Page 214: MMIXware - A RISC Computer for the Third Millennium - Knuth

207 MMIX-PIPE: THE DISPATCH STAGE

hSpecial cases of instruction dispatch 117 i +�case noop : if (cool~ interrupt & F_BIT) f

cool~go :o = cool~y:o = cool~ loc ;inst ptr = specval (&g[rT ]);

gbreak;

123. hUndo data structures set prematurely in the cool block and break 123 i �if (cool~ren x _ cool~mem x ) spec rem (&cool~x);if (cool~ren a ) spec rem (&cool~a);if (cool~set l ) spec rem (&cool~rl );if (inst ptr :p � &cool~go) inst ptr :p = UNKNOWN_SPEC;break;

This code is used in section 75.

a: specnode, x44.cool : control �, x60.cool G : int, x99.cool L: int, x99.cool O : octa, x98.cool S : octa, x98.div =9, x49.divu =28, x49.F_BIT=1� 17, x54.g: specnode [ ], x86.go : specnode, x44.i: internal opcode, x44.incr : octa ( ), MMIX-ARITH x6.inst ptr : spec, x284.interrupt : unsigned int, x44.known : bool, x40.l: specnode �, x86.l: tetra, x17.

loc : octa, x44.lring mask : int, x88.mem x : bool, x44.mulu =27, x49.new head : register fetch �,x74.

new O : octa, x99.noop =81, x49.o: octa, x40.p: specnode �, x40.pop =75, x49.ren a : bool, x44.ren x : bool, x44.rH =3, x52.rl : specnode, x44.rL=20, x52.rR =6, x52.rT =13, x52.

set l : bool, x44.spec install : static void ( ),x95.

spec rem : static void ( ), x97.specnode= struct, x40.specval : static spec ( ), x93.stall : label, x75.tail : fetch �, x69.tetra=unsigned int, x17.true =1, x11.UNKNOWN_SPEC=macro, x71.up : specnode �, x40.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.yz : register int, x75.z: spec, x44.

Page 215: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE EXECUTION STAGES 208

124. The execution stages. MMIX's raison d'etre is its ability to execute in-structions. So now we want to simulate the behavior of its functional units.Each coroutine scheduled for action at the current tick of the clock has a stage

number corresponding to a particular subset of the MMIX hardware. For example, thecoroutines with stage = 2 are the second stages in the pipelines of the functionalunits. A coroutine with stage = 0 works in the fetch unit. Several arti�cially largestage numbers are used to control special coroutines that that do things like writedata from bu�ers into memory.In this program the current coroutine of interest is called self ; hence self~stage is

the current stage number of interest. Another key variable, self~ctl , is called data ;this is the control block being operated on by the current coroutine. We typically aresimulating an operation in which data~x is being computed as a function of data~yand data~z. The data record has many �elds, as described earlier when we de�nedcontrol structures; for example, data~owner is the same as self , during the executionstage, if it is nonnull.This part of the simulator is written as if each functional unit is able to handle

all 256 operations. In practice, of course, a functional unit tends to be much morespecialized; the actual specialization is governed by the dispatcher, which issues aninstruction only to a functional unit that supports it. Once an instruction has beendispatched, however, we can simulate it most easily if we imagine that its functionalunit is universal.Coroutines with higher stage numbers are processed �rst. The three most impor-

tant variables that govern a coroutine's behavior, once self~stage is given, are theexternal operation code data~op , the internal operation code data~i, and the value ofdata~state . We typically have data~state = 0 when a coroutine is �rst �red up.

hLocal variables 12 i +�register coroutine �self ; =� the current coroutine being executed �=register control �data ; =� the control block of the current coroutine �=

125. When a coroutine has done all it wants to on a single cycle, it says goto done .It will not be scheduled to do any further work unless the schedule routine has beencalled since it began execution. The wait macro is a convenient way to say \Pleaseschedule me to resume again at the current data~state" after a speci�ed time; forexample, wait (1) will restart a coroutine on the next clock tick.

#de�ne wait (t) f schedule (self ; t; data~state ); goto done ; g#de�ne pass after (t) schedule (self + 1; t; data~state )#de�ne sleep f self~next = self ; goto done ; g =� wait forever �=#de�ne awaken (c; t) schedule (c; t; c~ctl~state )

Page 216: MMIXware - A RISC Computer for the Third Millennium - Knuth

209 MMIX-PIPE: THE EXECUTION STAGES

hExecute all coroutines scheduled for the current time 125 i �cur time++; if (cur time � ring size ) cur time = 0;for (self = queuelist (cur time ); self 6= &sentinel ; self = sentinel :next ) fsentinel :next = self~next ; self~next = �; =� unschedule this coroutine �=data = self~ctl ;if (verbose & coroutine bit ) fprintf (" running "); print coroutine id (self ); printf (" ");print control block (data ); printf ("\n");

gswitch (self~stage ) fcase 0: h Simulate an action of the fetch coroutine 288 i;case 1: h Simulate the �rst stage of an execution pipeline 130 i;default: h Simulate later stages of an execution pipeline 135 i;hCases for control of special coroutines 126 i;g

terminate : if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;done : ;g

This code is used in section 64.

126. A special coroutine whose stage number is vanish simply goes away at itsscheduled time.

hCases for control of special coroutines 126 i �case vanish : goto terminate ;

See also sections 215, 217, 222, 224, 232, 237, and 257.

This code is used in section 125.

127. hGlobal variables 20 i +�coroutine mem locker ; =� trivial coroutine that vanishes �=coroutine Dlocker ; =� another �=control vanish ctl ; =� such coroutines share a common control block �=

control= struct, x44.coroutine= struct, x23.coroutine bit =1� 2, x8.ctl : control �, x23.cur time : int, x29.i: internal opcode, x44.lockloc : coroutine ��, x23.next : coroutine �, x23.op : mmix opcode, x44.

owner : coroutine �, x44.print control block : staticvoid ( ), x46.

print coroutine id : staticvoid ( ), x25.

printf : int ( ), <stdio.h>.queuelist : static coroutine�( ), x35.

ring size : int, x29.

schedule : static void ( ), x28.sentinel : coroutine, x36.stage : int, x23.state : int, x44.vanish =98, x129.verbose : int, x4.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 217: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE EXECUTION STAGES 210

128. h Initialize everything 22 i +�mem locker :name = "Locker";mem locker :ctl = &vanish ctl ;mem locker :stage = vanish ;Dlocker :name = "Dlocker";Dlocker :ctl = &vanish ctl ;Dlocker :stage = vanish ;vanish ctl :go :o:l = 4;for (j = 0; j < DTcache~ports ; j++) DTcache~reader [j]:ctl = &vanish ctl ;if (Dcache )for (j = 0; j < Dcache~ports ; j++) Dcache~reader [j]:ctl = &vanish ctl ;

for (j = 0; j < ITcache~ports ; j++) ITcache~reader [j]:ctl = &vanish ctl ;if (Icache )for (j = 0; j < Icache~ports ; j++) Icache~reader [j]:ctl = &vanish ctl ;

129. Here is a list of the stage numbers for special coroutines to be de�ned below.

hHeader de�nitions 6 i +�#de�ne max stage 99 =� exceeds all stage numbers �=#de�ne vanish 98 =� special coroutine that just goes away �=#de�ne ush to mem 97 =� coroutine for ushing from a cache to memory �=#de�ne ush to S 96 =� coroutine for ushing from a cache to the S-cache �=#de�ne �ll from mem 95 =� coroutine for �lling a cache from memory �=#de�ne �ll from S 94 =� coroutine for �lling a cache from the S-cache �=#de�ne �ll from virt 93 =� coroutine for �lling a translation cache �=#de�ne write from wbuf 92 =� coroutine for emptying the write bu�er �=#de�ne cleanup 91 =� coroutine for cleaning the caches �=130. At the very beginning of stage 1, a functional unit will stall if necessary untilits operands are available. As soon as the operands are all present, the state is setnonzero and execution proper begins.

hSimulate the �rst stage of an execution pipeline 130 i �switch1 : switch (data~state ) fcase 0: hWait for input data if necessary; set state = 1 if it's there 131 i;case 1: hBegin execution of an operation 132 i;case 2: hPass data to the next stage of the pipeline 134 i;case 3: hFinish execution of an operation 144 i;h Special cases for states in the �rst stage 266 i;

gThis code is used in section 125.

Page 218: MMIXware - A RISC Computer for the Third Millennium - Knuth

211 MMIX-PIPE: THE EXECUTION STAGES

131. If some of our input data has been computed by another coroutine on thecurrent cycle, we grab it now but wait for the next cycle. (An actual machine wouldn'thave latched the data until then.)

hWait for input data if necessary; set state = 1 if it's there 131 i �j = 0;if (data~y:p) f

j++;if (data~y:p~known ) data~y:o = data~y:p~o; data~y:p = �;else j += 10;

gif (data~z:p) f

j++;if (data~z:p~known ) data~z:o = data~z:p~o; data~z:p = �;else j += 10;

gif (data~b:p) fif (data~need b) j++;if (data~b:p~known ) data~b:o = data~b:p~o; data~b:p = �;else if (data~need b) j += 10;

gif (data~ra :p) fif (data~need ra ) j++;if (data~ra :p~known ) data~ra :o = data~ra :p~o; data~ra :p = �;else if (data~need ra ) j += 10;

gif (j < 10) data~state = 1;if (j) wait (1); =� otherwise we fall through to case 1 �=

This code is used in section 130.

b: spec, x44.ctl : control �, x23.data : register control �,x124.

Dcache : cache �, x168.Dlocker : coroutine, x127.DTcache : cache �, x168.go : specnode, x44.Icache : cache �, x168.ITcache : cache �, x168.

j: register int, x10.j: register int, x12.known : bool, x40.l: tetra, x17.mem locker : coroutine, x127.name : char �, x23.need b : bool, x44.need ra : bool, x44.o: octa, x40.p: specnode �, x40.

ports : int, x167.ra : spec, x44.reader : coroutine �, x167.stage : int, x23.state : int, x44.vanish ctl : control, x127.wait =macro ( ), x125.y: spec, x44.z: spec, x44.

Page 219: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE EXECUTION STAGES 212

132. Simple register-to-register instructions like ADD are assumed to take just onecycle, but others like FADD almost certainly require more time. This simulator can becon�gured so that FADD might take, say, four pipeline stages of one cycle each (1+1+1+ 1), or two pipeline stages of two cycles each (2 + 2), or a single unpipelined stagelasting four cycles (4), etc. In any case the simulator computes the results now, forsimplicity, placing them in data~x and possibly also in data~a and/or data~ interrupt .The results will not be oÆcially made known until the proper time.

hBegin execution of an operation 132 i �switch (data~ i) fhCases to compute the results of register-to-register operation 137 i;hCases to compute the virtual address of a memory operation 265 i;hCases for stage 1 execution 155 i;

ghSet things up so that the results become known when they should 133 i;

This code is used in section 130.

133. If the internal opcode data~i is max pipe op or less, a special pipeline sequencelike 1 + 1+ 1+ 1 or 2+ 2 or 15+ 10, etc., has been con�gured. Otherwise we assumethat the pipeline sequence is simply 1.Suppose the pipeline sequence is t1 + t2 + � � � + tk. Each tj is positive and less

than 256, so we represent the sequence as a string pipe seq [data~i] of unsigned \char-acters," terminated by 0. Given such a string, we want to do the following: Wait(t1 � 1) cycles and pass data to stage 2; wait t2 cycles and pass data to stage 3; : : : ;wait tk�1 cycles and pass data to stage k; wait tk cycles and make the results known .The value of denin is added to t1; the value of denout is added to tk.

hSet things up so that the results become known when they should 133 i �data~state = 3;if (data~ i � max pipe op) f register unsigned char �s = pipe seq [data~ i];

j = s[0] + data~denin ;if (s[1]) data~state = 2; =� more than one stage �=else j += data~denout ;if (j > 1) wait (j � 1);

ggoto switch1 ;

This code is used in section 132.

134. When we're in stage j, the coroutine for stage j + 1 of the same functionalunit is self + 1.

hPass data to the next stage of the pipeline 134 i �pass data : if ((self + 1)~next ) wait (1); =� stall if the next stage is occupied �=f register unsigned char �s = pipe seq [data~ i];

j = s[self~stage ];if (s[self~stage + 1] � 0) j += data~denout ; data~state = 3;

=� the next stage is the last �=pass after (j);

gpassit : (self + 1)~ctl = data ;

Page 220: MMIXware - A RISC Computer for the Third Millennium - Knuth

213 MMIX-PIPE: THE EXECUTION STAGES

data~owner = self + 1;goto done ;

This code is used in section 130.

135. h Simulate later stages of an execution pipeline 135 i �switch2 : if (data~b:p ^ data~b:p~known ) data~b:o = data~b:p~o; data~b:p = �;switch (data~state ) fcase 0: panic(confusion ("switch2"));case 1: hBegin execution of a stage-two operation 351 i;case 2: goto pass data ;case 3: goto �n ex ;h Special cases for states in later stages 272 i;

gThis code is used in section 125.

136. The default pipeline times use only one stage; they can be overridden byMMIX con�g . The total number of stages supported by this simulator is limited to90, since it must never interfere with the stage numbers for special coroutines de�nedbelow. (The author doesn't feel guilty about making this restriction.)

hExternal variables 4 i +�#de�ne pipe limit 90Extern unsigned char pipe seq [max pipe op + 1][pipe limit + 1];

137. The simplest of all register-to-register operations is set , which occurs forcommands like SETH as well as for commands like GETA. (We might as well startwith the easy cases and work our way up.)

hCases to compute the results of register-to-register operation 137 i �case set : data~x:o = data~z:o; break;

See also sections 138, 139, 140, 141, 142, 143, 343, 344, 345, 346, 348, and 350.

This code is used in section 132.

a: specnode, x44.b: spec, x44.confusion =macro ( ), x13.ctl : control �, x23.data : register control �,x124.

denin : int, x44.denout : int, x44.done : label, x125.Extern=macro, x4.�n ex : label, x144.

i: internal opcode, x44.interrupt : unsigned int, x44.j: register int, x12.known : bool, x40.max pipe op = feps , x49.MMIX con�g : void ( ),MMIX-CONFIG x38.

next : coroutine �, x23.o: octa, x40.owner : coroutine �, x44.p: specnode �, x40.

panic =macro ( ), x13.pass after =macro ( ), x125.self : register coroutine �,x124.

set =33, x49.stage : int, x23.state : int, x44.switch1 : label, x130.wait =macro ( ), x125.x: specnode, x44.z: spec, x44.

Page 221: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE EXECUTION STAGES 214

138. Here are the basic boolean operations, which account for 24 of MMIX's 256opcodes.

hCases to compute the results of register-to-register operation 137 i +�case or : data~x:o:h = data~y:o:h j data~z:o:h;data~x:o:l = data~y:o:l j data~z:o:l;break;

case orn : data~x:o:h = data~y:o:h j �data~z:o:h;data~x:o:l = data~y:o:l j �data~z:o:l;break;

case nor : data~x:o:h = �(data~y:o:h j data~z:o:h);data~x:o:l = �(data~y:o:l j data~z:o:l);break;

case and : data~x:o:h = data~y:o:h& data~z:o:h;data~x:o:l = data~y:o:l & data~z:o:l;break;

case andn : data~x:o:h = data~y:o:h&�data~z:o:h;data~x:o:l = data~y:o:l &�data~z:o:l;break;

case nand : data~x:o:h = �(data~y:o:h& data~z:o:h);data~x:o:l = �(data~y:o:l & data~z:o:l);break;

case xor : data~x:o:h = data~y:o:h� data~z:o:h;data~x:o:l = data~y:o:l � data~z:o:l;break;

case nxor : data~x:o:h = data~y:o:h��data~z:o:h;data~x:o:l = data~y:o:l ��data~z:o:l;break;

139. The implementation of ADDU is only slightly more diÆcult. It would be trivialexcept for the fact that internal opcode addu is used not only for the ADDU[I] andINC[M][H,L] operations, in which we simply want to add data~y:o to data~z:o, butalso for operations like 4ADDU.

hCases to compute the results of register-to-register operation 137 i +�case addu : data~x:o = oplus ((data~op & #

f8) � #28 ?

shift left (data~y:o; 1 + ((data~op � 1) & #3)) : data~y:o; data~z:o);

break;case subu : data~x:o = ominus (data~y:o; data~z:o); break;

140. Signed addition and subtraction produce the same results as their unsignedcounterparts, but over ow must also be detected. Over ow occurs when adding y to zif and only if y and z have the same sign but their sum has a di�erent sign. Over owoccurs in the calculation x = y�z if and only if it occurs in the calculation y = x+z.

hCases to compute the results of register-to-register operation 137 i +�case add : data~x:o = oplus (data~y:o; data~z:o);if (((data~y:o:h� data~z:o:h)& sign bit ) � 0^ ((data~y:o:h� data~x:o:h)& sign bit ) 6= 0)data~ interrupt j= V_BIT;

break;

Page 222: MMIXware - A RISC Computer for the Third Millennium - Knuth

215 MMIX-PIPE: THE EXECUTION STAGES

case sub : data~x:o = ominus (data~y:o; data~z:o);if (((data~x:o:h� data~z:o:h)& sign bit ) � 0^ ((data~y:o:h� data~x:o:h)& sign bit ) 6= 0)data~ interrupt j= V_BIT;

break;

141. The shift commands might take more than one cycle, or they might even bepipelined, if the default value of pipe seq [sh ] is changed. But we compute shifts all atonce here, because other parts of the simulator will take care of the pipeline timing.(Notice that shlu is changed to sh , for this reason. Similar changes to the internal opcodes are made for other operators below.)

#de�ne shift amt (data~z:o:h _ data~z:o:l � 64 ? 64 : data~z:o:l)

hCases to compute the results of register-to-register operation 137 i +�case shlu : data~x:o = shift left (data~y:o; shift amt ); data~ i = sh ; break;case shl : data~x:o = shift left (data~y:o; shift amt ); data~ i = sh ;f octa tmpo ;

tmpo = shift right (data~x:o; shift amt ; 0);if (tmpo :h 6= data~y:o:h _ tmpo :l 6= data~y:o:l) data~ interrupt j= V_BIT;

g break;case shru : data~x:o = shift right (data~y:o; shift amt ; 1); data~ i = sh ; break;case shr : data~x:o = shift right (data~y:o; shift amt ; 0); data~ i = sh ; break;

142. The MUX operation has three operands, namely data~y, data~z, and data~b;the third operand is the current (speculative) value of rM, the special mask register.Otherwise MUX is unexceptional.

hCases to compute the results of register-to-register operation 137 i +�case mux : data~x:o:h = (data~y:o:h& data~b:o:h) + (data~z:o:h&�data~b:o:h);data~x:o:l = (data~y:o:l & data~b:o:l) + (data~z:o:l &�data~b:o:l);break;

add =29, x49.addu =30, x49.and =37, x49.andn =38, x49.b: spec, x44.data : register control �,x124.

h: tetra, x17.i: internal opcode, x44.interrupt : unsigned int, x44.l: tetra, x17.mux =11, x49.nand =39, x49.nor =36, x49.

nxor =41, x49.o: octa, x40.octa= struct, x17.ominus : octa ( ),MMIX-ARITH x5.

op : mmix opcode, x44.oplus : octa ( ), MMIX-ARITH x5.or =34, x49.orn =35, x49.pipe seq : unsigned char [ ][ ],x136.

sh =10, x49.shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

shl =44, x49.shlu =42, x49.shr =45, x49.shru =43, x49.sign bit =macro, x80.sub =31, x49.subu =32, x49.V_BIT=1� 14, x54.x: specnode, x44.xor =40, x49.y: spec, x44.z: spec, x44.

Page 223: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE EXECUTION STAGES 216

143. Comparisons are a breeze.

hCases to compute the results of register-to-register operation 137 i +�case cmp : if ((data~y:o:h& sign bit ) > (data~z:o:h& sign bit )) goto cmp neg ;if ((data~y:o:h& sign bit ) < (data~z:o:h& sign bit )) goto cmp pos ;

case cmpu : if (data~y:o:h < data~z:o:h) goto cmp neg ;if (data~y:o:h > data~z:o:h) goto cmp pos ;if (data~y:o:l < data~z:o:l) goto cmp neg ;if (data~y:o:l > data~z:o:l) goto cmp pos ;

cmp zero : break; =� data~x is zero �=cmp pos : data~x:o:l = 1; break; =� data~x:o:h is zero �=cmp neg : data~x:o = neg one ; break;

144. The other operations will be deferred until later, now that we understand thebasic ideas. But one more piece of code ought to written before we move on, becauseit completes the execution stage for the simple cases already considered.The ren x and ren a �elds tell us whether the x and/or a �elds contain valid

information that should become oÆcially known.

hFinish execution of an operation 144 i ��n ex : if (data~ren x ) data~x:known = true ;else if (data~mem x ) data~x:known = true ; data~x:addr :l &= �8;if (data~ren a ) data~a:known = true ;if (data~ loc :h& sign bit ) data~ra :o:l = 0;

=� no trips enabled for the operating system �=if (data~ interrupt &

#ffff) hHandle interrupt at end of execution stage 307 i;

die : data~owner = �; goto terminate ; =� this coroutine now fades away �=This code is used in section 130.

Page 224: MMIXware - A RISC Computer for the Third Millennium - Knuth

217 MMIX-PIPE: THE COMMISSION/DEISSUE STAGE

145. The commission/deissue stage. Control blocks leave the reorder bu�ereither at the hot end (when they're committed) or at the cool end (when they'redeissued). We hope most of them are committed, but from time to time our spec-ulation is incorrect and we must deissue a sequence of instructions that prove tobe unwanted. Deissuing must take priority over committing, because the dispatchercannot do anything until the machine's cool state has stabilized.Deissuing changes the cool state by undoing the most recently issued instructions,

in reverse order. Committing changes the hot state by doing the least recentlyissued instructions, in their original order. Both operations are similar, so we assumethat they take the same time; at most commit max instructions are deissued and/orcommitted on each clock cycle.

hDeissue the coolest instruction 145 i �fcool = (cool � reorder top ? reorder bot : cool + 1);if (verbose & issue bit ) fprintf ("Deissuing "); print control block (cool );if (cool~owner ) f printf (" "); print coroutine id (cool~owner ); gprintf ("\n");

gif (cool~ren x ) rename regs++; spec rem (&cool~x);if (cool~ren a ) rename regs++; spec rem (&cool~a);if (cool~mem x ) mem slots++; spec rem (&cool~x);if (cool~set l ) spec rem (&cool~rl );if (cool~owner ) fif (cool~owner~ lockloc) �(cool~owner~ lockloc) = �; cool~owner~ lockloc = �;if (cool~owner~next ) unschedule (cool~owner );

gcool O = cool~cur O ; cool S = cool~cur S ;deissues��;

gThis code is used in section 67.

a: specnode, x44.addr : octa, x40.cmp =46, x49.cmpu =47, x49.commit max : int, x59.cool : control �, x60.cool O : octa, x98.cool S : octa, x98.cur O : octa, x44.cur S : octa, x44.data : register control �,x124.

deissues : int, x60.h: tetra, x17.interrupt : unsigned int, x44.issue bit =1� 0, x8.known : bool, x40.

l: tetra, x17.loc : octa, x44.lockloc : coroutine ��, x23.mem slots : int, x86.mem x : bool, x44.neg one : octa, MMIX-ARITH x4.next : coroutine �, x23.o: octa, x40.owner : coroutine �, x44.print control block : staticvoid ( ), x46.

print coroutine id : staticvoid ( ), x25.

printf : int ( ), <stdio.h>.ra : spec, x44.ren a : bool, x44.

ren x : bool, x44.rename regs : int, x86.reorder bot : control �, x60.reorder top : control �, x60.rl : specnode, x44.set l : bool, x44.sign bit =macro, x80.spec rem : static void ( ), x97.terminate : label, x125.true =1, x11.unschedule : static void ( ),x33.

verbose : int, x4.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 225: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE COMMISSION/DEISSUE STAGE 218

146. hCommit the hottest instruction, or break if it's not ready 146 i �fif (nullifying ) hNullify the hottest instruction 147 ielse fif (hot~ i � get ^ hot~zz � rQ ) new Q = oandn (g[rQ ]:o; hot~x:o);else if (hot~ i � put ^ hot~xx � rQ ) hot~x:o:h j= new Q :h; hot~x:o:l j= new Q :l;if (hot~mem x ) hCommit to memory if possible, otherwise break 256 i;if (verbose & issue bit ) fprintf ("Committing "); print control block (hot ); printf ("\n");

gif (hot~ren x ) rename regs++; hot~x:up~o = hot~x:o; spec rem (&(hot~x));if (hot~ren a ) rename regs++; hot~a:up~o = hot~a:o; spec rem (&(hot~a));if (hot~set l ) hot~rl :up~o = hot~rl :o; spec rem (&(hot~rl ));if (hot~arith exc) g[rA]:o:l j= hot~arith exc ;if (hot~usage ) f

g[rU ]:o:l++; if (g[rU ]:o:l � 0) fg[rU ]:o:h++; if ((g[rU ]:o:h& #

7fff) � 0) g[rU ]:o:h �= #8000;

gg

gif (hot~ interrupt � H_BIT) hBegin an interruption and break 317 i;

gThis code is used in section 67.

147. A load or store instruction is \nulli�ed" if it is about to be captured by atrap interrupt. In such cases it will be the only item in the reorder bu�er; thusnullifying is sort of a cross between deissuing and committing. (It is important tohave stopped dispatching when nulli�cation is necessary, because instructions suchas incgamma and decgamma change rS, and we need to change it back when anunexpected interruption occurs.)

hNullify the hottest instruction 147 i �fif (verbose & issue bit ) fprintf ("Nullifying "); print control block (hot ); printf ("\n");

gif (hot~ren x ) rename regs++; spec rem (&hot~x);if (hot~ren a ) rename regs++; spec rem (&hot~a);if (hot~mem x ) mem slots++; spec rem (&hot~x);if (hot~set l ) spec rem (&hot~rl );cool O = hot~cur O ; cool S = hot~cur S ;nullifying = false ;

gThis code is used in section 146.

148. Interrupt bits in rQ might be lost if they are set between a GET and a PUT.Therefore we don't allow PUT to zero out bits that have become 1 since the mostrecently committed GET.

hGlobal variables 20 i +�octa new Q ; =� when rQ increases in any bit position, so should this �=

Page 226: MMIXware - A RISC Computer for the Third Millennium - Knuth

219 MMIX-PIPE: THE COMMISSION/DEISSUE STAGE

a: specnode, x44.arith exc : unsigned int, x44.cool O : octa, x98.cool S : octa, x98.cur O : octa, x44.cur S : octa, x44.decgamma =85, x49.false =0, x11.g: specnode [ ], x86.get =54, x49.h: tetra, x17.H_BIT=1� 16, x54.hot : control �, x60.i: internal opcode, x44.incgamma =84, x49.

interrupt : unsigned int, x44.issue bit =1� 0, x8.l: tetra, x17.mem slots : int, x86.mem x : bool, x44.nullifying : bool, x315.o: octa, x40.oandn : octa ( ),MMIX-ARITH x25.

octa= struct, x17.print control block : staticvoid ( ), x46.

printf : int ( ), <stdio.h>.put =55, x49.rA=21, x52.

ren a : bool, x44.ren x : bool, x44.rename regs : int, x86.rl : specnode, x44.rQ =16, x52.rU =17, x52.set l : bool, x44.spec rem : static void ( ), x97.up : specnode �, x40.usage : bool, x44.verbose : int, x4.x: specnode, x44.xx : unsigned char, x44.zz : unsigned char, x44.

Page 227: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE COMMISSION/DEISSUE STAGE 220

149. An instruction will not be committed immediately if it violates the basic secu-rity rule of MMIX: An instruction in a nonnegative location should not be performedunless all eight of the internal interrupts have been enabled in the interrupt maskregister rK. Conversely, an instruction in a negative location should not be performedif the P_BIT is enabled in rK.Such instructions take one extra cycle before they are committed. The nonnegative-

location case turns on the S_BIT of both rK and rQ, leading to an immediate interrupt(unless the current instruction is trap , put , or resume ).

hCheck for security violation, break if so 149 i �fif (hot~ loc :h& sign bit ) fif ((g[rK ]:o:h& P_BIT) ^ :(hot~ interrupt & P_BIT)) fhot~ interrupt j= P_BIT;g[rQ ]:o:h j= P_BIT;new Q :h j= P_BIT;if (verbose & issue bit ) fprintf (" setting rQ="); print octa (g[rQ ]:o); printf ("\n");

gbreak;

gg else if ((g[rK ]:o:h& #

ff) 6= #ff ^ :(hot~ interrupt & S_BIT)) f

hot~ interrupt j= S_BIT;g[rQ ]:o:h j= S_BIT;new Q :h j= S_BIT;g[rK ]:o:h j= S_BIT;if (verbose & issue bit ) fprintf (" setting rQ="); print octa (g[rQ ]:o);printf (", rK="); print octa (g[rK ]:o); printf ("\n");

gbreak;

gg

This code is used in section 67.

Page 228: MMIXware - A RISC Computer for the Third Millennium - Knuth

221 MMIX-PIPE: BRANCH PREDICTION

150. Branch prediction. An MMIX programmer distinguishes statically between\branches" and \probable branches," but many modern computers attempt to dobetter by implementing dynamic branch prediction. (See, for example, section 4.3of Hennessy and Patterson's Computer Architecture, second edition.) Experiencehas shown that dynamic branch prediction can signi�cantly improve the performanceof speculative execution, by reducing the number of instructions that need to bedeissued.This simulator has an optional bp table containing 2a+b+c entries of n bits each,

where n is between 1 and 8. Usually n is 1 or 2 in practice, but 8 bits are allocatedper entry for convenience in this program. The bp table is consulted and updated onevery branch instruction (every B or PB instruction, but not JMP), for advice on pasthistory of similar situations. It is indexed by the a least signi�cant bits of the addressof the instruction, the b most recent bits of global branch history, and the next c bitsof both address and history (exclusive-ored).A bp table entry begins at zero and is regarded as a signed n-bit number. If it

is nonnegative, we will follow the prediction in the instruction, namely to predicta branch taken only in the PB case. If it is negative, we will predict the oppositeof the instruction's recommendation. The n-bit number is increased (if possible) ifthe instruction's prediction was correct, decreased (if possible) if the instruction'sprediction was incorrect.(Incidentally, a large value of n is not necessarily a good idea. For example, if

n = 8 the machine might need 128 steps to recognize that a branch taken the �rst150 times is not taken the next 150 times. And if we modify the update criteria toavoid this problem, we obtain a scheme that is rarely better than a simple schemewith smaller n.)The values a, b, c, and n in this discussion are called bp a , bp b , bp c , and bp n in

the program.

hExternal variables 4 i +�Extern int bp a ; bp b ; bp c ; bp n ; =� parameters for branch prediction �=Extern char �bp table ; =� either � or an array of 2a+b+c items �=

Extern=macro, x4.g: specnode [ ], x86.h: tetra, x17.hot : control �, x60.interrupt : unsigned int, x44.issue bit =1� 0, x8.loc : octa, x44.

new Q : octa, x148.o: octa, x40.P_BIT=1� 0, x54.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.put =55, x49.resume =76, x49.

rK =15, x52.rQ =16, x52.S_BIT=1� 1, x54.sign bit =macro, x80.trap =82, x49.verbose : int, x4.

Page 229: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: BRANCH PREDICTION 222

151. Branch prediction is made when we are either about to issue an instruction orpeeking ahead. We look at the bp table , but we don't want to update it yet.

hPredict a branch outcome 151 i �fpredicted = op & #

10; =� start with the instruction's recommendation �=if (bp table ) f register int h;

m = ((head~ loc :l & bp cmask )� bp b) + (head~ loc :l & bp amask );m = ((cool hist & bp bcmask )� bp b)� (m� 2);h = bp table [m];if (h& bp npower ) predicted �= #

10;gif (predicted ) peek hist = (peek hist � 1) + 1;else peek hist �= 1;

gThis code is used in section 85.

152. We update the bp table when an instruction is issued. And we store theopposite table value in cool~x:o:l, just in case our prediction turns out to be wrong.

hRecord the result of branch prediction 152 i �if (bp table ) f register int reversed ; h; h up ; h down ;

reversed = op & #10;

if (peek hist & 1) reversed �= #10;

m = ((head~ loc :l & bp cmask )� bp b) + (head~ loc :l & bp amask );m = ((cool hist & bp bcmask )� bp b)� (m� 2);h = bp table [m];h up = (h+ 1) & bp nmask ; if (h up � bp npower ) h up = h;if (h � bp npower ) h down = h; else h down = (h� 1) & bp nmask ;if (reversed ) fbp table [m] = h down ; cool~x:o:l = h up ;cool~ i = pbr + br � cool~ i; =� reverse the sense �=bp rev stat++;

g else fbp table [m] = h up ; cool~x:o:l = h down ; =� go with the ow �=bp ok stat++;

gif (verbose & show pred bit ) fprintf (" predicting "); print octa (cool~ loc);printf (" %s; bp[%x]=%d\n"; reversed ? "NG" : "OK";m;

bp table [m]� ((bp table [m] & bp npower )� 1));gcool~x:o:h = m;

gThis code is used in section 75.

Page 230: MMIXware - A RISC Computer for the Third Millennium - Knuth

223 MMIX-PIPE: BRANCH PREDICTION

153. The calculations in the previous sections need several precomputed constants,depending on the parameters a, b, c, and n.

h Initialize everything 22 i +�bp amask = ((1� bp a )� 1)� 2; =� least a bits of instruction address �=bp cmask = ((1� bp c)� 1)� (bp a + 2); =� the next c address bits �=bp bcmask = (1� (bp b + bp c))� 1; =� least b+ c bits of history info �=bp nmask = (1� bp n )� 1; =� least signi�cant n bits �=bp npower = 1� (bp n � 1); =� 2n�1, the n-bit code for �1 �=

154. hGlobal variables 20 i +�int bp amask ; bp cmask ; bp bcmask ; bp nmask ; bp npower ;int bp rev stat ; bp ok stat ; =� how often we overrode and agreed �=int bp bad stat ; bp good stat ; =� how often we failed and succeeded �=

155. After a branch or probable branch instruction has been issued and the valueof the relevant register has been computed in the reorder bu�er as data~b:o, we'reready to determine if the prediction was correct or not.

hCases for stage 1 execution 155 i �case br : case pbr : j = register truth (data~b:o; data~op );if (j) data~go :o = data~z:o; else data~go :o = data~y:o;if (j � (data~ i � pbr )) bp good stat ++;else f =� oops, misprediction �=bp bad stat++;hRecover from incorrect branch prediction 160 i;

ggoto �n ex ;

See also sections 313, 325, 327, 328, 329, 331, and 356.

This code is used in section 132.

b: spec, x44.bp a : int, x150.bp b : int, x150.bp c : int, x150.bp n : int, x150.bp table : char �, x150.br =69, x49.cool : control �, x60.cool hist : unsigned int, x99.data : register control �,x124.

�n ex : label, x144.go : specnode, x44.

h: tetra, x17.head : fetch �, x69.i: internal opcode, x44.j: register int, x12.l: tetra, x17.loc : octa, x68.loc : octa, x44.m: register int, x12.o: octa, x40.op : register mmix opcode,x75.

op : mmix opcode, x44.

pbr =70, x49.peek hist : unsigned int, x99.predicted : register int, x85.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.register truth : static int ( ),x157.

show pred bit =1� 7, x8.verbose : int, x4.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 231: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: BRANCH PREDICTION 224

156. The register truth subroutine is used by B, PB, CS, and ZS commands to decidewhether an octabyte satis�es the conditions of the opcode, data~op .

h Internal prototypes 13 i +�static int register truth ARGS((octa;mmix opcode));

157. h Subroutines 14 i +�static int register truth (o; op)

octa o;mmix opcode op ;

f register int b;

switch ((op � 1) & #3) f

case 0: b = o:h� 31; break; =� negative? �=case 1: b = (o:h � 0 ^ o:l � 0); break; =� zero? �=case 2: b = (o:h < sign bit ^ (o:h _ o:l)); break; =� positive? �=case 3: b = o:l & #

1; break; =� odd? �=gif (op & #

8) return b� 1;else return b;

g158. The issued between subroutine determines how many speculative instructionswere issued between a given control block in the reorder bu�er and the current coolpointer, when cc = cool .

h Internal prototypes 13 i +�static int issued between ARGS((control �; control �));

159. h Subroutines 14 i +�static int issued between (c; cc)

control �c; �cc ;fif (c > cc) return c� 1� cc ;return (c� reorder bot ) + (reorder top � cc);

g160. If more than one functional unit is able to process branch instructions and iftwo of them simultaneously discover misprediction, or if misprediction is detected byone unit just as another unit is generating an interrupt, we assume that an arbitrationtakes place so that only the hottest one actually deissues the cooler instructions.Changes to the bp table aren't undone when they were made on speculation in an

instruction being deissued; nor do we worry about cases where the same bp table

entry is being updated by two or more active coroutines. After all, the bp table isjust a heuristic, not part of the real computation. We correct the bp table only if wediscover that a prediction was wrong, so that we will be less likely to make the samemistake later.

hRecover from incorrect branch prediction 160 i �i = issued between (data ; cool );if (i < deissues ) goto die ;deissues = i;old tail = tail = head ; resuming = 0; =� clear the fetch bu�er �=

Page 232: MMIXware - A RISC Computer for the Third Millennium - Knuth

225 MMIX-PIPE: BRANCH PREDICTION

hRestart the fetch coroutine 287 i;inst ptr :o = data~go :o; inst ptr :p = �;if (:(data~ loc :h& sign bit )) fif (inst ptr :o:h& sign bit ) data~ interrupt j= P_BIT;else data~ interrupt &= �P_BIT;

gif (bp table ) fbp table [data~x:o:h] = data~x:o:l; =� this is what we should have stored �=if (verbose & show pred bit ) fprintf (" mispredicted "); print octa (data~ loc);printf ("; bp[%x]=%d\n"; data~x:o:h; data~x:o:l � ((data~x:o:l & bp npower )� 1));

ggcool hist = (j ? (data~hist � 1) + 1 : data~hist � 1);

This code is used in section 155.

161. hExternal prototypes 9 i +�Extern void print stats ARGS((void));

162. hExternal routines 10 i +�void print stats ( )fregister int j;

if (bp table )printf ("Predictions: %d in agreement, %d in opposition; %d good, %d bad\n";

bp ok stat ; bp rev stat ; bp good stat ; bp bad stat );else printf ("Predictions: %d good, %d bad\n"; bp good stat ; bp bad stat );printf ("Instructions issued per cycle:\n");for (j = 0; j � dispatch max ; j++) printf (" %d %d\n"; j; dispatch stat [j]);

g

ARGS=macro, x6.bp bad stat : int, x154.bp good stat : int, x154.bp npower : int, x154.bp ok stat : int, x154.bp rev stat : int, x154.bp table : char �, x150.control= struct, x44.cool : control �, x60.cool hist : unsigned int, x99.data : register control �,x124.

deissues : int, x60.die : label, x144.dispatch max : int, x59.

dispatch stat : int �, x66.Extern=macro, x4.go : specnode, x44.h: tetra, x17.head : fetch �, x69.hist : unsigned int, x44.i: register int, x12.inst ptr : spec, x284.interrupt : unsigned int, x44.j: register int, x12.l: tetra, x17.loc : octa, x44.mmix opcode= enum, x47.o: octa, x40.octa= struct, x17.

old tail : fetch �, x70.op : mmix opcode, x44.p: specnode �, x40.P_BIT=1� 0, x54.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.reorder bot : control �, x60.reorder top : control �, x60.resuming : int, x78.show pred bit =1� 7, x8.sign bit =macro, x80.tail : fetch �, x69.verbose : int, x4.x: specnode, x44.

Page 233: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 226

163. Cache memory. It's time now to consider MMIX's MMU, the memory man-agement unit. This part of the machine deals with the critical problem of getting datato and from the computational units. In a RISC architecture all interaction betweenmain memory and the computer registers is speci�ed by load and store instructions;thus memory accesses are much easier to deal with than they would be on a machinewith more complex kinds of interaction. But memory management is still diÆcult,if we want to do it well, because main memory typically operates at a much slowerspeed than the registers do. High-speed implementations of MMIX introduce interme-diate \caches" of storage in order to keep the most important data accessible, andcache maintenance can be complicated when all the details are taken into account.(See, for example, Chapter 5 of Hennessy and Patterson's Computer Architecture,second edition.)This simulator can be con�gured to have up to three auxiliary caches between

registers and memory: An I-cache for instructions, a D-cache for data, and an S-cache for both instructions and data. The S-cache, also called a secondary cache, issupported only if both I-cache and D-cache are present. Arbitrary access times foreach cache can be speci�ed independently; we might assume, for example, that dataitems in the I-cache or D-cache can be sent to a register in one or two clock cycles,but the access time for the S-cache might be say 5 cycles, and main memory mightrequire 20 cycles or more. Our speculative pipeline can have many functional unitshandling load and store instructions, but only one load or store instruction can beupdating the D-cache or S-cache or main memory at a time. (However, the D-cachecan have several read ports; furthermore, data might be passing between the S-cacheand memory while other data is passing between the reorder bu�er and the D-cache.)Besides the optional I-cache, D-cache, and S-cache, there are required caches called

the IT-cache and DT-cache, for translation of virtual addresses to physical addresses.A translation cache is often called a \table lookaside bu�er" or TLB; but we call it acache since it is implemented in nearly the same way as an I-cache.

164. Consider a cache that has blocks of 2b bytes each and associativity 2a; hereb � 3 and a � 0. The I-cache, D-cache, and S-cache are addressed by 48-bit physicaladdresses, as if they were part of main memory; but the IT and DT caches areaddressed by 64-bit keys, obtained from a virtual address by blanking out the lowers bits and inserting the value of n, where the page size s and the process number nare found in rV. We will consider all caches to be addressed by 64-bit keys, so thatboth cases are handled with the same basic methods.Given a 64-bit key, we ignore the low-order b bits and use the next c bits to address

the cache set ; then the remaining 64� b� c bits should match one of 2a tags in thatset. The case a = 0 corresponds to a so-called direct-mapped cache; the case c = 0corresponds to a so-called fully associative cache. With 2c sets of 2a blocks each, and2b bytes per block, the cache contains 2a+b+c bytes of data, in addition to the spaceneeded for tags. Translation caches have b = 3 and they also usually have c = 0.If a tag matches the speci�ed bits, we \hit" in the cache and can use and/or

update the data found there. Otherwise we \miss," and we probably want to replaceone of the cache blocks by the block containing the item sought. The item chosen

Page 234: MMIXware - A RISC Computer for the Third Millennium - Knuth

227 MMIX-PIPE: CACHE MEMORY

for replacement is called a victim. The choice of victim is forced when the cache isdirect-mapped, but four strategies for victim selection are available when we mustchoose from among 2a entries for a > 0:

� \Random" selection chooses the victim by extracting the least signi�cant a bits ofthe clock.

� \Serial" selection chooses 0, 1, : : : , 2a � 1, 0, 1, : : : , 2a � 1, 0, : : : on successivetrials.

� \LRU (Least Recently Used)" selection chooses the victim that ranks last if itemsare ranked inversely to the time that has elapsed since their previous use.

� \Pseudo-LRU" selection chooses the victim by a rough approximation to LRU thatis simpler to implement in hardware. It requires a bit table r1 : : : ra. Whenever we usean item with binary address (i1 : : : ia)2 in the set, we adjust the bit table as follows:

r1 1� i1; r1i1 1� i2; : : : ; r1i1:::ia�1 1� ia;

here the subscripts on r are binary numbers. (For example, when a = 3, the use ofelement (010)2 sets r1 1, r10 0, r101 1, where r101 means the same as r5.)To select a victim, we start with l 1 and then repeatedly set l 2l + rl, a times;then we choose element l� 2a. When a = 1, this scheme is equivalent to LRU. Whena = 2, this scheme was implemented in the Intel 80486 chip.

hType de�nitions 11 i +�typedef enum frandom ; serial ; pseudo lru ; lru

g replace policy;

165. A cache might also include a \victim" area, which contains the last 2v victimblocks removed from the main cache area. The victim area can be searched in parallelwith the speci�ed cache set, thereby increasing the chance of a hit without makingthe search go slower. Each of the three replacement policies can be used also in thevictim cache.

Page 235: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 228

166. A cache also has a granularity 2g, where b � g � 3. This means that wemaintain, for each cache block, a set of 2b�g \dirty bits," which identify the 2g-bytegroups that have possibly changed since they were last read from memory. Thus ifg = b, an entire cache block is either dirty or clean; if g = 3, the dirtiness of eachoctabyte is maintained separately.Two policies are available when new data is written into all or part of a cache block.

We can write-through, meaning that we send all new data to memory immediately andnever mark anything dirty; or we can write-back, meaning that we update the memoryfrom the cache only when absolutely necessary. Furthermore we can write-allocate,meaning that we keep the new data in the cache, even if the cache block being writtenhas to be fetched �rst because of a miss; or we can write-around, meaning that wekeep the new data only if it was part of an existing cache block.(In this discussion, \memory" is shorthand for \the next level of the memory

hierarchy"; if there is an S-cache, the I-cache and D-cache write new data to theS-cache, not directly to memory. The I-cache, IT-cache, and DT-cache are read-only,so they do not need the facilities discussed in this section. Moreover, the D-cache andS-cache can be assumed to have the same granularity.)

hHeader de�nitions 6 i +�#de�ne WRITE_BACK 1 =� use this if not write-through �=#de�ne WRITE_ALLOC 2 =� use this if not write-around �=167. We have seen that many avors of cache can be simulated. They are repre-sented by cache structures, containing arrays of cacheset structures that containarrays of cacheblock structures for the individual blocks. We use a full byte to storeeach dirty bit, and we use full integer words to store rank �elds for LRU processing,etc.; memory economy is less important than simplicity in this simulator.

hType de�nitions 11 i +�typedef struct focta tag ; =� bits of key not included in the cache block address �=char �dirty ; =� array of 2g�b dirty bits, one per granule �=octa �data ; =� array of 2b�3 octabytes, the data in a cache block �=int rank ; =� auxiliary information for non-random policies �=

g cacheblock;typedef cacheblock �cacheset; =� array of 2a or 2v blocks �=typedef struct fint a; b; c; g; v;

=� lg of associativity, blocksize, setsize, granularity, and victimsize �=int aa ; bb ; cc ; gg ; vv ;

=� associativity, blocksize, setsize, granularity, and victimsize (all powers of 2) �=int tagmask ; =� �2b+c �=replace policy repl ; vrepl ; =� how to choose victims and victim-victims �=int mode ; =� optional WRITE_BACK and/or WRITE_ALLOC �=int access time ; =� cycles to know if there's a hit �=int copy in time ; =� cycles to copy a new block into the cache �=int copy out time ; =� cycles to copy an old block from the cache �=cacheset �set ; =� array of 2c sets of arrays of cache blocks �=cacheset victim ; =� the victim cache, if present �=

Page 236: MMIXware - A RISC Computer for the Third Millennium - Knuth

229 MMIX-PIPE: CACHE MEMORY

coroutine �ller ; =� a coroutine for copying new blocks into the cache �=control �ller ctl ; =� its control block �=coroutine usher ; =� a coroutine for writing dirty old data from the cache �=control usher ctl ; =� its control block �=cacheblock inbuf ; =� �lling comes from here �=cacheblock outbuf ; =� ushing goes to here �=lockvar lock ; =� nonzero when the cache is being changed signi�cantly �=lockvar �ll lock ; =� nonzero when �ller should pass data back �=int ports ; =� how many coroutines can be reading the cache? �=coroutine �reader ;

=� array of coroutines that might be reading simultaneously �=char �name ; =� "Icache", for example �=

g cache;168. hExternal variables 4 i +�Extern cache �Icache ; �Dcache ; �Scache ; �ITcache ; �DTcache ;

169. Now we are ready to de�ne some basic subroutines for cache maintenance.Let's begin with a trivial routine that tests if a given cache block is dirty.

h Internal prototypes 13 i +�static bool is dirty ARGS((cache �; cacheblock �));

170. h Subroutines 14 i +�static bool is dirty (c; p)

cache �c; =� the cache containing it �=cacheblock �p; =� a cache block �=

fregister int j;register char �d = p~dirty ;

for (j = 0; j < c~bb ; d++; j += c~gg )if (�d) return true ;

return false ;g

ARGS=macro, x6.bool= enum, x11.control= struct, x44.coroutine= struct, x23.

Extern=macro, x4.false =0, x11.lockvar= coroutine �, x37.octa= struct, x17.

random =0, x164.replace policy = enum, x164.true =1, x11.

Page 237: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 230

171. For diagnostic purposes we might want to display an entire cache block.

h Internal prototypes 13 i +�static void print cache block ARGS((cacheblock; cache �));

172. h Subroutines 14 i +�static void print cache block (p; c)

cacheblock p;cache �c;

f register int i; j; b = c~bb � 3; g = c~gg � 3;

printf ("%08x%08x: "; p:tag :h; p:tag :l);for (i = j = 0; j < b; j++; i += ((j & (g � 1)) ? 0 : 1))printf ("%08x%08x%c"; p:data [j]:h; p:data [j]:l; p:dirty [i] ? '*' : ' ');

printf (" (%d)\n"; p:rank );g

173. h Internal prototypes 13 i +�static void print cache locks ARGS((cache �));

174. h Subroutines 14 i +�static void print cache locks (c)

cache �c;fif (c) fif (c~ lock ) printf ("%s locked by %s:%d\n"; c~name ; c~ lock~name ; c~ lock~stage );if (c~�ll lock )printf ("%sfill locked by %s:%d\n"; c~name ; c~�ll lock~name ; c~�ll lock~stage );

gg

175. The print cache routine prints the entire contents of a cache. This can be ahuge amount of data, but it can be very useful when debugging. Fortunately, the taskof debugging favors the use of small caches, since interesting cases arise more oftenwhen a cache is fairly small.

hExternal prototypes 9 i +�Extern void print cache ARGS((cache �;bool));

176. hExternal routines 10 i +�void print cache (c; dirty only )

cache �c;bool dirty only ;

fif (c) f register int i; j;

printf ("%s of %s:"; dirty only ? "Dirty blocks" : "Contents"; c~name );if (c~�ller :next ) fprintf (" (filling ");print octa (c~name [1] � 'T' ? c~�ller ctl :y:o : c~�ller ctl :z:o);printf (")");

gif (c~ usher :next ) fprintf (" (flushing ");

Page 238: MMIXware - A RISC Computer for the Third Millennium - Knuth

231 MMIX-PIPE: CACHE MEMORY

print octa (c~outbuf :tag );printf (")");

gprintf ("\n");hPrint all of c's cache blocks 177 i;

gg

177. We don't print the cache blocks that have an invalid tag, unless requested tobe verbose.

hPrint all of c's cache blocks 177 i �for (i = 0; i < c~cc ; i++)for (j = 0; j < c~aa ; j++)if ((:(c~set [i][j]:tag :h& sign bit ) _ (verbose & show wholecache bit )) ^

(:dirty only _ is dirty (c;&c~set [i][j]))) fprintf ("[%d][%d] "; i; j);print cache block (c~set [i][j]; c);

gfor (j = 0; j < c~vv ; j++)if ((:(c~victim [j]:tag :h& sign bit ) _ (verbose & show wholecache bit )) ^

(:dirty only _ is dirty (c;&c~victim [j]))) fprintf ("V[%d] "; j);print cache block (c~victim [j]; c);

gThis code is used in section 176.

aa : int, x167.ARGS=macro, x6.bb : int, x167.bool= enum, x11.cache= struct, x167.cacheblock= struct, x167.cc : int, x167.data : octa �, x167.dirty : char �, x167.Extern=macro, x4.�ll lock : lockvar, x167.�ller : coroutine, x167.�ller ctl : control, x167.

usher : coroutine, x167.gg : int, x167.h: tetra, x17.is dirty : static bool ( ), x170.l: tetra, x17.lock : lockvar, x167.name : char �, x167.next : coroutine �, x23.o: octa, x40.outbuf : cacheblock, x167.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.

rank : int, x167.set : cacheset �, x167.show wholecache bit =1 � 8,x8.

sign bit =macro, x80.stage : int, x23.tag : octa, x167.verbose : int, x4.victim : cacheset, x167.vv : int, x167.y: spec, x44.z: spec, x44.

Page 239: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 232

178. The clean block routine simply initializes a given cache block.

hExternal prototypes 9 i +�Extern void clean block ARGS((cache �; cacheblock �));

179. hExternal routines 10 i +�void clean block (c; p)

cache �c;cacheblock �p;

fregister int j;

p~ tag :h = sign bit ; p~ tag :l = 0;for (j = 0; j < c~bb � 3; j++) p~data [j] = zero octa ;for (j = 0; j < c~bb � c~g; j++) p~dirty [j] = false ;

g180. The zap cache routine invalidates all tags of a given cache, e�ectively restoringit to its initial condition.

hExternal prototypes 9 i +�Extern void zap cache ARGS((cache �));

181. We clear the dirty entries here, just to be tidy, although they could actuallybe left in arbitrary condition when the tags are invalid.

hExternal routines 10 i +�void zap cache (c)

cache �c;fregister int i; j;

for (i = 0; i < c~cc ; i++)for (j = 0; j < c~aa ; j++) fclean block (c;&(c~set [i][j]));

gfor (j = 0; j < c~vv ; j++) fclean block (c;&(c~victim [j]));

gg

182. The get reader subroutine �nds the index of an available reader coroutine fora given cache, or returns a negative value if no readers are available.

h Internal prototypes 13 i +�static int get reader ARGS((cache �));

183. h Subroutines 14 i +�static int get reader (c)

cache �c;f register int j;

for (j = 0; j < c~ports ; j++)if (c~reader [j]:next � �) return j;

return �1;g

Page 240: MMIXware - A RISC Computer for the Third Millennium - Knuth

233 MMIX-PIPE: CACHE MEMORY

184. The subroutine copy block (c; p; cc ; pp) copies the dirty items from block p ofcache c into block pp of cache cc , assuming that the destination cache has a suÆcientlylarge block size. (In other words, we assume that cc~b � c~b.) We also assume thatboth blocks have compatible tags, and that both caches have the same granularity.

h Internal prototypes 13 i +�static void copy block ARGS((cache �; cacheblock �; cache �; cacheblock �));

185. h Subroutines 14 i +�static void copy block (c; p; cc ; pp)

cache �c; �cc ;cacheblock �p; �pp ;

fregister int j; jj ; i; ii ; lim ;register int o� = p~ tag :l & (cc~bb � 1);

if (c~g 6= cc~g _ p~ tag :h 6= pp~ tag :h _ p~ tag :l � o� 6= pp~ tag :l)panic(confusion ("copy block"));

for (j = 0; jj = o� � c~g; j < c~bb � c~g; j++; jj ++)if (p~dirty [j]) fpp~dirty [jj ] = true ;for (i = j � (c~g � 3); ii = jj � (c~g � 3); lim = (j + 1)� (c~g � 3); i < lim ;

i++; ii++) pp~data [ii ] = p~data [i];g

g

aa : int, x167.ARGS=macro, x6.b: int, x167.bb : int, x167.cache= struct, x167.cacheblock= struct, x167.cc : int, x167.confusion =macro ( ), x13.data : octa �, x167.

dirty : char �, x167.Extern=macro, x4.false =0, x11.g: int, x167.h: tetra, x17.l: tetra, x17.next : coroutine �, x23.panic =macro ( ), x13.ports : int, x167.

reader : coroutine �, x167.set : cacheset �, x167.sign bit =macro, x80.tag : octa, x167.true =1, x11.victim : cacheset, x167.vv : int, x167.zero octa : octa,MMIX-ARITH x4.

Page 241: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 234

186. The choose victim subroutine selects the victim to be replaced when we needto change a cache set. We need only one bit of the rank �elds to implement the r tablewhen policy = pseudo lru , and we don't need rank at all when policy = random . Ofcourse we use an a-bit counter to implement policy = serial . In the other case,policy = lru , we need an a-bit rank �eld; the least recently used entry has rank 0,and the most recently used entry has rank 2a � 1 = aa � 1.

h Internal prototypes 13 i +�static cacheblock �choose victim ARGS((cacheset; int; replace policy));

187. h Subroutines 14 i +�static cacheblock �choose victim (s; aa ; policy )

cacheset s;int aa ; =� setsize �=replace policy policy ;

fregister cacheblock �p;register int l; m;

switch (policy ) fcase random : return &s[ticks :l & (aa � 1)];case serial : l = s[0]:rank ; s[0]:rank = (l + 1) & (aa � 1); return &s[l];case lru :for (p = s; p < s+ aa ; p++)if (p~rank � 0) return p;

panic(confusion ("lru victim")); =� what happened? nobody has rank zero �=case pseudo lru :for (l = 1;m = aa � 1; m; m�= 1) l = l + l + s[l]:rank ;return &s[l � aa ];

gg

188. The note usage subroutine updates the rank entries to record the fact that aparticular block in a cache set is now being used.

h Internal prototypes 13 i +�static void note usage ARGS((cacheblock �; cacheset; int; replace policy));

189. h Subroutines 14 i +�static void note usage (l; s; aa ; policy )

cacheblock �l; =� a cache block that's probably worth preserving �=cacheset s; =� the set that contains l �=int aa ; =� setsize �=replace policy policy ;

fregister cacheblock �p;register int j; m; r;

if (aa � 1 _ policy � serial ) return;if (policy � lru ) f

r = l~rank ;for (p = s; p < s+ aa ; p++)if (p~rank > r) p~rank ��;

Page 242: MMIXware - A RISC Computer for the Third Millennium - Knuth

235 MMIX-PIPE: CACHE MEMORY

l~rank = aa � 1;gelse f =� policy � pseudo lru �=

r = l � s;for (j = 1;m = aa � 1; m; m�= 1)if (r &m) s[j]:rank = 0; j = j + j + 1;else s[j]:rank = 1; j = j + j;

greturn;

g190. The demote usage subroutine is sort of the opposite of note usage ; it changesthe rank of a given block to least recently used.

h Internal prototypes 13 i +�static void demote usage ARGS((cacheblock �; cacheset; int; replace policy));

191. h Subroutines 14 i +�static void demote usage (l; s; aa ; policy )

cacheblock �l; =� a cache block we probably don't need �=cacheset s; =� the set that contains l �=int aa ; =� setsize �=replace policy policy ;

fregister cacheblock �p;register int j; m; r;

if (aa � 1 _ policy � serial ) return;if (policy � lru ) f

r = l~rank ;for (p = s; p < s+ aa ; p++)if (p~rank < r) p~rank ++;

l~rank = 0;gelse f =� policy � pseudo lru �=

r = l � s;for (j = 1;m = aa � 1; m; m�= 1)if (r &m) s[j]:rank = 1; j = j + j + 1;else s[j]:rank = 0; j = j + j;

greturn;

g

aa : int, x167.ARGS=macro, x6.cacheblock= struct, x167.cacheset= cacheblock �,x167.

confusion =macro ( ), x13.lru =3, x164.panic =macro ( ), x13.pseudo lru =2, x164.random =0, x164.

rank : int, x167.replace policy = enum, x164.serial =1, x164.ticks =macro, x87.

Page 243: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 236

192. The cache search routine looks for a given key � in a given cache, and returnsa cache block if there's a hit; otherwise it returns �. If the search hits, the set inwhich the block was found is stored in global variable hit set . Notice that we need tocheck more bits of the tag when we search in the victim area.

#de�ne cache addr (c; alf ) c~set [(alf :l &�(c~ tagmask ))� c~b]

h Internal prototypes 13 i +�static cacheblock �cache search ARGS((cache �;octa));

193. h Subroutines 14 i +�static cacheblock �cache search (c; alf )

cache �c; =� the cache to be searched �=octa alf ; =� the key �=

fregister cacheset s;register cacheblock �p;s = cache addr (c; alf ); =� the set corresponding to alf �=for (p = s; p < s+ c~aa ; p++)if (((p~ tag :l � alf :l) & c~ tagmask ) � 0 ^ p~ tag :h � alf :h) goto hit ;

s = c~victim ;if (:s) return �; =� cache miss, and no victim area �=for (p = s; p < s+ c~vv ; p++)if (((p~ tag :l � alf :l) & (�c~bb)) � 0 ^ p~ tag :h � alf :h) goto hit ;

return �; =� double miss �=hit : hit set = s; return p;g

194. hGlobal variables 20 i +�cacheset hit set ;

195. If p = cache search (c; alf ) hits and if we call use and �x (c; p) immediatelyafterwards, cache c is updated to record the usage of key alf . A hit in the victim areamoves the cache block to the main area, unless the �ller routine of cache c is active.A pointer to the (possibly moved) cache block is returned.

h Internal prototypes 13 i +�static cacheblock �use and �x ARGS((cache �; cacheblock �));

196. h Subroutines 14 i +�static cacheblock �use and �x (c; p)

cache �c;cacheblock �p;

fif (hit set 6= c~victim ) note usage (p; hit set ; c~aa ; c~repl );else fnote usage (p; hit set ; c~vv ; c~vrepl ); =� found in victim cache �=if (:c~�ller :next ) fregister cacheset s = cache addr (c; p~ tag );register cacheblock �q = choose victim (s; c~aa ; c~repl );

note usage (q; s; c~aa ; c~repl );hSwap cache blocks p and q 197 i;

Page 244: MMIXware - A RISC Computer for the Third Millennium - Knuth

237 MMIX-PIPE: CACHE MEMORY

return q;g

greturn p;

g197. We can simply permute the pointers inside the cacheblock structures of acache, instead of copying the data, if we are careful not to let any of those pointersescape into other data structures.

hSwap cache blocks p and q 197 i �focta t;register char �d = p~dirty ;register octa �dd = p~data ;

t = p~ tag ; p~ tag = q~ tag ; q~ tag = t;p~dirty = q~dirty ; q~dirty = d;p~data = q~data ; q~data = dd ;

gThis code is used in sections 196 and 205.

198. The demote and �x routine is analogous to use and �x , except that we wantdon't want to promote the data we found.

h Internal prototypes 13 i +�static cacheblock �demote and �x ARGS((cache �; cacheblock �));

199. h Subroutines 14 i +�static cacheblock �demote and �x (c; p)

cache �c;cacheblock �p;

fif (hit set 6= c~victim ) demote usage (p; hit set ; c~aa ; c~repl );else demote usage (p; hit set ; c~vv ; c~vrepl );return p;

g

aa : int, x167.ARGS=macro, x6.b: int, x167.bb : int, x167.cache= struct, x167.cacheblock= struct, x167.cacheset= cacheblock �,x167.

choose victim : staticcacheblock �( ), x187.

data : octa �, x167.

demote usage : static void ( ),x191.

dirty : char �, x167.�ller : coroutine, x167.h: tetra, x17.l: tetra, x17.next : coroutine �, x23.note usage : static void ( ),x189.

octa= struct, x17.p: register cacheblock �,

x205.q: register cacheblock �,x205.

repl : replace policy, x167.set : cacheset �, x167.tag : octa, x167.tagmask : int, x167.victim : cacheset, x167.vrepl : replace policy, x167.vv : int, x167.

Page 245: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE MEMORY 238

200. The subroutine load cache (c; p) is called at a moment when c~ lock has beenset and c~ inbuf has been �lled with clean data to be placed in the cache block p.

h Internal prototypes 13 i +�static void load cache ARGS((cache �; cacheblock �));

201. h Subroutines 14 i +�static void load cache (c; p)

cache �c; cacheblock �p;fregister int i;register octa �d;for (i = 0; i < c~bb � c~g; i++) p~dirty [i] = false ;d = p~data ; p~data = c~ inbuf :data ; c~ inbuf :data = d;p~ tag = c~ inbuf :tag ;hit set = cache addr (c; p~ tag ); use and �x (c; p); =� p not moved �=

g202. The subroutine ush cache (c; p; keep) is called at a \quiet" moment whenc~ usher :next = �. It puts cache block p into c~outbuf and �res up the c~ ushercoroutine, which will take care of sending the data to lower levels of the memoryhierarchy. Cache block p is also marked clean.

h Internal prototypes 13 i +�static void ush cache ARGS((cache �; cacheblock �;bool));

203. h Subroutines 14 i +�static void ush cache (c; p; keep)

cache �c;cacheblock �p; =� a block inside cache c �=bool keep ; =� should we preserve the data in p? �=

fregister octa �d;register char �dd ;register int j;

c~outbuf :tag = p~ tag ;if (keep) for (j = 0; j < c~bb � 3; j++) c~outbuf :data [j] = p~data [j];else d = c~outbuf :data ; c~outbuf :data = p~data ; p~data = d;dd = c~outbuf :dirty ; c~outbuf :dirty = p~dirty ; p~dirty = dd ;for (j = 0; j < c~bb � c~g; j++) p~dirty [j] = false ;startup (&c~ usher ; c~copy out time ); =� will not be aborted �=

g204. The alloc slot routine is called when we wish to put new information into acache after a cache miss. It returns a pointer to a cache block in the main area wherethe new information should be put. The tag of that cache block is invalidated; thecalling routine should take care of �lling it and giving it a valid tag in due time. Thecache's �ller routine should not be active when alloc slot is called.Inserting new information might also require writing old information into the next

level of the memory hierarchy, if the block being replaced is dirty. This routine returns

Page 246: MMIXware - A RISC Computer for the Third Millennium - Knuth

239 MMIX-PIPE: CACHE MEMORY

� in such cases if the cache is ushing a previously discarded block. Otherwise itschedules the usher coroutine.This routine returns � also if the given key happens to be in the cache. Such

cases are rare, but the following scenario shows that they aren't impossible: Supposethe DT-cache access time is 5, the D-cache access time is 1, and two processessimultaneously look for the same physical address. One process hits in DT-cache butmisses in D-cache, waiting 5 cycles before trying alloc slot in the D-cache; meanwhilethe other process missed in D-cache but didn't need to use the DT-cache, so it mighthave updated the D-cache.A key value is never negative. Therefore we can invalidate the tag in the chosen

slot by forcing it to be negative.

h Internal prototypes 13 i +�static cacheblock �alloc slot ARGS((cache �;octa));

205. h Subroutines 14 i +�static cacheblock �alloc slot (c; alf )

cache �c;octa alf ; =� key that probably isn't in the cache �=

fregister cacheset s;register cacheblock �p; �q;register int j;

if (cache search (c; alf )) return �;s = cache addr (c; alf ); =� the set corresponding to alf �=if (c~victim ) p = choose victim (c~victim ; c~vv ; c~vrepl );else p = choose victim (s; c~aa ; c~repl );if (is dirty (c; p)) fif (c~ usher :next ) return �; ush cache (c; p; false );

gif (c~victim ) f

q = choose victim (s; c~aa ; c~repl );h Swap cache blocks p and q 197 i;q~ tag :h j= sign bit ; =� invalidate the tag �=return q;

gp~ tag :h j= sign bit ; return p;

g

aa : int, x167.ARGS=macro, x6.bb : int, x167.bool= enum, x11.cache= struct, x167.cache addr =macro ( ), x192.cache search : staticcacheblock �( ), x193.

cacheblock= struct, x167.cacheset= cacheblock �,x167.

choose victim : staticcacheblock �( ), x187.

copy out time : int, x167.data : octa �, x167.dirty : char �, x167.false =0, x11.�ller : coroutine, x167. usher : coroutine, x167.g: int, x167.h: tetra, x17.hit set : cacheset, x194.inbuf : cacheblock, x167.is dirty : static bool ( ), x170.lock : lockvar, x167.

next : coroutine �, x23.octa= struct, x17.outbuf : cacheblock, x167.repl : replace policy, x167.sign bit =macro, x80.startup : static void ( ), x31.tag : octa, x167.use and �x : staticcacheblock �( ), x196.

victim : cacheset, x167.vrepl : replace policy, x167.vv : int, x167.

Page 247: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: SIMULATED MEMORY 240

206. Simulated memory. How should we deal with the potentially giganticmemory of MMIX? We can't simply declare an array m that has 248 bytes. (Indeed,up to 263 bytes are needed, if we consider also the physical addresses � 248 that arereserved for memory-mapped input/output.)We could regard memory as a special kind of cache, in which every access is required

to hit. For example, such an \M-cache" could be fully associative, with 2a blockseach having a di�erent tag; simulation could proceed until more than 2a � 1 tags arerequired. But then the prede�ned value of a might well be so large that the sequentialsearch of our cache search routine would be too slow.Instead, we will allocate memory in chunks of 216 bytes at a time, as needed, and

we will use hashing to search for the relevant chunk whenever a physical address isgiven. If the address is 248 or greater, special routines called spec read and spec write ,supplied by the user, will be called upon to do the reading or writing. Otherwise the48-bit address consists of a 32-bit chunk address and a 16-bit chunk o�set.Chunk addresses that are not used take no space in this simulator. But if, say, 1000

such patterns occur, the simulator will dynamically allocate approximately 65MB forthe portions of main memory that are used. Parameter mem chunks max speci�esthe largest number of di�erent chunk addresses that are supported. This parameterdoes not constrain the range of simulated physical addresses, which cover the entire256 large-terabyte range permitted by MMIX.

hType de�nitions 11 i +�typedef struct ftetra tag ; =� 32-bit chunk address �=octa �chunk ; =� either � or an array of 213 octabytes �=

g chunknode;207. The parameter hash prime should be a prime number larger than the pa-rameter mem chunks max , preferably more than twice as large but not much biggerthan that. The default values mem chunks max = 1000 and hash prime = 2009 areset by MMIX con�g unless the user speci�es otherwise.

hExternal variables 4 i +�Extern int mem chunks ; =� this many chunks are allocated so far �=Extern int mem chunks max ; =� up to this many di�erent chunks per run �=Extern int hash prime ; =� larger than mem chunks max , but not enormous �=Extern chunknode �mem hash ; =� the simulated main memory �=

208. The separately compiled procedures spec read ( ) and spec write ( ) have thesame calling conventions as the general procedures mem read ( ) and mem write ( ).

hSubroutines 14 i +�extern octa spec read ARGS((octa addr )); =� for memory mapped I/O �=extern void spec write ARGS((octa addr ;octa val )); =� likewise �=

Page 248: MMIXware - A RISC Computer for the Third Millennium - Knuth

241 MMIX-PIPE: SIMULATED MEMORY

209. If the program tries to read from a chunk that hasn't been allocated, the valuezero is returned, optionally with a comment to the user.Chunk address 0 is always allocated �rst. Then we can assume that a matching

chunk tag implies a nonnull chunk pointer.This routine sets last h to the chunk found, so that we can rapidly read other words

that we know must belong to the same chunk. For this purpose it is convenient to letmem hash [hash prime ] be a chunk full of zeros, representing uninitialized memory.

hExternal prototypes 9 i +�Extern octa mem read ARGS((octa addr ));

210. hExternal routines 10 i +�octa mem read (addr )

octa addr ;fregister tetra o� ; key ;register int h;

if (addr :h � (1� 16)) return spec read (addr );o� = (addr :l & #

ffff)� 3;key = (addr :l & #

ffff0000) + addr :h;for (h = key % hash prime ; mem hash [h]:tag 6= key ; h��) fif (mem hash [h]:chunk � �) fif (verbose & uninit mem bit )errprint2 ("uninitialized memory read at %08x%08x"; addr :h; addr :l);

h = hash prime ; break; =� zero will be returned �=gif (h � 0) h = hash prime ;

glast h = h;return mem hash [h]:chunk [o� ];

g211. hExternal variables 4 i +�Extern int last h ; =� the hash index that was most recently correct �=

ARGS=macro, x6.cache search : staticcacheblock �( ), x193.

errprint2 =macro ( ), x13.Extern=macro, x4.h: tetra, x17.

l: tetra, x17.mem write : void ( ), x213.MMIX con�g : void ( ),MMIX-CONFIG x38.

octa= struct, x17.spec read : octa ( ),

MMIX-MEM x2.spec write : ( ), MMIX-MEM x3.tetra=unsigned int, x17.uninit mem bit =1� 4, x8.verbose : int, x4.

Page 249: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: SIMULATED MEMORY 242

212. hExternal prototypes 9 i +�Extern void mem write ARGS((octa addr ;octa val ));

213. hExternal routines 10 i +�void mem write (addr ; val )

octa addr ; val ;fregister tetra o� ; key ;register int h;

if (addr :h � (1� 16)) f spec write (addr ; val ); return; go� = (addr :l & #

ffff)� 3;key = (addr :l & #

ffff0000) + addr :h;for (h = key % hash prime ; mem hash [h]:tag 6= key ; h��) fif (mem hash [h]:chunk � �) fif (++mem chunks > mem chunks max )panic(errprint1 ("More than %d memory chunks are needed";

mem chunks max ));mem hash [h]:chunk = (octa �) calloc(1� 13; sizeof (octa));if (mem hash [h]:chunk � �)panic(errprint1 ("I can't allocate memory chunk number %d";mem chunks ));

mem hash [h]:tag = key ;break;

gif (h � 0) h = hash prime ;

glast h = h;mem hash [h]:chunk [o� ] = val ;

g214. The memory is characterized by several parameters, depending on the char-acteristics of the memory bus being simulated. Let bus words be the number ofoctabytes read or written simultaneously (usually bus words is 1 or 2; it must bea power of 2). The number of clock cycles needed to read or write c � bus wordsoctabytes that all belong to the same cache block is assumed to be mem addr time +c �mem read time or mem addr time + c �mem write time , respectively.

hExternal variables 4 i +�Extern int mem addr time ; =� cycles to transmit an address on memory bus �=Extern int bus words ; =� width of memory bus, in octabytes �=Extern int mem read time ; =� cycles to read from main memory �=Extern int mem write time ; =� cycles to write to main memory �=Extern lockvar mem lock ; =� is nonnull when the bus is busy �=

215. One of the principal ways to write memory is to invoke a ush to mem corou-tine, which is the Scache~ usher if there is an S-cache, or the Dcache~ usher if thereis a D-cache but no S-cache.When such a coroutine is started, its data~ptr a will be Scache or Dcache . The

data to be written will just have been copied to the cache's outbuf .

Page 250: MMIXware - A RISC Computer for the Third Millennium - Knuth

243 MMIX-PIPE: SIMULATED MEMORY

hCases for control of special coroutines 126 i +�case ush to mem :f register cache �c = (cache �) data~ptr a ;switch (data~state ) fcase 0: if (mem lock ) wait (1);data~state = 1;

case 1: set lock (self ;mem lock );data~state = 2;hWrite the dirty data of c~outbuf and wait for the bus 216 i;

case 2: goto terminate ; =� this frees mem lock and c~outbuf �=g

g216. hWrite the dirty data of c~outbuf and wait for the bus 216 i �fregister int o� ; last o� ; count ; �rst ; ii ;register int del = c~gg � 3; =� octabytes per granule �=octa addr ;

addr = c~outbuf :tag ; o� = (addr :l & #ffff)� 3;

for (i = j = 0;�rst = 1; count = 0; j < c~bb � c~g; j++) fii = i+ del ;if (:c~outbuf :dirty [j]) i = ii ; o� += del ; addr :l += del � 3;else while (i < ii ) f

if (�rst ) fcount++; last o� = o� ; �rst = 0;mem write (addr ; c~outbuf :data [i]);

g else fif ((o� � last o� ) & (�bus words )) count++;last o� = o� ;mem hash [last h ]:chunk [o� ] = c~outbuf :data [i];

gi++; o� ++; addr :l += 8;

ggwait (mem addr time + count �mem write time );

gThis code is used in section 215.

ARGS=macro, x6.bb : int, x167.cache= struct, x167.calloc : void �( ), <stdlib.h>.chunk : octa �, x206.data : register control �,x124.

data : octa �, x167.Dcache : cache �, x168.dirty : char �, x167.errprint1 =macro ( ), x13.Extern=macro, x4. ush to mem =97, x129. usher : coroutine, x167.g: int, x167.

gg : int, x167.h: tetra, x17.hash prime : int, x207.i: register int, x12.j: register int, x12.l: tetra, x17.last h : int, x211.lockvar= coroutine �, x37.mem chunks : int, x207.mem chunks max : int, x207.mem hash : chunknode �,x207.

octa= struct, x17.outbuf : cacheblock, x167.

panic =macro ( ), x13.ptr a : void �, x44.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.spec write : extern void ( ),x208.

state : int, x44.tag : tetra, x206.tag : octa, x167.terminate : label, x125.tetra=unsigned int, x17.wait =macro ( ), x125.

Page 251: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE TRANSFERS 244

217. Cache transfers. We have seen that the Dcache~ usher sends data directlyto the main memory if there is no S-cache. But if both D-cache and S-cache exist, theDcache~ usher is a more complicated coroutine of type ush to S . In this case weneed to deal with the fact that the S-cache blocks might be larger than the D-cacheblocks; furthermore, the S-cache might have a write-around and/or write-throughpolicy, etc. But one simplifying fact does help us: We know that the usher coroutinewill not be aborted until it has run to completion.Some machines, such as the Alpha 21164, have an additional cache between the

S-cache and memory, called the B-cache (the \backup cache"). A B-cache could besimulated by extending the logic used here; but such extensions of the present programare left to the interested reader.

hCases for control of special coroutines 126 i +�case ush to S :f register cache �c = (cache �) data~ptr a ;register int block di� = Scache~bb � c~bb ;

p = (cacheblock �) data~ptr b ;switch (data~state ) fcase 0: if (Scache~ lock ) wait (1);data~state = 1;

case 1: set lock (self ; Scache~ lock );data~ptr b = (void �) cache search (Scache ; c~outbuf :tag );if (data~ptr b) data~state = 4;else if (Scache~mode & WRITE_ALLOC) data~state = (block di� ? 2 : 3);else data~state = 6;wait (Scache~access time );

case 2: hFill Scache~ inbuf with clean memory data 219 i;case 3: hAllocate a slot p in the S-cache 218 i;if (block di� ) hCopy Scache~ inbuf to slot p 220 i;

case 4: copy block (c;&(c~outbuf ); Scache ; p);hit set = cache addr (Scache ; c~outbuf :tag ); use and �x (Scache ; p);

=� p not moved �=data~state = 5; wait (Scache~copy in time );

case 5: if ((Scache~mode & WRITE_BACK) � 0) f =� write-through �=if (Scache~ usher :next ) wait (1); ush cache (Scache ; p; true );

ggoto terminate ;

case 6: hHandle write-around when ushing to the S-cache 221 i;g

g218. hAllocate a slot p in the S-cache 218 i �if (Scache~�ller :next ) wait (1); =� perhaps an unnecessary precaution? �=p = alloc slot (Scache ; c~outbuf :tag );if (:p) wait (1);data~ptr b = (void �) p;p~ tag = c~outbuf :tag ; p~ tag :l = c~outbuf :tag :l & (�Scache~bb);

This code is used in section 217.

Page 252: MMIXware - A RISC Computer for the Third Millennium - Knuth

245 MMIX-PIPE: CACHE TRANSFERS

219. We only need to read block di� bytes, but it's easier to read them all and tocharge only for reading the ones we needed.

hFill Scache~ inbuf with clean memory data 219 i �f register int o� = (c~outbuf :tag :l &

#ffff)� 3;

register int count = block di� � 3;register int delay ;

if (mem lock ) wait (1);for (j = 0; j < Scache~bb � 3; j++)if (j � 0) Scache~ inbuf :data [j] = mem read (c~outbuf :tag );else Scache~ inbuf :data [j] = mem hash [last h ]:chunk [j + o� ];

set lock (&mem locker ;mem lock );delay = mem addr time+(int) ((count +bus words �1)=(bus words ))�mem read time ;startup (&mem locker ; delay );data~state = 3; wait (delay );

gThis code is used in section 217.

220. hCopy Scache~ inbuf to slot p 220 i �fregister octa �d = p~data ;

p~data = Scache~ inbuf :data ; Scache~ inbuf :data = d;g

This code is used in section 217.

access time : int, x167.alloc slot : static cacheblock�( ), x205.

bb : int, x167.bus words : int, x214.cache= struct, x167.cache addr =macro ( ), x192.cache search : staticcacheblock �( ), x193.

cacheblock= struct, x167.chunk : octa �, x206.copy block : static void ( ),x185.

copy in time : int, x167.data : register control �,x124.

data : octa �, x167.Dcache : cache �, x168.�ller : coroutine, x167. ush cache : static void ( ),

x203. ush to S =96, x129. usher : coroutine, x167.hit set : cacheset, x194.inbuf : cacheblock, x167.j: register int, x12.l: tetra, x17.last h : int, x211.lock : lockvar, x167.mem addr time : int, x214.mem hash : chunknode �,x207.

mem lock : lockvar, x214.mem locker : coroutine, x127.mem read : octa ( ), x210.mem read time : int, x214.mode : int, x167.next : coroutine �, x23.octa= struct, x17.

outbuf : cacheblock, x167.p: register cacheblock �,x258.

ptr a : void �, x44.ptr b : void �, x44.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.startup : static void ( ), x31.state : int, x44.tag : octa, x167.terminate : label, x125.true =1, x11.use and �x : staticcacheblock �( ), x196.

wait =macro ( ), x125.WRITE_ALLOC=2, x166.WRITE_BACK=1, x166.

Page 253: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE TRANSFERS 246

221. Here we assume that the granularity is 8.

hHandle write-around when ushing to the S-cache 221 i �if (Scache~ usher :next ) wait (1);Scache~outbuf :tag :h = c~outbuf :tag :h;Scache~outbuf :tag :l = c~outbuf :tag :l & (�Scache~bb);for (j = 0; j < Scache~bb � Scache~g; j++) Scache~outbuf :dirty [j] = false ;copy block (c;&(c~outbuf );Scache ;&(Scache~outbuf ));startup (&Scache~ usher ; Scache~copy out time );goto terminate ;

This code is used in section 217.

222. The S-cache gets new data from memory by invoking a �ll from mem corou-tine; the I-cache or D-cache may also invoke a �ll from mem coroutine, if there is noS-cache. When such a coroutine is invoked, it holds mem lock , and its caller has goneto sleep. A physical memory address is given in data~z:o, and data~ptr a speci�eseither Icache or Dcache . Furthermore, data~ptr b speci�es a block within that cache,determined by the alloc slot routine. The coroutine simulates reading the contents ofthe speci�ed memory location, places the result in the x:o �eld of its caller's controlblock, and wakes up the caller. It proceeds to �ll the cache's inbuf and, ultimately,the speci�ed cache block, before waking the caller again.Let c = data~ptr b . The caller is then c~�ll lock , if this variable is nonnull.

However, the caller might not wish to be awoken or the receive the data (for example,if it has been aborted). In such cases c~�ll lock will be �; the �lling action continueswithout the wakeup calls. If c = Scache , the S-cache will be locked and the caller willnot have been aborted.

hCases for control of special coroutines 126 i +�case �ll from mem :f register cache �c = (cache �) data~ptr a ;register coroutine �cc = c~�ll lock ;

switch (data~state ) fcase 0: data~x:o = mem read (data~z:o);if (cc) fcc~ctl~x:o = data~x:o; awaken (cc ;mem read time );

gdata~state = 1;hRead data into c~ inbuf and wait for the bus 223 i;

case 1: release lock (self ;mem lock ); data~state = 2;case 2: if (c 6= Scache ) f

if (c~ lock ) wait (1);set lock (self ; c~ lock );

gif (cc) awaken (cc ; c~copy in time ); =� the second wakeup call �=load cache (c; (cacheblock �) data~ptr b);data~state = 3; wait (c~copy in time );

case 3: goto terminate ;g

g

Page 254: MMIXware - A RISC Computer for the Third Millennium - Knuth

247 MMIX-PIPE: CACHE TRANSFERS

223. If c's cache size is no larger than the memory bus, we wait an extra cycle, sothat there will be two wakeup calls.

hRead data into c~ inbuf and wait for the bus 223 i �fregister int count ; o� ;

c~ inbuf :tag = data~z:o; c~ inbuf :tag :l &= �c~bb ;count = c~bb � 3; o� = (c~ inbuf :tag :l &

#ffff)� 3;

for (i = 0; i < count ; i++; o� ++) c~ inbuf :data [i] = mem hash [last h ]:chunk [o� ];if (count � bus words ) wait (1 +mem read time )else wait ((int) (count=bus words ) �mem read time );

gThis code is used in section 222.

alloc slot : static cacheblock�( ), x205.

awaken =macro ( ), x125.bb : int, x167.bus words : int, x214.c: register cache �, x217.cache= struct, x167.cacheblock= struct, x167.chunk : octa �, x206.copy block : static void ( ),x185.

copy in time : int, x167.copy out time : int, x167.coroutine= struct, x23.ctl : control �, x23.data : register control �,x124.

data : octa �, x167.Dcache : cache �, x168.

dirty : char �, x167.false =0, x11.�ll from mem =95, x129.�ll lock : lockvar, x167. usher : coroutine, x167.g: int, x167.h: tetra, x17.i: register int, x12.Icache : cache �, x168.inbuf : cacheblock, x167.j: register int, x12.l: tetra, x17.last h : int, x211.load cache : static void ( ),x201.

lock : lockvar, x167.mem hash : chunknode �,x207.

mem lock : lockvar, x214.

mem read : octa ( ), x210.mem read time : int, x214.next : coroutine �, x23.o: octa, x40.outbuf : cacheblock, x167.ptr a : void �, x44.ptr b : void �, x44.release lock =macro ( ), x37.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.startup : static void ( ), x31.state : int, x44.tag : octa, x167.terminate : label, x125.wait =macro ( ), x125.x: specnode, x44.z: spec, x44.

Page 255: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE TRANSFERS 248

224. The �ll from S coroutine has the same conventions as �ll from mem , exceptthat the data comes directly from the S-cache if it is present there. This is the �llercoroutine for the I-cache and D-cache if an S-cache is present.

hCases for control of special coroutines 126 i +�case �ll from S :f register cache �c = (cache �) data~ptr a ;register coroutine �cc = c~�ll lock ;

p = (cacheblock �) data~ptr c ;switch (data~state ) fcase 0: p = cache search (Scache ; data~z:o);if (p) goto S non miss ;data~state = 1;

case 1: hStart the S-cache �ller 225 i;data~state = 2; sleep ;

case 2: if (cc) fcc~ctl~x:o = data~x:o; =� this data has been supplied by Scache~�ller �=awaken (cc ;Scache~access time ); =� we propagate it back �=

gdata~state = 3; sleep ; =� when we awake, the S-cache will have our data �=

S non miss : if (cc) fcc~ctl~x:o = p~data [(data~z:o:l & (Scache~bb � 1))� 3];awaken (cc ;Scache~access time );

gcase 3: hCopy data from p into c~ inbuf 226 i;data~state = 4; wait (Scache~access time );

case 4: if (c~ lock ) wait (1);set lock (self ; c~ lock );Scache~ lock = �; =� we had been holding that lock �=load cache (c; (cacheblock �) data~ptr b);data~state = 5; wait (c~copy in time );

case 5: if (cc) awaken (cc ; 1); =� second wakeup call �=goto terminate ;

gg

Page 256: MMIXware - A RISC Computer for the Third Millennium - Knuth

249 MMIX-PIPE: CACHE TRANSFERS

225. We are already holding the Scache~ lock , but we're about to take on theScache~�ll lock too (with the understanding that one is \stronger" than the other).For a short time the Scache~ lock will point to us but we will point to Scache~�ll lock ;this will not cause diÆculty, because the present coroutine is not abortable.

hStart the S-cache �ller 225 i �if (Scache~�ller :next _mem lock ) wait (1);p = alloc slot (Scache ; data~z:o);if (:p) wait (1);set lock (&Scache~�ller ;mem lock );set lock (self ;Scache~�ll lock );data~ptr c = Scache~�ller ctl :ptr b = (void �) p;Scache~�ller ctl :z:o = data~z:o;startup (&Scache~�ller ;mem addr time );

This code is used in section 224.

226. The S-cache blocks might be wider than the blocks of the I-cache or D-cache,so the copying in this step isn't quite trivial.

hCopy data from p into c~ inbuf 226 i �f register int o� ;

c~ inbuf :tag = data~z:o; c~ inbuf :tag :l &= �c~bb ;for (j = 0; o� = (c~ inbuf :tag :l & (Scache~bb � 1))� 3; j < c~bb � 3; j++; o� ++)

c~ inbuf :data [j] = p~data [o� ];release lock (self ; Scache~�ll lock );set lock (self ;Scache~ lock );

gThis code is used in section 224.

access time : int, x167.alloc slot : static cacheblock�( ), x205.

awaken =macro ( ), x125.bb : int, x167.cache= struct, x167.cache search : staticcacheblock �( ), x193.

cacheblock= struct, x167.copy in time : int, x167.coroutine= struct, x23.ctl : control �, x23.data : octa �, x167.data : register control �,x124.

�ll from mem =95, x129.

�ll from S =94, x129.�ll lock : lockvar, x167.�ller : coroutine, x167.�ller ctl : control, x167.inbuf : cacheblock, x167.j: register int, x12.l: tetra, x17.load cache : static void ( ),x201.

lock : lockvar, x167.mem addr time : int, x214.mem lock : lockvar, x214.next : coroutine �, x23.o: octa, x40.p: register cacheblock �,x258.

ptr a : void �, x44.ptr b : void �, x44.ptr c : void �, x44.release lock =macro ( ), x37.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.sleep =macro, x125.startup : static void ( ), x31.state : int, x44.tag : octa, x167.terminate : label, x125.wait =macro ( ), x125.x: specnode, x44.z: spec, x44.

Page 257: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE TRANSFERS 250

227. The instruction PRELD X,$Y,$Z generates bX=2bc commands if there are 2b

bytes per block in the D-cache. These commands will try to preload blocks $Y+ $Z,$Y + $Z + 2b, : : : , into the cache if it is not too busy.Similar considerations apply to the instructions PREGO X,$Y,$Z and PREST X,$Y,$Z.

hSpecial cases of instruction dispatch 117 i +�case preld : case prest : if (:Dcache ) goto noop inst ;if (cool~xx � Dcache~bb) cool~ interim = true ;cool~ptr a = (void �) mem :up ; break;

case prego : if (:Icache ) goto noop inst ;if (cool~xx � Icache~bb) cool~ interim = true ;cool~ptr a = (void �) mem :up ; break;

228. If the block size is 64, a command like PREST 200,$Y,$Z is actually is-sued as four commands PREST 200,$Y,$Z; PREST 191,$Y,$Z; PREST 127,$Y,$Z;

PREST 63,$Y,$Z. An interruption will then be able to resume properly. In thepipeline, the instruction PREST 200,$Y,$Z is considered to a�ect bytes $Y+$Z+192through $Y + $Z + 200, or fewer bytes if $Y + $Z is not a multiple of 64. (Remem-ber that these instructions are only hints; we act on them only if it is reasonablyconvenient to do so.)

hGet ready for the next step of PRELD or PREST 228 i �head~ inst = (head~ inst &�((Dcache~bb � 1)� 16))� #

10000;

This code is used in section 81.

229. hGet ready for the next step of PREGO 229 i �head~ inst = (head~ inst &�((Icache~bb � 1)� 16))� #

10000;

This code is used in section 81.

230. Another coroutine, called cleanup , is occasionally called into action to removedirty data from the D-cache and S-cache. If it is invoked by starting in state 0, withits i �eld set to sync , it will clean everything. It can also be invoked in state 4, withits i �eld set to syncd and with a physical address in its z:o �eld; then it simply makessure that no D-cache or S-cache blocks associated with that address are dirty.Field x:o:h should be set to zero if items are expected to remain in the cache after

being cleaned; otherwise �eld x:o:h should be set to sign bit .The coroutine that invokes cleanup should hold clean lock . If that coroutine dies,

because of an interruption, the cleanup coroutine will terminate prematurely.We assume that the D-cache and S-cache have some sort of way to identify their

�rst dirty block, if any, in access time cycles.

hGlobal variables 20 i +�coroutine clean co ;control clean ctl ;lockvar clean lock ;

231. h Initialize everything 22 i +�clean co :ctl = &clean ctl ;clean co :name = "Clean";clean co :stage = cleanup ;clean ctl :go :o:l = 4;

Page 258: MMIXware - A RISC Computer for the Third Millennium - Knuth

251 MMIX-PIPE: CACHE TRANSFERS

232. hCases for control of special coroutines 126 i +�case cleanup : p = (cacheblock �) data~ptr b ;switch (data~state ) fhCases 0 through 4, for the D-cache 233 i;hCases 5 through 9, for the S-cache 234 i;

case 10: goto terminate ;g

access time : int, x167.bb : int, x167.cacheblock= struct, x167.cleanup =91, x129.control= struct, x44.cool : control �, x60.coroutine= struct, x23.ctl : control �, x23.data : register control �,x124.

Dcache : cache �, x168.go : specnode, x44.h: tetra, x17.head : fetch �, x69.

i: register int, x12.Icache : cache �, x168.inst : tetra, x68.interim : bool, x44.l: tetra, x17.lockvar= coroutine �, x37.mem : specnode, x115.name : char �, x23.noop inst : label, x118.o: octa, x40.p: register cacheblock �,x258.

prego =73, x49.preld =61, x49.

prest =62, x49.ptr a : void �, x44.ptr b : void �, x44.sign bit =macro, x80.stage : int, x23.state : int, x44.sync =79, x49.syncd =64, x49.terminate : label, x125.true =1, x11.up : specnode �, x40.x: specnode, x44.xx : unsigned char, x44.z: spec, x44.

Page 259: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: CACHE TRANSFERS 252

233. hCases 0 through 4, for the D-cache 233 i �case 0: if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1);startup (&Dcache~reader [j];Dcache~access time );set lock (self ;Dcache~ lock );i = j = 0;

Dclean loop : p = (i < Dcache~cc ? &(Dcache~set [i][j]) : &(Dcache~victim [j]));if (p~ tag :h& sign bit ) goto Dclean inc ;if (:is dirty (Dcache ; p)) f

p~ tag :h 6= data~x:o:h; goto Dclean inc ;gdata~y:o:h = i; data~y:o:l = j;

Dclean : data~state = 1; data~ptr b = (void �) p; wait (Dcache~access time );case 1: if (Dcache~ usher :next ) wait (1); ush cache (Dcache ; p; data~x:o:h � 0);p~ tag :h j= data~x:o:h;release lock (self ;Dcache~ lock );data~state = 2; wait (Dcache~copy out time );

case 2: if (:clean lock ) goto done ; =� premature termination �=if (Dcache~ usher :next ) wait (1);if (data~ i 6= sync) goto Sprep ;data~state = 3;

case 3: if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1);startup (&Dcache~reader [j];Dcache~access time );set lock (self ;Dcache~ lock );i = data~y:o:h; j = data~y:o:l;

Dclean inc : j++;if (i < Dcache~cc ^ j � Dcache~aa ) j = 0; i++;if (i � Dcache~cc ^ j � Dcache~vv ) fdata~state = 5; wait (Dcache~access time );

ggoto Dclean loop ;

case 4: if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1);startup (&Dcache~reader [j];Dcache~access time );set lock (self ;Dcache~ lock );p = cache search (Dcache ; data~z:o);if (p) fdemote and �x (Dcache ; p);if (is dirty (Dcache ; p)) goto Dclean ;

gdata~state = 9; wait (Dcache~access time );

This code is used in section 232.

234. hCases 5 through 9, for the S-cache 234 i �case 5: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (:Scache ) goto done ;if (Scache~ lock ) wait (1);set lock (self ;Scache~ lock );i = j = 0;

Sclean loop : p = (i < Scache~cc ? &(Scache~set [i][j]) : &(Scache~victim [j]));

Page 260: MMIXware - A RISC Computer for the Third Millennium - Knuth

253 MMIX-PIPE: CACHE TRANSFERS

if (p~ tag :h& sign bit ) goto Sclean inc ;if (:is dirty (Scache ; p)) f

p~ tag :h 6= data~x:o:h; goto Sclean inc ;gdata~y:o:h = i; data~y:o:l = j;

Sclean : data~state = 6; data~ptr b = (void �) p; wait (Scache~access time );case 6: if (Scache~ usher :next ) wait (1); ush cache (Scache ; p; data~x:o:h � 0);p~ tag :h j= data~x:o:h;release lock (self ;Scache~ lock );data~state = 7; wait (Scache~copy out time );

case 7: if (:clean lock ) goto done ; =� premature termination �=if (Scache~ usher :next ) wait (1);if (data~ i 6= sync) goto done ;data~state = 8;

case 8: if (Scache~ lock ) wait (1);set lock (self ;Scache~ lock );i = data~y:o:h; j = data~y:o:l;

Sclean inc : j++;if (i < Scache~cc ^ j � Scache~aa ) j = 0; i++;if (i � Scache~cc ^ j � Scache~vv ) fdata~state = 10; wait (Scache~access time );

ggoto Sclean loop ;

Sprep : data~state = 9;case 9: if (self~ lockloc) release lock (self ;Dcache~ lock );if (:Scache ) goto done ;if (Scache~ lock ) wait (1);set lock (self ;Scache~ lock );p = cache search (Scache ; data~z:o);if (p) fdemote and �x (Scache ; p);if (is dirty (Scache ; p)) goto Sclean ;

gdata~state = 10; wait (Scache~access time );

This code is used in section 232.

aa : int, x167.access time : int, x167.cache search : staticcacheblock �( ), x193.

cc : int, x167.clean lock : lockvar, x230.copy out time : int, x167.data : register control �,x124.

Dcache : cache �, x168.demote and �x : staticcacheblock �( ), x199.

done : label, x125. ush cache : static void ( ),x203.

usher : coroutine, x167.

get reader : static int ( ), x183.h: tetra, x17.i: internal opcode, x44.i: register int, x12.is dirty : static bool ( ), x170.j: register int, x12.l: tetra, x17.lock : lockvar, x167.lockloc : coroutine ��, x23.next : coroutine �, x23.o: octa, x40.p: register cacheblock �,x258.

ptr b : void �, x44.reader : coroutine �, x167.release lock =macro ( ), x37.

Scache : cache �, x168.self : register coroutine �,x124.

set : cacheset �, x167.set lock =macro ( ), x37.sign bit =macro, x80.startup : static void ( ), x31.state : int, x44.sync =79, x49.tag : octa, x167.victim : cacheset, x167.vv : int, x167.wait =macro ( ), x125.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 261: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 254

235. Virtual address translation. Special arrays of coroutines and controlblocks come into play when we need to implement MMIX's rather complicated pagetable mechanism for virtual address translation. In e�ect, we have up to ten controlblocks outside of the reorder bu�er that are capable of executing instructions just asif they were part of that bu�er. The \opcodes" of these non-abortable instructionsare special internal operations called ldptp and ldpte , for loading page table pointersand page table entries.Suppose, for example, that we need to translate a virtual address for the DT-cache

in which the virtual page address (a4a3a2a1a0)1024 of segment i has a4 = a3 = 0and a2 6= 0. Then the rules say that we should �rst �nd a page table pointer p2 inphysical location 213(r+ bi +2)+ 8a2, then another page table pointer p1 in locationp2 + 8a1, and �nally the page table entry p0 in location p1 + 8a0. The simulatorachieves this by setting up three coroutines c0, c1, c2 whose control blocks correspondto the pseudo-instructions

LDPTP x,[263 + 213(r + bi + 2)],8a2LDPTP x,x,8a1LDPTE x,x,8a0

where x is a hidden internal register and the other quantites are immediate values.Slight changes to the normal functionality of LDO give us the actions needed to im-plement LDPTP and LDPTE. Coroutine cj corresponds to the instruction that involvesaj and computes pj ; when c0 has computed its value p0, we know how to translatethe original virtual address.The LDPTP and LDPTE commands return zero if their y operand is zero or if the

page table does not properly match rV.

#de�ne LDPTP PREGO =� internally this won't cause confusion �=#de�ne LDPTE GO

hGlobal variables 20 i +�control IPTctl [5]; DPTctl [5]; =� control blocks for I and D page translation �=coroutine IPTco [10]; DPTco [10]; =� each coroutine is a two-stage pipeline �=char �IPTname [5] = f"IPT0"; "IPT1"; "IPT2"; "IPT3"; "IPT4"g;char �DPTname [5] = f"DPT0"; "DPT1"; "DPT2"; "DPT3"; "DPT4"g;

Page 262: MMIXware - A RISC Computer for the Third Millennium - Knuth

255 MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION

236. h Initialize everything 22 i +�for (j = 0; j < 5; j++) fDPTco [2 � j]:ctl = &DPTctl [j]; IPTco [2 � j]:ctl = &IPTctl [j];if (j > 0) DPTctl [j]:op = IPTctl [j]:op = LDPTP;DPTctl [j]:i = IPTctl [j]:i = ldptp ;else DPTctl [0]:op = IPTctl [0]:op = LDPTE;DPTctl [0]:i = IPTctl [0]:i = ldpte ;IPTctl [j]:loc = DPTctl [j]:loc = neg one ;IPTctl [j]:go :o = DPTctl [j]:go :o = incr (neg one ; 4);IPTctl [j]:ptr a = DPTctl [j]:ptr a = (void �) &mem ;IPTctl [j]:ren x = DPTctl [j]:ren x = true ;IPTctl [j]:x:addr :h = DPTctl [j]:x:addr :h = �1;IPTco [2 � j]:stage = DPTco [2 � j]:stage = 1;IPTco [2 � j + 1]:stage = DPTco [2 � j + 1]:stage = 2;IPTco [2 � j]:name = IPTco [2 � j + 1]:name = IPTname [j];DPTco [2 � j]:name = DPTco [2 � j + 1]:name = DPTname [j];

gITcache~�ller ctl :ptr c = (void �) &IPTco [0];DTcache~�ller ctl :ptr c = (void �) &DPTco [0];

addr : octa, x40.control= struct, x44.coroutine= struct, x23.ctl : control �, x23.DTcache : cache �, x168.�ller ctl : control, x167.GO=#

9e, x47.go : specnode, x44.h: tetra, x17.i: internal opcode, x44.

incr : octa ( ), MMIX-ARITH x6.ITcache : cache �, x168.j: register int, x10.ldpte =58, x49.ldptp =57, x49.loc : octa, x44.mem : specnode, x115.name : char �, x23.neg one : octa, MMIX-ARITH x4.

o: octa, x40.op : mmix opcode, x44.PREGO=#

9c, x47.ptr a : void �, x44.ptr c : void �, x44.ren x : bool, x44.stage : int, x23.true =1, x11.x: specnode, x44.

Page 263: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 256

237. Page table calculations are invoked by a coroutine of type �ll from virt , whichis used to �ll the IT-cache or DT-cache. The calling conventions of �ll from virt areanalogous to those of �ll from mem or �ll from S : A virtual address is supplied indata~y:o, and data~ptr a points to a cache (ITcache or DTcache ), while data~ptr bis a block in that cache. We wake up the caller, who holds the cache's �ll lock , assoon as the translation of the given address has been calculated, unless the caller hasbeen aborted. (No second wakeup call is necessary.)

hCases for control of special coroutines 126 i +�case �ll from virt :f register cache �c = (cache �) data~ptr a ;register coroutine �cc = c~�ll lock ;register coroutine �co = (coroutine �) data~ptr c ;

=� &IPTco [0] or &DPTco [0] �=octa aaaaa ;

switch (data~state ) fcase 0: h Start up auxiliary coroutines to compute the page table entry 243 i;data~state = 1;

case 1: if (data~b:p) fif (data~b:p~known ) data~b:o = data~b:p~o; data~b:p = �;else wait (1);

ghCompute the new entry for c~ inbuf and give the caller a sneak preview 245 i;data~state = 2;

case 2: if (c~ lock ) wait (1);set lock (self ; c~ lock );load cache (c; (cacheblock �) data~ptr b);data~state = 3; wait (c~copy in time );

case 3: data~b:o = zero octa ; goto terminate ;g

g238. The current contents of rV, the special virtual translation register, are keptunpacked in several global variables page r , page s , etc., for convenience. WheneverrV changes, we recompute all these variables.

hGlobal variables 20 i +�int page n ; =� the 10-bit n �eld of rV, times 8 �=int page r ; =� the 27-bit r �eld of rV �=int page s ; =� the 8-bit s �eld of rV �=int page b [5]; =� the 4-bit b �elds of rV; page b [0] = 0 �=octa page mask ; =� the least signi�cant s bits �=bool page bad = true ; =� does rV violate the rules? �=

239. hUpdate the page variables 239 i �f octa rv ;

rv = data~z:o;page bad = (rv :l & 7 ? true : false );page n = rv :l & #

1ff8;rv = shift right (rv ; 13; 1);

Page 264: MMIXware - A RISC Computer for the Third Millennium - Knuth

257 MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION

page r = rv :l & #7ffffff;

rv = shift right (rv ; 27; 1);page s = rv :l & #

ff;if (page s < 13 _ page s > 48) page bad = true ;else if (page s < 32) page mask :h = 0; page mask :l = (1� page s )� 1;else page mask :h = (1� (page s � 32))� 1; page mask :l = #

ffffffff;page b [4] = (rv :l � 8) & #

f;page b [3] = (rv :l � 12) & #

f;page b [2] = (rv :l � 16) & #

f;page b [1] = (rv :l � 20) & #

f;g

This code is used in section 329.

240. Here's how we compute a tag of the IT-cache or DT-cache from a virtualaddress, and how we compute a physical address from a translation found in thecache.

#de�ne trans key (addr ) incr (oandn (addr ; page mask ); page n )

h Internal prototypes 13 i +�static octa phys addr ARGS((octa;octa));

241. h Subroutines 14 i +�static octa phys addr (virt ; trans )

octa virt ; trans ;f octa t;

t = trans ; t:l &= �8; =� zero out the protection bits �=return oplus (t; oand (virt ; page mask ));

g242. Cheap (and slow) versions of MMIX leave the page table calculations to software.If the global variable no hardware PT is set true, �ll from virt begins its actions instate 1, not state 0. (See the RESUME_TRANS operation.)

hExternal variables 4 i +�Extern bool no hardware PT ;

ARGS=macro, x6.b: spec, x44.b, MMIX-DOC x45.bool= enum, x11.cache= struct, x167.cacheblock= struct, x167.copy in time : int, x167.coroutine= struct, x23.data : register control �,x124.

DPTco : coroutine [ ], x235.DTcache : cache �, x168.Extern=macro, x4.false =0, x11.�ll from mem =95, x129.�ll from S =94, x129.�ll from virt =93, x129.�ll lock : lockvar, x167.h: tetra, x17.

inbuf : cacheblock, x167.incr : octa ( ), MMIX-ARITH x6.IPTco : coroutine [ ], x235.ITcache : cache �, x168.known : bool, x40.l: tetra, x17.load cache : static void ( ),x201.

lock : lockvar, x167.n, MMIX-DOC x45.o: octa, x40.oand : octa ( ),MMIX-ARITH x25.

oandn : octa ( ),MMIX-ARITH x25.

octa= struct, x17.oplus : octa ( ), MMIX-ARITH x5.p: specnode �, x40.ptr a : void �, x44.

ptr b : void �, x44.ptr c : void �, x44.r, MMIX-DOC x45.RESUME_TRANS=3, x320.s, MMIX-DOC x45.self : register coroutine �,x124.

set lock =macro ( ), x37.shift right : octa ( ),MMIX-ARITH x7.

state : int, x44.terminate : label, x125.true =1, x11.wait =macro ( ), x125.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 265: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 258

243. Note: The operating system is supposed to ensure that changes to the pagetable entries do not appear in the pipeline when a translation cache is being updated.The internal LDPTP and LDPTE instructions use only the \hot state" of the memorysystem.

hStart up auxiliary coroutines to compute the page table entry 243 i �aaaaa = data~y:o;i = aaaaa :h� 29; =� the segment number �=aaaaa :h &= #

1fffffff; =� the address within segment i �=aaaaa = shift right (aaaaa ; page s ; 1); =� the page address �=for (j = 0; aaaaa :l 6= 0 _ aaaaa :h 6= 0; j++) fco [2 � j]:ctl~z:o:h = 0; co [2 � j]:ctl~z:o:l = (aaaaa :l & #

3ff)� 3;aaaaa = shift right (aaaaa ; 10; 1);

gif (page b [i+ 1] < page b [i] + j) =� address too large �=; =� nothing needs to be done, since data~b:o is zero �=

else fif (j � 0) j = 1; co [0]:ctl~z:o = zero octa ;h Issue j pseudo-instructions to compute a page table entry 244 i;

gThis code is used in section 237.

244. The �rst stage of coroutine cj is co [2 � j]. It will pass the jth control block tothe second stage, co [2 � j + 1], which will load page table information from memory(or hopefully from the D-cache).

h Issue j pseudo-instructions to compute a page table entry 244 i �j��;aaaaa :l = page r + page b [i] + j;co [2 � j]:ctl~y:p = �;co [2 � j]:ctl~y:o = shift left (aaaaa ; 13);co [2 � j]:ctl~y:o:h += sign bit ;for ( ; ; j��) fco [2 � j]:ctl~x:o = zero octa ; co [2 � j]:ctl~x:known = false ;co [2 � j]:ctl~owner = &co [2 � j];startup (&co [2 � j]; 1);if (j � 0) break;co [2 � (j � 1)]:ctl~y:p = &co [2 � j]:ctl~x;

gdata~b:p = &co [0]:ctl~x;

This code is used in section 243.

Page 266: MMIXware - A RISC Computer for the Third Millennium - Knuth

259 MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION

245. At this point the translation of the given virtual address data~y:o is theoctabyte data~b:o. Its least signi�cant three bits are the protection code p = prpwpx;its page address �eld is scaled by 2s. It is entirely zero, including the protection bits,if there was a page table failure.

hCompute the new entry for c~ inbuf and give the caller a sneak preview 245 i �c~ inbuf :tag = trans key (data~y:o);c~ inbuf :data [0] = data~b:o;if (cc) fcc~ctl~z:o = data~b:o;awaken (cc ; 1);

gThis code is used in section 237.

aaaaa : octa, x237.awaken =macro ( ), x125.b: spec, x44.c: register cache �, x237.cc : register coroutine �,x237.

co : register coroutine �,x237.

ctl : control �, x23.data : register control �,x124.

data : octa �, x167.false =0, x11.

h: tetra, x17.i: register int, x12.inbuf : cacheblock, x167.j: register int, x12.known : bool, x40.l: tetra, x17.o: octa, x40.owner : coroutine �, x44.p: specnode �, x40.page b : int [ ], x238.page r : int, x238.page s : int, x238.shift left : octa ( ),

MMIX-ARITH x7.shift right : octa ( ),MMIX-ARITH x7.

sign bit =macro, x80.startup : static void ( ), x31.tag : octa, x167.trans key =macro ( ), x240.x: specnode, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 267: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE WRITE BUFFER 260

246. The write bu�er. The dispatcher has arranged things so that speculativestores into memory are recorded in a doubly linked list leading upward from mem .When such instructions �nally are committed, they enter the \write bu�er," whichholds octabytes that are ready to be written into designated physical memory ad-dresses (or into the D-cache and/or S-cache). The \hot state" of the computation isre ected not only by the registers and caches but also by the instructions that arepending in the write bu�er.

hType de�nitions 11 i +�typedef struct focta o; =� data to be stored �=octa addr ; =� its physical address �=tetra stamp ; =� when last committed (mod 232) �=internal opcode i; =� is this write special? �=

g write node;

247. We represent the bu�er in the usual way as a circular list, with elementswrite tail + 1, write tail + 2, : : : , write head .The data will sit at least holding time cycles before it leaves the write bu�er.

This speeds things up when di�erent �elds of the same octabyte are being stored bydi�erent instructions.

hExternal variables 4 i +�Extern write node �wbuf bot ; �wbuf top ;

=� least and greatest write bu�er nodes �=Extern write node �write head ; �write tail ;

=� front and rear of the write bu�er �=Extern lockvar wbuf lock ; =� is the data in write head being written? �=Extern int holding time ; =� minimum holding time �=Extern lockvar speed lock ; =� should we ignore holding time? �=

248. hGlobal variables 20 i +�coroutine write co ; =� coroutine that empties the write bu�er �=control write ctl ; =� its control block �=

249. h Initialize everything 22 i +�write co :ctl = &write ctl ;write co :name = "Write";write co :stage = write from wbuf ;write ctl :ptr a = (void �) &mem ;write ctl :go :o:l = 4;startup (&write co ; 1);write head = write tail = wbuf top ;

250. h Internal prototypes 13 i +�static void print write bu�er ARGS((void));

251. h Subroutines 14 i +�static void print write bu�er ( )fprintf ("Write buffer");if (write head � write tail ) printf (" (empty)\n");

Page 268: MMIXware - A RISC Computer for the Third Millennium - Knuth

261 MMIX-PIPE: THE WRITE BUFFER

else f register write node �p;printf (":\n");for (p = write head ; p 6= write tail ; p = (p � wbuf bot ? wbuf top : p� 1)) fprintf ("m["); print octa (p~addr ); printf ("]="); print octa (p~o);if (p~ i � stunc) printf (" unc");else if (p~ i � sync) printf (" sync");printf (" (age %d)\n"; ticks :l � p~stamp);

gg

g252. The entire present state of the pipeline computation can be visualized byprinting �rst the write bu�er, then the reorder bu�er, then the fetch bu�er. Thisshows the progression of results from oldest to youngest, from sizzling hot to ice cold.

hExternal prototypes 9 i +�Extern void print pipe ARGS((void));

253. hExternal routines 10 i +�void print pipe ( )fprint write bu�er ( );print reorder bu�er ( );print fetch bu�er ( );

g

ARGS=macro, x6.control= struct, x44.coroutine= struct, x23.ctl : control �, x23.Extern=macro, x4.go : specnode, x44.internal opcode= enum,x49.

l: tetra, x17.lockvar= coroutine �, x37.

mem : specnode, x115.name : char �, x23.o: octa, x40.octa= struct, x17.print fetch bu�er : static void

( ), x73.print octa : static void ( ), x19.print reorder bu�er : staticvoid ( ), x63.

printf : int ( ), <stdio.h>.ptr a : void �, x44.stage : int, x23.startup : static void ( ), x31.stunc =67, x49.sync =79, x49.tetra=unsigned int, x17.ticks =macro, x87.write from wbuf =92, x129.

Page 269: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE WRITE BUFFER 262

254. The write search routine looks to see if any instructions ahead of a given placein the mem list of the reorder bu�er are storing into a given physical address, or ifthere's a pending instruction in the write bu�er for that address. If so, it returns apointer to the value to be written. If not, it returns �. If the answer is currentlyunknown, because at least one possibly relevant physical address has not yet beencomputed, the subroutine returns the special code value DUNNO.The search starts at the x:up �eld of a control block for a store instruction, otherwise

at the ptr a �eld of the control block.The i �eld in the write bu�er is usually st or pst , inherited from a store or partial

store command. It may also be sync (from SYNC 1 or SYNC 3) or stunc (from STUNC).

#de�ne DUNNO ((octa �) 1) =� an impossible non-� pointer �=h Internal prototypes 13 i +�static octa �write search ARGS((control �;octa));

255. h Subroutines 14 i +�static octa �write search (ctl ; addr )

control �ctl ;octa addr ;

f register specnode �p = (ctl~mem x ? ctl~x:up : (specnode �) ctl~ptr a );register write node �q = write tail ;

addr :l &= �8;for ( ; p 6= &mem ; p = p~up ) fif (p~addr :h � �1) return DUNNO;if ((p~addr :l &�8) � addr :l ^ p~addr :h � addr :h)return (p~known ? &(p~o) : DUNNO);

gfor ( ; ; ) fif (q � write head ) return �;if (q � wbuf top) q = wbuf bot ; else q++;if (q~addr :l � addr :l ^ q~addr :h � addr :h) return &(q~o);

gg

256. When we're committing new data to memory, we can update an existing itemin the write bu�er if it has the same physical address, unless that item is already inthe process of being written out. Increasing the value of holding time will increase thechance that this economy is possible, but it will also increase the number of bu�ereditems when writes are to di�erent locations.A store instruction that sets any of the eight interrupt bits rwxnkbsp will not a�ect

memory, even if it doesn't cause an interrupt.When \store" is followed by \store uncached" at the same address, or vice versa,

we believe the most recent hint.

hCommit to memory if possible, otherwise break 256 i �f register write node �q = write tail ;

if (hot~ interrupt & (F_BIT + #ff)) goto done with write ;

if (hot~ i 6= sync)for ( ; ; ) f

Page 270: MMIXware - A RISC Computer for the Third Millennium - Knuth

263 MMIX-PIPE: THE WRITE BUFFER

if (q � write head ) break;if (q � wbuf top) q = wbuf bot ; else q++;if (q~ i � sync) break;if (q~addr :l � hot~x:addr :l ^ q~addr :h � hot~x:addr :h ^ (q 6=

write head _ :wbuf lock )) goto addr found ;g

f register write node �p = (write tail � wbuf bot ? wbuf top : write tail � 1);

if (p � write head ) break; =� the write bu�er is full �=q = write tail ; write tail = p;q~addr = hot~x:addr ;

gaddr found : q~o = hot~x:o;

q~stamp = ticks :l;q~ i = hot~ i;

done with write : spec rem (&(hot~x));mem slots++;

gThis code is used in section 146.

addr : octa, x246.addr : octa, x40.ARGS=macro, x6.control= struct, x44.F_BIT=1� 17, x54.h: tetra, x17.holding time : int, x247.hot : control �, x60.i: internal opcode, x246.i: internal opcode, x44.interrupt : unsigned int, x44.known : bool, x40.l: tetra, x17.

mem : specnode, x115.mem slots : int, x86.mem x : bool, x44.o: octa, x246.o: octa, x40.octa= struct, x17.pst =66, x49.ptr a : void �, x44.spec rem : static void ( ), x97.specnode= struct, x40.st =63, x49.stamp : tetra, x246.stunc =67, x49.

sync =79, x49.ticks =macro, x87.up : specnode �, x40.wbuf bot : write node �, x247.wbuf lock : lockvar, x247.wbuf top : write node �, x247.write head : write node �,x247.

write node= struct, x246.write tail : write node �,x247.

x: specnode, x44.

Page 271: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE WRITE BUFFER 264

257. A special coroutine whose duty is to empty the write bu�er is always ac-tive. It holds the wbuf lock while it is writing the contents of write head . It holdsDcache~�ll lock while waiting for the D-cache to �ll a block.

hCases for control of special coroutines 126 i +�case write from wbuf : p = (cacheblock �) data~ptr b ;switch (data~state ) fcase 4: hForward the new data past the D-cache if it is write-through 263 i;data~state = 5;

case 5: if (write head � wbuf bot ) write head = wbuf top ; else write head ��;write restart : data~state = 0;case 0: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (write head � write tail ) wait (1); =� write bu�er is empty �=if (write head~ i � sync) h Ignore the item in write head 264 i;if (ticks :l � write head~stamp < holding time ^ :speed lock ) wait (1);

=� data too raw �=if (:Dcache _ (write head~addr :h&

#ffff0000)) goto mem direct ;

=� not cached �=if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1); =� D-cache busy �=startup (&Dcache~reader [j];Dcache~access time );hWrite the data into the D-cache and set state = 4, if there's a cache hit 262 i;data~state = ((Dcache~mode & WRITE_ALLOC) ^ write head~ i 6= stunc ? 1 : 3);wait (Dcache~access time );

case 1: hTry to put the contents of location write head~addr into the D-cache 261 i;data~state = 2; sleep ;

case 2: data~state = 0; sleep ; =� wake up when the D-cache has the block �=case 3: hHandle write-around when writing to the D-cache 259 i;mem direct : hWrite directly from write head to memory 260 i;g

258. hLocal variables 12 i +�register cacheblock �p; �q;

259. The granularity is guaranteed to be 8 in write-around mode (seeMMIX con�g ).Although an uncached store will not be stored in the D-cache (unless it hits in theD-cache), it will go into a secondary cache.

hHandle write-around when writing to the D-cache 259 i �if (Dcache~ usher :next ) wait (1);Dcache~outbuf :tag :h = write head~addr :h;Dcache~outbuf :tag :l = write head~addr :l & (�Dcache~bb);for (j = 0; j < Dcache~bb � Dcache~g; j++) Dcache~outbuf :dirty [j] = false ;Dcache~outbuf :data [(write head~addr :l & (Dcache~bb � 1))� 3] = write head~o;Dcache~outbuf :dirty [(write head~addr :l & (Dcache~bb � 1))� Dcache~g] = true ;set lock (self ;wbuf lock );startup (&Dcache~ usher ;Dcache~copy out time );data~state = 5; wait (Dcache~copy out time );

This code is used in section 257.

Page 272: MMIXware - A RISC Computer for the Third Millennium - Knuth

265 MMIX-PIPE: THE WRITE BUFFER

260. hWrite directly from write head to memory 260 i �if (mem lock ) wait (1);set lock (self ;wbuf lock );set lock (&mem locker ;mem lock ); =� a coroutine of type vanish �=startup (&mem locker ;mem addr time +mem write time );mem write (write head~addr ;write head~o);data~state = 5; wait (mem addr time +mem write time );

This code is used in section 257.

261. A subtlety needs to be mentioned here: While we're trying to update the D-cache, another instruction might be �lling the same cache block (although not becauseof the same physical address). Therefore we goto write restart here instead of sayingwait (1).

hTry to put the contents of location write head~addr into the D-cache 261 i �if (Dcache~�ller :next ) goto write restart ;if ((Scache ^ Scache~ lock ) _ (:Scache ^mem lock )) goto write restart ;p = alloc slot (Dcache ;write head~addr );if (:p) goto write restart ;if (Scache ) set lock (&Dcache~�ller ;Scache~ lock )else set lock (&Dcache~�ller ;mem lock );set lock (self ;Dcache~�ll lock );data~ptr b = Dcache~�ller ctl :ptr b = (void �) p;Dcache~�ller ctl :z:o = write head~addr ;startup (&Dcache~�ller ; Scache ? Scache~access time : mem addr time );

This code is used in section 257.

access time : int, x167.addr : octa, x246.alloc slot : static cacheblock�( ), x205.

bb : int, x167.cacheblock= struct, x167.copy out time : int, x167.data : register control �,x124.

data : octa �, x167.Dcache : cache �, x168.dirty : char �, x167.false =0, x11.�ll lock : lockvar, x167.�ller : coroutine, x167.�ller ctl : control, x167. usher : coroutine, x167.g: int, x167.get reader : static int ( ), x183.h: tetra, x17.holding time : int, x247.i: internal opcode, x246.j: register int, x12.

l: tetra, x17.lock : lockvar, x167.lockloc : coroutine ��, x23.mem addr time : int, x214.mem lock : lockvar, x214.mem locker : coroutine, x127.mem write : void ( ), x213.mem write time : int, x214.MMIX con�g : void ( ),MMIX-CONFIG x38.

mode : int, x167.next : coroutine �, x23.o: octa, x246.o: octa, x40.outbuf : cacheblock, x167.p: register write node �,x256.

ptr b : void �, x44.reader : coroutine �, x167.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.

sleep =macro, x125.speed lock : lockvar, x247.stamp : tetra, x246.startup : static void ( ), x31.state : int, x44.stunc =67, x49.sync =79, x49.tag : octa, x167.ticks =macro, x87.true =1, x11.vanish =98, x129.wait =macro ( ), x125.wbuf bot : write node �, x247.wbuf lock : lockvar, x247.wbuf top : write node �, x247.WRITE_ALLOC=2, x166.write from wbuf =92, x129.write head : write node �,x247.

write tail : write node �,x247.

z: spec, x44.

Page 273: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE WRITE BUFFER 266

262. Here it is assumed that Dcache~access time is enough to search the D-cacheand update one octabyte in case of a hit. The D-cache is not locked, since othercoroutines that might be simultaneously reading the D-cache are not going to use theoctabyte that changes. Perhaps the simulator is being too lenient here.

hWrite the data into the D-cache and set state = 4, if there's a cache hit 262 i �p = cache search (Dcache ;write head~addr );if (p) f

p = use and �x (Dcache ; p);set lock (self ;wbuf lock );data~ptr b = (void �) p;p~data [(write head~addr :l & (Dcache~bb � 1))� 3] = write head~o;p~dirty [(write head~addr :l & (Dcache~bb � 1))� Dcache~g] = true ;data~state = 4; wait (Dcache~access time );

gThis code is used in section 257.

263. hForward the new data past the D-cache if it is write-through 263 i �if ((Dcache~mode & WRITE_BACK) � 0) f =� write-through �=if (Dcache~ usher :next ) wait (1); ush cache (Dcache ; p; true );

gThis code is used in section 257.

264. h Ignore the item in write head 264 i �fset lock (self ;wbuf lock );data~state = 5;wait (1);

gThis code is used in section 257.

Page 274: MMIXware - A RISC Computer for the Third Millennium - Knuth

267 MMIX-PIPE: LOADING AND STORING

access time : int, x167.addr : octa, x246.bb : int, x167.cache search : staticcacheblock �( ), x193.

data : octa �, x167.data : register control �,x124.

Dcache : cache �, x168.dirty : char �, x167. ush cache : static void ( ),

x203. usher : coroutine, x167.g: int, x167.l: tetra, x17.mode : int, x167.next : coroutine �, x23.o: octa, x246.p: register cacheblock �,x258.

ptr b : void �, x44.self : register coroutine �,

x124.set lock =macro ( ), x37.state : int, x44.true =1, x11.use and �x : staticcacheblock �( ), x196.

wait =macro ( ), x125.wbuf lock : lockvar, x247.WRITE_BACK=1, x166.write head : write node �,x247.

Page 275: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 268

265. Loading and storing. A RISC machine is often said to have a \load/storearchitecture," perhaps because loading and storing are among the most diÆcult thingsa RISC machine is called upon to do.We want memory accesses to be eÆcient, so we try to access the D-cache at the

same time as we are translating a virtual address via the DT-cache. Usually we hit inboth caches, but numerous cases must be dealt with when we miss. Is there an elegantway to handle all the contingencies? Alas, the author of this program was unable tothink of anything better than to throw lots of code at the problem | knowing fullwell that such a spaghetti-like approach is fraught with possibilities for error.Instructions like LDO x; y; z operate in two pipeline stages. The �rst stage com-

putes the virtual address y + z, waiting if necessary until y and z are both known;then it starts to access the necessary caches. In the second stage we ascertain thecorresponding physical address and hopefully �nd the data in the cache (or in thespeculative mem list or the write bu�er).An instruction like STB x; y; z shares some of the computation of LDO x; y; z, because

only one byte is being stored but the other seven bytes must be found in the cache.In this case, however, x is treated as an input, and mem is the output. The secondstage of a store command can begin even though x is not known during the �rst stage.Here's what we do at the beginning of stage 1.

#de�ne ld st launch 7=� state when load/store command has its memory address �=

hCases to compute the virtual address of a memory operation 265 i �case preld : case prest : case prego :data~z:o = incr (data~z:o; data~xx &�(data~ i � prego ? Icache : Dcache )~bb);

=� (I hope the adder is fast enough) �=case ld : case ldunc : case ldvts : case st : case pst : case syncd : case syncid :start ld st : data~y:o = oplus (data~y:o; data~z:o);data~state = ld st launch ; goto switch1 ;

case ldptp : case ldpte : if (data~y:o:h) goto start ld st ;data~x:o = zero octa ; data~x:known = true ; goto die ; =� page table fault �=

This code is used in section 132.

266. #de�ne PRW_BITS (data~ i < st ? PR_BIT : data~ i � pst ? PR_BIT + PW_BIT :(data~ i � syncid ^ (data~ loc :h& sign bit )) ? 0 : PW_BIT)

hSpecial cases for states in the �rst stage 266 i �case ld st launch : if ((self + 1)~next ) wait (1); =� second stage must be clear �=hHandle special cases for operations like prego and ldvts 289 i;if (data~y:o:h& sign bit ) hDo load/store stage 1 with known physical address 271 i;if (page bad ) fif (data~ i � st _ (data~ i < preld ^ data~ i > syncid )) data~ interrupt j= PRW_BITS;goto �n ex ;

gif (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1);startup (&DTcache~reader [j];DTcache~access time );hLook up the address in the DT-cache, and also in the D-cache if possible 267 i;

Page 276: MMIXware - A RISC Computer for the Third Millennium - Knuth

269 MMIX-PIPE: LOADING AND STORING

pass after (DTcache~access time ); goto passit ;

See also sections 310, 326, 360, and 363.

This code is used in section 130.

267. When stage 2 of a load/store command begins, the state will depend on whattranspired in stage 1. For example, data~state will be DT miss if the virtual addresskey can't be found in the DT-cache; then stage 2 will have to compute the physicaladdress the hard way.The data~state will be DT hit if the physical address is known via the DT-cache,

but the data may or may not be in the D-cache. The data~state will be hit and miss

if the DT-cache hits and the D-cache doesn't. And data~state will be ld ready ifdata~x:o is the desired octabyte (for example, if both caches hit).

#de�ne DT miss 10 =� second stage state when DT-cache doesn't hold the key �=#de�ne DT hit 11 =� second stage state when physical address is known �=#de�ne hit and miss 12 =� second stage state when D-cache misses �=#de�ne ld ready 13 =� second stage state when data has been read �=#de�ne st ready 14 =� second stage state when data needn't be read �=#de�ne prest win 15 =� second stage state when we can �ll a block with zeroes �=hLook up the address in the DT-cache, and also in the D-cache if possible 267 i �

p = cache search (DTcache ; trans key (data~y:o));if (:Dcache_Dcache~ lock_(j = get reader (Dcache )) < 0_(data~ i � st^data~ i � syncid ))hDo load/store stage 1 without D-cache lookup 270 i;

startup (&Dcache~reader [j];Dcache~access time );if (p) hDo a simultaneous lookup in the D-cache 268 ielse data~state = DT miss ;

This code is used in section 266.

access time : int, x167.bb : int, x167.cache search : staticcacheblock �( ), x193.

data : register control �,x124.

Dcache : cache �, x168.die : label, x144.DTcache : cache �, x168.�n ex : label, x144.get reader : static int ( ), x183.h: tetra, x17.i: internal opcode, x44.Icache : cache �, x168.incr : octa ( ), MMIX-ARITH x6.interrupt : unsigned int, x44.j: register int, x12.known : bool, x40.ld =56, x49.ldpte =58, x49.

ldptp =57, x49.ldunc =59, x49.ldvts =60, x49.loc : octa, x44.lock : lockvar, x167.mem : specnode, x115.next : coroutine �, x23.o: octa, x40.oplus : octa ( ), MMIX-ARITH x5.p: register cacheblock �,x258.

page bad : bool, x238.pass after =macro ( ), x125.passit : label, x134.PR_BIT=1� 7, x54.prego =73, x49.preld =61, x49.prest =62, x49.pst =66, x49.PW_BIT=1� 6, x54.

reader : coroutine �, x167.self : register coroutine �,x124.

sign bit =macro, x80.st =63, x49.startup : static void ( ), x31.state : int, x44.switch1 : label, x130.syncd =64, x49.syncid =65, x49.trans key =macro ( ), x240.true =1, x11.wait =macro ( ), x125.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 277: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 270

268. We assume that it is possible to look up a virtual address in the DT-cacheat the same time as we look for a corresponding physical address in the D-cache,provided that the lower b + c bits of the two addresses are the same. (They willalways be the same if b+ c � page s ; otherwise the operating system can try to makethem the same by \page coloring" whenever possible.) If both caches hit, the physicaladdress is known in max(DTcache~access time ;Dcache~access time ) cycles.If the lower b+ c bits of the virtual and physical addresses di�er, the machine will

not know this until the DT-cache has hit. Therefore we simulate the operation ofaccessing the D-cache, but we go to DT hit instead of to hit and miss because theD-cache will experience a spurious miss.

#de�ne max (x; y) ((x) < (y) ? (y) : (x))

hDo a simultaneous lookup in the D-cache 268 i �f octa �m;

hUpdate DT-cache usage and check the protection bits 269 i;data~z:o = phys addr (data~y:o; p~data [0]);m = write search (data ; data~z:o);if (m � DUNNO) data~state = DT hit ;else if (m) data~x:o = �m; data~state = ld ready ;else if (Dcache~b+ Dcache~c > page s ^

((data~y:o:l � data~z:o:l) & ((Dcache~bb � Dcache~c)� (1� page s ))))data~state = DT hit ; =� spurious D-cache lookup �=

else fq = cache search (Dcache ; data~z:o);if (q) fif (data~ i � ldunc) q = demote and �x (Dcache ; q);else q = use and �x (Dcache ; q);data~x:o = q~data [(data~z:o:l & (Dcache~bb � 1))� 3];data~state = ld ready ;

g else data~state = hit and miss ;gpass after (max (DTcache~access time ;Dcache~access time ));goto passit ;

gThis code is used in section 267.

Page 278: MMIXware - A RISC Computer for the Third Millennium - Knuth

271 MMIX-PIPE: LOADING AND STORING

269. The protection bits prpwpx in a translation cache are shifted four positionsright from the interrupt codes PR_BIT, PW_BIT, PX_BIT. If the data is protected, weabort the load/store operation immediately; this protects the privacy of other users.

hUpdate DT-cache usage and check the protection bits 269 i �p = use and �x (DTcache ; p);j = PRW_BITS;if (((p~data [0]:l � PROT_OFFSET) & j) 6= j) fif (data~ i � syncd _ data~ i � syncid ) goto sync check ;if (data~ i 6= preld ^ data~ i 6= prest )data~ interrupt j= j &�(p~data [0]:l� PROT_OFFSET);

goto �n ex ;g

This code is used in sections 268, 270, and 272.

270. hDo load/store stage 1 without D-cache lookup 270 i �f octa �m;

if (p) fhUpdate DT-cache usage and check the protection bits 269 i;data~z:o = phys addr (data~y:o; p~data [0]);if (data~ i � st ^ data~ i � syncid ) data~state = st ready ;else f

m = write search (data ; data~z:o);if (m ^m 6= DUNNO) data~x:o = �m; data~state = ld ready ;else data~state = DT hit ;

gg else data~state = DT miss ;pass after (DTcache~access time ); goto passit ;

gThis code is used in section 267.

access time : int, x167.b: int, x167.bb : int, x167.c: int, x167.cache search : staticcacheblock �( ), x193.

data : octa �, x167.data : register control �,x124.

Dcache : cache �, x168.demote and �x : staticcacheblock �( ), x199.

DT hit =11, x267.DT miss =10, x267.DTcache : cache �, x168.DUNNO=macro, x254.�n ex : label, x144.hit and miss =12, x267.i: internal opcode, x44.

interrupt : unsigned int, x44.j: register int, x12.l: tetra, x17.ld ready =13, x267.ldunc =59, x49.o: octa, x40.octa= struct, x17.p: register cacheblock �,x258.

page s : int, x238.pass after =macro ( ), x125.passit : label, x134.phys addr : static octa ( ),x241.

PR_BIT=1� 7, x54.preld =61, x49.prest =62, x49.PROT_OFFSET=5, x54.

PRW_BITS=macro, x266.PW_BIT=1� 6, x54.PX_BIT=1� 5, x54.q: register cacheblock �,x258.

st =63, x49.st ready =14, x267.state : int, x44.sync check : label, x370.syncd =64, x49.syncid =65, x49.use and �x : staticcacheblock �( ), x196.

write search : static octa �( ),x255.

x: specnode, x44.y: spec, x44.z: spec, x44.

Page 279: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 272

271. hDo load/store stage 1 with known physical address 271 i �f octa �m;

if (:(data~ loc :h& sign bit )) fif (data~ i � syncd _ data~ i � syncid ) goto sync check ;if (data~ i 6= preld ^ data~ i 6= prest ) data~ interrupt j= N_BIT;goto �n ex ;

gdata~z:o = data~y:o; data~z:o:h �= sign bit ;if (data~ i � st ^ data~ i � syncid ) fdata~state = st ready ; pass after (1); goto passit ;

gm = write search (data ; data~z:o);if (m) fif (m � DUNNO) data~state = DT hit ;else data~x:o = �m; data~state = ld ready ;

g else if ((data~z:o:h&#ffff0000) _ :Dcache ) f

if (mem lock ) wait (1);set lock (&mem locker ;mem lock );data~x:o = mem read (data~z:o);data~state = ld ready ;startup (&mem locker ;mem addr time +mem read time );pass after (mem addr time +mem read time ); goto passit ;

gif (Dcache~ lock _ (j = get reader (Dcache )) < 0) fdata~state = DT hit ; pass after (1); goto passit ;

gstartup (&Dcache~reader [j];Dcache~access time );q = cache search (Dcache ; data~z:o);if (q) fif (data~ i � ldunc) q = demote and �x (Dcache ; q);else q = use and �x (Dcache ; q);data~x:o = q~data [(data~z:o:l & (Dcache~bb � 1))� 3];data~state = ld ready ;

g else data~state = hit and miss ;pass after (Dcache~access time ); goto passit ;

gThis code is used in section 266.

Page 280: MMIXware - A RISC Computer for the Third Millennium - Knuth

273 MMIX-PIPE: LOADING AND STORING

access time : int, x167.bb : int, x167.cache search : staticcacheblock �( ), x193.

data : octa �, x167.data : register control �,x124.

Dcache : cache �, x168.demote and �x : staticcacheblock �( ), x199.

DT hit =11, x267.DUNNO=macro, x254.�n ex : label, x144.get reader : static int ( ), x183.h: tetra, x17.hit and miss =12, x267.i: internal opcode, x44.interrupt : unsigned int, x44.j: register int, x12.

l: tetra, x17.ld ready =13, x267.ldunc =59, x49.loc : octa, x44.lock : lockvar, x167.mem addr time : int, x214.mem lock : lockvar, x214.mem locker : coroutine, x127.mem read : octa ( ), x210.mem read time : int, x214.N_BIT=1� 4, x54.o: octa, x40.octa= struct, x17.pass after =macro ( ), x125.passit : label, x134.preld =61, x49.prest =62, x49.q: register cacheblock �,x258.

reader : coroutine �, x167.set lock =macro ( ), x37.sign bit =macro, x80.st =63, x49.st ready =14, x267.startup : static void ( ), x31.state : int, x44.sync check : label, x370.syncd =64, x49.syncid =65, x49.use and �x : staticcacheblock �( ), x196.

wait =macro ( ), x125.write search : static octa �( ),x255.

x: specnode, x44.y: spec, x44.z: spec, x44.

Page 281: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 274

272. The program for the second stage is, likewise, rather long-winded, yet quitesimilar to the cache manipulations we have already seen several times.Several instructions might be trying to �ll the DT-cache for the same page. (A sim-

ilar situation faced us in the write from wbuf coroutine.) The second stage thereforeneeds to do some translation cache searching just as the �rst stage did. In this stage,however, we don't go all out for speed, because DT-cache misses are rare.

#de�ne DT retry 8=� second stage state when DT-cache should be searched again �=

#de�ne got DT 9=� second stage state when DT-cache entry has been computed �=

hSpecial cases for states in later stages 272 i �square one : data~state = DT retry ;case DT retry : if (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1);startup (&DTcache~reader [j];DTcache~access time );p = cache search (DTcache ; trans key (data~y:o));if (p) fhUpdate DT-cache usage and check the protection bits 269 i;data~z:o = phys addr (data~y:o; p~data [0]);if (data~ i � st ^ data~ i � syncid ) data~state = st ready ;else data~state = DT hit ;

g else data~state = DT miss ;wait (DTcache~access time );

case DT miss : if (DTcache~�ller :next )if (data~ i � preld _ data~ i � prest ) goto �n ex ; else goto square one ;

if (no hardware PT )if (data~ i � preld _ data~ i � prest ) goto �n ex ; else goto emulate virt ;

p = alloc slot (DTcache ; trans key (data~y:o));if (:p) goto square one ;data~ptr b = DTcache~�ller ctl :ptr b = (void �) p;DTcache~�ller ctl :y:o = data~y:o;set lock (self ;DTcache~�ll lock );startup (&DTcache~�ller ; 1);data~state = got DT ;if (data~ i � preld _ data~ i � prest ) goto �n ex ; else sleep ;

case got DT : release lock (self ;DTcache~�ll lock );j = PRW_BITS;if (((data~z:o:l � PROT_OFFSET) & j) 6= j) fif (data~ i � syncd _ data~ i � syncid ) goto sync check ;data~ interrupt j= j &�(data~z:o:l � PROT_OFFSET);goto �n ex ;

gdata~z:o = phys addr (data~y:o; data~z:o);if (data~ i � st ^ data~ i � syncid ) goto �nish store ;

=� otherwise we fall through to ld retry below �=See also sections 273, 276, 279, 280, 299, 311, 354, 364, and 370.

This code is used in section 135.

Page 282: MMIXware - A RISC Computer for the Third Millennium - Knuth

275 MMIX-PIPE: LOADING AND STORING

273. The second stage might also want to �ll the D-cache (and perhaps the S-cache)as we get the data.Several load instructions might be trying to �ll the same cache block. So we

should go back and look in the D-cache again if we miss and cannot allocate a slotimmediately.A PRELD or PREST instruction, which is just a \hint," doesn't do anything more if

the caches are already busy.

hSpecial cases for states in later stages 272 i +�ld retry : data~state = DT hit ;case DT hit : if (data~ i � preld _ data~ i � prest ) goto �n ex ;hCheck for a hit in pending writes 278 i;if ((data~z:o:h&

#ffff0000) _ :Dcache )

hDo load/store stage 2 without D-cache lookup 277 i;if (Dcache~ lock _ (j = get reader (Dcache )) < 0) wait (1);startup (&Dcache~reader [j];Dcache~access time );q = cache search (Dcache ; data~z:o);if (q) fif (data~ i � ldunc) q = demote and �x (Dcache ; q);else q = use and �x (Dcache ; q);data~x:o = q~data [(data~z:o:l & (Dcache~bb � 1))� 3];data~state = ld ready ;

g else data~state = hit and miss ;wait (Dcache~access time );

case hit and miss : if (data~ i � ldunc) goto avoid D ;hTry to get the contents of location data~z:o in the D-cache 274 i;

access time : int, x167.alloc slot : static cacheblock�( ), x205.

avoid D : label, x277.bb : int, x167.cache search : staticcacheblock �( ), x193.

data : octa �, x167.data : register control �,x124.

Dcache : cache �, x168.demote and �x : staticcacheblock �( ), x199.

DT hit =11, x267.DT miss =10, x267.DTcache : cache �, x168.emulate virt : label, x310.�ll lock : lockvar, x167.�ller : coroutine, x167.�ller ctl : control, x167.�n ex : label, x144.�nish store : label, x280.get reader : static int ( ), x183.

h: tetra, x17.hit and miss =12, x267.i: internal opcode, x44.interrupt : unsigned int, x44.j: register int, x12.l: tetra, x17.ld ready =13, x267.ldunc =59, x49.lock : lockvar, x167.next : coroutine �, x23.no hardware PT : bool, x242.o: octa, x40.p: register cacheblock �,x258.

phys addr : static octa ( ),x241.

preld =61, x49.prest =62, x49.PROT_OFFSET=5, x54.PRW_BITS=macro, x266.ptr b : void �, x44.q: register cacheblock �,

x258.reader : coroutine �, x167.release lock =macro ( ), x37.self : register coroutine �,x124.

set lock =macro ( ), x37.sleep =macro, x125.st =63, x49.st ready =14, x267.startup : static void ( ), x31.state : int, x44.sync check : label, x370.syncd =64, x49.syncid =65, x49.trans key =macro ( ), x240.use and �x : staticcacheblock �( ), x196.

wait =macro ( ), x125.write from wbuf =92, x129.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 283: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 276

274. hTry to get the contents of location data~z:o in the D-cache 274 i �hCheck for prest with a fully spanned cache block 275 i;if (Dcache~�ller :next ) goto ld retry ;if ((Scache ^ Scache~ lock ) _ (:Scache ^mem lock )) goto ld retry ;q = alloc slot (Dcache ; data~z:o);if (:q) goto ld retry ;if (Scache ) set lock (&Dcache~�ller ;Scache~ lock )else set lock (&Dcache~�ller ;mem lock );set lock (self ;Dcache~�ll lock );data~ptr b = Dcache~�ller ctl :ptr b = (void �) q;Dcache~�ller ctl :z:o = data~z:o;startup (&Dcache~�ller ; Scache ? Scache~access time : mem addr time );data~state = ld ready ;if (data~ i � preld _ data~ i � prest ) goto �n ex ; else sleep ;

This code is used in section 273.

275. If a prest instruction makes it to the hot seat, we have been assured by the userof PREST that the current values of bytes in virtual addresses data~y:o� (data~xx &�Dcache~bb) through data~y:o+ (data~xx & (Dcache~bb � 1)) are irrelevant. Hencewe can pretend that we know they are zero. This is advantageous if it saves us from�lling a cache block from the S-cache or from memory.

hCheck for prest with a fully spanned cache block 275 i �if (data~ i � prest ^

(data~xx � Dcache~bb _ ((data~y:o:l & (Dcache~bb � 1)) � 0)) ^((data~y:o:l + (data~xx & (Dcache~bb � 1)) + 1)� data~y:o:l) � Dcache~bb)

goto prest span ;

This code is used in section 274.

276. h Special cases for states in later stages 272 i +�prest span : data~state = prest win ;case prest win : if (data 6= old hot _Dlocker :next ) wait (1);if (Dcache~ lock ) goto �n ex ;q = alloc slot (Dcache ; data~z:o); =� OK if Dcache~�ller is busy �=if (q) fclean block (Dcache ; q);q~ tag = data~z:o; q~ tag :l &= �Dcache~bb ;set lock (&Dlocker ;Dcache~ lock );startup (&Dlocker ;Dcache~copy in time );

ggoto �n ex ;

277. hDo load/store stage 2 without D-cache lookup 277 i �favoid D : if (mem lock ) wait (1);set lock (&mem locker ;mem lock );startup (&mem locker ;mem addr time +mem read time );data~x:o = mem read (data~z:o);data~state = ld ready ; wait (mem addr time +mem read time );

gThis code is used in section 273.

Page 284: MMIXware - A RISC Computer for the Third Millennium - Knuth

277 MMIX-PIPE: LOADING AND STORING

278. hCheck for a hit in pending writes 278 i �focta �m = write search (data ; data~z:o);

if (m � DUNNO) wait (1);if (m) fdata~x:o = �m;data~state = ld ready ;wait (1);

gg

This code is used in section 273.

access time : int, x167.alloc slot : static cacheblock�( ), x205.

bb : int, x167.clean block : void ( ), x179.copy in time : int, x167.data : register control �,x124.

Dcache : cache �, x168.Dlocker : coroutine, x127.DUNNO=macro, x254.�ll lock : lockvar, x167.�ller : coroutine, x167.�ller ctl : control, x167.�n ex : label, x144.i: internal opcode, x44.l: tetra, x17.

ld ready =13, x267.ld retry : label, x273.lock : lockvar, x167.mem addr time : int, x214.mem lock : lockvar, x214.mem locker : coroutine, x127.mem read : octa ( ), x210.mem read time : int, x214.next : coroutine �, x23.o: octa, x40.octa= struct, x17.old hot : control �, x60.preld =61, x49.prest =62, x49.PREST=#

ba, x47.prest win =15, x267.ptr b : void �, x44.

q: register cacheblock �,x258.

Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.sleep =macro, x125.startup : static void ( ), x31.state : int, x44.tag : octa, x167.wait =macro ( ), x125.write search : static octa �( ),x255.

x: specnode, x44.xx : unsigned char, x44.y: spec, x44.z: spec, x44.

Page 285: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 278

279. The requested octabyte will arrive sooner or later in data~x:o. Then a loadinstruction is almost done, except that we might need to massage the input a littlebit.

hSpecial cases for states in later stages 272 i +�case ld ready : if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (data~ i � st ) goto �nish store ;switch (data~op � 1) fcase LDB � 1 : case LDBU � 1 : j = (data~z:o:l &

#7)� 3; i = 56; goto �n ld ;

case LDW � 1 : case LDWU � 1 : j = (data~z:o:l &#6)� 3; i = 48; goto �n ld ;

case LDT � 1 : case LDTU � 1 : j = (data~z:o:l &#4)� 3; i = 32;

�n ld : data~x:o = shift right (shift left (data~x:o; j); i; data~op & #2);

default: goto �n ex ;case LDHT � 1: if (data~z:o:l & 4) data~x:o:h = data~x:o:l;data~x:o:l = 0; goto �n ex ;

case LDSF � 1: if (data~z:o:l & 4) data~x:o:h = data~x:o:l;if ((data~x:o:h&

#7f800000) � 0 ^ (data~x:o:h&

#7fffff)) f

data~x:o = load sf (data~x:o:h);data~state = 3; wait (denin penalty );

gelse data~x:o = load sf (data~x:o:h); goto �n ex ;

case LDPTP � 1: if ((data~x:o:h& sign bit ) � 0 _ (data~x:o:l &#1ff8) 6= page n )

data~x:o = zero octa ;else data~x:o:l &= �(1� 13);goto �n ex ;

case LDPTE � 1: if ((data~x:o:l &#1ff8) 6= page n ) data~x:o = zero octa ;

else data~x:o = incr (oandn (data~x:o; page mask ); data~x:o:l &#7);

data~x:o:h &= #ffff; goto �n ex ;

case UNSAVE � 1 : hHandle an internal UNSAVE when it's time to load 336 i;g

280. h Special cases for states in later stages 272 i +��nish store : data~state = st ready ;case st ready : switch (data~ i) fcase st : case pst : hFinish a store command 281 i;case syncd : data~b:o:l = (Dcache ? Dcache~bb : 8192); goto do syncd ;case syncid : data~b:o:l = (Icache ? Icache~bb : 8192);if (Dcache ^Dcache~bb < data~b:o:l) data~b:o:l = Dcache~bb ;goto do syncid ;

g281. Store instructions have an extra complication, because some of them need tocheck for over ow.

hFinish a store command 281 i �data~x:addr = data~z:o;if (data~b:p) wait (1);switch (data~op � 1) fcase STUNC � 1 : data~ i = stunc ;default: data~x:o = data~b:o; goto �n ex ;case STSF � 1 : data~b:o:h = store sf (data~b:o);

Page 286: MMIXware - A RISC Computer for the Third Millennium - Knuth

279 MMIX-PIPE: LOADING AND STORING

data~ interrupt j= exceptions ;if ((data~b:o:h&

#7f800000) � 0 ^ (data~b:o:h&

#7fffff)) f

if (data~z:o:l & 4) data~x:o:l = data~b:o:h;else data~x:o:h = data~b:o:h;data~state = 3; wait (denout penalty );

gcase STHT � 1: if (data~z:o:l & 4) data~x:o:l = data~b:o:h;else data~x:o:h = data~b:o:h;goto �n ex ;

case STB � 1 : case STBU � 1 : j = (data~z:o:l &#7)� 3; i = 56; goto �n st ;

case STW � 1 : case STWU � 1 : j = (data~z:o:l &#6)� 3; i = 48; goto �n st ;

case STT � 1 : case STTU � 1 : j = (data~z:o:l &#4)� 3; i = 32;

�n st : h Insert data~b:o into the proper �eld of data~x:o, checking for arithmeticexceptions if signed 282 i;

goto �n ex ;case CSWAP � 1 : hFinish a CSWAP 283 i;case SAVE � 1 : hHandle an internal SAVE when it's time to store 342 i;g

This code is used in section 280.

addr : octa, x40.b: spec, x44.bb : int, x167.CSWAP=#

94, x47.data : register control �,x124.

Dcache : cache �, x168.denin penalty : int, x349.denout penalty : int, x349.do syncd : label, x364.do syncid : label, x364.exceptions : int,MMIX-ARITH x32.

�n ex : label, x144.h: tetra, x17.i: register int, x12.i: internal opcode, x44.Icache : cache �, x168.incr : octa ( ), MMIX-ARITH x6.interrupt : unsigned int, x44.j: register int, x12.l: tetra, x17.ld ready =13, x267.LDB=#

80, x47.LDBU=#

82, x47.

LDHT=#92, x47.

LDPTE=macro, x235.LDPTP=macro, x235.LDSF=#

90, x47.LDT=#

88, x47.LDTU=#

8a, x47.LDW=#

84, x47.LDWU=#

86, x47.load sf : octa ( ),MMIX-ARITH x39.

lockloc : coroutine ��, x23.o: octa, x40.oandn : octa ( ),MMIX-ARITH x25.

op : mmix opcode, x44.p: specnode �, x40.page mask : octa, x238.page n : int, x238.pst =66, x49.SAVE=#

fa, x47.self : register coroutine �,x124.

shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),

MMIX-ARITH x7.sign bit =macro, x80.st =63, x49.st ready =14, x267.state : int, x44.STB=#

a0, x47.STBU=#

a2, x47.STHT=#

b2, x47.store sf : tetra ( ),MMIX-ARITH x40.

STSF=#b0, x47.

STT=#a8, x47.

STTU=#aa, x47.

stunc =67, x49.STUNC=#

b6, x47.STW=#

a4, x47.STWU=#

a6, x47.syncd =64, x49.syncid =65, x49.UNSAVE=#

fb, x47.wait =macro ( ), x125.x: specnode, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 287: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: LOADING AND STORING 280

282. h Insert data~b:o into the proper �eld of data~x:o, checking for arithmetic exceptionsif signed 282 i �

focta mask ;

if (:(data~op & 2)) f octa before ; after ;

before = data~b:o; after = shift right (shift left (data~b:o; i); i; 0);if (before :l 6= after :l _ before :h 6= after :h) data~ interrupt j= V_BIT;

gmask = shift right (shift left (neg one ; i); j; 1);data~b:o = shift right (shift left (data~b:o; i); j; 1);data~x:o:h �= mask :h& (data~x:o:h� data~b:o:h);data~x:o:l �= mask :l & (data~x:o:l � data~b:o:l);

gThis code is used in section 281.

283. The CSWAP operation has four inputs ($X; $Y; $Z; rP) as well as three outputs($X;M8[A]; rP). To keep from exceeding the capacity of the control blocks in ourpipeline, we wait until this instruction reaches the hot seat, thereby allowing us non-speculative access to rP.

hFinish a CSWAP 283 i �if (data 6= old hot ) wait (1);if (data~x:o:h � g[rP ]:o:h ^ data~x:o:l � g[rP ]:o:l) fdata~a:o:l = 1; =� data~a:o:h is zero �=data~x:o = data~b:o;

g else fg[rP ]:o = data~x:o; =� data~a:o is zero �=if (verbose & issue bit ) fprintf (" setting rP="); print octa (g[rP ]:o); printf ("\n");

ggdata~ i = cswap ; =� cosmetic change, a�ects the trace output only �=goto �n ex ;

This code is used in section 281.

Page 288: MMIXware - A RISC Computer for the Third Millennium - Knuth

281 MMIX-PIPE: THE FETCH STAGE

284. The fetch stage. Now that we've mastered the most diÆcult memoryoperations, we can relax and apply our knowledge to the slightly simpler task of�lling the fetch bu�er. Fetching is like loading/storing, except that we use the I-cacheinstead of the D-cache. It's slightly simpler because the I-cache is read-only. Furthersimpli�cations would be possible if there were no PREGO instruction, because there isonly one fetch unit. However, we want to implement PREGO with reasonable eÆciency,in order to see if that instruction is worthwhile; so we include the complications ofsimultaneous I-cache and IT-cache readers, which we have already implemented forthe D-cache and DT-cache.The fetch coroutine is always present, as the one and only coroutine with stage

number zero.In normal circumstances, the fetch coroutine accesses a cache block containing the

instruction whose virtual address is given by inst ptr (the instruction pointer), andtransfers up to fetch max instructions from that block to the fetch bu�er. Compli-cations arise if the instruction isn't in the cache, or if we can't translate the virtualaddress because of a miss in the IT-cache. Moreover, inst ptr is a spec variable whosevalue might not even be known; it inst ptr :p is nonnull, we don't know what to fetch.

hExternal variables 4 i +�Extern spec inst ptr ; =� the instruction pointer (aka program counter) �=Extern octa �fetched ; =� bu�er for incoming instructions �=

285. The fetch coroutine usually begins a cycle in state fetch ready , with the mostrecently fetched octabytes in positions fetch lo , fetch lo + 1, : : : , fetch hi � 1 of abu�er called fetched . Once that bu�er has been exhausted, the coroutine reverts tostate 0; with luck, the bu�er might have more data by the time the next cycle rollsaround.

hGlobal variables 20 i +�int fetch lo ; fetch hi ; =� the active region of that bu�er �=coroutine fetch co ;control fetch ctl ;

a: specnode, x44.b: spec, x44.control= struct, x44.coroutine= struct, x23.cswap =68, x49.data : register control �,x124.

Extern=macro, x4.fetch max : int, x59.fetch ready =23, x291.�n ex : label, x144.g: specnode [ ], x86.h: tetra, x17.

i: register int, x12.i: internal opcode, x44.interrupt : unsigned int, x44.issue bit =1� 0, x8.j: register int, x12.l: tetra, x17.neg one : octa, MMIX-ARITH x4.o: octa, x40.octa= struct, x17.old hot : control �, x60.op : mmix opcode, x44.p: specnode �, x40.print octa : static void ( ), x19.

printf : int ( ), <stdio.h>.rP =23, x52.shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

spec= struct, x40.stage : int, x23.V_BIT=1� 14, x54.verbose : int, x4.wait =macro ( ), x125.x: specnode, x44.

Page 289: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE FETCH STAGE 282

286. h Initialize everything 22 i +�fetch co :ctl = &fetch ctl ;fetch co :name = "Fetch";fetch ctl :go :o:l = 4;startup (&fetch co ; 1);

287. hRestart the fetch coroutine 287 i �if (fetch co :lockloc) �(fetch co :lockloc) = �; fetch co :lockloc = �;unschedule (&fetch co);startup (&fetch co ; 1);

This code is used in sections 85, 160, 308, 309, and 316.

288. Some of the actions here are done not only by the fetcher but also by the �rstand second stages of a prego operation.

#de�ne wait or pass (t)if (data~ i � prego) f pass after (t); goto passit ; gelse wait (t)

hSimulate an action of the fetch coroutine 288 i �switch0 : switch (data~state ) fnew fetch : data~state = 0;case 0: hWait, if necessary, until the instruction pointer is known 290 i;data~y:o = inst ptr :o;data~state = 1; data~ interrupt = 0; data~x:o = data~z:o = zero octa ;

case 1:start fetch : if (data~y:o:h& sign bit ) hBegin fetch with known physical address 296 i;if (page bad ) goto bad fetch ;if (ITcache~ lock _ (j = get reader (ITcache )) < 0) wait (1);startup (&ITcache~reader [j]; ITcache~access time );hLook up the address in the IT-cache, and also in the I-cache if possible 291 i;wait or pass (ITcache~access time );hOther cases for the fetch coroutine 298 i

gThis code is used in section 125.

289. hHandle special cases for operations like prego and ldvts 289 i �if (data~ i � prego) goto start fetch ;

See also section 352.

This code is used in section 266.

290. hWait, if necessary, until the instruction pointer is known 290 i �if (inst ptr :p) fif (inst ptr :p 6= UNKNOWN_SPEC ^ inst ptr :p~known )inst ptr :o = inst ptr :p~o; inst ptr :p = �;

wait (1);g

This code is used in section 288.

Page 290: MMIXware - A RISC Computer for the Third Millennium - Knuth

283 MMIX-PIPE: THE FETCH STAGE

291. #de�ne got IT 19 =� state when IT-cache entry has been computed �=#de�ne IT miss 20 =� state when IT-cache doesn't hold the key �=#de�ne IT hit 21 =� state when physical instruction address is known �=#de�ne Ihit and miss 22 =� state when I-cache misses �=#de�ne fetch ready 23 =� state when instructions have been read �=#de�ne got one 24 =� state when a \preview" octabyte is ready �=hLook up the address in the IT-cache, and also in the I-cache if possible 291 i �

p = cache search (ITcache ; trans key (data~y:o));if (:Icache _ Icache~ lock _ (j = get reader (Icache )) < 0)hBegin fetch without I-cache lookup 295 i;

startup (&Icache~reader [j]; Icache~access time );if (p) hDo a simultaneous lookup in the I-cache 292 ielse data~state = IT miss ;

This code is used in section 288.

access time : int, x167.bad fetch : label, x301.cache search : staticcacheblock �( ), x193.

ctl : control �, x23.data : register control �,x124.

fetch co : coroutine, x285.fetch ctl : control, x285.get reader : static int ( ), x183.go : specnode, x44.h: tetra, x17.i: internal opcode, x44.Icache : cache �, x168.inst ptr : spec, x284.interrupt : unsigned int, x44.

ITcache : cache �, x168.j: register int, x12.known : bool, x40.l: tetra, x17.ldvts =60, x49.lock : lockvar, x167.lockloc : coroutine ��, x23.name : char �, x23.o: octa, x40.p: specnode �, x40.p: register cacheblock �,x258.

page bad : bool, x238.pass after =macro ( ), x125.passit : label, x134.

prego =73, x49.reader : coroutine �, x167.sign bit =macro, x80.startup : static void ( ), x31.state : int, x44.trans key =macro ( ), x240.UNKNOWN_SPEC=macro, x71.unschedule : static void ( ),x33.

wait =macro ( ), x125.x: specnode, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 291: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE FETCH STAGE 284

292. We assume that it is possible to look up a virtual address in the IT-cacheat the same time as we look for a corresponding physical address in the I-cache,provided that the lower b+ c bits of the two addresses are the same. (See the remarksabout \page coloring," when we made similar assumptions about the DT-cache andD-cache.)

hDo a simultaneous lookup in the I-cache 292 i �fhUpdate IT-cache usage and check the protection bits 293 i;data~z:o = phys addr (data~y:o; p~data [0]);if (Icache~b+ Icache~c > page s ^

((data~y:o:l � data~z:o:l) & ((Icache~bb � Icache~c)� (1� page s ))))data~state = IT hit ; =� spurious I-cache lookup �=

else fq = cache search (Icache ; data~z:o);if (q) f

q = use and �x (Icache ; q);hCopy the data from block q to fetched 294 i;data~state = fetch ready ;

g else data~state = Ihit and miss ;gwait or pass (max (ITcache~access time ; Icache~access time ));

gThis code is used in section 291.

293. hUpdate IT-cache usage and check the protection bits 293 i �p = use and �x (ITcache ; p);if (:(p~data [0]:l & (PX_BIT � PROT_OFFSET))) goto bad fetch ;

This code is used in sections 292 and 295.

294. At this point inst ptr :o equals data~y:o.

hCopy the data from block q to fetched 294 i �if (data~ i 6= prego) ffor (j = 0; j < Icache~bb ; j++) fetched [j] = q~data [j];fetch lo = (inst ptr :o:l & (Icache~bb � 1))� 3;fetch hi = Icache~bb � 3;

gThis code is used in sections 292 and 296.

295. hBegin fetch without I-cache lookup 295 i �fif (p) fhUpdate IT-cache usage and check the protection bits 293 i;data~z:o = phys addr (data~y:o; p~data [0]);data~state = IT hit ;

g else data~state = IT miss ;wait or pass (ITcache~access time );

gThis code is used in section 291.

Page 292: MMIXware - A RISC Computer for the Third Millennium - Knuth

285 MMIX-PIPE: THE FETCH STAGE

296. hBegin fetch with known physical address 296 i �fif (data~ i � prego ^ :(data~ loc :h& sign bit )) goto �n ex ;data~z:o = data~y:o; data~z:o:h �= sign bit ;

known phys : if (data~z:o:h&#ffff0000) goto bad fetch ;

if (:Icache ) hRead from memory into fetched 297 i;if (Icache~ lock _ (j = get reader (Icache )) < 0) fdata~state = IT hit ; wait or pass (1);

gstartup (&Icache~reader [j]; Icache~access time );q = cache search (Icache ; data~z:o);if (q) f

q = use and �x (Icache ; q);hCopy the data from block q to fetched 294 i;data~state = fetch ready ;

g else data~state = Ihit and miss ;wait or pass (Icache~access time );

gThis code is used in section 288.

access time : int, x167.b: int, x167.bad fetch : label, x301.bb : int, x167.c: int, x167.cache search : staticcacheblock �( ), x193.

data : octa �, x167.data : register control �,x124.

fetch hi : int, x285.fetch lo : int, x285.fetch ready =23, x291.fetched : octa �, x284.�n ex : label, x144.get reader : static int ( ), x183.h: tetra, x17.

i: internal opcode, x44.Icache : cache �, x168.Ihit and miss =22, x291.inst ptr : spec, x284.IT hit =21, x291.IT miss =20, x291.ITcache : cache �, x168.j: register int, x12.l: tetra, x17.loc : octa, x44.lock : lockvar, x167.max =macro ( ), x268.o: octa, x40.p: register cacheblock �,x258.

page s : int, x238.

phys addr : static octa ( ),x241.

prego =73, x49.PROT_OFFSET=5, x54.PX_BIT=1� 5, x54.q: register cacheblock �,x258.

reader : coroutine �, x167.sign bit =macro, x80.startup : static void ( ), x31.state : int, x44.use and �x : staticcacheblock �( ), x196.

wait or pass =macro ( ), x288.y: spec, x44.z: spec, x44.

Page 293: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE FETCH STAGE 286

297. hRead from memory into fetched 297 i �f octa addr ;

addr = data~z:o;if (mem lock ) wait (1);set lock (&mem locker ;mem lock );startup (&mem locker ;mem addr time +mem read time );addr :l &= �(bus words � 3);fetched [0] = mem read (addr );for (j = 1; j < bus words ; j++)fetched [j] = mem hash [last h ]:chunk [((addr :l & #

ffff)� 3) + j];fetch lo = (data~z:o:l � 3) & (bus words � 1); fetch hi = bus words ;data~state = fetch ready ;wait (mem addr time +mem read time );

gThis code is used in section 296.

298. hOther cases for the fetch coroutine 298 i �case IT miss : if (ITcache~�ller :next )

if (data~ i � prego) goto �n ex ; else wait (1);if (no hardware PT ) h Insert dummy instruction for page table emulation 302 i;p = alloc slot (ITcache ; trans key (data~y:o));if (:p) =� hey, it was present after all �=if (data~ i � prego) goto �n ex ; else goto new fetch ;

data~ptr b = ITcache~�ller ctl :ptr b = (void �) p;ITcache~�ller ctl :y:o = data~y:o;set lock (self ; ITcache~�ll lock );startup (&ITcache~�ller ; 1);data~state = got IT ;if (data~ i � prego) goto �n ex ; else sleep ;

case got IT : release lock (self ; ITcache~�ll lock );if (:(data~z:o:l & (PX_BIT � PROT_OFFSET))) goto bad fetch ;data~z:o = phys addr (data~y:o; data~z:o);

fetch retry : data~state = IT hit ;case IT hit : if (data~ i � prego) goto �n ex ; else goto known phys ;case Ihit and miss : hTry to get the contents of location data~z:o in the I-cache 300 i;See also section 301.

This code is used in section 288.

299. h Special cases for states in later stages 272 i +�case IT miss : case Ihit and miss : case IT hit : case fetch ready : goto switch0 ;

Page 294: MMIXware - A RISC Computer for the Third Millennium - Knuth

287 MMIX-PIPE: THE FETCH STAGE

300. hTry to get the contents of location data~z:o in the I-cache 300 i �if (Icache~�ller :next ) goto fetch retry ;if ((Scache ^ Scache~ lock ) _ (:Scache ^mem lock )) goto fetch retry ;q = alloc slot (Icache ; data~z:o);if (:q) goto fetch retry ;if (Scache ) set lock (&Icache~�ller ;Scache~ lock )else set lock (&Icache~�ller ;mem lock );set lock (self ; Icache~�ll lock );data~ptr b = Icache~�ller ctl :ptr b = (void �) q;Icache~�ller ctl :z:o = data~z:o;startup (&Icache~�ller ; Scache ? Scache~access time : mem addr time );data~state = got one ;if (data~ i � prego) goto �n ex ; else sleep ;

This code is used in section 298.

access time : int, x167.alloc slot : static cacheblock�( ), x205.

bad fetch : label, x301.bus words : int, x214.chunk : octa �, x206.data : register control �,x124.

fetch hi : int, x285.fetch lo : int, x285.fetch ready =23, x291.fetched : octa �, x284.�ll lock : lockvar, x167.�ller : coroutine, x167.�ller ctl : control, x167.�n ex : label, x144.got IT =19, x291.got one =24, x291.i: internal opcode, x44.Icache : cache �, x168.Ihit and miss =22, x291.IT hit =21, x291.

IT miss =20, x291.ITcache : cache �, x168.j: register int, x12.known phys : label, x296.l: tetra, x17.last h : int, x211.lock : lockvar, x167.mem addr time : int, x214.mem hash : chunknode �,x207.

mem lock : lockvar, x214.mem locker : coroutine, x127.mem read : octa ( ), x210.mem read time : int, x214.new fetch : label, x288.next : coroutine �, x23.no hardware PT : bool, x242.o: octa, x40.octa= struct, x17.p: register cacheblock �,x258.

phys addr : static octa ( ),x241.

prego =73, x49.PROT_OFFSET=5, x54.ptr b : void �, x44.PX_BIT=1� 5, x54.q: register cacheblock �,x258.

release lock =macro ( ), x37.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.sleep =macro, x125.startup : static void ( ), x31.state : int, x44.switch0 : label, x288.trans key =macro ( ), x240.wait =macro ( ), x125.y: spec, x44.z: spec, x44.

Page 295: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: THE FETCH STAGE 288

301. The I-cache �ller will wake us up with the octabyte we want, before it has�lled the entire cache block. In that case we can fetch one or two instructions beforethe rest of the block has been loaded.

hOther cases for the fetch coroutine 298 i +�bad fetch : if (data~ i � prego) goto �n ex ;data~ interrupt j= PX_BIT;

swym one : fetched [0]:h = fetched [0]:l = SWYM � 24;goto fetch one ;

case got one : fetched [0] = data~x:o; =� a \preview" of the new cache data �=fetch one : fetch lo = 0; fetch hi = 1;data~state = fetch ready ;

case fetch ready : if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (data~ i � prego) goto �n ex ;for (j = 0; j < fetch max ; j++) fregister fetch �new tail ;

if (tail � fetch bot ) new tail = fetch top ;else new tail = tail � 1;if (new tail � head ) break; =� fetch bu�er is full �=h Install a new instruction into the tail position 304 i;tail = new tail ;if (sleepy ) fsleepy = false ; sleep ;

ginst ptr :o = incr (inst ptr :o; 4);if (fetch lo � fetch hi ) goto new fetch ;

gwait (1);

302. h Insert dummy instruction for page table emulation 302 i �fif (cache search (ITcache ; trans key (inst ptr :o))) goto new fetch ;data~ interrupt j= F_BIT;sleepy = true ;goto swym one ;

gThis code is used in section 298.

303. hGlobal variables 20 i +�bool sleepy ; =� have we just emitted the page table emulation call? �=

Page 296: MMIXware - A RISC Computer for the Third Millennium - Knuth

289 MMIX-PIPE: THE FETCH STAGE

304. At this point we check for egregiously invalid instructions. (Sometimes the dis-patcher will actually allow such instructions to occupy the fetch bu�er, for internallygenerated commands.)

h Install a new instruction into the tail position 304 i �tail~ loc = inst ptr :o;if (inst ptr :o:l & 4) tail~ inst = fetched [fetch lo++]:l;else tail~ inst = fetched [fetch lo ]:h;tail~ interrupt = data~ interrupt ;i = tail~ inst � 24;if (i � RESUME ^ i � SYNC ^ (tail~ inst & bad inst mask [i� RESUME]))tail~ interrupt j= B_BIT;

tail~noted = false ;if (inst ptr :o:l � breakpoint :l ^ inst ptr :o:h � breakpoint :h) breakpoint hit = true ;

This code is used in section 301.

305. The commands RESUME, SAVE, UNSAVE, and SYNC should not have nonzero bitsin the positions de�ned here.

hGlobal variables 20 i +�int bad inst mask [4] = f#fffffe;#ffff;#ffff00;#fffff8g;

B_BIT=1� 2, x54.bool= enum, x11.breakpoint : octa, x10.breakpoint hit : bool, x12.cache search : staticcacheblock �( ), x193.

data : register control �,x124.

F_BIT=1� 17, x54.false =0, x11.fetch= struct, x68.fetch bot : fetch �, x69.fetch hi : int, x285.fetch lo : int, x285.fetch max : int, x59.fetch ready =23, x291.fetch top : fetch �, x69.fetched : octa �, x284.

�n ex : label, x144.got one =24, x291.h: tetra, x17.head : fetch �, x69.i: internal opcode, x44.i: register int, x12.incr : octa ( ), MMIX-ARITH x6.inst : tetra, x68.inst ptr : spec, x284.interrupt : unsigned int, x44.interrupt : unsigned int, x68.ITcache : cache �, x168.j: register int, x12.l: tetra, x17.loc : octa, x68.lockloc : coroutine ��, x23.new fetch : label, x288.noted : bool, x68.

o: octa, x40.prego =73, x49.PX_BIT=1� 5, x54.RESUME=#

f9, x47.SAVE=#

fa, x47.self : register coroutine �,x124.

sleep =macro, x125.state : int, x44.SWYM=#

fd, x47.SYNC=#

fc, x47.tail : fetch �, x69.trans key =macro ( ), x240.true =1, x11.UNSAVE=#

fb, x47.wait =macro ( ), x125.x: specnode, x44.

Page 297: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTERRUPTS 290

306. Interrupts. The scariest thing about the design of a pipelined machine isthe existence of interrupts, which disrupt the smooth ow of a computation in waysthat are diÆcult to anticipate. Fortunately, however, the discipline of a reorder bu�er,which forces instructions to be committed in order, allows us to deal with interruptsin a fairly natural way. Our solution to the problems of dynamic scheduling andspeculative execution therefore solves the interrupt problem as well.MMIX has three kinds of interrupts, which show up as bit codes in the interrupt �eld

when an instruction is ready to be committed: H_BIT invokes a trip handler, for TRIPinstructions and arithmetic exceptions; F_BIT invokes a forced-trap handler, for TRAPinstructions and unimplemented instructions that need to be emulated in software;E_BIT invokes a dynamic-trap handler, for external interrupts like I/O signals or forinternal interrupts caused by improper instructions. In all three cases, the pipelinecontrol has already been redirected to fetch new instructions starting at the correcthandler address by the time an interrupted instruction is ready to be committed.

307. Most instructions come to the following part of the program, if they have�nished execution with any 1s among the eight trip bits or the eight trap bits.If the trip bits aren't all zero, we want to update the event bits of rA, or perform

an enabled trip handler, or both. If the trap bits are nonzero, we need to hold ontothem until we get to the hot seat, when they will be joined with the bits of rQ andprobably cause an interrupt. A load or store instruction with nonzero trap bits willbe nulli�ed, not committed.Under ow that is exact and not enabled is ignored, in accordance with the IEEE

standard conventions. (This applies also to under ow triggered by RESUME_SET.)

#de�ne is load store (i) (i � ld ^ i � cswap )

hHandle interrupt at end of execution stage 307 i �fif ((data~ interrupt &

#ff) ^ is load store (data~ i)) goto state 5 ;

j = data~ interrupt &#ff00;

data~ interrupt �= j;if ((j & (U_BIT + X_BIT)) � U_BIT ^ :(data~ra :o:l & U_BIT)) j &= �U_BIT;data~arith exc = (j &�data~ra :o:l)� 8;if (j & data~ra :o:l) hPrepare for exceptional trip handler 308 i;if (data~ interrupt &

#ff) goto state 5 ;

gThis code is used in section 144.

Page 298: MMIXware - A RISC Computer for the Third Millennium - Knuth

291 MMIX-PIPE: INTERRUPTS

308. Since execution is speculative, an exceptional condition might not be partof the \real" computation. Indeed, the present coroutine might have already beendeissued.

hPrepare for exceptional trip handler 308 i �f

i = issued between (data ; cool );if (i < deissues ) goto die ;deissues = i;old tail = tail = head ; resuming = 0; =� clear the fetch bu�er �=hRestart the fetch coroutine 287 i;cool hist = data~hist ;for (i = j & data~ra :o:l;m = 16; :(i& D_BIT); i�= 1;m += 16) ;data~go :o:h = 0; data~go :o:l = m;inst ptr :o = data~go :o; inst ptr :p = �;data~ interrupt j= H_BIT;goto state 4 ;

gThis code is used in section 307.

309. hPrepare to emulate the page translation 309 i �i = issued between (data ; cool );if (i < deissues ) goto die ;deissues = i;old tail = tail = head ; resuming = 0; =� clear the fetch bu�er �=hRestart the fetch coroutine 287 i;cool hist = data~hist ;inst ptr :p = UNKNOWN_SPEC;data~ interrupt j= F_BIT;

This code is used in section 310.

arith exc : unsigned int, x44.cool : control �, x60.cool hist : unsigned int, x99.cswap =68, x49.D_BIT=1� 15, x54.data : register control �,x124.

deissues : int, x60.die : label, x144.E_BIT=1� 18, x54.F_BIT=1� 17, x54.go : specnode, x44.h: tetra, x17.

H_BIT=1� 16, x54.head : fetch �, x69.hist : unsigned int, x44.i: internal opcode, x44.i: register int, x12.inst ptr : spec, x284.interrupt : unsigned int, x44.issued between : static int ( ),x159.

j: register int, x12.l: tetra, x17.ld =56, x49.m: register int, x12.

o: octa, x40.old tail : fetch �, x70.p: specnode �, x40.ra : spec, x44.RESUME_SET=2, x320.resuming : int, x78.state 4 : label, x310.state 5 : label, x310.tail : fetch �, x69.U_BIT=1� 10, x54.UNKNOWN_SPEC=macro, x71.X_BIT=1� 8, x54.

Page 299: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTERRUPTS 292

310. We need to stop dispatching when calling a trip handler from within thereorder bu�er, lest we issue an instruction that uses g[255] or rB as an operand.

hSpecial cases for states in the �rst stage 266 i +�emulate virt : hPrepare to emulate the page translation 309 i;state 4 : data~state = 4;case 4: if (dispatch lock ) wait (1);set lock (self ; dispatch lock );

state 5 : data~state = 5;case 5: if (data 6= old hot ) wait (1);if ((data~ interrupt & F_BIT) ^ data~ i 6= trap) finst ptr :o = g[rT ]:o; inst ptr :p = �;if (is load store (data~ i)) nullifying = true ;

gif (data~ interrupt &

#ff) f

g[rQ ]:o:h j= data~ interrupt &#ff;

new Q :h j= data~ interrupt &#ff;

if (verbose & issue bit ) fprintf (" setting rQ="); print octa (g[rQ ]:o); printf ("\n");

gggoto die ;

311. The instructions of the previous section appear in the switch for coroutinestage 1 only. We need to use them also in later stages.

hSpecial cases for states in later stages 272 i +�case 4: goto state 4 ;case 5: goto state 5 ;

312. h Special cases of instruction dispatch 117 i +�case trap : if (( ags [op ] & X is dest bit ) ^ cool~xx < cool G ^ cool~xx � cool L)

goto increase L;if (:g[rT ]:up~known ) goto stall ;inst ptr = specval (&g[rT ]); =� traps and emulated ops �=cool~x:o = inst ptr :o;cool~need b = true ; cool~b = specval (&g[255]);

case trip : cool~ren x = true ; spec install (&g[255];&cool~x);cool~x:known = true ;if (i � trip) cool~x:o = cool~go :o = zero octa ;cool~ren a = true ; spec install (&g[i � trap ? rBB : rB ];&cool~a); break;

313. hCases for stage 1 execution 155 i +�case trap : data~ interrupt j= F_BIT; data~a:o = data~b:o; goto �n ex ;case trip : data~ interrupt j= H_BIT; data~a:o = data~b:o; goto �n ex ;

Page 300: MMIXware - A RISC Computer for the Third Millennium - Knuth

293 MMIX-PIPE: INTERRUPTS

314. The following check is performed at the beginning of every cycle. An instruc-tion in the hot seat can be externally interrupted only if it is ready to be committedand not already marked for tripping or trapping.

hCheck for external interrupt 314 i �g[rI ]:o = incr (g[rI ]:o;�1);if (g[rI ]:o:l � 0 ^ g[rI ]:o:h � 0) f

g[rQ ]:o:l j= INTERVAL_TIMEOUT;new Q :l j= INTERVAL_TIMEOUT;if (verbose & issue bit ) fprintf (" setting rQ="); print octa (g[rQ ]:o); printf ("\n");

ggtrying to interrupt = false ;if (((g[rQ ]:o:h& g[rK ]:o:h) _ (g[rQ ]:o:l & g[rK ]:o:l)) ^ cool 6= hot ^

:(hot~ interrupt & (E_BIT + F_BIT + H_BIT)) ^ :doing interrupt ^:(hot~ i � resum )) f

if (hot~owner ) trying to interrupt = true ;else fhot~ interrupt j= E_BIT;hDeissue all but the hottest command 316 i;inst ptr :o = g[rTT ]:o; inst ptr :p = �;

gg

This code is used in section 64.

315. hGlobal variables 20 i +�bool trying to interrupt ; =� encouraging interruptible operations to pause �=bool nullifying ; =� stopping dispatch to nullify a load/store command �=

a: specnode, x44.b: spec, x44.bool= enum, x11.cool : control �, x60.cool G : int, x99.cool L: int, x99.data : register control �,x124.

die : label, x144.dispatch lock : lockvar, x65.doing interrupt : int, x65.E_BIT=1� 18, x54.F_BIT=1� 17, x54.false =0, x11.�n ex : label, x144. ags : unsigned char [ ], x83.g: specnode [ ], x86.go : specnode, x44.h: tetra, x17.H_BIT=1� 16, x54.hot : control �, x60.i: internal opcode, x44.i: register int, x12.incr : octa ( ), MMIX-ARITH x6.

increase L: label, x110.inst ptr : spec, x284.interrupt : unsigned int, x44.INTERVAL_TIMEOUT=1 � 7,x57.

is load store =macro ( ), x307.issue bit =1� 0, x8.known : bool, x40.l: tetra, x17.need b : bool, x44.new Q : octa, x148.o: octa, x40.old hot : control �, x60.op : register mmix opcode,x75.

owner : coroutine �, x44.p: specnode �, x40.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.rB =0, x52.rBB =7, x52.ren a : bool, x44.ren x : bool, x44.resum =89, x49.

rI =12, x52.rK =15, x52.rQ =16, x52.rT =13, x52.rTT =14, x52.self : register coroutine �,x124.

set lock =macro ( ), x37.spec install : static void ( ),x95.

specval : static spec ( ), x93.stall : label, x75.state : int, x44.trap =82, x49.trip =83, x49.true =1, x11.up : specnode �, x40.verbose : int, x4.wait =macro ( ), x125.x: specnode, x44.X is dest bit =#

20, x83.xx : unsigned char, x44.zero octa : octa,MMIX-ARITH x4.

Page 301: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTERRUPTS 294

316. It's possible that the command in the hot seat has been deissued, but only ifthe simulator has done so at the user's request. Otherwise the test `i � deissues ' herewill always succeed.The value of cool hist becomes aky here. We could try to keep it strictly up to

date, but the unpredictable nature of external interrupts suggests that we are bettero� leaving it alone. (It's only a heuristic for branch prediction, and a suÆcientlystrong prediction will survive one-time glitches due to interrupts.)

hDeissue all but the hottest command 316 i �i = issued between (hot ; cool );if (i � deissues ) fdeissues = i;tail = head ; resuming = 0; =� clear the fetch bu�er �=hRestart the fetch coroutine 287 i;if (is load store (hot~ i)) nullifying = true ;

gThis code is used in section 314.

317. Even though an interrupted instruction has oÆcially been either \committed"or \nulli�ed," it stays in the hot seat for two or three extra cycles, while we saveenough of the machine state to resume the computation later.

hBegin an interruption and break 317 i �fif (:(hot~ interrupt & H_BIT)) g[rK ]:o = zero octa ; =� trap �=if (((hot~ interrupt & H_BIT) ^ hot~ i 6= trip) _

((hot~ interrupt & F_BIT) ^ hot~ i 6= trap ) _(hot~ interrupt & E_BIT)) doing interrupt = 3; suppress dispatch = true ;

else doing interrupt = 2; =� trip or trap started by dispatcher �=break;

gThis code is used in section 146.

318. If a memory failure occurs, we should set rF here, either in case 2 or case 1.The simulator doesn't do anything with rF at present.

hPerform one cycle of the interrupt preparations 318 i �switch (doing interrupt��) fcase 3: h Set resumption registers (rB; $255) or (rBB; $255) 319 i; break;case 2: h Set resumption registers (rW; rX) or (rWW; rXX) 320 i; break;case 1: h Set resumption registers (rY; rZ) or (rYY; rZZ) 321 i;if (hot � reorder bot ) hot = reorder top ; else hot��;break;

gThis code is used in section 64.

Page 302: MMIXware - A RISC Computer for the Third Millennium - Knuth

295 MMIX-PIPE: INTERRUPTS

319. h Set resumption registers (rB; $255) or (rBB; $255) 319 i �j = hot~ interrupt & H_BIT;g[j ? rB : rBB ]:o = g[255]:o;if (j) g[255]:o = zero octa ;else g[255]:o = g[hot~ interrupt & F_BIT ? rT : rTT ]:o;if (verbose & issue bit ) fif (j) fprintf (" setting rB="); print octa (g[rB ]:o);printf (", $255=0\n");

g else fprintf (" setting rBB="); print octa (g[rBB ]:o);printf (", $255="); print octa (g[255]:o); printf ("\n");

gg

This code is used in section 318.

cool : control �, x60.cool hist : unsigned int, x99.deissues : int, x60.doing interrupt : int, x65.E_BIT=1� 18, x54.F_BIT=1� 17, x54.g: specnode [ ], x86.H_BIT=1� 16, x54.head : fetch �, x69.hot : control �, x60.i: internal opcode, x44.i: register int, x12.interrupt : unsigned int, x44.

is load store =macro ( ), x307.issue bit =1� 0, x8.issued between : static int ( ),x159.

j: register int, x12.nullifying : bool, x315.o: octa, x40.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.rB =0, x52.rBB =7, x52.reorder bot : control �, x60.reorder top : control �, x60.

resuming : int, x78.rK =15, x52.rT =13, x52.rTT =14, x52.suppress dispatch : bool, x65.tail : fetch �, x69.trap =82, x49.trip =83, x49.true =1, x11.verbose : int, x4.zero octa : octa,MMIX-ARITH x4.

Page 303: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTERRUPTS 296

320. Here's where we manufacture the \ropcodes" for resumption.

#de�ne RESUME_AGAIN 0 =� repeat the command in rX as if in location rW � 4 �=#de�ne RESUME_CONT 1 =� same, but substitute rY and rZ for operands �=#de�ne RESUME_SET 2 =� set r[X] to rZ �=#de�ne RESUME_TRANS 3

=� install (rY; rZ) into IT-cache or DT-cache, then RESUME_AGAIN �=#de�ne pack bytes (a; b; c; d) ((((((unsigned) (a)� 8) + (b))� 8) + (c))� 8) + (d)

hSet resumption registers (rW; rX) or (rWW; rXX) 320 i �j = pack bytes (hot~op ; hot~xx ; hot~yy ; hot~zz );if (hot~ interrupt & H_BIT) f =� trip �=

g[rW ]:o = incr (hot~ loc ; 4);g[rX ]:o:h = sign bit ; g[rX ]:o:l = j;if (verbose & issue bit ) fprintf (" setting rW="); print octa (g[rW ]:o);printf (", rX="); print octa (g[rX ]:o); printf ("\n");

gg else f =� trap �=

g[rWW ]:o = hot~go :o;g[rXX ]:o:l = j;if (hot~ interrupt & F_BIT) f =� forced �=if (hot~ i 6= trap) j = RESUME_TRANS; =� emulate page translation �=else if (hot~op � TRAP) j = #

80; =� TRAP �=else if ( ags [internal op [hot~op ]] & X is dest bit ) j = RESUME_SET;

=� emulation �=else j = #

80; =� emulation when r[X] is not a destination �=g else f =� dynamic �=if (hot~ interim )

j = (hot~ i � frem_hot~ i � syncd_hot~ i � syncid ? RESUME_CONT : RESUME_AGAIN);else if (is load store (hot~ i)) j = RESUME_AGAIN;else j = #

80; =� normal external interruption �=gg[rXX ]:o:h = (j � 24) + (hot~ interrupt &

#ff);

if (verbose & issue bit ) fprintf (" setting rWW="); print octa (g[rWW ]:o);printf (", rXX="); print octa (g[rXX ]:o); printf ("\n");

gg

This code is used in section 318.

Page 304: MMIXware - A RISC Computer for the Third Millennium - Knuth

297 MMIX-PIPE: INTERRUPTS

321. h Set resumption registers (rY; rZ) or (rYY; rZZ) 321 i �j = hot~ interrupt & H_BIT;if ((hot~ interrupt & F_BIT) ^ hot~op � SWYM) g[rYY ]:o = hot~go :o;else g[j ? rY : rYY ]:o = hot~y:o;if (hot~ i � st _ hot~ i � pst ) g[j ? rZ : rZZ ]:o = hot~x:o;else g[j ? rZ : rZZ ]:o = hot~z:o;if (verbose & issue bit ) fif (j) fprintf (" setting rY="); print octa (g[rY ]:o);printf (", rZ="); print octa (g[rZ ]:o); printf ("\n");

g else fprintf (" setting rYY="); print octa (g[rYY ]:o);printf (", rZZ="); print octa (g[rZZ ]:o); printf ("\n");

gg

This code is used in section 318.

F_BIT=1� 17, x54. ags : unsigned char [ ], x83.frem =25, x49.g: specnode [ ], x86.go : specnode, x44.h: tetra, x17.H_BIT=1� 16, x54.hot : control �, x60.i: internal opcode, x44.incr : octa ( ), MMIX-ARITH x6.interim : bool, x44.internal op : internal opcode

[ ], x51.interrupt : unsigned int, x44.is load store =macro ( ), x307.issue bit =1� 0, x8.

j: register int, x12.l: tetra, x17.loc : octa, x44.o: octa, x40.op : mmix opcode, x44.print octa : static void ( ), x19.printf : int ( ), <stdio.h>.pst =66, x49.rW =24, x52.rWW =28, x52.rX =25, x52.rXX =29, x52.rY =26, x52.rYY =30, x52.rZ =27, x52.rZZ =31, x52.

sign bit =macro, x80.st =63, x49.SWYM=#

fd, x47.syncd =64, x49.syncid =65, x49.TRAP=#

00, x47.trap =82, x49.verbose : int, x4.x: specnode, x44.X is dest bit =#

20, x83.xx : unsigned char, x44.y: spec, x44.yy : unsigned char, x44.z: spec, x44.zz : unsigned char, x44.

Page 305: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTERRUPTS 298

322. Whew; we've successfully interrupted the computation. The remaining taskis to restart it again, as transparently as possible.The RESUME instruction waits for the pipeline to drain, because it has to do such

drastic things. For example, an interrupt may be occurring at this very moment,changing the registers needed for resumption.

hSpecial cases of instruction dispatch 117 i +�case resume : if (cool 6= old hot ) goto stall ;inst ptr = specval (&g[cool~zz ? rWW : rW ]);if (:(cool~ loc :h& sign bit )) fif (cool~zz ) cool~ interrupt j= K_BIT;else if (inst ptr :o:h& sign bit ) cool~ interrupt j= P_BIT;

gif (cool~ interrupt ) finst ptr :o = incr (cool~ loc ; 4); cool~ i = noop ;

g else fcool~go :o = inst ptr :o;if (cool~zz ) fhMagically do an I/O operation, if cool~ loc is rT 372 i;cool~ren a = true ; spec install (&g[rK ];&cool~a);cool~a:known = true ; cool~a:o = g[255]:o;cool~ren x = true ; spec install (&g[255];&cool~x);cool~x:known = true ; cool~x:o = g[rBB ]:o;

gcool~b = specval (&g[cool~zz ? rXX : rX ]);if (:(cool~b:o:h& sign bit )) hResume an interrupted operation 323 i;

g break;

323. Here we set cool~i = resum , since we want to issue another instruction afterthe RESUME itself.The restrictions on inserted instructions are designed to insure that those instruc-

tions will be the very next ones issued. (If, for example, an incgamma instructionwere necessary, it might cause a page fault and we'd lose the operand values forRESUME_SET or RESUME_CONT.)A subtle point arises here: If RESUME_TRANS is being used to compute the page

translation of virtual address zero, we don't want to execute the dummy SWYM in-struction from virtual address �4! So we avoid the SWYM altogether.

hResume an interrupted operation 323 i �fcool~xx = cool~b:o:h� 24; cool~ i = resum ;head~ loc = incr (inst ptr :o;�4);switch (cool~xx ) fcase RESUME_SET: cool~b:o:l = (SETH � 24) + (cool~b:o:l &

#ff0000);

head~ interrupt j= cool~b:o:h&#ff00;

resuming = 2;case RESUME_CONT: resuming += 1 + cool~zz ;if (((cool~b:o:l � 24) & #

fa) 6= #b8) f =� not syncd or syncid �=

m = cool~b:o:l� 28;if ((1� m) & #

8f30) goto bad resume ;

Page 306: MMIXware - A RISC Computer for the Third Millennium - Knuth

299 MMIX-PIPE: INTERRUPTS

m = (cool~b:o:l� 16) & #ff;

if (m � cool L ^m < cool G ) goto bad resume ;g

case RESUME_AGAIN: resume again : head~ inst = cool~b:o:l;m = head~ inst � 24;if (m � RESUME) goto bad resume ; =� avoid uninterruptible loop �=if (:cool~zz ^m > RESUME ^m � SYNC ^ (head~ inst & bad inst mask [m� RESUME]))head~ interrupt j= B_BIT;

head~noted = false ; break;case RESUME_TRANS: if (cool~zz ) f

cool~y = specval (&g[rYY ]); cool~z = specval (&g[rZZ ]);if ((cool~b:o:l� 24) 6= SWYM) goto resume again ;cool~ i = resume ; break; =� see \subtle point" above �=

gdefault: bad resume : cool~ interrupt j= B_BIT; cool~ i = noop ;resuming = 0; break;

gg

This code is used in section 322.

a: specnode, x44.b: spec, x44.B_BIT=1� 2, x54.bad inst mask : int [ ], x305.cool : control �, x60.cool G : int, x99.cool L: int, x99.false =0, x11.g: specnode [ ], x86.go : specnode, x44.h: tetra, x17.head : fetch �, x69.i: internal opcode, x44.incgamma =84, x49.incr : octa ( ), MMIX-ARITH x6.inst : tetra, x68.inst ptr : spec, x284.interrupt : unsigned int, x44.interrupt : unsigned int, x68.K_BIT=1� 3, x54.known : bool, x40.l: tetra, x17.

loc : octa, x44.loc : octa, x68.m: register int, x12.noop =81, x49.noted : bool, x68.o: octa, x40.old hot : control �, x60.P_BIT=1� 0, x54.rBB =7, x52.ren a : bool, x44.ren x : bool, x44.resum =89, x49.resume =76, x49.RESUME=#

f9, x47.RESUME_AGAIN=0, x320.RESUME_CONT=1, x320.RESUME_SET=2, x320.RESUME_TRANS=3, x320.resuming : int, x78.rK =15, x52.rW =24, x52.

rWW =28, x52.rX =25, x52.rXX =29, x52.rYY =30, x52.rZZ =31, x52.SETH=#

e0, x47.sign bit =macro, x80.spec install : static void ( ),x95.

specval : static spec ( ), x93.stall : label, x75.SWYM=#

fd, x47.SYNC=#

fc, x47.syncd =64, x49.syncid =65, x49.true =1, x11.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.z: spec, x44.zz : unsigned char, x44.

Page 307: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INTERRUPTS 300

324. h Insert special operands when resuming an interrupted operation 324 i �fif (resuming & 1) fcool~y = specval (&g[rY ]);cool~z = specval (&g[rZ ]);

g else fcool~y = specval (&g[rYY ]);cool~z = specval (&g[rZZ ]);

gif (resuming � 3) f =� RESUME_SET �=cool~need ra = true ; cool~ra = specval (&g[rA]);

gcool~usage = false ;

gThis code is used in section 103.

325. #de�ne do resume trans 17=� state for performing RESUME_TRANS actions �=

hCases for stage 1 execution 155 i +�case resume : case resum : if (data~xx 6= RESUME_TRANS) goto �n ex ;data~ptr a = (void �) ((data~b:o:l� 24) � SWYM ? ITcache : DTcache );data~state = do resume trans ;data~z:o = incr (oandn (data~z:o; page mask ); data~z:o:l & 7);data~z:o:h &= #

ffff;goto resume trans ;

326. h Special cases for states in the �rst stage 266 i +�case do resume trans :resume trans :f register cache �c = (cache �) data~ptr a ;if (c~ lock ) wait (1);if (c~�ller :next ) wait (1);p = alloc slot (c; trans key (data~y:o));if (p) f

c~�ller ctl :ptr b = (void �) p;c~�ller ctl :y:o = data~y:o;c~�ller ctl :b:o = data~z:o;c~�ller ctl :state = 1;schedule (&c~�ller ; c~access time ; 1);

ggoto �n ex ;

g

Page 308: MMIXware - A RISC Computer for the Third Millennium - Knuth

301 MMIX-PIPE: ADMINISTRATIVE OPERATIONS

327. Administrative operations.The internal instructions that handle the register stack simply reduce to things we

already know how to do. (Well, the internal instructions for saving and unsaving dosometimes lead to special cases, based on data~op ; for the most part, though, thenecessary mechanisms are already present.)

hCases for stage 1 execution 155 i +�case noop : if (data~ interrupt & F_BIT) goto emulate virt ;case jmp : case pushj : case incrl : case unsave : goto �n ex ;case sav : if (:(data~mem x )) goto �n ex ;case incgamma : case save : data~ i = st ;goto switch1 ;

case decgamma : case unsav : data~ i = ld ;goto switch1 ;

328. We can GET special registers � 21 (that is, rA, rF, rP, rW{rZ, or rWW{rZZ)only in the hot seat, because those registers are implicit outputs of many instructions.The same applies to rK, since it is changed by TRAP and by emulated instructions.

hCases for stage 1 execution 155 i +�case get : if (data~zz � 21 _ data~zz � rK ) f

if (data 6= old hot ) wait (1);data~z:o = g[data~zz ]:o;

gdata~x:o = data~z:o; goto �n ex ;

access time : int, x167.alloc slot : static cacheblock�( ), x205.

b: spec, x44.cache= struct, x167.cool : control �, x60.data : register control �,x124.

decgamma =85, x49.DTcache : cache �, x168.emulate virt : label, x310.F_BIT=1� 17, x54.false =0, x11.�ller : coroutine, x167.�ller ctl : control, x167.�n ex : label, x144.g: specnode [ ], x86.get =54, x49.h: tetra, x17.i: internal opcode, x44.incgamma =84, x49.incr : octa ( ), MMIX-ARITH x6.incrl =86, x49.interrupt : unsigned int, x44.ITcache : cache �, x168.

jmp =80, x49.l: tetra, x17.ld =56, x49.lock : lockvar, x167.mem x : bool, x44.need ra : bool, x44.next : coroutine �, x23.noop =81, x49.o: octa, x40.oandn : octa ( ),MMIX-ARITH x25.

old hot : control �, x60.op : mmix opcode, x44.p: register cacheblock �,x258.

page mask : octa, x238.ptr a : void �, x44.ptr b : void �, x44.pushj =71, x49.rA=21, x52.ra : spec, x44.resum =89, x49.resume =76, x49.RESUME_SET=2, x320.RESUME_TRANS=3, x320.

resuming : int, x78.rK =15, x52.rY =26, x52.rYY =30, x52.rZ =27, x52.rZZ =31, x52.sav =87, x49.save =77, x49.schedule : static void ( ), x28.specval : static spec ( ), x93.st =63, x49.state : int, x44.switch1 : label, x130.SWYM=#

fd, x47.trans key =macro ( ), x240.true =1, x11.unsav =88, x49.unsave =78, x49.usage : bool, x44.wait =macro ( ), x125.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.z: spec, x44.zz : unsigned char, x44.

Page 309: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: ADMINISTRATIVE OPERATIONS 302

329. A PUT is, similarly, delayed in the cases that hold dispatch lock . This programdoes not restrict the 1 bits that might be PUT into rQ, although the contents of thatregister can have drastic implications.

hCases for stage 1 execution 155 i +�case put : if (data~xx � 15 ^ data~xx � 20) f

if (data 6= old hot ) wait (1);switch (data~xx ) fcase rV : hUpdate the page variables 239 i; break;case rQ : new Q :h j= data~z:o:h&�g[rQ ]:o:h; new Q :l j= data~z:o:l &�g[rQ ]:o:l;data~z:o:l j= new Q :l; data~z:o:h j= new Q :h; break;

case rL: if (data~z:o:h 6= 0) data~z:o:h = 0; data~z:o:l = g[rL]:o:l;else if (data~z:o:l > g[rL]:o:l) data~z:o:l = g[rL]:o:l;

default: break;case rG : hUpdate rG 330 i; break;g

g else if (data~xx � rA ^ (data~z:o:h 6= 0 _ data~z:o:l � #40000))

data~ interrupt j= B_BIT;data~x:o = data~z:o; goto �n ex ;

330. When rG decreases, we assume that up to commit max marginal registers canbe zeroed during each clock cycle. (Remember that we're currently in the hot seat,and holding dispatch lock .)

hUpdate rG 330 i �if (data~z:o:h 6= 0 _ data~z:o:l � 256 _ data~z:o:l < g[rL]:o:l _ data~z:o:l < 32)data~ interrupt j= B_BIT;

else if (data~z:o:l < g[rG ]:o:l) fdata~ interim = true ; =� potentially interruptible �=for (j = 0; j < commit max ; j++) f

g[rG ]:o:l��;g[g[rG ]:o:l]:o = zero octa ;if (data~z:o:l � g[rG ]:o:l) break;

gif (j � commit max ) fif (:trying to interrupt ) wait (1);

g else data~ interim = false ;g

This code is used in section 329.

331. Computed jumps put the desired destination address into the go �eld.

hCases for stage 1 execution 155 i +�case go : data~x:o = data~go :o; goto add go ;case pop : data~x:o = data~y:o;data~y:o = data~b:o; =� move rJ to y �eld �=

case pushgo : add go : data~go :o = oplus (data~y:o; data~z:o);if ((data~go :o:h& sign bit ) ^ :(data~ loc :h& sign bit )) data~ interrupt j= P_BIT;data~go :known = true ; goto �n ex ;

Page 310: MMIXware - A RISC Computer for the Third Millennium - Knuth

303 MMIX-PIPE: ADMINISTRATIVE OPERATIONS

332. The instruction UNSAVE z generates a sequence of internal instructions that ac-complish the actual unsaving. This sequence is controlled by the instruction currentlyin the fetch bu�er, which changes its X and Y �elds until all global registers have beenloaded. The �rst instructions of the sequence are UNSAVE 0; 0; z; UNSAVE 1; rZ; z � 8;UNSAVE 1; rY; z�16; : : : ; UNSAVE 1; rB; z�96; UNSAVE 2; 255; z�104; UNSAVE 2; 254; z�112; etc. If an interrupt occurs before these instructions have all been committed, theexecution register will contain enough information to restart the process.After the global registers have all been loaded, UNSAVE continues by acting rather

like POP. An interrupt occurring during this last stage will �nd rS < rO; a contextswitch might then take us back to restoring the local registers again. But no infor-mation will be lost, even though the register from which we began unsaving has longsince been replaced.

hSpecial cases of instruction dispatch 117 i +�case unsave : if (cool~ interrupt & B_BIT) cool~ i = noop ;else fcool~ interim = true ;op = LDOU; =� this instruction needs to be handled by load/store unit �=cool~ i = unsav ;switch (cool~xx ) fcase 0: if (cool~z:p) goto stall ;h Set up the �rst phase of unsaving 334 i; break;

case 1: case 2: hGenerate an instruction to unsave g[yy ] 333 i; break;case 3: cool~ i = unsave ; cool~ interim = false ; op = UNSAVE;goto pop unsave ;

default: cool~ interim = false ; cool~ i = noop ; cool~ interrupt j= B_BIT; break;g

gbreak; =� this takes us to dispatch done �=

b: spec, x44.B_BIT=1� 2, x54.commit max : int, x59.cool : control �, x60.data : register control �,x124.

dispatch done : label, x101.dispatch lock : lockvar, x65.false =0, x11.�n ex : label, x144.g: specnode [ ], x86.go =72, x49.go : specnode, x44.h: tetra, x17.i: internal opcode, x44.interim : bool, x44.interrupt : unsigned int, x44.j: register int, x12.known : bool, x40.

l: tetra, x17.LDOU=#

8e, x47.loc : octa, x44.new Q : octa, x148.noop =81, x49.o: octa, x40.old hot : control �, x60.op : mmix opcode, x44.oplus : octa ( ), MMIX-ARITH x5.p: specnode �, x40.P_BIT=1� 0, x54.pop =75, x49.pop unsave : label, x120.pushgo =74, x49.put =55, x49.rA=21, x52.rG =19, x52.rL=20, x52.

rQ =16, x52.rV =18, x52.sign bit =macro, x80.stall : label, x75.true =1, x11.trying to interrupt : bool,x315.

unsav =88, x49.UNSAVE=#

fb, x47.unsave =78, x49.wait =macro ( ), x125.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.yy : unsigned char, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 311: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: ADMINISTRATIVE OPERATIONS 304

333. hGenerate an instruction to unsave g[yy ] 333 i �cool~ren x = true ; spec install (&g[cool~yy ];&cool~x);new O = new S = incr (cool O ;�1);cool~z:o = shift left (new O ; 3);cool~ptr a = (void �) mem :up ;

This code is used in section 332.

334. h Set up the �rst phase of unsaving 334 i �cool~ren x = true ; spec install (&g[rG ];&cool~x);cool~ren a = true ; spec install (&g[rA];&cool~a);new O = new S = shift right (cool~z:o; 3; 1);cool~set l = true ; spec install (&g[rL];&cool~rl );cool~ptr a = (void �) mem :up ;

This code is used in section 332.

335. hGet ready for the next step of UNSAVE 335 i �switch (cool~xx ) fcase 0: head~ inst = pack bytes (UNSAVE; 1; rZ ; 0); break;case 1: if (cool~yy � rP ) head~ inst = pack bytes (UNSAVE; 1; rR ; 0);else if (cool~yy � 0) head~ inst = pack bytes (UNSAVE; 2; 255; 0);else head~ inst = pack bytes (UNSAVE; 1; cool~yy � 1; 0); break;

case 2: if (cool~yy � cool G ) head~ inst = pack bytes (UNSAVE; 3; 0; 0);else head~ inst = pack bytes (UNSAVE; 2; cool~yy � 1; 0); break;

gThis code is used in section 81.

336. hHandle an internal UNSAVE when it's time to load 336 i �if (data~xx � 0) fdata~a:o = data~x:o; data~a:o:h &= #

ffffff; =� unsaved rA �=data~x:o:l = data~x:o:h� 24; data~x:o:h = 0; =� unsaved rG �=if (data~a:o:h _ (data~a:o:l &

#fffc0000)) f

data~a:o:h = 0; data~a:o:l &=#3ffff; data~ interrupt j= B_BIT;

gif (data~x:o:l < 32) fdata~x:o:l = 32; data~ interrupt j= B_BIT;

gggoto �n ex ;

This code is used in section 279.

337. Of course SAVE is handled essentially like UNSAVE, but backwards.

hSpecial cases of instruction dispatch 117 i +�case save : if (cool~xx < cool G ) cool~ interrupt j= B_BIT;if (cool~ interrupt & B_BIT) cool~ i = noop ;else if (((cool S :l � cool O :l) & lring mask ) � cool L ^ cool L 6= 0)h Insert an instruction to advance gamma 113 i

else fcool~ interim = true ;cool~ i = sav ;switch (cool~zz ) f

Page 312: MMIXware - A RISC Computer for the Third Millennium - Knuth

305 MMIX-PIPE: ADMINISTRATIVE OPERATIONS

case 0: h Set up the �rst phase of saving 338 i; break;case 1: if (cool O :l 6= cool S :l) h Insert an instruction to advance gamma 113 icool~zz = 2; cool~yy = cool G ;

case 2: case 3: hGenerate an instruction to save g[yy ] 339 i; break;default: cool~ interim = false ; cool~ i = noop ; cool~ interrupt j= B_BIT; break;g

gbreak;

338. If an interrupt occurs during the �rst phase, say between two incgamma

instructions, the value cool~zz = 1 will get things restarted properly. (Indeed, ifcontext is saved and unsaved during the interrupt, many incgamma instructions mayno longer be necessary.)

hSet up the �rst phase of saving 338 i �cool~zz = 1;cool~ren x = true ; spec install (&l[(cool O :l + cool L) & lring mask ];&cool~x);cool~x:known = true ; cool~x:o:h = 0; cool~x:o:l = cool L;cool~set l = true ; spec install (&g[rL];&cool~rl );new O = incr (cool O ; cool L + 1);

This code is used in section 337.

339. hGenerate an instruction to save g[yy ] 339 i �op = STOU; =� this instruction needs to be handled by load/store unit �=cool~mem x = true ; spec install (&mem ;&cool~x);cool~z:o = shift left (cool O ; 3);new O = new S = incr (cool O ; 1);if (cool~zz � 3 ^ cool~yy > rZ ) hDo the �nal SAVE 340 ielse cool~b = specval (&g[cool~yy ]);

This code is used in section 337.

a: specnode, x44.b: spec, x44.B_BIT=1� 2, x54.cool : control �, x60.cool G : int, x99.cool L: int, x99.cool O : octa, x98.cool S : octa, x98.data : register control �,x124.

false =0, x11.�n ex : label, x144.g: specnode [ ], x86.h: tetra, x17.head : fetch �, x69.i: internal opcode, x44.incgamma =84, x49.incr : octa ( ), MMIX-ARITH x6.inst : tetra, x68.interim : bool, x44.interrupt : unsigned int, x44.

known : bool, x40.l: tetra, x17.l: specnode �, x86.lring mask : int, x88.mem : specnode, x115.mem x : bool, x44.new O : octa, x99.new S : octa, x99.noop =81, x49.o: octa, x40.op : mmix opcode, x44.pack bytes =macro ( ), x320.ptr a : void �, x44.rA=21, x52.ren a : bool, x44.ren x : bool, x44.rG =19, x52.rl : specnode, x44.rL=20, x52.rP =23, x52.rR =6, x52.

rZ =27, x52.sav =87, x49.save =77, x49.set l : bool, x44.shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

spec install : static void ( ),x95.

specval : static spec ( ), x93.STOU=#

ae, x47.true =1, x11.UNSAVE=#

fb, x47.up : specnode �, x40.x: specnode, x44.xx : unsigned char, x44.yy : unsigned char, x44.z: spec, x44.zz : unsigned char, x44.

Page 313: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: ADMINISTRATIVE OPERATIONS 306

340. The �nal SAVE instruction not only stores rG and rA, it also places the �naladdress in global register X.

hDo the �nal SAVE 340 i �fcool~ i = save ;cool~ interim = false ;cool~ren a = true ; spec install (&g[cool~xx ];&cool~a);

gThis code is used in section 339.

341. hGet ready for the next step of SAVE 341 i �switch (cool~zz ) fcase 1: head~ inst = pack bytes (SAVE; cool~xx ; 0; 1); break;case 2: if (cool~yy � 255) head~ inst = pack bytes (SAVE; cool~xx ; 0; 3);else head~ inst = pack bytes (SAVE; cool~xx ; cool~yy + 1; 2); break;

case 3: if (cool~yy � rR) head~ inst = pack bytes (SAVE; cool~xx ; rP ; 3);else head~ inst = pack bytes (SAVE; cool~xx ; cool~yy + 1; 3); break;

gThis code is used in section 81.

342. hHandle an internal SAVE when it's time to store 342 i �fif (data~ interim ) data~x:o = data~b:o;else fif (data 6= old hot ) wait (1); =� we need the hottest value of rA �=data~x:o:h = g[rG ]:o:l� 24;data~x:o:l = g[rA]:o:l;data~a:o = data~y:o;

ggoto �n ex ;

gThis code is used in section 281.

Page 314: MMIXware - A RISC Computer for the Third Millennium - Knuth

307 MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS

343. More register-to-register ops. Now that we've �nished most of the hardstu�, we can relax and �ll in the holes that we left in the all-register parts of theexecution stages.First let's complete the �xed point arithmetic operations, by dispensing with mul-

tiplication and division.

hCases to compute the results of register-to-register operation 137 i +�case mulu : data~x:o = omult (data~y:o; data~z:o);data~a:o = aux ;goto quantify mul ;

case mul : data~x:o = signed omult (data~y:o; data~z:o);if (over ow ) data~ interrupt j= V_BIT;

quantify mul : aux = data~z:o;for (j = mul0 ; aux :l _ aux :h; j++) aux = shift right (aux ; 8; 1);data~ i = j; break; =� j is mul0 or mul1 or : : : or mul8 �=

case divu : data~x:o = odiv (data~b:o; data~y:o; data~z:o);data~a:o = aux ; data~ i = div ; break;

case div : if (data~z:o:l � 0 ^ data~z:o:h � 0) fdata~ interrupt j= D_BIT; data~a:o = data~y:o;data~ i = set ; =� divide by zero needn't wait in the pipeline �=

g else fdata~x:o = signed odiv (data~y:o; data~z:o);if (over ow ) data~ interrupt j= V_BIT;data~a:o = aux ;

g break;

a: specnode, x44.aux : octa, MMIX-ARITH x4.b: spec, x44.cool : control �, x60.D_BIT=1� 15, x54.data : register control �,x124.

div =9, x49.divu =28, x49.false =0, x11.�n ex : label, x144.g: specnode [ ], x86.h: tetra, x17.head : fetch �, x69.i: internal opcode, x44.inst : tetra, x68.interim : bool, x44.interrupt : unsigned int, x44.j: register int, x12.l: tetra, x17.

mul =26, x49.mul0 =0, x49.mul1 =1, x49.mul8 =8, x49.mulu =27, x49.o: octa, x40.odiv : octa ( ), MMIX-ARITH x13.old hot : control �, x60.omult : octa ( ),MMIX-ARITH x8.

over ow : bool,MMIX-ARITH x4.

pack bytes =macro ( ), x320.rA=21, x52.ren a : bool, x44.rG =19, x52.rP =23, x52.rR =6, x52.save =77, x49.

SAVE=#fa, x47.

set =33, x49.shift right : octa ( ),MMIX-ARITH x7.

signed odiv : octa ( ),MMIX-ARITH x24.

signed omult : octa ( ),MMIX-ARITH x12.

spec install : static void ( ),x95.

true =1, x11.V_BIT=1� 14, x54.wait =macro ( ), x125.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.yy : unsigned char, x44.z: spec, x44.zz : unsigned char, x44.

Page 315: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS 308

344. Next let's polish o� the bitwise and bytewise operations.

hCases to compute the results of register-to-register operation 137 i +�case sadd :data~x:o:l = count bits (data~y:o:h&�data~z:o:h) + count bits (data~y:o:l&�data~z:o:l);break;

case mor : data~x:o = bool mult (data~y:o; data~z:o; data~op & #2); break;

case bdif : data~x:o:h = byte di� (data~y:o:h; data~z:o:h);data~x:o:l = byte di� (data~y:o:l; data~z:o:l); break;

case wdif : data~x:o:h = wyde di� (data~y:o:h; data~z:o:h);data~x:o:l = wyde di� (data~y:o:l; data~z:o:l); break;

case tdif : if (data~y:o:h > data~z:o:h) data~x:o:h = data~y:o:h� data~z:o:h;tdif l : if (data~y:o:l > data~z:o:l) data~x:o:l = data~y:o:l � data~z:o:l; break;case odif : if (data~y:o:h > data~z:o:h) data~x:o = ominus (data~y:o; data~z:o);else if (data~y:o:h � data~z:o:h) goto tdif l ;break;

345. The conditional set (CS) instructions are, rather surprisingly, more diÆcult toimplement than the zero set (ZS) instructions, although the ZS instructions do more.The reason is that dynamic instruction dependencies are more complicated with CS.Consider, for example, the instructions

LDO x,a,b; FDIV y,c,d; CSZ y,x,0; INCL y,1.

If the value of x is zero, the INCL instruction need not wait for the division to becompleted. (We do not, however, abort the division in such a case; it might invokea trip handler, or change the inexact bit, etc. Our policy is to treat common caseseÆciently and to treat all cases correctly, but not to treat all cases with maximumeÆciency.)

hCases to compute the results of register-to-register operation 137 i +�case zset : if (register truth (data~y:o; data~op )) data~x:o = data~z:o;

=� otherwise data~x:o is already zero �=goto �n ex ;

case cset : if (register truth (data~y:o; data~op)) data~x:o = data~z:o; data~b:p = �;else if (data~b:p � �) data~x:o = data~b:o;else fdata~state = 0; data~need b = true ; goto switch1 ;

g break;

346. Floating point computations are mostly handled by the routines in MMIX-

ARITH, which record anomalous events in the global variable exceptions . But weconsider the operation trivial if an input is in�nite or NaN; and we may need toincrease the execution time when denormals are present.

#de�ne ROUND_OFF 1#de�ne ROUND_UP 2#de�ne ROUND_DOWN 3#de�ne ROUND_NEAR 4#de�ne is denormal (x) ((x:h& #

7ff00000) � 0 ^ ((x:h& #fffff) _ x:l))

#de�ne is trivial (x) ((x:h& #7ff00000) � #

7ff00000)#de�ne set round cur round = (data~ra :o:l <

#10000 ? ROUND_NEAR : data~ra :o:l� 16)

Page 316: MMIXware - A RISC Computer for the Third Millennium - Knuth

309 MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS

hCases to compute the results of register-to-register operation 137 i +�case fadd : set round ; data~x:o = fplus (data~y:o; data~z:o);�n b ot : if (is denormal (data~y:o)) data~denin = denin penalty ;�n u ot : if (is denormal (data~x:o)) data~denout = denout penalty ;�n ot : if (is denormal (data~z:o)) data~denin = denin penalty ;data~ interrupt j= exceptions ;if (is trivial (data~y:o) _ is trivial (data~z:o)) goto �n ex ;if (data~ i � fsqrt ^ (data~z:o:h& sign bit )) goto �n ex ;break;

case fsub : data~a:o = data~z:o;if (fcomp(data~z:o; zero octa ) 6= 2) data~a:o:h �= sign bit ;set round ; data~x:o = fplus (data~y:o; data~a:o);data~ i = fadd ; =� use pipeline times for addition �=goto �n b ot ;

case fmul : set round ;data~x:o = fmult (data~y:o; data~z:o); goto �n b ot ;

case fdiv : set round ;data~x:o = fdivide (data~y:o; data~z:o); goto �n b ot ;

case fsqrt : data~x:o = froot (data~z:o; data~y:o:l); goto �n u ot ;case �nt : data~x:o = �ntegerize (data~z:o; data~y:o:l); goto �n u ot ;case �x : data~x:o = �xit (data~z:o; data~y:o:l);if (data~op & #

2) exceptions &= �W_BIT; =� unsigned case doesn't over ow �=goto �n ot ;

case ot : data~x:o = oatit (data~z:o; data~y:o:l; data~op & #2; data~op & #

4);data~ interrupt j= exceptions ; break;

347. h Special cases of instruction dispatch 117 i +�case fsqrt : case �nt : case �x : case ot : if (cool~y:o:l > 4) goto illegal inst ;break;

a: specnode, x44.b: spec, x44.bdif =48, x49.bool mult : octa ( ),MMIX-ARITH x29.

byte di� : tetra ( ),MMIX-ARITH x27.

cool : control �, x60.count bits : int ( ),MMIX-ARITH x26.

cset =53, x49.cur round : int,MMIX-ARITH x30.

data : register control �,x124.

denin : int, x44.denin penalty : int, x349.denout : int, x44.denout penalty : int, x349.exceptions : int,MMIX-ARITH x32.

fadd =14, x49.fcomp : int ( ), MMIX-ARITH x84.fdiv =16, x49.fdivide : octa ( ),MMIX-ARITH x44.

�n ex : label, x144.�nt =18, x49.�ntegerize : octa ( ),MMIX-ARITH x86.

�x =19, x49.�xit : octa ( ), MMIX-ARITH x88. oatit : octa ( ),MMIX-ARITH x89.

ot =20, x49.fmul =15, x49.fmult : octa ( ),MMIX-ARITH x41.

fplus : octa ( ),MMIX-ARITH x46.

froot : octa ( ),MMIX-ARITH x912.

fsqrt =17, x49.fsub =24, x49.h: tetra, x17.i: internal opcode, x44.illegal inst : label, x118.interrupt : unsigned int, x44.l: tetra, x17.mor =13, x49.need b : bool, x44.

o: octa, x40.odif =51, x49.ominus : octa ( ),MMIX-ARITH x5.

op : mmix opcode, x44.p: specnode �, x40.ra : spec, x44.register truth : static int ( ),x157.

sadd =12, x49.sign bit =macro, x80.state : int, x44.switch1 : label, x130.tdif =50, x49.true =1, x11.W_BIT=1� 13, x54.wdif =49, x49.wyde di� : tetra ( ),MMIX-ARITH x28.

x: specnode, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

zset =52, x49.

Page 317: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS 310

348. hCases to compute the results of register-to-register operation 137 i +�case feps : j = fepscomp(data~y:o; data~z:o; data~b:o; data~op 6= FEQLE);if (j � 2) data~ i = fcmp ;else if (is denormal (data~y:o) _ is denormal (data~z:o)) data~denin = denin penalty ;switch (data~op) fcase FUNE: if (j � 2) goto cmp pos ; else goto cmp zero ;case FEQLE: goto cmp �n ;case FCMPE: if (j) goto cmp zero or invalid ;g

case fcmp : j = fcomp(data~y:o; data~z:o);if (j < 0) goto cmp neg ;

cmp �n : if (j � 1) goto cmp pos ;cmp zero or invalid : if (j � 2) data~ interrupt j= I_BIT;goto cmp zero ;

case funeq : if (fcomp(data~y:o; data~z:o) � (data~op � FUN ? 2 : 0)) goto cmp pos ;else goto cmp zero ;

349. hExternal variables 4 i +�Extern int frem max ;Extern int denin penalty ; denout penalty ;

350. The oating point remainder operation is especially interesting because it canbe interrupted when it's in the hot seat.

hCases to compute the results of register-to-register operation 137 i +�case frem : if (is trivial (data~y:o) _ is trivial (data~z:o)) f

data~x:o = fremstep(data~y:o; data~z:o; 2500); goto �n ex ;gif ((self + 1)~next ) wait (1);data~ interim = true ;j = 1;if (is denormal (data~y:o) _ is denormal (data~z:o)) j += denin penalty ;pass after (j);goto passit ;

351. hBegin execution of a stage-two operation 351 i �j = 1;if (data~ i � frem ) fdata~x:o = fremstep(data~y:o; data~z:o; frem max );if (exceptions & E_BIT) fdata~y:o = data~x:o;if (trying to interrupt ^ data � old hot ) goto �n ex ;

g else fdata~state = 3;data~ interim = false ;data~ interrupt j= exceptions ;if (is denormal (data~x:o)) j += denout penalty ;

gwait (j);

gThis code is used in section 135.

Page 318: MMIXware - A RISC Computer for the Third Millennium - Knuth

311 MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS

b: spec, x44.cmp neg : label, x143.cmp pos : label, x143.cmp zero : label, x143.data : register control �,x124.

denin : int, x44.E_BIT=1� 18, x54.exceptions : int,MMIX-ARITH x32.

Extern=macro, x4.false =0, x11.fcmp =22, x49.FCMPE=#

11, x47.fcomp : int ( ), MMIX-ARITH x84.feps =21, x49.fepscomp : int ( ),

MMIX-ARITH x50.FEQLE=#

13, x47.�n ex : label, x144.frem =25, x49.fremstep : octa ( ),MMIX-ARITH x93.

FUN=#02, x47.

FUNE=#12, x47.

funeq =23, x49.i: internal opcode, x44.I_BIT=1� 12, x54.interim : bool, x44.interrupt : unsigned int, x44.is denormal =macro ( ), x346.is trivial =macro ( ), x346.j: register int, x12.

next : coroutine �, x23.o: octa, x40.old hot : control �, x60.op : mmix opcode, x44.pass after =macro ( ), x125.passit : label, x134.self : register coroutine �,x124.

state : int, x44.true =1, x11.trying to interrupt : bool,x315.

wait =macro ( ), x125.x: specnode, x44.y: spec, x44.z: spec, x44.

Page 319: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: SYSTEM OPERATIONS 312

352. System operations. Finally we need to implement some operations for theoperating system; then the hardware simulation will be done!A LDVTS instruction is delayed until it reaches the hot seat, because it changes

the IT and DT caches. The operating system should use SYNC after LDVTS if thee�ects are needed immediately; the system is also responsible for ensuring that thepage table permission bits agree with the LDVTS permission bits when the latter arenonzero. (Also, if write permission is taken away from a page, the operating systemmust have previously used SYNCD to write out any dirty bytes that might have beencached from that page; SYNCD will be inoperative after write permission goes away.)

hHandle special cases for operations like prego and ldvts 289 i +�if (data~ i � ldvts ) hDo stage 1 of LDVTS 353 i;

353. hDo stage 1 of LDVTS 353 i �fif (data 6= old hot ) wait (1);if (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1);startup (&DTcache~reader [j];DTcache~access time );data~z:o:h = 0; data~z:o:l = data~y:o:l &

#7;

p = cache search (DTcache ; data~y:o); =� N.B.: Not trans key (data~y:o) �=if (p) fdata~x:o:l = 2;if (data~z:o:l) f

p = use and �x (DTcache ; p);p~data [0]:l = (p~data [0]:l &�8) + data~z:o:l;

g else fp = demote and �x (DTcache ; p);p~ tag :h j= sign bit ; =� invalidate the tag �=

ggpass after (DTcache~access time ); goto passit ;

gThis code is used in section 352.

354. h Special cases for states in later stages 272 i +�case ld st launch : if (ITcache~ lock _ (j = get reader (ITcache )) < 0) wait (1);startup (&ITcache~reader [j]; ITcache~access time );p = cache search (ITcache ; data~y:o); =� N.B.: Not trans key (data~y:o) �=if (p) fdata~x:o:l j= 1;if (data~z:o:l) f

p = use and �x (ITcache ; p);p~data [0]:l = (p~data [0]:l &�8) + data~z:o:l;

g else fp = demote and �x (ITcache ; p);p~ tag :h j= sign bit ; =� invalidate the tag �=

ggdata~state = 3; wait (ITcache~access time );

Page 320: MMIXware - A RISC Computer for the Third Millennium - Knuth

313 MMIX-PIPE: SYSTEM OPERATIONS

355. The SYNC operation interacts with the pipeline in interesting ways. SYNC 0

and SYNC 4 are the simplest; they just lock the dispatch and wait until they get to thehot seat, after which the pipeline has drained. SYNC 1 and SYNC 3 put a \barrier" intothe write bu�er so that subsequent store instructions will not merge with previousstores. SYNC 2 and SYNC 3 lock the dispatch until all previous load instructions haveleft the pipeline. SYNC 5, SYNC 6, and SYNC 7 remove things from caches once theyget to the hot seat.

hSpecial cases of instruction dispatch 117 i +�case sync : if (cool~zz > 3) f

if (:(cool~ loc :h& sign bit )) goto privileged inst ;if (cool~zz � 4) freeze dispatch = true ;

g else fif (cool~zz 6= 1) freeze dispatch = true ;if (cool~zz & 1) cool~mem x = true ; spec install (&mem ;&cool~x);

g break;

356. hCases for stage 1 execution 155 i +�case sync : switch (data~zz ) fcase 0: case 4: if (data 6= old hot ) wait (1);halted = (data~zz 6= 0); goto �n ex ;

case 2: case 3: hWait if there's an un�nished load ahead of us 357 i;release lock (self ; dispatch lock );

case 1: data~x:addr = zero octa ; goto �n ex ;case 5: if (data 6= old hot ) wait (1);hClean the data caches 361 i;

case 6: if (data 6= old hot ) wait (1);hZap the translation caches 358 i;

case 7: if (data 6= old hot ) wait (1);hZap the instruction and data caches 359 i;

g

access time : int, x167.addr : octa, x40.cache search : staticcacheblock �( ), x193.

cool : control �, x60.data : register control �,x124.

data : octa �, x167.demote and �x : staticcacheblock �( ), x199.

dispatch lock : lockvar, x65.DTcache : cache �, x168.�n ex : label, x144.freeze dispatch : registerbool, x75.

get reader : static int ( ), x183.h: tetra, x17.halted : bool, x12.i: internal opcode, x44.ITcache : cache �, x168.

j: register int, x12.l: tetra, x17.ld st launch =7, x265.ldvts =60, x49.loc : octa, x44.lock : lockvar, x167.mem : specnode, x115.mem x : bool, x44.o: octa, x40.old hot : control �, x60.p: register cacheblock �,x258.

pass after =macro ( ), x125.passit : label, x134.prego =73, x49.privileged inst : label, x118.reader : coroutine �, x167.release lock =macro ( ), x37.self : register coroutine �,

x124.sign bit =macro, x80.spec install : static void ( ),x95.

startup : static void ( ), x31.state : int, x44.sync =79, x49.tag : octa, x167.trans key =macro ( ), x240.true =1, x11.use and �x : staticcacheblock �( ), x196.

wait =macro ( ), x125.x: specnode, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

zz : unsigned char, x44.

Page 321: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: SYSTEM OPERATIONS 314

357. hWait if there's an un�nished load ahead of us 357 i �fregister control �cc ;for (cc = data ; cc 6= hot ; ) fcc = (cc � reorder top ? reorder bot : cc + 1);if (cc~owner ^ (cc~ i � ld _ cc~ i � ldunc _ cc~ i � pst )) wait (1);

gg

This code is used in section 356.

358. Perhaps the delay should be longer here.

hZap the translation caches 358 i �if (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1);startup (&DTcache~reader [j];DTcache~access time );set lock (self ;DTcache~ lock );zap cache (DTcache );data~state = 10; wait (DTcache~access time );

This code is used in section 356.

359. hZap the instruction and data caches 359 i �if (:Icache ) fdata~state = 11; goto switch1 ;

gif (Icache~ lock _ (j = get reader (Icache )) < 0) wait (1);startup (&Icache~reader [j]; Icache~access time );set lock (self ; Icache~ lock );zap cache (Icache );data~state = 11; wait (Icache~access time );

This code is used in section 356.

360. h Special cases for states in the �rst stage 266 i +�case 10: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (ITcache~ lock _ (j = get reader (ITcache )) < 0) wait (1);startup (&ITcache~reader [j]; ITcache~access time );set lock (self ; ITcache~ lock );zap cache (ITcache );data~state = 3; wait (ITcache~access time );

case 11: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (wbuf lock ) wait (1);write head = write tail ;write ctl :state = 0; =� zap the write bu�er �=if (:Dcache ) fdata~state = 12; goto switch1 ;

gif (Dcache~ lock _ (j = get reader (Dcache )) < 0) wait (1);startup (&Dcache~reader [j];Dcache~access time );set lock (self ;Dcache~ lock );zap cache (Dcache );data~state = 12; wait (Dcache~access time );

Page 322: MMIXware - A RISC Computer for the Third Millennium - Knuth

315 MMIX-PIPE: SYSTEM OPERATIONS

case 12: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (:Scache ) goto �n ex ;if (Scache~ lock ) wait (1);set lock (self ;Scache~ lock );zap cache (Scache );data~state = 3; wait (Scache~access time );

361. hClean the data caches 361 i �if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;hWait till write bu�er is empty 362 i;if (clean co :next _ clean lock ) wait (1);set lock (self ; clean lock );clean ctl :i = sync ; clean ctl :state = 0; clean ctl :x:o:h = 0;startup (&clean co ; 1);data~state = 13;data~ interim = true ;wait (1);

This code is used in section 356.

362. hWait till write bu�er is empty 362 i �if (write head 6= write tail ) fif (:speed lock ) set lock (self ; speed lock );wait (1);

gThis code is used in sections 361 and 364.

363. The cleanup process might take a huge amount of time, so we must allow it tobe interrupted. (Servicing the interruption might, of course, put more stu� into thecache.)

hSpecial cases for states in the �rst stage 266 i +�case 13: if (:clean co :next ) f

data~ interim = false ; goto �n ex ; =� it's done! �=gif (trying to interrupt ) goto �n ex ; =� accept an interruption �=wait (1);

access time : int, x167.clean co : coroutine, x230.clean ctl : control, x230.clean lock : lockvar, x230.control= struct, x44.data : register control �,x124.

Dcache : cache �, x168.DTcache : cache �, x168.false =0, x11.�n ex : label, x144.get reader : static int ( ), x183.h: tetra, x17.hot : control �, x60.i: internal opcode, x44.Icache : cache �, x168.interim : bool, x44.

ITcache : cache �, x168.j: register int, x12.ld =56, x49.ldunc =59, x49.lock : lockvar, x167.lockloc : coroutine ��, x23.next : coroutine �, x23.o: octa, x40.owner : coroutine �, x44.pst =66, x49.reader : coroutine �, x167.reorder bot : control �, x60.reorder top : control �, x60.Scache : cache �, x168.self : register coroutine �,x124.

set lock =macro ( ), x37.

speed lock : lockvar, x247.startup : static void ( ), x31.state : int, x44.switch1 : label, x130.sync =79, x49.true =1, x11.trying to interrupt : bool,x315.

wait =macro ( ), x125.wbuf lock : lockvar, x247.write ctl : control, x248.write head : write node �,x247.

write tail : write node �,x247.

x: specnode, x44.zap cache : void ( ), x181.

Page 323: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: SYSTEM OPERATIONS 316

364. Now we consider SYNCD and SYNCID. When control comes to this part of theprogram, data~y:o is a virtual address and data~z:o is the corresponding physicaladdress; data~xx +1 is the number of bytes we are supposed to be syncing; data~b:o:lis the number of bytes we can handle at once (either Icache~bb or Dcache~bb or 8192).We need a more elaborate scheme to implement SYNCD and SYNCID than we have

used for the \hint" instructions PRELD, PREGO, and PREST, because SYNCD and SYNCID

are not merely hints. They cannot be converted into a sequence of cache-block-sizecommands at dispatch time, because we cannot be sure that the starting virtualaddress will be aligned with the beginning of a cache block. We need to realize thatthe bytes speci�ed by SYNCD or SYNCID might cross a virtual page boundary|possiblywith di�erent protection bits on each page. We need to allow for interrupts. And wealso need to keep the fetch bu�er empty until a user's SYNCID has completely broughtthe memory up to date.

hSpecial cases for states in later stages 272 i +�do syncid : data~state = 30;case 30: if (data 6= old hot ) wait (1);if (:Icache ) fdata~state = (data~ loc :h& sign bit ? 31 : 33); goto switch2 ;

ghClean the I-cache block for data~z:o, if any 365 i;data~state = (data~ loc :h& sign bit ? 31 : 33); wait (Icache~access time );

case 31: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;hWait till write bu�er is empty 362 i;if (((data~b:o:l � 1) &�data~y:o:l) < data~xx ) data~ interim = true ;if (:Dcache ) goto next sync ;hClean the D-cache block for data~z:o, if any 366 i;data~state = 32; wait (Dcache~access time );

case 32: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (:Scache ) goto next sync ;hClean the S-cache block for data~z:o, if any 367 i;data~state = 35; wait (Scache~access time );

do syncd : data~state = 33;case 33: if (data 6= old hot ) wait (1);if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;hWait till write bu�er is empty 362 i;if (((data~b:o:l � 1) &�data~y:o:l) < data~xx ) data~ interim = true ;if (:Dcache )if (data~ i � syncd ) goto �n ex ; else goto next sync ;

hUse cleanup on the cache blocks for data~z:o, if any 368 i;data~state = 34;

case 34: if (:clean co :next ) goto next sync ;if (trying to interrupt ^ data~ interim ^ data � old hot ) fdata~z:o = zero octa ; =� anticipate RESUME_CONT �=goto �n ex ; =� accept an interruption �=

gwait (1);

next sync : data~state = 35;

Page 324: MMIXware - A RISC Computer for the Third Millennium - Knuth

317 MMIX-PIPE: SYSTEM OPERATIONS

case 35: if (self~ lockloc) �(self~ lockloc) = �; self~ lockloc = �;if (data~ interim ) hContinue this command on the next cache block 369 i;data~go :known = true ;goto �n ex ;

365. hClean the I-cache block for data~z:o, if any 365 i �if (Icache~ lock _ (j = get reader (Icache )) < 0) wait (1);startup (&Icache~reader [j]; Icache~access time );set lock (self ; Icache~ lock );p = cache search (Icache ; data~z:o);if (p) fdemote and �x (Icache ; p);clean block (Icache ; p);

gThis code is used in section 364.

366. hClean the D-cache block for data~z:o, if any 366 i �if (Dcache~ lock _ (j = get reader (Dcache )) < 0) wait (1);startup (&Dcache~reader [j];Dcache~access time );set lock (self ;Dcache~ lock );p = cache search (Dcache ; data~z:o);if (p) fdemote and �x (Dcache ; p);clean block (Dcache ; p);

gThis code is used in section 364.

367. hClean the S-cache block for data~z:o, if any 367 i �if (Scache~ lock ) wait (1);set lock (self ;Scache~ lock );p = cache search (Scache ; data~z:o);if (p) fdemote and �x (Scache ; p);clean block (Scache ; p);

gThis code is used in section 364.

access time : int, x167.b: spec, x44.bb : int, x167.cache search : staticcacheblock �( ), x193.

clean block : void ( ), x179.clean co : coroutine, x230.cleanup =91, x129.data : register control �,x124.

Dcache : cache �, x168.demote and �x : staticcacheblock �( ), x199.

�n ex : label, x144.get reader : static int ( ), x183.go : specnode, x44.h: tetra, x17.

i: internal opcode, x44.Icache : cache �, x168.interim : bool, x44.j: register int, x12.known : bool, x40.l: tetra, x17.loc : octa, x44.lock : lockvar, x167.lockloc : coroutine ��, x23.next : coroutine �, x23.o: octa, x40.old hot : control �, x60.p: register cacheblock �,x258.

reader : coroutine �, x167.RESUME_CONT=1, x320.Scache : cache �, x168.

self : register coroutine �,x124.

set lock =macro ( ), x37.sign bit =macro, x80.startup : static void ( ), x31.state : int, x44.switch2 : label, x135.syncd =64, x49.true =1, x11.trying to interrupt : bool,x315.

wait =macro ( ), x125.xx : unsigned char, x44.y: spec, x44.z: spec, x44.zero octa : octa,MMIX-ARITH x4.

Page 325: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: SYSTEM OPERATIONS 318

368. hUse cleanup on the cache blocks for data~z:o, if any 368 i �if (clean co :next _ clean lock ) wait (1);set lock (self ; clean lock );clean ctl :i = syncd ;clean ctl :state = 4;clean ctl :x:o:h = data~ loc :h& sign bit ;clean ctl :z:o = data~z:o;schedule (&clean co ; 1; 4);

This code is used in section 364.

369. We use the fact that cache block sizes are divisors of 8192.

hContinue this command on the next cache block 369 i �fdata~ interim = false ;data~xx �= ((data~b:o:l � 1) &�data~y:o:l) + 1;data~y:o = incr (data~y:o; data~b:o:l);data~y:o:l &= �data~b:o:l;data~z:o:l = (data~z:o:l &�8192) + (data~y:o:l & 8191);if ((data~y:o:l & 8191) � 0) goto square one ;

=� maybe crossed a page boundary �=if (data~ i � syncd ) goto do syncd ; else goto do syncid ;

gThis code is used in section 364.

370. If the �rst page lacks proper protection, we still must try the second, in therare case that a page boundary is spanned.

hSpecial cases for states in later stages 272 i +�sync check : if ((data~y:o:l � (data~y:o:l + data~xx )) � 8192) f

data~xx �= (8191 &�data~y:o:l) + 1;data~y:o = incr (data~y:o; 8192);data~y:o:l &= �8192;goto square one ;

ggoto �n ex ;

Page 326: MMIXware - A RISC Computer for the Third Millennium - Knuth

319 MMIX-PIPE: INPUT AND OUTPUT

371. Input and output. We're done implementing the hardware, but there'sstill a small matter of software remaining, because we sometimes want to pretendthat a real operating system is present without actually having one loaded. Thissimulator therefore implements a special feature: If RESUME 1 is issued in location rT,the ten special I/O traps of MMIX-SIM are performed instantaneously behind thescenes.Of course all claims of accurate simulation go out the door when this feature is

used.

#de�ne max sys call Ftell

hType de�nitions 11 i +�typedef enum fHalt ;Fopen ;Fclose ;Fread ;Fgets ;Fgetws ;Fwrite ;Fputs ;Fputws ;Fseek ;Ftell

g sys call;

b: spec, x44.clean co : coroutine, x230.clean ctl : control, x230.clean lock : lockvar, x230.cleanup =91, x129.data : register control �,x124.

do syncd : label, x364.do syncid : label, x364.false =0, x11.�n ex : label, x144.

h: tetra, x17.i: internal opcode, x44.incr : octa ( ), MMIX-ARITH x6.interim : bool, x44.l: tetra, x17.loc : octa, x44.next : coroutine �, x23.o: octa, x40.schedule : static void ( ), x28.self : register coroutine �,x124.

set lock =macro ( ), x37.sign bit =macro, x80.square one : label, x272.state : int, x44.syncd =64, x49.wait =macro ( ), x125.x: specnode, x44.xx : unsigned char, x44.y: spec, x44.z: spec, x44.

Page 327: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INPUT AND OUTPUT 320

372. hMagically do an I/O operation, if cool~ loc is rT 372 i �if (cool~ loc :l � g[rT ]:o:l ^ cool~ loc :h � g[rT ]:o:h) fregister unsigned char yy ; zz ;octa ma ; mb ;

if (g[rXX ]:o:l & #ffff0000) goto magic done ;

yy = g[rXX ]:o:l� 8; zz = g[rXX ]:o:l & #ff;

if (yy > max sys call ) goto magic done ;hPrepare memory arguments ma = M[a] and mb = M[b] if needed 380 i;switch (yy ) fcase Halt : hEither halt or print warning 373 i; break;case Fopen : g[rBB ]:o = mmix fopen (zz ;mb ;ma ); break;case Fclose : g[rBB ]:o = mmix fclose (zz ); break;case Fread : g[rBB ]:o = mmix fread (zz ;mb ;ma ); break;case Fgets : g[rBB ]:o = mmix fgets (zz ;mb ;ma ); break;case Fgetws : g[rBB ]:o = mmix fgetws (zz ;mb ;ma ); break;case Fwrite : g[rBB ]:o = mmix fwrite (zz ;mb ;ma ); break;case Fputs : g[rBB ]:o = mmix fputs (zz ; g[rBB ]:o); break;case Fputws : g[rBB ]:o = mmix fputws (zz ; g[rBB ]:o); break;case Fseek : g[rBB ]:o = mmix fseek (zz ; g[rBB ]:o); break;case Ftell : g[rBB ]:o = mmix ftell (zz ); break;g

magic done : g[255]:o = neg one ; =� this will enable interrupts �=g

This code is used in section 322.

373. hEither halt or print warning 373 i �if (:zz ) halted = true ;else if (zz � 1) focta trap loc ;

trap loc = incr (g[rWW ]:o;�4);if (:(trap loc :h _ trap loc :l � #

90))print trip warning (trap loc :l� 4; incr (g[rW ]:o;�4));

gThis code is used in section 372.

374. hGlobal variables 20 i +�char arg count [ ] = f1; 3; 1; 3; 3; 3; 3; 2; 2; 2; 1g;

375. The input/output operations invoked by TRAPs are done by subroutines in anauxiliary program module called MMIX-IO. Here we need only declare those subrou-tines, and write three primitive interfaces on which they depend.

376. hGlobal variables 20 i +�extern octa mmix fopen ARGS((unsigned char;octa;octa));extern octa mmix fclose ARGS((unsigned char));extern octa mmix fread ARGS((unsigned char;octa;octa));extern octa mmix fgets ARGS((unsigned char;octa;octa));extern octa mmix fgetws ARGS((unsigned char;octa;octa));extern octa mmix fwrite ARGS((unsigned char;octa;octa));extern octa mmix fputs ARGS((unsigned char;octa));

Page 328: MMIXware - A RISC Computer for the Third Millennium - Knuth

321 MMIX-PIPE: INPUT AND OUTPUT

extern octa mmix fputws ARGS((unsigned char;octa));extern octa mmix fseek ARGS((unsigned char;octa));extern octa mmix ftell ARGS((unsigned char));extern void print trip warning ARGS((int;octa));

377. h Internal prototypes 13 i +�int mmgetchars ARGS((char �; int;octa; int));int mmputchars ARGS((unsigned char �; int;octa));char stdin chr ARGS((void));octa magic read ARGS((octa));void magic write ARGS((octa;octa));

ARGS=macro, x6.cool : control �, x60.Fclose =2, x371.Fgets =4, x371.Fgetws =5, x371.Fopen =1, x371.Fputs =7, x371.Fputws =8, x371.Fread =3, x371.Fseek =9, x371.Ftell =10, x371.Fwrite =6, x371.g: specnode [ ], x86.h: tetra, x17.Halt =0, x371.halted : bool, x12.incr : octa ( ), MMIX-ARITH x6.

l: tetra, x17.loc : octa, x44.max sys call =macro, x371.mmix fclose : octa ( ),MMIX-IO x11.

mmix fgets : octa ( ),MMIX-IO x14.

mmix fgetws : octa ( ),MMIX-IO x16.

mmix fopen : octa ( ),MMIX-IO x8.

mmix fputs : octa ( ),MMIX-IO x19.

mmix fputws : octa ( ),MMIX-IO x20.

mmix fread : octa ( ),MMIX-IO x12.

mmix fseek : octa ( ),MMIX-IO x21.

mmix ftell : octa ( ),MMIX-IO x22.

mmix fwrite : octa ( ),MMIX-IO x18.

neg one : octa, MMIX-ARITH x4.o: octa, x40.octa= struct, x17.print trip warning : void ( ),MMIX-IO x23.

rBB =7, x52.rT =13, x52.rW =24, x52.rWW =28, x52.rXX =29, x52.true =1, x11.

Page 329: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INPUT AND OUTPUT 322

378. We need to cut through all the complications of bu�ers and caches in order todo magical I/O. The magic read routine �nds the current octabyte in a given physicaladdress by looking at the write bu�er, D-cache, S-cache, and memory until �nding it.

hSubroutines 14 i +�octa magic read (addr )

octa addr ;fregister write node �q;register cacheblock �p;for (q = write tail ; ; ) fif (q � write head ) break;if (q � wbuf top) q = wbuf bot ; else q++;if (q~addr :l � addr :l ^ q~addr :h � addr :h) return q~o;

gif (Dcache ) f

p = cache search (Dcache ; addr );if (p) return p~data [(addr :l & (Dcache~bb � 1))� 3];if (Scache ) f

p = cache search (Scache ; addr );if (p) return p~data [(addr :l & (Scache~bb � 1))� 3];

ggreturn mem read (addr );

g379. The magic write routine changes the octabyte in a given physical address bychanging it wherever it appears in a bu�er or cache. Any \dirty" or \least recentlyused" status remains unchanged. (Yes, this is magic.)

hSubroutines 14 i +�void magic write (addr ; val )

octa addr ; val ;fregister write node �q;register cacheblock �p;for (q = write tail ; ; ) fif (q � write head ) break;if (q � wbuf top) q = wbuf bot ; else q++;if (q~addr :l � addr :l ^ q~addr :h � addr :h) q~o = val ;

gif (Dcache ) f

p = cache search (Dcache ; addr );if (p) p~data [(addr :l & (Dcache~bb � 1))� 3] = val ;if (((Dcache~ inbuf :tag :l � addr :l) & Dcache~ tagmask ) � 0 ^Dcache~ inbuf :tag :h �

addr :h) Dcache~ inbuf :data [(addr :l & (Dcache~bb � 1))� 3] = val ;if (((Dcache~outbuf :tag :l�addr :l)&Dcache~ tagmask ) � 0^Dcache~outbuf :tag :h �

addr :h) Dcache~outbuf :data [(addr :l & (Dcache~bb � 1))� 3] = val ;if (Scache ) f

p = cache search (Scache ; addr );

Page 330: MMIXware - A RISC Computer for the Third Millennium - Knuth

323 MMIX-PIPE: INPUT AND OUTPUT

if (p) p~data [(addr :l & (Scache~bb � 1))� 3] = val ;if (((Scache~ inbuf :tag :l� addr :l) & Scache~ tagmask ) � 0^ Scache~ inbuf :tag :h �

addr :h) Scache~ inbuf :data [(addr :l & (Scache~bb � 1))� 3] = val ;if (((Scache~outbuf :tag :l�addr :l)&Scache~ tagmask ) � 0^Scache~outbuf :tag :h �

addr :h) Scache~outbuf :data [(addr :l & (Scache~bb � 1))� 3] = val ;g

gmem write (addr ; val );

g380. The conventions of our imaginary operating system require us to apply thetrivial memory mapping in which segment i appears in a 232-byte page of physicaladdresses starting at 232i.

hPrepare memory arguments ma = M[a] and mb = M[b] if needed 380 i �if (arg count [yy ] � 3) focta arg loc ;

arg loc = g[rBB ]:o;if (arg loc :h& #

9fffffff) mb = zero octa ;else arg loc :h�= 29;mb = magic read (arg loc);arg loc = incr (g[rBB ]:o; 8);if (arg loc :h& #

9fffffff) ma = zero octa ;else arg loc :h�= 29;ma = magic read (arg loc);

gThis code is used in section 372.

addr : octa, x246.arg count : char [ ], x374.bb : int, x167.cache search : staticcacheblock �( ), x193.

cacheblock= struct, x167.data : octa �, x167.Dcache : cache �, x168.g: specnode [ ], x86.h: tetra, x17.inbuf : cacheblock, x167.incr : octa ( ), MMIX-ARITH x6.

l: tetra, x17.ma : octa, x372.mb : octa, x372.mem read : octa ( ), x210.mem write : void ( ), x213.o: octa, x246.o: octa, x40.octa= struct, x17.outbuf : cacheblock, x167.rBB =7, x52.Scache : cache �, x168.tag : octa, x167.

tagmask : int, x167.wbuf bot : write node �, x247.wbuf top : write node �, x247.write head : write node �,x247.

write node= struct, x246.write tail : write node �,x247.

yy : register unsigned char,x372.

zero octa : octa,MMIX-ARITH x4.

Page 331: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INPUT AND OUTPUT 324

381. The subroutine mmgetchars (buf ; size ; addr ; stop) reads characters starting ataddress addr in the simulated memory and stores them in buf , continuing until sizecharacters have been read or some other stopping criterion has been met. If stop < 0there is no other criterion; if stop = 0 a null character will also terminate the process;otherwise addr is even, and two consecutive null bytes starting at an even address willterminate the process. The number of bytes read and stored, exclusive of terminatingnulls, is returned.

hSubroutines 14 i +�int mmgetchars (buf ; size ; addr ; stop )

char �buf ;int size ;octa addr ;int stop ;

fregister char �p;register int m;octa a; x;

if (((addr :h& #9fffffff) _ (incr (addr ; size � 1):h& #

9fffffff)) ^ size ) ffprintf (stderr ; "Attempt to get characters from off the page!\n");return 0;

gfor (p = buf ;m = 0; a = addr ; a:h�= 29; m < size ; ) f

x = magic read (a);if ((a:l & #

7) _m > size � 8) hRead and store one byte; return if done 382 ielse hRead and store up to eight bytes; return if done 383 i

greturn size ;

g382. hRead and store one byte; return if done 382 i �fif (a:l & #

4) �p = (x:l� (8 � ((�a:l) & #3))) & #

ff;else �p = (x:h� (8 � ((�a:l) & #

3))) & #ff;

if (:�p ^ stop � 0) fif (stop � 0) return m;if ((a:l & #

1) ^ �(p� 1) � '\0') return m� 1;gp++;m++; a = incr (a; 1);

gThis code is used in section 381.

383. hRead and store up to eight bytes; return if done 383 i �f�p = x:h� 24;if (:�p ^ (stop � 0 _ (stop > 0 ^ x:h < #

10000))) return m;�(p+ 1) = (x:h� 16) & #

ff;if (:�(p+ 1) ^ stop � 0) return m+ 1;�(p+ 2) = (x:h� 8) & #

ff;if (:�(p+ 2) ^ (stop � 0 _ (stop > 0 ^ (x:h& #

ffff) � 0))) return m+ 2;

Page 332: MMIXware - A RISC Computer for the Third Millennium - Knuth

325 MMIX-PIPE: INPUT AND OUTPUT

�(p+ 3) = x:h& #ff;

if (:�(p+ 3) ^ stop � 0) return m+ 3;�(p+ 4) = x:l� 24;if (:�(p+ 4) ^ (stop � 0 _ (stop > 0 ^ x:l < #

10000))) return m+ 4;�(p+ 5) = (x:l� 16) & #

ff;if (:�(p+ 5) ^ stop � 0) return m+ 5;�(p+ 6) = (x:l� 8) & #

ff;if (:�(p+ 6) ^ (stop � 0 _ (stop > 0 ^ (x:l & #

ffff) � 0))) return m+ 6;�(p+ 7) = x:l & #

ff;if (:�(p+ 7) ^ stop � 0) return m+ 7;p += 8;m += 8; a = incr (a; 8);

gThis code is used in section 381.

384. The subroutine mmputchars (buf ; size ; addr ) puts size characters into the sim-ulated memory starting at address addr .

hSubroutines 14 i +�int mmputchars (buf ; size ; addr )

unsigned char �buf ;int size ;octa addr ;

fregister unsigned char �p;register int m;octa a; x;

if (((addr :h& #9fffffff) _ (incr (addr ; size � 1):h& #

9fffffff)) ^ size ) ffprintf (stderr ; "Attempt to put characters off the page!\n");return 0;

gfor (p = buf ;m = 0; a = addr ; a:h�= 29; m < size ; ) fif ((a:l & #

7) _m > size � 8) hLoad and write one byte 385 ielse hLoad and write eight bytes 386 i;

gg

385. hLoad and write one byte 385 i �fregister int s = 8 � ((�a:l) & #

3);

x = magic read (a);if (a:l & #

4) x:l �= (((x:l� s)� �p) & #ff)� s;

else x:h �= (((x:h� s)� �p) & #ff)� s;

magic write (a; x);p++;m++; a = incr (a; 1);

gThis code is used in section 384.

fprintf : int ( ), <stdio.h>.h: tetra, x17.incr : octa ( ), MMIX-ARITH x6.

l: tetra, x17.magic read : octa ( ), x378.magic write : void ( ), x379.

octa= struct, x17.stderr : FILE �, <stdio.h>.

Page 333: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: INPUT AND OUTPUT 326

386. hLoad and write eight bytes 386 i �f

x:h = (�p� 24) + (�(p+ 1)� 16) + (�(p+ 2)� 8) + �(p+ 3);x:l = (�(p+ 4)� 24) + (�(p+ 5)� 16) + (�(p+ 6)� 8) + �(p+ 7);magic write (a; x);p += 8;m += 8; a = incr (a; 8);

gThis code is used in section 384.

387. When standard input is being read by the simulated program at the sametime as it is being used for interaction, we try to keep the two uses separate bymaintaining a private bu�er for the simulated program's StdIn. Online input isusually transmitted from the keyboard to a C program a line at a time; thereforean fgets operation works much better than fread when we prompt for new input.But there is a slight complication, because fgets might read a null character beforecoming to a newline character. We cannot deduce the number of characters read byfgets simply by looking at strlen (stdin buf ).

hSubroutines 14 i +�char stdin chr ( )fregister char �p;while (stdin buf start � stdin buf end ) fprintf ("StdIn> "); �ush (stdout );fgets (stdin buf ; 256; stdin );stdin buf start = stdin buf ;for (p = stdin buf ; p < stdin buf + 254; p++)if (�p � '\n') break;

stdin buf end = p+ 1;greturn �stdin buf start ++;

g388. hGlobal variables 20 i +�char stdin buf [256]; =� standard input to the simulated program �=char �stdin buf start ; =� current position in that bu�er �=char �stdin buf end ; =� current end of that bu�er �=

Page 334: MMIXware - A RISC Computer for the Third Millennium - Knuth

327 MMIX-PIPE: NAMES OF THE SECTIONS

389. Names of the sections.

hAllocate a slot p in the S-cache 218 i Used in section 217.

hAssign a functional unit if available, otherwise goto stall 82 i Used in section 75.

hBegin an interruption and break 317 i Used in section 146.

hBegin execution of a stage-two operation 351 i Used in section 135.

hBegin execution of an operation 132 i Used in section 130.

hBegin fetch with known physical address 296 i Used in section 288.

hBegin fetch without I-cache lookup 295 i Used in section 291.

hCases 0 through 4, for the D-cache 233 i Used in section 232.

hCases 5 through 9, for the S-cache 234 i Used in section 232.

hCases for control of special coroutines 126, 215, 217, 222, 224, 232, 237, 257 i Used in

section 125.

hCases for stage 1 execution 155, 313, 325, 327, 328, 329, 331, 356 i Used in section 132.

hCases to compute the results of register-to-register operation 137, 138, 139, 140, 141,

142, 143, 343, 344, 345, 346, 348, 350 i Used in section 132.

hCases to compute the virtual address of a memory operation 265 i Used in sec-

tion 132.

hCheck for a hit in pending writes 278 i Used in section 273.

hCheck for external interrupt 314 i Used in section 64.

hCheck for prest with a fully spanned cache block 275 i Used in section 274.

hCheck for security violation, break if so 149 i Used in section 67.

hCheck for suÆcient rename registers and memory slots, or goto stall 111 i Used

in section 75.

hClean the D-cache block for data~z:o, if any 366 i Used in section 364.

hClean the data caches 361 i Used in section 356.

hClean the I-cache block for data~z:o, if any 365 i Used in section 364.

hClean the S-cache block for data~z:o, if any 367 i Used in section 364.

hCommit and/or deissue up to commit max instructions 67 i Used in section 64.

hCommit the hottest instruction, or break if it's not ready 146 i Used in section 67.

hCommit to memory if possible, otherwise break 256 i Used in section 146.

hCompute the new entry for c~ inbuf and give the caller a sneak preview 245 i Used

in section 237.

hContinue this command on the next cache block 369 i Used in section 364.

hConvert relative address to absolute address 84 i Used in section 75.

hCopy data from p into c~ inbuf 226 i Used in section 224.

hCopy Scache~ inbuf to slot p 220 i Used in section 217.

hCopy the data from block q to fetched 294 i Used in sections 292 and 296.

hDeclare mmix opcode and internal opcode 47, 49 i Used in section 44.

hDeissue all but the hottest command 316 i Used in section 314.

hDeissue the coolest instruction 145 i Used in section 67.

a: octa, x384.�ush : int ( ), <stdio.h>.fgets : char �( ), <stdio.h>.fread : size t ( ), <stdio.h>.h: tetra, x17.incr : octa ( ), MMIX-ARITH x6.

l: tetra, x17.m: register int, x384.magic write : void ( ), x379.p: register unsigned char �,x384.

printf : int ( ), <stdio.h>.stdin : FILE �, <stdio.h>.stdout : FILE �, <stdio.h>.strlen : int ( ), <string.h>.x: octa, x384.

Page 335: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: NAMES OF THE SECTIONS 328

hDetermine the ags, f , and the internal opcode, i 80 i Used in section 75.

hDispatch an instruction to the cool block if possible, otherwise goto stall 101 iUsed in section 75.

hDispatch one cycle's worth of instructions 74 i Used in section 64.

hDo a simultaneous lookup in the D-cache 268 i Used in section 267.

hDo a simultaneous lookup in the I-cache 292 i Used in section 291.

hDo load/store stage 1 without D-cache lookup 270 i Used in section 267.

hDo load/store stage 1 with known physical address 271 i Used in section 266.

hDo load/store stage 2 without D-cache lookup 277 i Used in section 273.

hDo stage 1 of LDVTS 353 i Used in section 352.

hDo the �nal SAVE 340 i Used in section 339.

hEither halt or print warning 373 i Used in section 372.

hExecute all coroutines scheduled for the current time 125 i Used in section 64.

hExternal prototypes 9, 38, 161, 175, 178, 180, 209, 212, 252 i Used in sections 3 and 5.

hExternal routines 10, 39, 162, 176, 179, 181, 210, 213, 253 i Used in section 3.

hExternal variables 4, 29, 59, 60, 66, 69, 77, 86, 98, 115, 136, 150, 168, 207, 211, 214, 242, 247,

284, 349 i Used in sections 3 and 5.

hFill Scache~ inbuf with clean memory data 219 i Used in section 217.

hFinish a CSWAP 283 i Used in section 281.

hFinish a store command 281 i Used in section 280.

hFinish execution of an operation 144 i Used in section 130.

hForward the new data past the D-cache if it is write-through 263 i Used in sec-

tion 257.

hGenerate an instruction to save g[yy ] 339 i Used in section 337.

hGenerate an instruction to unsave g[yy ] 333 i Used in section 332.

hGet ready for the next step of PREGO 229 i Used in section 81.

hGet ready for the next step of PRELD or PREST 228 i Used in section 81.

hGet ready for the next step of SAVE 341 i Used in section 81.

hGet ready for the next step of UNSAVE 335 i Used in section 81.

hGlobal variables 20, 36, 41, 48, 50, 51, 53, 54, 65, 70, 78, 83, 88, 99, 107, 127, 148, 154, 194, 230,

235, 238, 248, 285, 303, 305, 315, 374, 376, 388 i Used in section 3.

hHandle an internal SAVE when it's time to store 342 i Used in section 281.

hHandle an internal UNSAVE when it's time to load 336 i Used in section 279.

hHandle interrupt at end of execution stage 307 i Used in section 144.

hHandle special cases for operations like prego and ldvts 289, 352 i Used in section 266.

hHandle write-around when ushing to the S-cache 221 i Used in section 217.

hHandle write-around when writing to the D-cache 259 i Used in section 257.

hHeader de�nitions 6, 7, 8, 52, 57, 87, 129, 166 i Used in sections 3 and 5.

h Ignore the item in write head 264 i Used in section 257.

h Initialize everything 22, 26, 61, 71, 79, 89, 116, 128, 153, 231, 236, 249, 286 i Used in sec-

tion 10.

h Insert an instruction to advance beta and L 112 i Used in section 110.

h Insert an instruction to advance gamma 113 i Used in sections 110, 119, and 337.

h Insert an instruction to decrease gamma 114 i Used in section 120.

Page 336: MMIXware - A RISC Computer for the Third Millennium - Knuth

329 MMIX-PIPE: NAMES OF THE SECTIONS

h Insert data~b:o into the proper �eld of data~x:o, checking for arithmetic exceptionsif signed 282 i Used in section 281.

h Insert dummy instruction for page table emulation 302 i Used in section 298.

h Insert special operands when resuming an interrupted operation 324 i Used in

section 103.

h Install a new instruction into the tail position 304 i Used in section 301.

h Install default �elds in the cool block 100 i Used in section 75.

h Install register X as the destination, or insert an internal command and gotodispatch done if X is marginal 110 i Used in section 101.

h Install the operand �elds of the cool block 103 i Used in section 101.

h Internal prototypes 13, 18, 24, 27, 30, 32, 34, 42, 45, 55, 62, 72, 90, 92, 94, 96, 156, 158, 169, 171,

173, 182, 184, 186, 188, 190, 192, 195, 198, 200, 202, 204, 240, 250, 254, 377 i Used in section 3.

h Issue j pseudo-instructions to compute a page table entry 244 i Used in section 243.

h Issue the cool instruction 81 i Used in section 75.

hLoad and write eight bytes 386 i Used in section 384.

hLoad and write one byte 385 i Used in section 384.

hLocal variables 12, 124, 258 i Used in section 10.

hLook at the head instruction, and try to dispatch it if j < dispatch max 75 i Used

in section 74.

hLook up the address in the DT-cache, and also in the D-cache if possible 267 iUsed in section 266.

hLook up the address in the IT-cache, and also in the I-cache if possible 291 i Used

in section 288.

hMagically do an I/O operation, if cool~ loc is rT 372 i Used in section 322.

hMake sure cool L and cool G are up to date 102 i Used in section 101.

hNullify the hottest instruction 147 i Used in section 146.

hOther cases for the fetch coroutine 298, 301 i Used in section 288.

hPass data to the next stage of the pipeline 134 i Used in section 130.

hPerform one cycle of the interrupt preparations 318 i Used in section 64.

hPerform one machine cycle 64 i Used in section 10.

hPredict a branch outcome 151 i Used in section 85.

hPrepare for exceptional trip handler 308 i Used in section 307.

hPrepare memory arguments ma = M[a] and mb = M[b] if needed 380 i Used in

section 372.

hPrepare to emulate the page translation 309 i Used in section 310.

hPrint all of c's cache blocks 177 i Used in section 176.

hRead and store one byte; return if done 382 i Used in section 381.

hRead and store up to eight bytes; return if done 383 i Used in section 381.

hRead data into c~ inbuf and wait for the bus 223 i Used in section 222.

hRead from memory into fetched 297 i Used in section 296.

hRecord the result of branch prediction 152 i Used in section 75.

hRecover from incorrect branch prediction 160 i Used in section 155.

hRedirect the fetch if control changes at this inst 85 i Used in section 75.

hRestart the fetch coroutine 287 i Used in sections 85, 160, 308, 309, and 316.

hResume an interrupted operation 323 i Used in section 322.

Page 337: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-PIPE: NAMES OF THE SECTIONS 330

hSet cool~b and/or cool~ra from special register 108 i Used in section 103.

hSet cool~b from register X 106 i Used in section 103.

hSet cool~y from register Y 105 i Used in section 103.

hSet cool~z as an immediate wyde 109 i Used in section 103.

hSet cool~z from register Z 104 i Used in section 103.

hSet resumption registers (rB; $255) or (rBB; $255) 319 i Used in section 318.

hSet resumption registers (rW; rX) or (rWW; rXX) 320 i Used in section 318.

hSet resumption registers (rY; rZ) or (rYY; rZZ) 321 i Used in section 318.

hSet things up so that the results become known when they should 133 i Used in

section 132.

hSet up the �rst phase of saving 338 i Used in section 337.

hSet up the �rst phase of unsaving 334 i Used in section 332.

hSimulate an action of the fetch coroutine 288 i Used in section 125.

hSimulate later stages of an execution pipeline 135 i Used in section 125.

hSimulate the �rst stage of an execution pipeline 130 i Used in section 125.

hSpecial cases for states in later stages 272, 273, 276, 279, 280, 299, 311, 354, 364, 370 iUsed in section 135.

hSpecial cases for states in the �rst stage 266, 310, 326, 360, 363 i Used in section 130.

hSpecial cases of instruction dispatch 117, 118, 119, 120, 121, 122, 227, 312, 322, 332, 337,

347, 355 i Used in section 101.

hStart the S-cache �ller 225 i Used in section 224.

hStart up auxiliary coroutines to compute the page table entry 243 i Used in sec-

tion 237.

hSubroutines 14, 19, 21, 25, 28, 31, 33, 35, 43, 46, 56, 63, 73, 91, 93, 95, 97, 157, 159, 170, 172, 174,

183, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 208, 241, 251, 255, 378, 379, 381, 384, 387 iUsed in section 3.

hSwap cache blocks p and q 197 i Used in sections 196 and 205.

hTry to get the contents of location data~z:o in the D-cache 274 i Used in section 273.

hTry to get the contents of location data~z:o in the I-cache 300 i Used in section 298.

hTry to put the contents of location write head~addr into the D-cache 261 i Used

in section 257.

hType de�nitions 11, 17, 23, 37, 40, 44, 68, 76, 164, 167, 206, 246, 371 i Used in sections 3

and 5.

hUndo data structures set prematurely in the cool block and break 123 i Used in

section 75.

hUpdate DT-cache usage and check the protection bits 269 i Used in sections 268,

270, and 272.

hUpdate IT-cache usage and check the protection bits 293 i Used in sections 292

and 295.

hUpdate rG 330 i Used in section 329.

hUpdate the page variables 239 i Used in section 329.

hUse cleanup on the cache blocks for data~z:o, if any 368 i Used in section 364.

hWait for input data if necessary; set state = 1 if it's there 131 i Used in section 130.

hWait if there's an un�nished load ahead of us 357 i Used in section 356.

Page 338: MMIXware - A RISC Computer for the Third Millennium - Knuth

331 MMIX-PIPE: NAMES OF THE SECTIONS

hWait till write bu�er is empty 362 i Used in sections 361 and 364.

hWait, if necessary, until the instruction pointer is known 290 i Used in section 288.

hWrite directly from write head to memory 260 i Used in section 257.

hWrite the data into the D-cache and set state = 4, if there's a cache hit 262 i Used

in section 257.

hWrite the dirty data of c~outbuf and wait for the bus 216 i Used in section 215.

hZap the instruction and data caches 359 i Used in section 356.

hZap the translation caches 358 i Used in section 356.

h mmix-pipe.h 5 i

Page 339: MMIXware - A RISC Computer for the Third Millennium - Knuth

332

MMIX-SIM

1. Introduction. This program simulates a simpli�ed version of the MMIX com-puter. Its main goal is to help people create and test MMIX programs for The Art

of Computer Programming and related publications. It provides only a rudimen-tary terminal-oriented interface, but it has enough infrastructure to support a coolgraphical user interface | which could be added by a motivated reader. (Hint, hint.)MMIX is simpli�ed in the following ways:

� There is no pipeline, and there are no caches. Thus, commands like SYNC and SYNCD

and PREGO do nothing.

� Simulation applies only to user programs, not to an operating system kernel. Thus,all addresses must be nonnegative; \privileged" commands such as PUT rK,z orRESUME 1 or LDVTS x,y,z are not allowed; instructions should be executed onlyfrom addresses in segment 0 (addresses less than #2000000000000000). Certainspecial registers remain constant: rF = 0, rK = #ffffffffffffffff, rQ = 0;rT = #8000000500000000, rTT = #8000000600000000, rV = #369c200400000000.

� No trap interrupts are implemented, except for a few special cases of TRAP thatprovide rudimentary input-output.

� All instructions take a �xed amount of time, given by the rough estimates stated inthe MMIX documentation. For example, MUL takes 10�, LDB takes �+ � ; all times areexpressed in terms of � and �, \mems" and \oops." The clock register rC increasesby 232 for each � and 1 for each �. But the interval counter rI decreases by 1 for eachinstruction, and the usage counter rU increases by 1 for each instruction.

2. To run this simulator, assuming UNIX conventions, you say `mmix h options iprogfile args...', where progfile is an output of the MMIXAL assembler, args...is a sequence of optional command-line arguments passed to the simulated program,and h options i is any subset of the following:

� -t<n> Trace each instruction the �rst n times it is executed. (The notation <n>

in this option, and in several other options and interactive commands below, standsfor a decimal integer.)

� -x<x> Trace each instruction that raises an arithmetic exception belonging to thegiven bit pattern. (The notation <x> in this option, and in several other commandsbelow, stands for a hexadecimal integer.) The exception bits are DVWIOUZX as theyappear in rA, namely #80 for D (integer divide check), #40 for V (integer over ow),: : : , #01 for X ( oating inexact). The option -x by itself is equivalent to -xff, tracingall eight exceptions.

� -r Trace details of the register stack. This option shows all the \hidden" loadsand stores that occur when octabytes are written from the ring of local registers intomemory, or read from memory into that ring. It also shows the full details of SAVEand UNSAVE operations.

� -l<n> List the source line corresponding to each traced instruction, �lling gapsof length n or less. For example, if one instruction came from line 10 of the source

D.E. Knuth: MMIXware, LNCS 1750, pp. 332-421, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 340: MMIXware - A RISC Computer for the Third Millennium - Knuth

333 MMIX-SIM: INTRODUCTION

�le and the next instruction to be traced came from line 12, line 11 would be shownalso, provided that n � 1. If <n> is omitted it is assumed to be 3.

� -s Show statistics of running time with each traced instruction.

� -P Show the program pro�le (that is, the frequency counts of each instructionthat was executed) when the simulation ends.

� -L<n> List the source lines corresponding to each instruction that appears in theprogram pro�le, �lling gaps of length n or less. This option implies -P.

� -v Be verbose: Turn on all options. (More precisely, the -v option is shorthandfor -t9999999999 -r -s -l10 -L10.)

� -q Be quiet: Cancel all previously speci�ed options.

� -i Go into interactive mode before starting the simulation.

� -I Go into interactive mode when the simulated program halts or pauses for abreakpoint.

� -b<n> Set the bu�er size of source lines to max(72; n).

� -c<n> Set the capacity of the local register ring to max(256; n); this number mustbe a power of 2.

� -f<filename> Use the named �le for standard input to the simulated program.

� -D<filename> Prepare the named �le for use by other simulators, instead ofactually doing a simulation.

� -? Print the \Usage" message, which summarizes the command-line options.

The author recommends -t2 -l -L for initial o�ine debugging.While the program is being simulated, an interrupt signal (usually control-C) will

cause the simulator to break and go into interactive mode after tracing the currentinstruction, even if -i and -I were not speci�ed on the command line.

running times, MMIX x50.

Page 341: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: INTRODUCTION 334

3. In interactive mode, the user is prompted `mmix>' and a variety of commandscan be typed online. Any command-line option can be given in response to such aprompt (including the `-' that begins the option), and the following operations arealso available:

� Simply typing h return i or nh return i to the mmix> prompt causes one MMIX in-struction to be executed and traced; then the user is prompted again.

� c continues simulation until the program halts or reaches a breakpoint. (Actuallythe command is `ch return i', but we won't bother to mention the h return i in thefollowing description.)

� q quits (terminates the simulation), after printing the pro�le (if it was requested)and the �nal statistics.

� s prints out the current statistics (the clock times and the current instructionlocation). We have already discussed the -s option on the command line, whichcauses these statistics to be printed automatically; but a lot of statistics can �ll up alot of �le space, so users may prefer to see the statistics only on demand.

� l<n><t>, g<n><t>, $<n><t>, rA<t>, rB<t>, : : : , rZZ<t>, and M<x><t> will showthe current value of a local register, global register, dynamically numbered register,special register, or memory location. Here <t> speci�es the type of value to bedisplayed; if <t> is `!', the value will be given in decimal notation; if <t> is `.' itwill be given in oating point notation; if <t> is `#' it will be given in hexadecimal,and if <t> is `"' it will be given as a string of eight one-byte characters. Just typing<t> by itself will repeat the most recently shown value, perhaps in another format;for example, the command `l10#' will show local register 10 in hexadecimal notation,then the command `!' will show it in decimal and `.' will show it as a oating pointnumber. If <t> is empty, the previous type will be repeated; the default type isdecimal. Register rA is equivalent to g22, according to the numbering used in GET

and PUT commands.The `<t>' in any of these commands can also have the form `=<value>', where

the value is a decimal or oating point or hexadecimal or string constant. (Thesyntax rules for oating point constants appear in MMIX-ARITH. A string constantis treated as in the BYTE command of MMIXAL, but padded at the left with zeros iffewer than eight characters are speci�ed.) This assigns a new value before displayingit. For example, `l10=.1e3' sets local register 10 equal to 100; `g250="ABCD",#a'sets global register 250 equal to #000000414243440a; `M1000=-Inf' sets M[#1000]8 =#fff0000000000000, the representation of�1. Special registers other than rI cannotbe set to values disallowed by PUT. Marginal registers cannot be set to nonzero values.The command `rI=250' sets the interval counter to 250; this will cause a break in

simulation after 250 instructions have been executed.

� +<n><t> shows the next n octabytes following the one most recently shown, informat <t>. For example, after `l10#' a subsequent `+30' will show l11, l12, : : : ,l40 in hexadecimal notation. After `g200=3' a subsequent `+30' will set g201, g202,: : : , g230 equal to 3, but a subsequent `+30!' would merely display g201 through g230

in decimal notation. Memory addresses will advance by 8 instead of by 1. If <n> isempty, the default value n = 1 is used.

Page 342: MMIXware - A RISC Computer for the Third Millennium - Knuth

335 MMIX-SIM: INTRODUCTION

� @<x> sets the address of the next tetrabyte to be simulated, sort of like a GO

command.

� t<x> says that the instruction in tetrabyte location x should always be traced,regardless of its frequency count.

� u<x> undoes the e�ect of t<x>.

� b[rwx]<x> sets breakpoints at tetrabyte x; here [rwx] stands for any subset of theletters r, w, and/or x, meaning to break when the tetrabyte is read, written, and/orexecuted. For example, `bx1000' causes a break in the simulation just after thetetrabyte in #1000 is executed; `b1000' undoes this breakpoint; `brwx1000' causesa break just after any simulated instruction loads, stores, or appears in tetrabytenumber #1000.

� T, D, P, S changes the \current segment" to either Text_Segment, Data_Segment,Pool_Segment, or Stack_Segment, respectively, namely to #0, #2000000000000000,#4000000000000000, or #6000000000000000. The current segment, initially #0, isadded to all memory addresses in M, @, t, u, and b commands.

� B lists all current breakpoints and tracepoints.

� i<filename> reads a sequence of interactive commands from the speci�ed �le,one command per line, ignoring blank lines. This feature can be used to set manybreakpoints or to display a number of key registers, etc. Included lines that begin with% or i are ignored; therefore an included �le cannot include another �le. Includedlines that begin with a blank space are reproduced in the standard output, otherwiseignored.

� h (help) reminds the user of the available interactive commands.

Page 343: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUDIMENTARY I/O 336

4. Rudimentary I/O. Input and output are provided by the following ten prim-itive system calls:

� Fopen(handle ;name ;mode ). Here handle is a one-byte integer, name is a string,and mode is one of the values TextRead, TextWrite, BinaryRead, BinaryWrite,BinaryReadWrite. An Fopen call associates handle with the external �le called name

and prepares to do input and/or output on that �le. It returns 0 if the �le was openedsuccessfully; otherwise returns the value �1. If mode is TextWrite or BinaryWrite,any previous contents of the named �le are discarded. If mode is TextRead orTextWrite, the �le consists of \lines" terminated by \newline" characters, and itis said to be a text �le; otherwise the �le consists of uninterpreted bytes, and it issaid to be a binary �le.Text �les and binary �les are essentially equivalent in cases where this simulator is

hosted by an operating system derived from UNIX; in such cases �les can be writtenas text and read as binary or vice versa. But with other operating systems, text �lesand binary �les often have quite di�erent representations, and certain characters withbyte codes less than ' ' are forbidden in text. Within any MMIX program, the newlinecharacter has byte code #0a = 10.At the beginning of a program three handles have already been opened: The

\standard input" �le StdIn (handle 0) has mode TextRead, the \standard output"�le StdOut (handle 1) has mode TextWrite, and the \standard error" �le StdErr

(handle 2) also has mode TextWrite. When this simulator is being run interactively,lines of standard input should be typed following a prompt that says `StdIn> '; thestandard output and standard error �les of the simulated program are intermixedwith the output of the simulator itself.The input/output operations supported by this simulator can perhaps be under-

stood most easily with reference to the standard library <stdio> that comes withthe C language, because the conventions of C have been explained in hundreds ofbooks. If we declare an array FILE ��le [256] and set �le [0] = stdin , �le [1] = stdout ,and �le [2] = stderr , then the simulated system call Fopen(handle ;name ;mode ) isessentially equivalent to the C expression

(�le [handle ]? (�le [handle ] = freopen (name ;mode string [mode ];�le [handle ])):

(�le [handle ] = fopen (name ;mode string [mode ])))? 0: �1;

if we prede�ne the values mode string [TextRead] = "r", mode string [TextWrite] ="w", mode string [BinaryRead] = "rb", mode string [BinaryWrite] = "wb", andmode string [BinaryReadWrite] = "wb+".

� Fclose(handle ). If the given �le handle has been opened, it is closed|no longerassociated with any �le. Again the result is 0 if successful, or �1 if the �le was alreadyclosed or unclosable. The C equivalent is

fclose (�le [handle ]) ? �1 : 0

with the additional side e�ect of setting �le [handle ] = �.

Page 344: MMIXware - A RISC Computer for the Third Millennium - Knuth

337 MMIX-SIM: RUDIMENTARY I/O

� Fread(handle ; bu�er ; size ). The �le handle should have been opened with modeTextRead, BinaryRead, or BinaryReadWrite. The next size characters are read intoMMIX's memory starting at address bu�er . If an error occurs, the value �1� size isreturned; otherwise, if the end of �le does not intervene, 0 is returned; otherwise thenegative value n� size is returned, where n is the number of characters successfullyread and stored. The statement

fread (bu�er ; 1; size ;�le [handle ])� size

has the equivalent e�ect in C, in the absence of �le errors.

� Fgets(handle ; bu�er ; size ). The �le handle should have been opened with modeTextRead, BinaryRead, or BinaryReadWrite. Characters are read into MMIX's mem-ory starting at address bu�er , until either size � 1 characters have been read andstored or a newline character has been read and stored; the next byte in memoryis then set to zero. If an error or end of �le occurs before reading is complete, thememory contents are unde�ned and the value �1 is returned; otherwise the numberof characters successfully read and stored is returned. The equivalent in C is

fgets (bu�er ; size ;�le [handle ]) ? strlen (bu�er ) : �1

if we assume that no null characters were read in; null characters may, however,precede a newline, and they are counted just like other characters.

� Fgetws(handle ; bu�er ; size ). This command is the same as Fgets, except thatit applies to wyde characters instead of one-byte characters. Up to size � 1 wydecharacters are read; a wyde newline is #000a. The C version, using conventions ofthe ISO multibyte string extension (MSE), is approximately

fgetws (bu�er ; size ;�le [handle ]) ? wcslen (bu�er ) : �1

where bu�er now has type wchar t �.

� Fwrite(handle ; bu�er ; size ). The �le handle should have been opened with one ofthe modes TextWrite, BinaryWrite, or BinaryReadWrite. The next size charactersare written from MMIX's memory starting at address bu�er . If no error occurs, 0 isreturned; otherwise the negative value n� size is returned, where n is the number ofcharacters successfully written. The statement

fwrite (bu�er ; 1; size ;�le [handle ])� size

together with �ush (�le [handle ]) has the equivalent e�ect in C.

� Fputs(handle ; string ). The �le handle should have been opened with one of themodes TextWrite, BinaryWrite, or BinaryReadWrite. One-byte characters arewritten from MMIX's memory to the �le, starting at address string , up to but notincluding the �rst byte equal to zero. The number of bytes written is returned, or �1on error. The C version is

fputs (string ;�le [handle ]) � 0 ? strlen (string ) : �1,

together with �ush (�le [handle ]).

Page 345: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUDIMENTARY I/O 338

� Fputws(handle ; string ). The �le handle should have been opened with one of themodes TextWrite, BinaryWrite, or BinaryReadWrite. Wyde characters are writtenfrom MMIX's memory to the �le, starting at address string , up to but not includingthe �rst wyde equal to zero. The number of wydes written is returned, or �1 onerror. The C+MSE version is

fputws (string ;�le [handle ]) � 0 ? wcslen (string ) : �1

together with �ush (�le [handle ]), where string now has type wchar t �.

� Fseek(handle ; o�set ). The �le handle should have been opened with one of themodes BinaryRead, BinaryWrite, or BinaryReadWrite. This operation causes thenext input or output operation to begin at o�set bytes from the beginning of the�le, if o�set � 0, or at �o�set � 1 bytes before the end of the �le, if o�set < 0.(For example, o�set = 0 \rewinds" the �le to its very beginning; o�set = �1 movesforward all the way to the end.) The result is 0 if successful, or �1 if the statedpositioning could not be done. The C version is

fseek (�le [handle ]; o�set < 0 ? o�set + 1 : o�set ;

o�set < 0 ? SEEK_END : SEEK_SET)? �1: 0.

If a �le in mode BinaryReadWrite is used for both reading and writing, an Fseek

command must be given when switching from input to output or from output toinput.

� Ftell(handle ). The �le handle should have been opened with mode BinaryRead,BinaryWrite, or BinaryReadWrite. This operation returns the current �le position,measured in bytes from the beginning, or �1 if an error has occurred. In this casethe C function

ftell (�le [handle ])

has exactly the same meaning.

Although these ten operations are quite primitive, they provide the necessary func-tionality for extremely complex input/output behavior. For example, every functionin the stdio library of C, with the exception of the two administrative operationsremove and rename , can be implemented as a subroutine in terms of the six basicoperations Fopen, Fclose, Fread, Fwrite, Fseek, and Ftell.Notice that the MMIX function calls are much more consistent than those in the

C library. The �rst argument is always a handle; the second, if present, is alwaysan address; the third, if present, is always a size. The result returned is always

nonnegative if the operation was successful, negative if an anomaly arose. Thesecommon features make the functions reasonably easy to remember.

5. The ten input/output operations of the previous section are invoked by TRAP

commands with X = 0, Y = Fopen or Fclose or : : : or Ftell, and Z = Handle.If there are two arguments, the second argument is placed in $255. If there arethree arguments, the address of the second is placed in $255; the second argument isM[$255]8 and the third argument is M[$255+8]8. The returned value will be in $255when the system call is �nished. (See the example below.)

Page 346: MMIXware - A RISC Computer for the Third Millennium - Knuth

339 MMIX-SIM: RUDIMENTARY I/O

6. The user program starts at symbolic location Main. At this time the globalregisters are initialized according to the GREG statements in the MMIXAL program,and $255 is set to the numeric equivalent of Main. Local register $0 is initially setto the number of command-line arguments; and local register $1 points to the �rstsuch argument, which is always a pointer to the program name. Each command-lineargument is a pointer to a string; the last such pointer is M[$0� 3+$1]8, and M[$0�3+ $1+ 8]8 is zero. (Register $1 will point to an octabyte in Pool_Segment, and thecommand-line strings will be in that segment too.) Location M[Pool_Segment] willbe the address of the �rst unused octabyte of the pool segment.Registers rA, rB, rD, rE, rF, rH, rI, rJ, rM, rP, rQ, and rR are initially zero, and

rL = 2.A subroutine library loaded with the user program might need to initialize itself. If

an instruction has been loaded into tetrabyte M[#90]4, the simulator actually beginsexecution at #90 instead of at Main; in this case $255 holds the location of Main.(The routine at #90 can pass control to Main without increasing rL, if it starts withthe slightly tricky sequence

PUT rW, $255; PUT rB, $255; SETML $255,#F700; PUT rX,$255

and eventually says RESUME; this RESUME command will restore $255 and rB. But theuser program should not really count on the fact that rL is initially 2.)

7. The main program ends when MMIX executes the system call TRAP 0, which isoften symbolically written `TRAP 0,Halt,0' to make its intention clear. The contentsof $255 at that time are considered to be the value \returned" by the main program,as in the exit statement of C; a nonzero value indicates an anomalous exit. All open�les are closed when the program ends.

exit : void ( ), <stdlib.h>.

Page 347: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUDIMENTARY I/O 340

8. Here, for example, is a complete program that copies a text �le to the standardoutput, given the name of the �le to be copied. It includes all necessary error checking.

* SAMPLE PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT

t IS $255

argc IS $0

argv IS $1

s IS $2

Buf_Size IS 1000

LOC Data_Segment

Buffer LOC @+Buf_Size

GREG @

Arg0 OCTA 0,TextRead

Arg1 OCTA Buffer,Buf_Size

LOC #200 main(argc,argv) {

Main CMP t,argc,2 if (argc==2) goto openit

PBZ t,OpenIt

GETA t,1F fputs("Usage: ",stderr)

TRAP 0,Fputs,StdErr

LDOU t,argv,0 fputs(argv[0],stderr)

TRAP 0,Fputs,StdErr

GETA t,2F fputs(" filename\n",stderr)

Quit TRAP 0,Fputs,StdErr

NEG t,0,1 quit: exit(-1)

TRAP 0,Halt,0

1H BYTE "Usage: ",0

LOC (@+3)&-4 align to tetrabyte

2H BYTE " filename",#a,0

OpenIt LDOU s,argv,8 openit: s=argv[1]

STOU s,Arg0

LDA t,Arg0 fopen(argv[1],"r",file[3])

TRAP 0,Fopen,3

PBNN t,CopyIt if (no error) goto copyit

GETA t,1F fputs("Can't open file ",stderr)

TRAP 0,Fputs,StdErr

SET t,s fputs(argv[1],stderr)

TRAP 0,Fputs,StdErr

GETA t,2F fputs("!\n",stderr)

JMP Quit goto quit

1H BYTE "Can't open file ",0

LOC (@+3)&-4 align to tetrabyte

2H BYTE "!",#a,0

CopyIt LDA t,Arg1 copyit:

TRAP 0,Fread,3 items=fread(buffer,1,buf_size,file[3])

Page 348: MMIXware - A RISC Computer for the Third Millennium - Knuth

341 MMIX-SIM: RUDIMENTARY I/O

BN t,EndIt if (items < buf_size) goto endit

LDA t,Arg1 items=fwrite(buffer,1,buf_size,stdout)

TRAP 0,Fwrite,StdOut

PBNN t,CopyIt if (items >= buf_size) goto copyit

Trouble GETA t,1F trouble: fputs("Trouble w...!",stderr)

JMP Quit goto quit

1H BYTE "Trouble writing StdOut!",#a,0

EndIt INCL t,Buf_Size

BN t,ReadErr if (ferror(file[3])) goto readerr

STO t,Arg1+8

LDA t,Arg1 n=fwrite(buffer,1,items,stdout)

TRAP 0,Fwrite,StdOut

BN t,Trouble if (n < items) goto trouble

TRAP 0,Halt,0 exit(0)

ReadErr GETA t,1F readerr: fputs("Trouble r...!",stderr)

JMP Quit goto quit }

1H BYTE "Trouble reading!",#a,0

Page 349: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: BASICS 342

9. Basics. To get started, we de�ne a type that provides semantic sugar.

hType declarations 9 i �typedef enum ffalse ; true

g bool;

See also sections 10, 16, 38, 39, 54, 55, 59, 64, and 135.

This code is used in section 141.

10. This program for the 64-bit MMIX architecture is based on 32-bit integer arith-metic, because nearly every computer available to the author at the time of writing(1999) was limited in that way. It uses subroutines from the MMIX-ARITH module,assuming only that type tetra represents unsigned 32-bit integers. The de�nition oftetra given here should be changed, if necessary, to agree with the de�nition in thatmodule.

hType declarations 9 i +�typedef unsigned int tetra;=� for systems conforming to the LP-64 data model �=

typedef struct ftetra h; l;

g octa; =� two tetrabytes makes one octabyte �=typedef unsigned char byte; =� a monobyte �=

11. We declare subroutines twice, once with a prototype and once with the old-style C conventions. The following hack makes this work with new compilers as wellas the old standbys.

hPreprocessor macros 11 i �#ifdef __STDC__

#de�ne ARGS(list ) list

#else#de�ne ARGS(list ) ( )#endif

See also sections 43 and 46.

This code is used in section 141.

12. h Subroutines 12 i �void print hex ARGS((octa));void print hex (o)

octa o;fif (o:h) printf ("%x%08x"; o:h; o:l);else printf ("%x"; o:l);

gSee also sections 13, 15, 17, 20, 26, 27, 42, 45, 47, 50, 82, 83, 91, 114, 117, 120, 137, 140, 143, 148,

154, 160, 162, 165, and 166.

This code is used in section 141.

Page 350: MMIXware - A RISC Computer for the Third Millennium - Knuth

343 MMIX-SIM: BASICS

__STDC__, Standard C. printf : int ( ), <stdio.h>.

Page 351: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: BASICS 344

13. Most of the subroutines in MMIX-ARITH return an octabyte as a function oftwo octabytes; for example, oplus (y; z) returns the sum of octabytes y and z. Divisioninputs the high half of a dividend in the global variable aux and returns the remainderin aux .

hSubroutines 12 i +�extern octa zero octa ; =� zero octa :h = zero octa :l = 0 �=extern octa neg one ; =� neg one :h = neg one :l = �1 �=extern octa aux ; val ; =� auxiliary data �=extern bool over ow ; =� ag set by signed multiplication and division �=extern int exceptions ; =� bits set by oating point operations �=extern int cur round ; =� the current rounding mode �=extern char �next char ; =� where a scanned constant ended �=extern octa oplus ARGS((octa y;octa z)); =� unsigned y + z �=extern octa ominus ARGS((octa y;octa z)); =� unsigned y � z �=extern octa incr ARGS((octa y; int delta )); =� unsigned y + Æ (Æ is signed) �=extern octa oand ARGS((octa y;octa z)); =� y ^ z �=extern octa shift left ARGS((octa y; int s)); =� y � s, 0 � s � 64 �=extern octa shift right ARGS((octa y; int s; int uns )); =� y � s, signed if :uns �=extern octa omult ARGS((octa y;octa z)); =� unsigned (aux ; x) = y � z �=extern octa signed omult ARGS((octa y;octa z)); =� signed x = y � z �=extern octa odiv ARGS((octa x;octa y;octa z));=� unsigned (x; y)=z; aux = (x; y) mod z �=

extern octa signed odiv ARGS((octa y;octa z)); =� signed x = y=z �=extern int count bits ARGS((tetra z)); =� x = �(z) �=extern tetra byte di� ARGS((tetra y; tetra z)); =� half of BDIF �=extern tetra wyde di� ARGS((tetra y; tetra z)); =� half of WDIF �=extern octa bool mult ARGS((octa y;octa z;bool xor )); =� MOR or MXOR �=extern octa load sf ARGS((tetra z)); =� load short oat �=extern tetra store sf ARGS((octa x)); =� store short oat �=extern octa fplus ARGS((octa y;octa z)); =� oating point x = y � z �=extern octa fmult ARGS((octa y;octa z)); =� oating point x = y z �=extern octa fdivide ARGS((octa y;octa z)); =� oating point x = y � z �=extern octa froot ARGS((octa; int)); =� oating point x =

pz �=

extern octa fremstep ARGS((octa y;octa z; int delta ));=� oating point x rem z = y rem z �=

extern octa �ntegerize ARGS((octa z; int mode )); =� oating point x = round(z) �=extern int fcomp ARGS((octa y;octa z));=� �1, 0, 1, or 2 if y < z, y = z, y > z, y k z �=

extern int fepscomp ARGS((octa y;octa z;octa eps ; int sim ));=� x = sim? [y � z (�)] : [y � z (�)] �=

extern octa oatit ARGS((octa z; int mode ; int unsgnd ; int shrt ));=� �x to oat �=

extern octa �xit ARGS((octa z; int mode )); =� oat to �x �=extern void print oat ARGS((octa z)); =� print octabyte as oating decimal �=extern int scan const ARGS((char �buf ));=� val = oating or integer constant; returns the type �=

Page 352: MMIXware - A RISC Computer for the Third Millennium - Knuth

345 MMIX-SIM: BASICS

14. Here's a quick check to see if arithmetic is in trouble.

#de�ne panic(m)f fprintf (stderr ; "Panic: %s!\n";m); exit (�2); g

h Initialize everything 14 i �if (shift left (neg one ; 1):h 6= #

ffffffff)panic("Incorrect implementation of type tetra");

See also sections 18, 24, 32, 41, 77, and 147.

This code is used in section 141.

ARGS=macro ( ), x11.aux : octa, MMIX-ARITH x4.bool= enum, x9.bool mult : octa ( ),MMIX-ARITH x29.

byte di� : tetra ( ),MMIX-ARITH x27.

count bits : int ( ),MMIX-ARITH x26.

cur round : int,MMIX-ARITH x30.

exceptions : int,MMIX-ARITH x32.

exit : void ( ), <stdlib.h>.fcomp : int ( ), MMIX-ARITH x84.fdivide : octa ( ),MMIX-ARITH x44.

fepscomp : int ( ),MMIX-ARITH x50.

�ntegerize : octa ( ),MMIX-ARITH x86.

�xit : octa ( ), MMIX-ARITH x88. oatit : octa ( ),MMIX-ARITH x89.

fmult : octa ( ),

MMIX-ARITH x41.fplus : octa ( ),MMIX-ARITH x46.

fprintf : int ( ), <stdio.h>.fremstep : octa ( ),MMIX-ARITH x93.

froot : octa ( ),MMIX-ARITH x912.

h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.l: tetra, x10.load sf : octa ( ),MMIX-ARITH x39.

neg one : octa, MMIX-ARITH x4.next char : char �,MMIX-ARITH x69.

oand : octa ( ),MMIX-ARITH x25.

octa= struct, x10.odiv : octa ( ), MMIX-ARITH x13.ominus : octa ( ),MMIX-ARITH x5.

omult : octa ( ),MMIX-ARITH x8.

oplus : octa ( ), MMIX-ARITH x5.over ow : bool,MMIX-ARITH x4.

print oat : void ( ),MMIX-ARITH x54.

scan const : int ( ),MMIX-ARITH x68.

shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

signed odiv : octa ( ),MMIX-ARITH x24.

signed omult : octa ( ),MMIX-ARITH x12.

stderr : FILE �, <stdio.h>.store sf : tetra ( ),MMIX-ARITH x40.

tetra=unsigned int, x10.val : octa, MMIX-ARITH x69.wyde di� : tetra ( ),MMIX-ARITH x28.

zero octa : octa,MMIX-ARITH x4.

Page 353: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: BASICS 346

15. Binary-to-decimal conversion is used when we want to see an octabyte as asigned integer. The identity b(an + b)=10c = ba=10cn + b((a mod 10)n + b)=10c ishelpful here.

#de�ne sign bit ((unsigned) #80000000)

hSubroutines 12 i +�void print int ARGS((octa));void print int (o)

octa o;fregister tetra hi = o:h; lo = o:l; r; t;register int j;char dig [20];

if (lo � 0 ^ hi � 0) printf ("0");else fif (hi & sign bit ) fprintf ("-");if (lo � 0) hi = �hi ;else lo = �lo ; hi = �hi ;

gfor (j = 0; hi ; j++) f =� 64-bit division by 10 �=

r = ((hi % 10)� 16) + (lo � 16);hi = hi=10;t = ((r % 10)� 16) + (lo & #

ffff);lo = ((r=10)� 16) + (t=10);dig [j] = t% 10;

gfor ( ; lo ; j++) fdig [j] = lo % 10;lo = lo=10;

gfor (j��; j � 0; j��) printf ("%c"; dig [j] + '0');

gg

Page 354: MMIXware - A RISC Computer for the Third Millennium - Knuth

347 MMIX-SIM: SIMULATED MEMORY

16. Simulated memory. Chunks of simulated memory, 2K bytes each, are keptin a tree structure organized as a treap, following ideas of Vuillemin, Aragon, andSeidel [Communications of the ACM 23 (1980), 229{239; IEEE Symp. on Foundations

of Computer Science 30 (1989), 540{546]. Each node of the treap has two keys: One,called loc , is the base address of 512 simulated tetrabytes; it follows the conventionsof an ordinary binary search tree, with all locations in the left subtree less than theloc of a node and all locations in the right subtree greater than that loc . The other,called stamp , can be thought of as the time the node was inserted into the tree; allsubnodes of a given node have a larger stamp . By assigning time stamps at random,we maintain a tree structure that almost always is fairly well balanced.Each simulated tetrabyte has an associated frequency count and source �le refer-

ence.

hType declarations 9 i +�typedef struct ftetra tet ; =� the tetrabyte of simulated memory �=tetra freq ; =� the number of times it was obeyed as an instruction �=unsigned char bkpt ; =� breakpoint information for this tetrabyte �=unsigned char �le no ; =� source �le number, if known �=unsigned short line no ; =� source line number, if known �=

g mem tetra;

typedef struct mem node struct focta loc ; =� location of the �rst of 512 simulated tetrabytes �=tetra stamp ; =� time stamp for treap balancing �=struct mem node struct �left ; �right ; =� pointers to subtrees �=mem tetra dat [512]; =� the chunk of simulated tetrabytes �=

g mem node;

17. The stamp value is actually only pseudorandom, based on the idea of Fibonaccihashing [see Sorting and Searching, Section 6.4]. This is good enough for our purposes,and it guarantees that no two stamps will be identical.

hSubroutines 12 i +�mem node �new mem ARGS((void));mem node �new mem ( )fregister mem node �p;p = (mem node �) calloc(1; sizeof (mem node));if (:p) panic("Can't allocate any more memory");p~stamp = priority ;priority += #

9e3779b9; =� b232(�� 1)c �=return p;

g

ARGS=macro ( ), x11.calloc : void �( ), <stdlib.h>.h: tetra, x10.

l: tetra, x10.octa= struct, x10.panic =macro ( ), x14.

printf : int ( ), <stdio.h>.priority : tetra, x19.tetra=unsigned int, x10.

Page 355: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATED MEMORY 348

18. Initially we start with a chunk for the pool segment, since the simulator will beputting command-line information there before it runs the program.

h Initialize everything 14 i +�mem root = new mem ( );mem root~ loc :h = #

40000000;last mem = mem root ;

19. hGlobal variables 19 i �tetra priority = 314159265; =� pseudorandom time stamp counter �=mem node �mem root ; =� root of the treap �=mem node �last mem ; =� the memory node most recently read or written �=

See also sections 25, 31, 40, 48, 52, 56, 61, 65, 76, 110, 113, 121, 129, 139, 144, and 151.

This code is used in section 141.

20. Themem �nd routine �nds a given tetrabyte in the simulated memory, insertinga new node into the treap if necessary.

hSubroutines 12 i +�mem tetra �mem �nd ARGS((octa));mem tetra �mem �nd (addr )

octa addr ;focta key ;register int o�set ;register mem node �p = last mem ;

key :h = addr :h;key :l = addr :l & #

fffff800;o�set = addr :l & #

7fc;if (p~ loc :l 6= key :l _ p~ loc :h 6= key :h)h Search for key in the treap, setting last mem and p to its location 21 i;

return &p~dat [o�set � 2];g

Page 356: MMIXware - A RISC Computer for the Third Millennium - Knuth

349 MMIX-SIM: SIMULATED MEMORY

21. h Search for key in the treap, setting last mem and p to its location 21 i �f register mem node ��q;for (p = mem root ; p; ) fif (key :l � p~ loc :l ^ key :h � p~ loc :h) goto found ;if ((key :l < p~ loc :l ^ key :h � p~ loc :h) _ key :h < p~ loc :h) p = p~ left ;else p = p~right ;

gfor (p = mem root ; q = &mem root ; p ^ p~stamp < priority ; p = �q) fif ((key :l < p~ loc :l ^ key :h � p~ loc :h) _ key :h < p~ loc :h) q = &p~ left ;else q = &p~right ;

g�q = new mem ( );(�q)~ loc = key ;hFix up the subtrees of �q 22 i;p = �q;

found : last mem = p;g

This code is used in section 20.

22. At this point we want to split the binary search tree p into two parts based onthe given key , forming the left and right subtrees of the new node q. The e�ect willbe as if key had been inserted before all of p's nodes.

hFix up the subtrees of �q 22 i �fregister mem node ��l = &(�q)~ left ; ��r = &(�q)~right ;while (p) fif ((key :l < p~ loc :l^ key :h � p~ loc :h)_ key :h < p~ loc :h) �r = p; r = &p~ left ; p = �r;else �l = p; l = &p~right ; p = �l;

g�l = �r = �;

gThis code is used in section 21.

ARGS=macro ( ), x11.dat : mem tetra [ ], x16.h: tetra, x10.l: tetra, x10.left : mem node �, x16.

loc : octa, x16.mem node= struct, x16.mem tetra= struct, x16.new mem : mem node �( ),

x17.

octa= struct, x10.right : mem node �, x16.stamp : tetra, x16.tetra=unsigned int, x10.

Page 357: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: LOADING AN OBJECT FILE 350

23. Loading an object �le. To get the user's program into memory, we read inan MMIX object, using modi�cations of the routines in the utility program MMOtype.Complete details of mmo format appear in the program for MMIXAL; a reader whohopes to understand this section ought to at least skim that documentation. Here weneed to de�ne only the basic constants used for interpretation.

#de�ne mm #98 =� the escape code of mmo format �=

#de�ne lop quote #0 =� the quotation lopcode �=

#de�ne lop loc #1 =� the location lopcode �=

#de�ne lop skip #2 =� the skip lopcode �=

#de�ne lop �xo #3 =� the octabyte-�x lopcode �=

#de�ne lop �xr #4 =� the relative-�x lopcode �=

#de�ne lop �xrx #5 =� extended relative-�x lopcode �=

#de�ne lop �le #6 =� the �le name lopcode �=

#de�ne lop line #7 =� the �le position lopcode �=

#de�ne lop spec #8 =� the special hook lopcode �=

#de�ne lop pre #9 =� the preamble lopcode �=

#de�ne lop post #a =� the postamble lopcode �=

#de�ne lop stab #b =� the symbol table lopcode �=

#de�ne lop end #c =� the end-it-all lopcode �=

24. We do not load the symbol table. (A more ambitious simulator could implementMMIXAL-style expressions for interactive debugging, but such enhancements are left tothe interested reader.)

h Initialize everything 14 i +�mmo �le = fopen (mmo �le name ; "rb");if (:mmo �le ) fregister char �alt name = (char �) calloc(strlen (mmo �le name )+5; sizeof (char));

sprintf (alt name ; "%s.mmo";mmo �le name );mmo �le = fopen (alt name ; "rb");if (:mmo �le ) ffprintf (stderr ; "Can't open the object file %s or %s!\n";mmo �le name ;

alt name );exit (�3);

gfree (alt name );

gbyte count = 0;

25. hGlobal variables 19 i +�FILE �mmo �le ; =� the input �le �=int postamble ; =� have we encountered lop post ? �=int byte count ; =� index of the next-to-be-read byte �=byte buf [4]; =� the most recently read bytes �=int yzbytes ; =� the two least signi�cant bytes �=int delta ; =� di�erence for relative �xup �=tetra tet ; =� buf bytes packed big-endianwise �=

Page 358: MMIXware - A RISC Computer for the Third Millennium - Knuth

351 MMIX-SIM: LOADING AN OBJECT FILE

26. The tetrabytes of an mmo �le are stored in friendly big-endian fashion, but thisprogram is supposed to work also on computers that are little-endian. Therefore weread four successive bytes and pack them into a tetrabyte, instead of reading a singletetrabyte.

#de�ne mmo err

ffprintf (stderr ; "Bad object file! (Try running MMOtype.)\n");exit (�4);

ghSubroutines 12 i +�void read tet ARGS((void));void read tet ( )fif (fread (buf ; 1; 4;mmo �le ) 6= 4) mmo err ;yzbytes = (buf [2]� 8) + buf [3];tet = (((buf [0]� 8) + buf [1])� 16) + yzbytes ;

g27. h Subroutines 12 i +�byte read byte ARGS((void));byte read byte ( )fregister byte b;

if (:byte count ) read tet ( );b = buf [byte count ];byte count = (byte count + 1) & 3;return b;

g28. hLoad the preamble 28 i �read tet ( ); =� read the �rst tetrabyte of input �=if (buf [0] 6= mm _ buf [1] 6= lop pre ) mmo err ;if (ybyte 6= 1) mmo err ;if (zbyte � 0) obj time = #

ffffffff;else f

j = zbyte � 1;read tet ( ); obj time = tet ; =� �le creation time �=for ( ; j > 0; j��) read tet ( );

gThis code is used in section 32.

ARGS=macro ( ), x11.byte=unsigned char, x10.calloc : void �( ), <stdlib.h>.exit : void ( ), <stdlib.h>.FILE, <stdio.h>.fopen : FILE �( ), <stdio.h>.

fprintf : int ( ), <stdio.h>.fread : size t ( ), <stdio.h>.free : void ( ), <stdlib.h>.j: register int, x62.mmo �le name =macro, x142.obj time : tetra, x31.

sprintf : int ( ), <stdio.h>.stderr : FILE �, <stdio.h>.strlen : int ( ), <string.h>.tetra=unsigned int, x10.ybyte =macro, x33.zbyte =macro, x33.

Page 359: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: LOADING AN OBJECT FILE 352

29. hLoad the next item 29 i �fread tet ( );

loop : if (buf [0] � mm )switch (buf [1]) fcase lop quote : if (yzbytes 6= 1) mmo err ;read tet ( ); break;

hCases for lopcodes in the main loop 33 icase lop post : postamble = 1;if (ybyte _ zbyte < 32) mmo err ;continue;

default: mmo err ;g

hLoad tet as a normal item 30 i;g

This code is used in section 32.

30. In a normal situation, the newly read tetrabyte is simply supposed to be loadedinto the current location. We load not only the current location but also the current�le position, if cur line is nonzero and cur loc belongs to segment 0.

#de�ne mmo load (loc ; val ) ll = mem �nd (loc); ll~ tet �= val

hLoad tet as a normal item 30 i �fmmo load (cur loc ; tet );if (cur line ) fll~�le no = cur �le ;ll~ line no = cur line ;cur line++;

gcur loc = incr (cur loc ; 4); cur loc :l &= �4;

gThis code is used in section 29.

31. hGlobal variables 19 i +�octa cur loc ; =� the current location �=int cur �le = �1; =� the most recently selected �le number �=int cur line ; =� the current position in cur �le , if nonzero �=octa tmp ; =� an octabyte of temporary interest �=tetra obj time ; =� when the object �le was created �=

32. h Initialize everything 14 i +�cur loc :h = cur loc :l = 0;cur �le = �1;cur line = 0;hLoad the preamble 28 i;do hLoad the next item 29 i while (:postamble );hLoad the postamble 37 i;fclose (mmo �le );cur line = 0;

Page 360: MMIXware - A RISC Computer for the Third Millennium - Knuth

353 MMIX-SIM: LOADING AN OBJECT FILE

33. We have already implemented lop quote , which falls through to the normal caseafter reading an extra tetrabyte. Now let's consider the other lopcodes in turn.

#de�ne ybyte buf [2] =� the next-to-least signi�cant byte �=#de�ne zbyte buf [3] =� the least signi�cant byte �=hCases for lopcodes in the main loop 33 i �case lop loc : if (zbyte � 2) f

read tet ( ); cur loc :h = (ybyte � 24) + tet ;g else if (zbyte � 1) cur loc :h = ybyte � 24;else mmo err ;read tet ( ); cur loc :l = tet ;continue;

case lop skip : cur loc = incr (cur loc ; yzbytes ); continue;

See also sections 34, 35, and 36.

This code is used in section 29.

34. Fixups load information out of order, when future references have been resolved.The current �le name and line number are not considered relevant.

hCases for lopcodes in the main loop 33 i +�case lop �xo : if (zbyte � 2) f

read tet ( ); tmp :h = (ybyte � 24) + tet ;g else if (zbyte � 1) tmp :h = ybyte � 24;else mmo err ;read tet ( ); tmp :l = tet ;mmo load (tmp ; cur loc :h);mmo load (incr (tmp ; 4); cur loc :l);continue;

case lop �xr : delta = yzbytes ;goto �xr ;

case lop �xrx : j = yzbytes ; if (j 6= 16 ^ j 6= 24) mmo err ;read tet ( );delta = tet ;if (delta & #

fe000000) mmo err ;�xr : tmp = incr (cur loc ;�(delta � #

1000000 ? (delta & #ffffff)� (1� j) : delta )� 2);

mmo load (tmp ; delta );continue;

buf : byte [ ], x25.delta : int, x25.fclose : int ( ), <stdio.h>.�le no : unsigned char, x16.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.j: register int, x62.l: tetra, x10.line no : unsigned short, x16.ll : register mem tetra �,

x62.lop �xo =#

3, x23.lop �xr =#

4, x23.lop �xrx =#

5, x23.lop loc =#

1, x23.lop post =#

a, x23.lop quote =#

0, x23.lop skip =#

2, x23.mem �nd : mem tetra �( ),

x20.

mm =#98, x23.

mmo err =macro, x26.mmo �le : FILE �, x25.octa= struct, x10.postamble : int, x25.read tet : void ( ), x26.tet : tetra, x25.tet : tetra, x16.tetra=unsigned int, x10.yzbytes : int, x25.

Page 361: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: LOADING AN OBJECT FILE 354

35. The space for �le names isn't allocated until we are sure we need it.

hCases for lopcodes in the main loop 33 i +�case lop �le : if (�le info [ybyte ]:name ) f

if (zbyte ) mmo err ;cur �le = ybyte ;

g else fif (:zbyte ) mmo err ;�le info [ybyte ]:name = (char �) calloc(4 � zbyte + 1; 1);if (:�le info [ybyte ]:name ) ffprintf (stderr ; "No room to store the file name!\n"); exit (�5);

gcur �le = ybyte ;for (j = zbyte ; p = �le info [ybyte ]:name ; j > 0; j��; p += 4) fread tet ( );�p = buf [0]; �(p+ 1) = buf [1]; �(p+ 2) = buf [2]; �(p+ 3) = buf [3];

ggcur line = 0; continue;

case lop line : if (cur �le < 0) mmo err ;cur line = yzbytes ; continue;

36. Special bytes are ignored (at least for now).

hCases for lopcodes in the main loop 33 i +�case lop spec : while (1) f

read tet ( );if (buf [0] � mm ) fif (buf [1] 6= lop quote _ yzbytes 6= 1) goto loop ; =� end of special data �=read tet ( );

gg

37. Since a chunk of memory holds 512 tetrabytes, the ll pointer in the followingloop stays in the same chunk (namely, the �rst chunk of segment 3, also known asStack_Segment).

hLoad the postamble 37 i �aux :h = #

60000000; aux :l = #18;

ll = mem �nd (aux );(ll � 1)~ tet = 2; =� this will ultimately set rL = 2 �=(ll � 5)~ tet = argc ; =� and $ = argc �=(ll � 4)~ tet =

#40000000;

(ll � 3)~ tet =#8; =� and $1 = Pool_Segment + 8 �=

G = zbyte ; L = 0;for (j = G+G; j < 256 + 256; j++; ll ++; aux :l += 4) read tet ( ); ll~ tet = tet ;inst ptr :h = (ll � 2)~ tet ; inst ptr :l = (ll � 1)~ tet ; =� Main �=(ll + 2 � 12)~ tet = G� 24;g[255] = incr (aux ; 12 � 8); =� we will UNSAVE from here, to get going �=

This code is used in section 32.

Page 362: MMIXware - A RISC Computer for the Third Millennium - Knuth

355 MMIX-SIM: LOADING AND PRINTING SOURCE LINES

38. Loading and printing source lines. The loaded program generally con-tains cross references to the lines of symbolic source �les, so that the context of eachinstruction can be understood. The following sections of this program make suchinformation available when it is desired.Source �le data is kept in a �le node structure:

hType declarations 9 i +�typedef struct fchar �name ; =� name of source �le �=int line count ; =� number of lines in the �le �=long �map ; =� pointer to map of �le positions �=

g �le node;

39. In partial preparation for the day when source �les are in Unicode, we de�ne atype Char for the source characters.

hType declarations 9 i +�typedef char Char; =� bytes that will become wydes some day �=

40. hGlobal variables 19 i +��le node �le info [256]; =� data about each source �le �=int buf size ; =� size of bu�er for source lines �=Char �bu�er ;

41. As in MMIXAL, we prefer source lines of length 72 characters or less, but the useris allowed to increase the limit. (Longer lines will silently be truncated to the bu�ersize when the simulator lists them.)

h Initialize everything 14 i +�if (buf size < 72) buf size = 72;bu�er = (Char �) calloc(buf size + 1; sizeof (Char));

argc : int, x141.aux : octa, MMIX-ARITH x4.buf : byte [ ], x25.calloc : void �( ), <stdlib.h>.cur �le : int, x31.cur line : int, x31.exit : void ( ), <stdlib.h>.fprintf : int ( ), <stdio.h>.g: octa [ ], x76.G: register int, x75.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.

inst ptr : octa, x61.j: register int, x62.l: tetra, x10.L: register int, x75.ll : register mem tetra �,

x62.loop : label, x29.lop �le =#

6, x23.lop line =#

7, x23.lop quote =#

0, x23.lop spec =#

8, x23.mem �nd : mem tetra �( ),

x20.mm =#

98, x23.mmo err =macro, x26.p: register char �, x62.read tet : void ( ), x26.rL=20, x55.stderr : FILE �, <stdio.h>.tet : tetra, x25.UNSAVE=#

fb, x54.ybyte =macro, x33.yzbytes : int, x25.zbyte =macro, x33.

Page 363: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: LOADING AND PRINTING SOURCE LINES 356

42. The �rst time we are called upon to list a line from a given source �le, we makea map of starting locations for each line. Source �les should contain at most 65535lines. We assume that they contain no null characters.

hSubroutines 12 i +�void make map ARGS((void));void make map( )flong map [65536];register int k; l;register long �p;hCheck if the source �le has been modi�ed 44 i;for (l = 1; l < 65536 ^ :feof (src �le ); l++) fmap [l] = ftell (src �le );

loop : if (:fgets (bu�er ; buf size ; src �le )) break;if (bu�er [strlen (bu�er )� 1] 6= '\n') goto loop ;

g�le info [cur �le ]:line count = l;�le info [cur �le ]:map = p = (long �) calloc(l; sizeof (long));if (:p) panic("No room for a source-line map");for (k = 1; k < l; k++) p[k] = map [k];

g43. We want to warn the user if the source �le has changed since the object �lewas written. The standard C library doesn't provide the information we need; so weuse the UNIX system function stat , in hopes that other operating systems provide asimilar way to do the job.

hPreprocessor macros 11 i +�#include <sys/types.h>

#include <sys/stat.h>

44. hCheck if the source �le has been modi�ed 44 i �fstruct stat stat buf ;

if (stat (�le info [cur �le ]:name ;&stat buf ) � 0)if ((tetra) stat buf :st mtime > obj time ) fprintf (stderr ;

"Warning: File %s was modified; it may not match the program!\n";�le info [cur �le ]:name );

gThis code is used in section 42.

45. Source lines are listed by the print line routine, preceded by 12 characterscontaining the line number. If a �le error occurs, nothing is printed|not even anerror message; the absence of listed data is itself a message.

hSubroutines 12 i +�void print line ARGS((int));void print line (k)

int k;f

Page 364: MMIXware - A RISC Computer for the Third Millennium - Knuth

357 MMIX-SIM: LOADING AND PRINTING SOURCE LINES

char buf [11];

if (k � �le info [cur �le ]:line count ) return;if (fseek (src �le ; �le info [cur �le ]:map [k]; SEEK_SET) 6= 0) return;if (:fgets (bu�er ; buf size ; src �le )) return;sprintf (buf ; "%d: "; k);printf ("line %.6s %s"; buf ; bu�er );if (bu�er [strlen (bu�er )� 1] 6= '\n') printf ("\n");line shown = true ;

g46. hPreprocessor macros 11 i +�#ifndef SEEK_SET

#de�ne SEEK_SET 0 =� Set �le pointer to "o�set" �=#endif

47. The show line routine is called when we want to output line cur line of source�le number cur �le , assuming that cur line 6= 0. Its job is primarily to maintaincontinuity, by opening or reopening the src �le if the source �le changes, and byconnecting the previously output lines to the new one. Sometimes no output isnecessary, because the desired line has already been printed.

hSubroutines 12 i +�void show line ARGS((void));void show line ( )fregister int k;

if (shown �le 6= cur �le ) hPrepare to list lines from a new source �le 49 ielse if (shown line � cur line ) return; =� already shown �=if (cur line > shown line + gap + 1 _ cur line < shown line ) fif (shown line > 0)if (cur line < shown line ) printf ("--------\n");

=� indicate upward move �=else printf (" ...\n"); =� indicate the gap �=

print line (cur line );g else for (k = shown line + 1; k � cur line ; k++) print line (k);shown line = cur line ;

g

ARGS=macro ( ), x11.buf size : int, x40.bu�er : Char �, x40.calloc : void �( ), <stdlib.h>.cur �le : int, x31.cur line : int, x31.feof : int ( ), <stdio.h>.fgets : char �( ), <stdio.h>.�le info : �le node [ ], x40.fprintf : int ( ), <stdio.h>.fseek : int ( ), <stdio.h>.

ftell : long ( ), <stdio.h>.gap : int, x48.line count : int, x38.line shown : bool, x48.map : long �, x38.name : char �, x38.obj time : tetra, x31.panic =macro ( ), x14.printf : int ( ), <stdio.h>.SEEK_SET=macro, <stdio.h>.shown �le : int, x48.

shown line : int, x48.sprintf : int ( ), <stdio.h>.src �le : FILE �, x48.st mtime : time t,<sys/stat.h.

stat : int ( ), <sys/stat.h>.stderr : FILE �, <stdio.h>.strlen : int ( ), <string.h>.tetra=unsigned int, x10.true =1, x9.

Page 365: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: LOADING AND PRINTING SOURCE LINES 358

48. hGlobal variables 19 i +�FILE �src �le ; =� the currently open source �le �=int shown �le = �1; =� index of the most recently listed �le �=int shown line ; =� the line most recently listed in shown �le �=int gap ; =� minimum gap between consecutively listed source lines �=bool line shown ; =� did we list anything recently? �=bool showing source ; =� are we listing source lines? �=int pro�le gap ; =� the gap when printing �nal frequencies �=bool pro�le showing source ; =� showing source within �nal frequencies �=

49. hPrepare to list lines from a new source �le 49 i �fif (:src �le ) src �le = fopen (�le info [cur �le ]:name ; "r");else freopen (�le info [cur �le ]:name ; "r"; src �le );if (:src �le ) ffprintf (stderr ; "Warning: I can't open file %s; source listing omitted.\n";

�le info [cur �le ]:name );showing source = false ;return;

gprintf ("\"%s\"\n";�le info [cur �le ]:name );shown �le = cur �le ;shown line = 0;if (:�le info [cur �le ]:map) make map( );

gThis code is used in section 47.

50. Here is a simple application of show line . It is a recursive routine that printsthe frequency counts of all instructions that occur in a given subtree of the simulatedmemory and that were executed at least once. The subtree is traversed in symmetricorder; therefore the frequencies appear in increasing order of the instruction locations.

hSubroutines 12 i +�void print freqs ARGS((mem node �));void print freqs (p)

mem node �p;fregister int j;octa cur loc ;

if (p~ left ) print freqs (p~ left );for (j = 0; j < 512; j++)if (p~dat [j]:freq ) hPrint frequency data for location p~ loc + 4 � j 51 i;

if (p~right ) print freqs (p~right );g

Page 366: MMIXware - A RISC Computer for the Third Millennium - Knuth

359 MMIX-SIM: LOADING AND PRINTING SOURCE LINES

51. An ellipsis (...) is printed between frequency data for nonconsecutive instruc-tions, unless source line information intervenes.

hPrint frequency data for location p~ loc + 4 � j 51 i �fcur loc = incr (p~ loc ; 4 � j);if (showing source ^ p~dat [j]:line no) fcur �le = p~dat [j]:�le no ; cur line = p~dat [j]:line no ;line shown = false ;show line ( );if (line shown ) goto loc implied ;

gif (cur loc :l 6= implied loc :l _ cur loc :h 6= implied loc :h)if (pro�le started ) printf (" 0. ...\n");

loc implied : printf ("%10d. %08x%08x: %08x (%s)\n"; p~dat [j]:freq ; cur loc :h; cur loc :l;p~dat [j]:tet ; info [p~dat [j]:tet � 24]:name );

implied loc = incr (cur loc ; 4); pro�le started = true ;g

This code is used in section 50.

52. hGlobal variables 19 i +�octa implied loc ; =� location following the last shown frequency data �=bool pro�le started ; =� have we printed at least one frequency count? �=

53. hPrint all the frequency counts 53 i �fprintf ("\nProgram profile:\n");shown �le = cur �le = �1; shown line = cur line = 0;gap = pro�le gap ;showing source = pro�le showing source ;implied loc = neg one ;print freqs (mem root );

gThis code is used in section 141.

ARGS=macro ( ), x11.bool= enum, x9.cur �le : int, x31.cur line : int, x31.dat : mem tetra [ ], x16.false =0, x9.FILE, <stdio.h>.�le info : �le node [ ], x40.�le no : unsigned char, x16.fopen : FILE �( ), <stdio.h>.fprintf : int ( ), <stdio.h>.

freopen : FILE �( ), <stdio.h>.freq : tetra, x16.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.info : op info [ ], x65.l: tetra, x10.left : mem node �, x16.line no : unsigned short, x16.loc : octa, x16.make map : void ( ), x42.map : long �, x38.

mem node= struct, x16.mem root : mem node �, x19.name : char �, x38.neg one : octa, MMIX-ARITH x4.octa= struct, x10.printf : int ( ), <stdio.h>.right : mem node �, x16.show line : void ( ), x47.stderr : FILE �, <stdio.h>.tet : tetra, x16.true =1, x9.

Page 367: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: LISTS 360

54. Lists. This simulator needs to deal with 256 di�erent opcodes, so we mightas well enumerate them now.

hType declarations 9 i +�typedef enum f

TRAP; FCMP; FUN; FEQL; FADD; FIX; FSUB; FIXU;FLOT; FLOTI; FLOTU; FLOTUI; SFLOT; SFLOTI; SFLOTU; SFLOTUI;FMUL; FCMPE; FUNE; FEQLE; FDIV; FSQRT; FREM; FINT;MUL; MULI; MULU; MULUI; DIV; DIVI; DIVU; DIVUI;ADD; ADDI; ADDU; ADDUI; SUB; SUBI; SUBU; SUBUI;IIADDU; IIADDUI; IVADDU; IVADDUI; VIIIADDU; VIIIADDUI; XVIADDU; XVIADDUI;CMP; CMPI; CMPU; CMPUI; NEG; NEGI; NEGU; NEGUI;SL; SLI; SLU; SLUI; SR; SRI; SRU; SRUI;BN; BNB; BZ; BZB; BP; BPB; BOD; BODB;BNN; BNNB; BNZ; BNZB; BNP; BNPB; BEV; BEVB;PBN; PBNB; PBZ; PBZB; PBP; PBPB; PBOD; PBODB;PBNN; PBNNB; PBNZ; PBNZB; PBNP; PBNPB; PBEV; PBEVB;CSN; CSNI; CSZ; CSZI; CSP; CSPI; CSOD; CSODI;CSNN; CSNNI; CSNZ; CSNZI; CSNP; CSNPI; CSEV; CSEVI;ZSN; ZSNI; ZSZ; ZSZI; ZSP; ZSPI; ZSOD; ZSODI;ZSNN; ZSNNI; ZSNZ; ZSNZI; ZSNP; ZSNPI; ZSEV; ZSEVI;LDB; LDBI; LDBU; LDBUI; LDW; LDWI; LDWU; LDWUI;LDT; LDTI; LDTU; LDTUI; LDO; LDOI; LDOU; LDOUI;LDSF; LDSFI; LDHT; LDHTI; CSWAP; CSWAPI; LDUNC; LDUNCI;LDVTS; LDVTSI; PRELD; PRELDI; PREGO; PREGOI; GO; GOI;STB; STBI; STBU; STBUI; STW; STWI; STWU; STWUI;STT; STTI; STTU; STTUI; STO; STOI; STOU; STOUI;STSF; STSFI; STHT; STHTI; STCO; STCOI; STUNC; STUNCI;SYNCD; SYNCDI; PREST; PRESTI; SYNCID; SYNCIDI; PUSHGO; PUSHGOI;OR; ORI; ORN; ORNI; NOR; NORI; XOR; XORI;AND; ANDI; ANDN; ANDNI; NAND; NANDI; NXOR; NXORI;BDIF; BDIFI; WDIF; WDIFI; TDIF; TDIFI; ODIF; ODIFI;MUX; MUXI; SADD; SADDI; MOR; MORI; MXOR; MXORI;SETH; SETMH; SETML; SETL; INCH; INCMH; INCML; INCL;ORH; ORMH; ORML; ORL; ANDNH; ANDNMH; ANDNML; ANDNL;JMP; JMPB; PUSHJ; PUSHJB; GETA; GETAB; PUT; PUTI;POP; RESUME; SAVE; UNSAVE; SYNC; SWYM; GET; TRIP

g mmix opcode;

55. We also need to enumerate the special names for special registers.

hType declarations 9 i +�typedef enum frB ; rD ; rE ; rH ; rJ ; rM ; rR ; rBB ; rC ; rN ; rO ; rS ; rI ; rT ; rTT ; rK ; rQ ; rU ; rV ; rG ; rL;

rA; rF ; rP ; rW ; rX ; rY ; rZ ; rWW ; rXX ; rYY ; rZZg special reg;

56. hGlobal variables 19 i +�char �special name [32] = f"rB"; "rD"; "rE"; "rH"; "rJ"; "rM"; "rR"; "rBB"; "rC"; "rN";

"rO"; "rS"; "rI"; "rT"; "rTT"; "rK"; "rQ"; "rU"; "rV"; "rG"; "rL"; "rA"; "rF"; "rP";"rW"; "rX"; "rY"; "rZ"; "rWW"; "rXX"; "rYY"; "rZZ"g;

Page 368: MMIXware - A RISC Computer for the Third Millennium - Knuth

361 MMIX-SIM: LISTS

57. Here are the bit codes for arithmetic exceptions. These codes, except H_BIT,are de�ned also in MMIX-ARITH.

#de�ne X_BIT (1� 8) =� oating inexact �=#de�ne Z_BIT (1� 9) =� oating division by zero �=#de�ne U_BIT (1� 10) =� oating under ow �=#de�ne O_BIT (1� 11) =� oating over ow �=#de�ne I_BIT (1� 12) =� oating invalid operation �=#de�ne W_BIT (1� 13) =� oat-to-�x over ow �=#de�ne V_BIT (1� 14) =� integer over ow �=#de�ne D_BIT (1� 15) =� integer divide check �=#de�ne H_BIT (1� 16) =� trip �=58. The bkpt �eld associated with each tetrabyte of memory has bits associatedwith forced tracing and/or breaking for reading, writing, and/or execution.

#de�ne trace bit (1� 3)#de�ne read bit (1� 2)#de�ne write bit (1� 1)#de�ne exec bit (1� 0)

59. To complete our lists of lists, we enumerate the rudimentary operating systemcalls that are built in to MMIXAL.

#de�ne max sys call Ftell

hType declarations 9 i +�typedef enum fHalt ;Fopen ;Fclose ;Fread ;Fgets ;Fgetws ;Fwrite ;Fputs ;Fputws ;Fseek ;Ftell

g sys call;

bkpt : unsigned char, x16.

Page 369: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 362

60. The main loop. Now let's plunge in to the guts of the simulator, the masterswitch that controls most of the action.

hPerform one instruction 60 i �fif (resuming ) loc = incr (inst ptr ;�4); inst = g[rX ]:l;else hFetch the next instruction 63 i;op = inst � 24; xx = (inst � 16) & #

ff; yy = (inst � 8) & #ff; zz = inst & #

ff;f = info [op ]: ags ; yz = inst & #

ffff;x = y = z = a = b = zero octa ; exc = 0; old L = L;if (f & rel addr bit ) hConvert relative address to absolute address 70 i;h Install operand �elds 71 i;if (f &X is dest bit )h Install register X as the destination, adjusting the register stack if necessary 80 i;

w = oplus (y; z);if (loc :h � #

20000000) goto privileged inst ;switch (op) fhCases for individual MMIX instructions 84 i;ghCheck for trip interrupt 122 i;hUpdate the clocks 127 i;hTrace the current instruction, if requested 128 i;if (resuming ^ op 6= RESUME) resuming = false ;

gThis code is used in section 141.

61. Operands x and a are usually destinations (results), computed from the sourceoperands y, z, and/or b.

hGlobal variables 19 i +�octa w; x; y; z; a; b; ma ; mb ; =� operands �=octa �x ptr ; =� destination �=octa loc ; =� location of the current instruction �=octa inst ptr ; =� location of the next instruction �=tetra inst ; =� the current instruction �=int old L; =� value of L before the current instruction �=int exc ; =� exceptions raised by the current instruction �=int tracing exceptions ; =� exception bits that cause tracing �=int rop ; =� ropcode of a resumed instruction �=int round mode ; =� the style of oating point rounding just used �=bool resuming ; =� are we resuming an interrupted instruction? �=bool halted ; =� did the program come to a halt? �=bool breakpoint ; =� should we pause after the current instruction? �=bool tracing ; =� should we trace the current instruction? �=bool stack tracing ; =� should we trace details of the register stack? �=bool interacting ; =� are we in interactive mode? �=bool interact after break ; =� should we go into interactive mode? �=bool tripping ; =� are we about to go to a trip handler? �=bool good ; =� did the last branch instruction guess correctly? �=tetra trace threshold ; =� each instruction should be traced this many times �=

Page 370: MMIXware - A RISC Computer for the Third Millennium - Knuth

363 MMIX-SIM: THE MAIN LOOP

62. hLocal registers 62 i �register mmix opcode op ; =� operation code of the current instruction �=register int xx ; yy ; zz ; yz ; =� operand �elds of the current instruction �=register tetra f ; =� properties of the current op �=register int i; j; k; =� miscellaneous indices �=register mem tetra �ll ; =� current place in the simulated memory �=register char �p; =� current place in a string �=

See also section 75.

This code is used in section 141.

63. hFetch the next instruction 63 i �floc = inst ptr ;ll = mem �nd (loc);inst = ll~ tet ;cur �le = ll~�le no ;cur line = ll~ line no ;ll~ freq++;if (ll~bkpt & exec bit ) breakpoint = true ;tracing = breakpoint _ (ll~bkpt & trace bit ) _ (ll~ freq � trace threshold );inst ptr = incr (inst ptr ; 4);

gThis code is used in section 60.

64. Much of the simulation is table-driven, based on a static data structure calledthe op info for each operation code.

hType declarations 9 i +�typedef struct fchar �name ; =� symbolic name of an opcode �=unsigned char ags ; =� its instruction format �=unsigned char third operand ; =� its special register input �=unsigned char mems ; =� how many � it costs �=unsigned char oops ; =� how many � it costs �=char �trace format ; =� how it is appears when traced �=

g op info;

bkpt : unsigned char, x16.bool= enum, x9.cur �le : int, x31.cur line : int, x31.exec bit =macro, x58.false =0, x9.�le no : unsigned char, x16.freq : tetra, x16.g: octa [ ], x76.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.

info : op info [ ], x65.l: tetra, x10.L: register int, x75.line no : unsigned short, x16.mem �nd : mem tetra �( ),

x20.mem tetra= struct, x16.mmix opcode= enum, x54.octa= struct, x10.oplus : octa ( ), MMIX-ARITH x5.privileged inst : label, x107.

rel addr bit =#40, x65.

RESUME=#f9, x54.

rX =25, x55.tet : tetra, x16.tetra=unsigned int, x10.trace bit =macro, x58.true =1, x9.X is dest bit =#

20, x65.zero octa : octa,MMIX-ARITH x4.

Page 371: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 364

65. For example, the ags �eld of info [op ] tells us how to obtain the operands fromthe X, Y, and Z �elds of the current instruction. Each entry records special propertiesof an operation code, in binary notation: #1 means Z is an immediate value, #2 meansrZ is a source operand, #4 means Y is an immediate value, #8 means rY is a sourceoperand, #10 means rX is a source operand, #20 means rX is a destination, #40 meansYZ is part of a relative address, #80 means a push or pop or unsave instruction.The trace format �eld will be explained later.

#de�ne Z is immed bit #1

#de�ne Z is source bit #2

#de�ne Y is immed bit #4

#de�ne Y is source bit #8

#de�ne X is source bit #10

#de�ne X is dest bit #20

#de�ne rel addr bit #40

#de�ne push pop bit #80

hGlobal variables 19 i +�op info info [256] = fh Info for arithmetic commands 66 i; h Info for branch

commands 67 i; h Info for load/store commands 68 i; h Info for logical and controlcommands 69 ig;

66. h Info for arithmetic commands 66 i �f"TRAP";#0a; 255; 0; 5; "%r"g;

f"FCMP";#2a; 0; 0; 1; "%l = %.y cmp %.z = %x"g;f"FUN";#2a; 0; 0; 1; "%l = [%.y(||)%.z] = %x"g;f"FEQL";#2a; 0; 0; 1; "%l = [%.y(==)%.z] = %x"g;f"FADD";#2a; 0; 0; 4; "%l = %.y %(+%) %.z = %.x"g;f"FIX";#26; 0; 0; 4; "%l = %(fix%) %.z = %x"g;f"FSUB";#2a; 0; 0; 4; "%l = %.y %(-%) %.z = %.x"g;f"FIXU";#26; 0; 0; 4; "%l = %(fix%) %.z = %#x"g;f"FLOT";#26; 0; 0; 4; "%l = %(flot%) %z = %.x"g;f"FLOTI";#25; 0; 0; 4; "%l = %(flot%) %z = %.x"g;f"FLOTU";#26; 0; 0; 4; "%l = %(flot%) %#z = %.x"g;f"FLOTUI";#25; 0; 0; 4; "%l = %(flot%) %z = %.x"g;f"SFLOT";#26; 0; 0; 4; "%l = %(sflot%) %z = %.x"g;f"SFLOTI";#25; 0; 0; 4; "%l = %(sflot%) %z = %.x"g;f"SFLOTU";#26; 0; 0; 4; "%l = %(sflot%) %#z = %.x"g;f"SFLOTUI";#25; 0; 0; 4; "%l = %(sflot%) %z = %.x"g;f"FMUL";#2a; 0; 0; 4; "%l = %.y %(*%) %.z = %.x"g;f"FCMPE";#2a; rE ; 0; 4; "%l = %.y cmp %.z (%.b)) = %x"g;f"FUNE";#2a; rE ; 0; 1; "%l = [%.y(||)%.z (%.b)] = %x"g;f"FEQLE";#2a; rE ; 0; 4; "%l = [%.y(==)%.z (%.b)] = %x"g;f"FDIV";#2a; 0; 0; 40; "%l = %.y %(/%) %.z = %.x"g;f"FSQRT";#26; 0; 0; 40; "%l = %(sqrt%) %.z = %.x"g;f"FREM";#2a; 0; 0; 4; "%l = %.y %(rem%) %.z = %.x"g;f"FINT";#26; 0; 0; 4; "%l = %(int%) %.z = %.x"g;f"MUL";#2a; 0; 0; 10; "%l = %y * %z = %x"g;f"MULI";#29; 0; 0; 10; "%l = %y * %z = %x"g;f"MULU";#2a; 0; 0; 10; "%l = %#y * %#z = %#x, rH=%#a"g;

Page 372: MMIXware - A RISC Computer for the Third Millennium - Knuth

365 MMIX-SIM: THE MAIN LOOP

f"MULUI";#29; 0; 0; 10; "%l = %#y * %z = %#x, rH=%#a"g;f"DIV";#2a; 0; 0; 60; "%l = %y / %z = %x, rR=%a"g;f"DIVI";#29; 0; 0; 60; "%l = %y / %z = %x, rR=%a"g;f"DIVU";#2a; rD ; 0; 60; "%l = %#b%0y / %#z = %#x, rR=%#a"g;f"DIVUI";#29; rD ; 0; 60; "%l = %#b%0y / %z = %#x, rR=%#a"g;f"ADD";#2a; 0; 0; 1; "%l = %y + %z = %x"g;f"ADDI";#29; 0; 0; 1; "%l = %y + %z = %x"g;f"ADDU";#2a; 0; 0; 1; "%l = %#y + %#z = %#x"g;f"ADDUI";#29; 0; 0; 1; "%l = %#y + %z = %#x"g;f"SUB";#2a; 0; 0; 1; "%l = %y - %z = %x"g;f"SUBI";#29; 0; 0; 1; "%l = %y - %z = %x"g;f"SUBU";#2a; 0; 0; 1; "%l = %#y - %#z = %#x"g;f"SUBUI";#29; 0; 0; 1; "%l = %#y - %z = %#x"g;f"2ADDU";#2a; 0; 0; 1; "%l = %#y <<1+ %#z = %#x"g;f"2ADDUI";#29; 0; 0; 1; "%l = %#y <<1+ %z = %#x"g;f"4ADDU";#2a; 0; 0; 1; "%l = %#y <<2+ %#z = %#x"g;f"4ADDUI";#29; 0; 0; 1; "%l = %#y <<2+ %z = %#x"g;f"8ADDU";#2a; 0; 0; 1; "%l = %#y <<3+ %#z = %#x"g;f"8ADDUI";#29; 0; 0; 1; "%l = %#y <<3+ %z = %#x"g;f"16ADDU";#2a; 0; 0; 1; "%l = %#y <<4+ %#z = %#x"g;f"16ADDUI";#29; 0; 0; 1; "%l = %#y <<4+ %z = %#x"g;f"CMP";#2a; 0; 0; 1; "%l = %y cmp %z = %x"g;f"CMPI";#29; 0; 0; 1; "%l = %y cmp %z = %x"g;f"CMPU";#2a; 0; 0; 1; "%l = %#y cmp %#z = %x"g;f"CMPUI";#29; 0; 0; 1; "%l = %#y cmp %z = %x"g;f"NEG";#26; 0; 0; 1; "%l = %y - %z = %x"g;f"NEGI";#25; 0; 0; 1; "%l = %y - %z = %x"g;f"NEGU";#26; 0; 0; 1; "%l = %y - %#z = %#x"g;f"NEGUI";#25; 0; 0; 1; "%l = %y - %z = %#x"g;f"SL";#2a; 0; 0; 1; "%l = %y << %#z = %x"g;f"SLI";#29; 0; 0; 1; "%l = %y << %z = %x"g;f"SLU";#2a; 0; 0; 1; "%l = %#y << %#z = %#x"g;f"SLUI";#29; 0; 0; 1; "%l = %#y << %z = %#x"g;f"SR";#2a; 0; 0; 1; "%l = %y >> %#z = %x"g;f"SRI";#29; 0; 0; 1; "%l = %y >> %z = %x"g;f"SRU";#2a; 0; 0; 1; "%l = %#y >> %#z = %#x"g;f"SRUI";#29; 0; 0; 1; "%l = %#y >> %z = %#x"g

This code is used in section 65.

ags : unsigned char, x64.op : register mmix opcode,

x62.

op info= struct, x64.rD =1, x55.

rE =2, x55.trace format : char �, x64.

Page 373: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 366

67. h Info for branch commands 67 i �f"BN";#50; 0; 0; 1; "%b<0? %t%g"g;

f"BNB";#50; 0; 0; 1; "%b<0? %t%g"g;f"BZ";#50; 0; 0; 1; "%b==0? %t%g"g;f"BZB";#50; 0; 0; 1; "%b==0? %t%g"g;f"BP";#50; 0; 0; 1; "%b>0? %t%g"g;f"BPB";#50; 0; 0; 1; "%b>0? %t%g"g;f"BOD";#50; 0; 0; 1; "%b odd? %t%g"g;f"BODB";#50; 0; 0; 1; "%b odd? %t%g"g;f"BNN";#50; 0; 0; 1; "%b>=0? %t%g"g;f"BNNB";#50; 0; 0; 1; "%b>=0? %t%g"g;f"BNZ";#50; 0; 0; 1; "%b!=0? %t%g"g;f"BNZB";#50; 0; 0; 1; "%b!=0? %t%g"g;f"BNP";#50; 0; 0; 1; "%b<=0? %t%g"g;f"BNPB";#50; 0; 0; 1; "%b<=0? %t%g"g;f"BEV";#50; 0; 0; 1; "%b even? %t%g"g;f"BEVB";#50; 0; 0; 1; "%b even? %t%g"g;f"PBN";#50; 0; 0; 1; "%b<0? %t%g"g;f"PBNB";#50; 0; 0; 1; "%b<0? %t%g"g;f"PBZ";#50; 0; 0; 1; "%b==0? %t%g"g;f"PBZB";#50; 0; 0; 1; "%b==0? %t%g"g;f"PBP";#50; 0; 0; 1; "%b>0? %t%g"g;f"PBPB";#50; 0; 0; 1; "%b>0? %t%g"g;f"PBOD";#50; 0; 0; 1; "%b odd? %t%g"g;f"PBODB";#50; 0; 0; 1; "%b odd? %t%g"g;f"PBNN";#50; 0; 0; 1; "%b>=0? %t%g"g;f"PBNNB";#50; 0; 0; 1; "%b>=0? %t%g"g;f"PBNZ";#50; 0; 0; 1; "%b!=0? %t%g"g;f"PBNZB";#50; 0; 0; 1; "%b!=0? %t%g"g;f"PBNP";#50; 0; 0; 1; "%b<=0? %t%g"g;f"PBNPB";#50; 0; 0; 1; "%b<=0? %t%g"g;f"PBEV";#50; 0; 0; 1; "%b even? %t%g"g;f"PBEVB";#50; 0; 0; 1; "%b even? %t%g"g;f"CSN";#3a; 0; 0; 1; "%l = %y<0? %z: %b = %x"g;f"CSNI";#39; 0; 0; 1; "%l = %y<0? %z: %b = %x"g;f"CSZ";#3a; 0; 0; 1; "%l = %y==0? %z: %b = %x"g;f"CSZI";#39; 0; 0; 1; "%l = %y==0? %z: %b = %x"g;f"CSP";#3a; 0; 0; 1; "%l = %y>0? %z: %b = %x"g;f"CSPI";#39; 0; 0; 1; "%l = %y>0? %z: %b = %x"g;f"CSOD";#3a; 0; 0; 1; "%l = %y odd? %z: %b = %x"g;f"CSODI";#39; 0; 0; 1; "%l = %y odd? %z: %b = %x"g;f"CSNN";#3a; 0; 0; 1; "%l = %y>=0? %z: %b = %x"g;f"CSNNI";#39; 0; 0; 1; "%l = %y>=0? %z: %b = %x"g;f"CSNZ";#3a; 0; 0; 1; "%l = %y!=0? %z: %b = %x"g;f"CSNZI";#39; 0; 0; 1; "%l = %y!=0? %z: %b = %x"g;f"CSNP";#3a; 0; 0; 1; "%l = %y<=0? %z: %b = %x"g;f"CSNPI";#39; 0; 0; 1; "%l = %y<=0? %z: %b = %x"g;f"CSEV";#3a; 0; 0; 1; "%l = %y even? %z: %b = %x"g;f"CSEVI";#39; 0; 0; 1; "%l = %y even? %z: %b = %x"g;

Page 374: MMIXware - A RISC Computer for the Third Millennium - Knuth

367 MMIX-SIM: THE MAIN LOOP

f"ZSN";#2a; 0; 0; 1; "%l = %y<0? %z: 0 = %x"g;f"ZSNI";#29; 0; 0; 1; "%l = %y<0? %z: 0 = %x"g;f"ZSZ";#2a; 0; 0; 1; "%l = %y==0? %z: 0 = %x"g;f"ZSZI";#29; 0; 0; 1; "%l = %y==0? %z: 0 = %x"g;f"ZSP";#2a; 0; 0; 1; "%l = %y>0? %z: 0 = %x"g;f"ZSPI";#29; 0; 0; 1; "%l = %y>0? %z: 0 = %x"g;f"ZSOD";#2a; 0; 0; 1; "%l = %y odd? %z: 0 = %x"g;f"ZSODI";#29; 0; 0; 1; "%l = %y odd? %z: 0 = %x"g;f"ZSNN";#2a; 0; 0; 1; "%l = %y>=0? %z: 0 = %x"g;f"ZSNNI";#29; 0; 0; 1; "%l = %y>=0? %z: 0 = %x"g;f"ZSNZ";#2a; 0; 0; 1; "%l = %y!=0? %z: 0 = %x"g;f"ZSNZI";#29; 0; 0; 1; "%l = %y!=0? %z: 0 = %x"g;f"ZSNP";#2a; 0; 0; 1; "%l = %y<=0? %z: 0 = %x"g;f"ZSNPI";#29; 0; 0; 1; "%l = %y<=0? %z: 0 = %x"g;f"ZSEV";#2a; 0; 0; 1; "%l = %y even? %z: 0 = %x"g;f"ZSEVI";#29; 0; 0; 1; "%l = %y even? %z: 0 = %x"g

This code is used in section 65.

Page 375: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 368

68. h Info for load/store commands 68 i �f"LDB";#2a; 0; 1; 1; "%l = M1[%#y+%#z] = %x"g;

f"LDBI";#29; 0; 1; 1; "%l = M1[%#y%?+] = %x"g;f"LDBU";#2a; 0; 1; 1; "%l = M1[%#y+%#z] = %#x"g;f"LDBUI";#29; 0; 1; 1; "%l = M1[%#y%?+] = %#x"g;f"LDW";#2a; 0; 1; 1; "%l = M2[%#y+%#z] = %x"g;f"LDWI";#29; 0; 1; 1; "%l = M2[%#y%?+] = %x"g;f"LDWU";#2a; 0; 1; 1; "%l = M2[%#y+%#z] = %#x"g;f"LDWUI";#29; 0; 1; 1; "%l = M2[%#y%?+] = %#x"g;f"LDT";#2a; 0; 1; 1; "%l = M4[%#y+%#z] = %x"g;f"LDTI";#29; 0; 1; 1; "%l = M4[%#y%?+] = %x"g;f"LDTU";#2a; 0; 1; 1; "%l = M4[%#y+%#z] = %#x"g;f"LDTUI";#29; 0; 1; 1; "%l = M4[%#y%?+] = %#x"g;f"LDO";#2a; 0; 1; 1; "%l = M8[%#y+%#z] = %x"g;f"LDOI";#29; 0; 1; 1; "%l = M8[%#y%?+] = %x"g;f"LDOU";#2a; 0; 1; 1; "%l = M8[%#y+%#z] = %#x"g;f"LDOUI";#29; 0; 1; 1; "%l = M8[%#y%?+] = %#x"g;f"LDSF";#2a; 0; 1; 1; "%l = (M4[%#y+%#z]) = %.x"g;f"LDSFI";#29; 0; 1; 1; "%l = (M4[%#y%?+]) = %.x"g;f"LDHT";#2a; 0; 1; 1; "%l = M4[%#y+%#z]<<32 = %#x"g;f"LDHTI";#29; 0; 1; 1; "%l = M4[%#y%?+]<<32 = %#x"g;f"CSWAP";#3a; 0; 2; 2; "%l = [M8[%#y+%#z]==%a] = %x, %r"g;f"CSWAPI";#39; 0; 2; 2; "%l = [M8[%#y%?+]==%a] = %x, %r"g;f"LDUNC";#2a; 0; 1; 1; "%l = M8[%#y+%#z] = %#x"g;f"LDUNCI";#29; 0; 1; 1; "%l = M8[%#y%?+] = %#x"g;f"LDVTS";#2a; 0; 0; 1; ""g;f"LDVTSI";#29; 0; 0; 1; ""g;f"PRELD";#0a; 0; 0; 1; "[%#y+%#z .. %#x]"g;f"PRELDI";#09; 0; 0; 1; "[%#y%?+ .. %#x]"g;f"PREGO";#0a; 0; 0; 1; "[%#y+%#z .. %#x]"g;f"PREGOI";#09; 0; 0; 1; "[%#y%?+ .. %#x]"g;f"GO";#2a; 0; 0; 3; "%l = %#x, -> %#y+%#z"g;f"GOI";#29; 0; 0; 3; "%l = %#x, -> %#y%?+"g;f"STB";#1a; 0; 1; 1; "M1[%#y+%#z] = %b, M8[%#w]=%#a"g;f"STBI";#19; 0; 1; 1; "M1[%#y%?+] = %b, M8[%#w]=%#a"g;f"STBU";#1a; 0; 1; 1; "M1[%#y+%#z] = %#b, M8[%#w]=%#a"g;f"STBUI";#19; 0; 1; 1; "M1[%#y%?+] = %#b, M8[%#w]=%#a"g;f"STW";#1a; 0; 1; 1; "M2[%#y+%#z] = %b, M8[%#w]=%#a"g;f"STWI";#19; 0; 1; 1; "M2[%#y%?+] = %b, M8[%#w]=%#a"g;f"STWU";#1a; 0; 1; 1; "M2[%#y+%#z] = %#b, M8[%#w]=%#a"g;f"STWUI";#19; 0; 1; 1; "M2[%#y%?+] = %#b, M8[%#w]=%#a"g;f"STT";#1a; 0; 1; 1; "M4[%#y+%#z] = %b, M8[%#w]=%#a"g;f"STTI";#19; 0; 1; 1; "M4[%#y%?+] = %b, M8[%#w]=%#a"g;f"STTU";#1a; 0; 1; 1; "M4[%#y+%#z] = %#b, M8[%#w]=%#a"g;f"STTUI";#19; 0; 1; 1; "M4[%#y%?+] = %#b, M8[%#w]=%#a"g;f"STO";#1a; 0; 1; 1; "M8[%#y+%#z] = %b"g;f"STOI";#19; 0; 1; 1; "M8[%#y%?+] = %b"g;f"STOU";#1a; 0; 1; 1; "M8[%#y+%#z] = %#b"g;f"STOUI";#19; 0; 1; 1; "M8[%#y%?+] = %#b"g;

Page 376: MMIXware - A RISC Computer for the Third Millennium - Knuth

369 MMIX-SIM: THE MAIN LOOP

f"STSF";#1a; 0; 1; 1; "%(M4[%#y+%#z]%) = %.b, M8[%#w]=%#a"g;f"STSFI";#19; 0; 1; 1; "%(M4[%#y%?+]%) = %.b, M8[%#w]=%#a"g;f"STHT";#1a; 0; 1; 1; "M4[%#y+%#z] = %#b>>32, M8[%#w]=%#a"g;f"STHTI";#19; 0; 1; 1; "M4[%#y%?+] = %#b>>32, M8[%#w]=%#a"g;f"STCO";#0a; 0; 1; 1; "M8[%#y+%#z] = %b"g;f"STCOI";#09; 0; 1; 1; "M8[%#y%?+] = %b"g;f"STUNC";#1a; 0; 1; 1; "M8[%#y+%#z] = %#b"g;f"STUNCI";#19; 0; 1; 1; "M8[%#y%?+] = %#b"g;f"SYNCD";#0a; 0; 0; 1; "[%#y+%#z .. %#x]"g;f"SYNCDI";#09; 0; 0; 1; "[%#y%?+ .. %#x]"g;f"PREST";#0a; 0; 0; 1; "[%#y+%#z .. %#x]"g;f"PRESTI";#09; 0; 0; 1; "[%#y%?+ .. %#x]"g;f"SYNCID";#0a; 0; 0; 1; "[%#y+%#z .. %#x]"g;f"SYNCIDI";#09; 0; 0; 1; "[%#y%?+ .. %#x]"g;f"PUSHGO";#8a; 0; 0; 3; "%lrO=%#b, rL=%a, rJ=%#x, -> %#y+%#z"g;f"PUSHGOI";#89; 0; 0; 3; "%lrO=%#b, rL=%a, rJ=%#x, -> %#y%?+"g

This code is used in section 65.

Page 377: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 370

69. h Info for logical and control commands 69 i �f"OR";#2a; 0; 0; 1; "%l = %#y | %#z = %#x"g;

f"ORI";#29; 0; 0; 1; "%l = %#y | %z = %#x"g;f"ORN";#2a; 0; 0; 1; "%l = %#y |~ %#z = %#x"g;f"ORNI";#29; 0; 0; 1; "%l = %#y |~ %z = %#x"g;f"NOR";#2a; 0; 0; 1; "%l = %#y ~| %#z = %#x"g;f"NORI";#29; 0; 0; 1; "%l = %#y ~| %z = %#x"g;f"XOR";#2a; 0; 0; 1; "%l = %#y ^ %#z = %#x"g;f"XORI";#29; 0; 0; 1; "%l = %#y ^ %z = %#x"g;f"AND";#2a; 0; 0; 1; "%l = %#y & %#z = %#x"g;f"ANDI";#29; 0; 0; 1; "%l = %#y & %z = %#x"g;f"ANDN";#2a; 0; 0; 1; "%l = %#y \\ %#z = %#x"g;f"ANDNI";#29; 0; 0; 1; "%l = %#y \\ %z = %#x"g;f"NAND";#2a; 0; 0; 1; "%l = %#y ~& %#z = %#x"g;f"NANDI";#29; 0; 0; 1; "%l = %#y ~& %z = %#x"g;f"NXOR";#2a; 0; 0; 1; "%l = %#y ~^ %#z = %#x"g;f"NXORI";#29; 0; 0; 1; "%l = %#y ~^ %z = %#x"g;f"BDIF";#2a; 0; 0; 1; "%l = %#y bdif %#z = %#x"g;f"BDIFI";#29; 0; 0; 1; "%l = %#y bdif %z = %#x"g;f"WDIF";#2a; 0; 0; 1; "%l = %#y wdif %#z = %#x"g;f"WDIFI";#29; 0; 0; 1; "%l = %#y wdif %z = %#x"g;f"TDIF";#2a; 0; 0; 1; "%l = %#y tdif %#z = %#x"g;f"TDIFI";#29; 0; 0; 1; "%l = %#y tdif %z = %#x"g;f"ODIF";#2a; 0; 0; 1; "%l = %#y odif %#z = %#x"g;f"ODIFI";#29; 0; 0; 1; "%l = %#y odif %z = %#x"g;f"MUX";#2a; rM ; 0; 1; "%l = %#b? %#y: %#z = %#x"g;f"MUXI";#29; rM ; 0; 1; "%l = %#b? %#y: %z = %#x"g;f"SADD";#2a; 0; 0; 1; "%l = nu(%#y\\%#z) = %x"g;f"SADDI";#29; 0; 0; 1; "%l = nu(%#y%?\\) = %x"g;f"MOR";#2a; 0; 0; 1; "%l = %#y mor %#z = %#x"g;f"MORI";#29; 0; 0; 1; "%l = %#y mor %z = %#x"g;f"MXOR";#2a; 0; 0; 1; "%l = %#y mxor %#z = %#x"g;f"MXORI";#29; 0; 0; 1; "%l = %#y mxor %z = %#x"g;f"SETH";#20; 0; 0; 1; "%l = %#z"g;f"SETMH";#20; 0; 0; 1; "%l = %#z"g;f"SETML";#20; 0; 0; 1; "%l = %#z"g;f"SETL";#20; 0; 0; 1; "%l = %#z"g;f"INCH";#30; 0; 0; 1; "%l = %#y + %#z = %#x"g;f"INCMH";#30; 0; 0; 1; "%l = %#y + %#z = %#x"g;f"INCML";#30; 0; 0; 1; "%l = %#y + %#z = %#x"g;f"INCL";#30; 0; 0; 1; "%l = %#y + %#z = %#x"g;f"ORH";#30; 0; 0; 1; "%l = %#y | %#z = %#x"g;f"ORMH";#30; 0; 0; 1; "%l = %#y | %#z = %#x"g;f"ORML";#30; 0; 0; 1; "%l = %#y | %#z = %#x"g;f"ORL";#30; 0; 0; 1; "%l = %#y | %#z = %#x"g;f"ANDNH";#30; 0; 0; 1; "%l = %#y \\ %#z = %#x"g;f"ANDNMH";#30; 0; 0; 1; "%l = %#y \\ %#z = %#x"g;f"ANDNML";#30; 0; 0; 1; "%l = %#y \\ %#z = %#x"g;f"ANDNL";#30; 0; 0; 1; "%l = %#y \\ %#z = %#x"g;

Page 378: MMIXware - A RISC Computer for the Third Millennium - Knuth

371 MMIX-SIM: THE MAIN LOOP

f"JMP";#40; 0; 0; 1; "-> %#z"g;f"JMPB";#40; 0; 0; 1; "-> %#z"g;f"PUSHJ";#c0; 0; 0; 1; "%lrO=%#b, rL=%a, rJ=%#x, -> %#z"g;f"PUSHJB";#c0; 0; 0; 1; "%lrO=%#b, rL=%a, rJ=%#x, -> %#z"g;f"GETA";#60; 0; 0; 1; "%l = %#z"g;f"GETAB";#60; 0; 0; 1; "%l = %#z"g;f"PUT";#02; 0; 0; 1; "%s = %r"g;f"PUTI";#01; 0; 0; 1; "%s = %r"g;f"POP";#80; rJ ; 0; 3; "%lrL=%a, rO=%#b, -> %#y%?+"g;f"RESUME";#00; 0; 0; 5; "{%#b} -> %#z"g;f"SAVE";#20; 0; 20; 1; "%l = %#x"g;f"UNSAVE";#82; 0; 20; 1; "%#z: rG=%x, ..., rL=%a"g;f"SYNC";#01; 0; 0; 1; ""g;f"SWYM";#00; 0; 0; 1; ""g;f"GET";#20; 0; 0; 1; "%l = %s = %#x"g;f"TRIP";#0a; 255; 0; 5; "rY=%#y, rZ=%#z, rB=%#b, g[255]=0"g

This code is used in section 65.

70. hConvert relative address to absolute address 70 i �fif ((op & #

fe) � JMP) yz = inst & #ffffff;

if (op & 1) yz �= (op � JMPB ? #1000000 : #10000);y = inst ptr ; z = incr (loc ; yz � 2);

gThis code is used in section 60.

71. h Install operand �elds 71 i �if (resuming ^ rop 6= RESUME_AGAIN)h Install special operands when resuming an interrupted operation 126 i

else fif (f & #

10) h Set b from register X 74 i;if (info [op ]:third operand ) hSet b from special register 79 i;if (f & #

1) z:l = zz ;else if (f & #

2) h Set z from register Z 72 ielse if ((op & #

f0) � SETH) h Set z as an immediate wyde 78 i;if (f & #

4) y:l = yy ;else if (f & #

8) h Set y from register Y 73 i;g

This code is used in section 60.

b: octa, x61.f : register tetra, x62.incr : octa ( ), MMIX-ARITH x6.info : op info [ ], x65.inst : tetra, x61.inst ptr : octa, x61.JMP=#

f0, x54.JMPB=#

f1, x54.l: tetra, x10.

loc : octa, x61.op : register mmix opcode,

x62.RESUME_AGAIN=0, x125.resuming : bool, x61.rJ =4, x55.rM =5, x55.rop : int, x61.

SETH=#e0, x54.

third operand : unsignedchar, x64.

y: octa, x61.yy : register int, x62.yz : register int, x62.z: octa, x61.zz : register int, x62.

Page 379: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 372

72. There are 256 global registers, g[0] through g[255]; the �rst 32 of them are usedfor the special registers rA, rB , etc. There are lring mask +1 local registers, usually256 but the user can increase this to a larger power of 2 if desired.The current values of rL, rG, rO, and rS are kept in separate variables called L, G,

O, and S for convenience. (In fact, O and S actually hold the values rO/8 and rS/8,modulo lring size .)

hSet z from register Z 72 i �fif (zz � G) z = g[zz ];else if (zz < L) z = l[(O + zz ) & lring mask ];

gThis code is used in section 71.

73. h Set y from register Y 73 i �fif (yy � G) y = g[yy ];else if (yy < L) y = l[(O + yy ) & lring mask ];

gThis code is used in section 71.

74. h Set b from register X 74 i �fif (xx � G) b = g[xx ];else if (xx < L) b = l[(O + xx ) & lring mask ];

gThis code is used in section 71.

75. hLocal registers 62 i +�register int G; L; O; =� accessible copies of key registers �=

76. hGlobal variables 19 i +�octa g[256]; =� global registers �=octa �l; =� local registers �=int lring size ; =� the number of local registers (a power of 2) �=int lring mask ; =� one less than lring size �=int S; =� congruent to rS� 3 modulo lring size �=

77. Several of the global registers have constant values, because of the way MMIX

has been simpli�ed in this simulator.Special register rN has a constant value identifying the time of compilation. (The

macro ABSTIME is de�ned externally in the �le abstime.h, which should have justbeen created by ABSTIME; ABSTIME is a trivial program that computes the valueof the standard library function time (�). We assume that this number, which is thenumber of seconds in the \UNIX epoch," is less than 232. Beware: Our assumptionwill fail in February of 2106.)

#de�ne VERSION 1 =� version of the MMIX architecture that we support �=#de�ne SUBVERSION 0 =� secondary byte of version number �=#de�ne SUBSUBVERSION 1 =� further quali�cation to version number �=

Page 380: MMIXware - A RISC Computer for the Third Millennium - Knuth

373 MMIX-SIM: THE MAIN LOOP

h Initialize everything 14 i +�g[rK ] = neg one ;g[rN ]:h = (VERSION � 24) + (SUBVERSION � 16) + (SUBSUBVERSION � 8);g[rN ]:l = ABSTIME; =� see comment and warning above �=g[rT ]:h = #

80000005;g[rTT ]:h = #

80000006;g[rV ]:h = #

369c2004;if (lring size < 256) lring size = 256;lring mask = lring size � 1;if (lring size & lring mask )panic("The number of local registers must be a power of 2");

l = (octa �) calloc(lring size ; sizeof (octa));if (:l) panic("No room for the local registers");cur round = ROUND_NEAR;

78. In operations like INCH, we want z to be the yz �eld, shifted left 48 bits. Wealso want y to be register X, which has previously been placed in b; then INCH canbe simulated as if it were ADDU.

hSet z as an immediate wyde 78 i �fswitch (op & 3) fcase 0: z:h = yz � 16; break;case 1: z:h = yz ; break;case 2: z:l = yz � 16; break;case 3: z:l = yz ; break;gy = b;

gThis code is used in section 71.

79. h Set b from special register 79 i �b = g[info [op ]:third operand ];

This code is used in section 71.

ABSTIME=macro, abstime.h.ADDU=#

22, x54.b: octa, x61.calloc : void �( ), <stdlib.h>.cur round : int,MMIX-ARITH x30.

h: tetra, x10.INCH=#

e4, x54.info : op info [ ], x65.l: tetra, x10.neg one : octa, MMIX-ARITH x4.

octa= struct, x10.op : register mmix opcode,

x62.panic =macro ( ), x14.rA=21, x55.rB =0, x55.rK =15, x55.rN =9, x55.ROUND_NEAR=4, x100.rT =13, x55.rTT =14, x55.

rV =18, x55.third operand : unsignedchar, x64.

time : time t ( ), <time.h>.xx : register int, x62.y: octa, x61.yy : register int, x62.yz : register int, x62.z: octa, x61.zz : register int, x62.

Page 381: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: THE MAIN LOOP 374

80. h Install register X as the destination, adjusting the register stack if necessary 80 i �if (xx � G) fsprintf (lhs ; "$%d=g[%d]"; xx ; xx );x ptr = &g[xx ];

g else fwhile (xx � L) h Increase rL 81 i;sprintf (lhs ; "$%d=l[%d]"; xx ; (O + xx ) & lring mask );x ptr = &l[(O + xx ) & lring mask ];

gThis code is used in section 60.

81. h Increase rL 81 i �fif (((S �O) & lring mask ) � L ^ L 6= 0) stack store ( );l[(O + L) & lring mask ] = zero octa ;L = g[rL]:l = L+ 1;

gThis code is used in section 80.

82. The stack store routine advances the \gamma" pointer in the ring of localregisters, by storing the oldest local register into memory location rS and advancingrS.

#de�ne test store bkpt (ll ) if ((ll )~bkpt & write bit ) breakpoint = tracing = true

hSubroutines 12 i +�void stack store ARGS((void));void stack store ( )fregister mem tetra �ll = mem �nd (g[rS ]);register int k = S & lring mask ;

ll~ tet = l[k]:h; test store bkpt (ll );(ll + 1)~ tet = l[k]:l; test store bkpt (ll + 1);if (stack tracing ) ftracing = true ;if (cur line ) show line ( );printf (" M8[#%08x%08x]=l[%d]=#%08x%08x, rS+=8\n"; g[rS ]:h;

g[rS ]:l; k; l[k]:h; l[k]:l);gg[rS ] = incr (g[rS ]; 8); S++;

g83. The stack load routine is essentially the inverse of stack store .

#de�ne test load bkpt (ll ) if ((ll )~bkpt & read bit ) breakpoint = tracing = true

hSubroutines 12 i +�void stack load ARGS((void));void stack load ( )fregister mem tetra �ll ;register int k;

Page 382: MMIXware - A RISC Computer for the Third Millennium - Knuth

375 MMIX-SIM: THE MAIN LOOP

S��; g[rS ] = incr (g[rS ];�8);ll = mem �nd (g[rS ]);k = S & lring mask ;l[k]:h = ll~ tet ; test load bkpt (ll );l[k]:l = (ll + 1)~ tet ; test load bkpt (ll + 1);if (stack tracing ) ftracing = true ;if (cur line ) show line ( );printf (" rS-=8, l[%d]=M8[#%08x%08x]=#%08x%08x\n"; k; g[rS ]:h;

g[rS ]:l; l[k]:h; l[k]:l);g

g

ARGS=macro ( ), x11.bkpt : unsigned char, x16.breakpoint : bool, x61.cur line : int, x31.g: octa [ ], x76.G: register int, x75.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.l: octa �, x76.L: register int, x75.l: tetra, x10.

lhs : char [ ], x139.lring mask : int, x76.mem �nd : mem tetra �( ),

x20.mem tetra= struct, x16.O: register int, x75.printf : int ( ), <stdio.h>.read bit =macro, x58.rL=20, x55.rS =11, x55.S: int, x76.

show line : void ( ), x47.sprintf : int ( ), <stdio.h>.stack tracing : bool, x61.tet : tetra, x16.tracing : bool, x61.true =1, x9.write bit =macro, x58.x ptr : octa �, x61.xx : register int, x62.zero octa : octa,MMIX-ARITH x4.

Page 383: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 376

84. Simulating the instructions. The master switch branches in 256 directions,one for each MMIX instruction.Let's start with ADD, since it is somehow the most typical case|not too easy, and

not too hard. The task is to compute x = y + z, and to signal over ow if the sum isout of range. Over ow occurs if and only if y and z have the same sign but the sumhas a di�erent sign.Over ow is one of the eight arithmetic exceptions. We record such exceptions in

a variable called exc , which is set to zero at the beginning of each cycle and used toupdate rA at the end.The main control routine has put the input operands into octabytes y and z. It

has also made x ptr point to the octabyte where the result should be placed.

hCases for individual MMIX instructions 84 i �case ADD: case ADDI: x = w; =� w = oplus (y; z) �=if (((y:h� z:h) & sign bit ) � 0 ^ ((y:h� x:h) & sign bit ) 6= 0) exc j= V_BIT;

store x : �x ptr = x; break;

See also sections 85, 86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 101, 102, 104, 106, 107, 108, and 124.

This code is used in section 60.

85. Other cases of signed and unsigned addition and subtraction are, of course,similar. Over ow occurs in the calculation x = y � z if and only if it occurs in thecalculation y = x+ z.

hCases for individual MMIX instructions 84 i +�case SUB: case SUBI: case NEG: case NEGI: x = ominus (y; z);if (((x:h� z:h) & sign bit ) � 0 ^ ((x:h� y:h) & sign bit ) 6= 0) exc j= V_BIT;goto store x ;

case ADDU: case ADDUI: case INCH: case INCMH: case INCML: case INCL: x = w;goto store x ;

case SUBU: case SUBUI: case NEGU: case NEGUI: x = ominus (y; z); goto store x ;case IIADDU: case IIADDUI: case IVADDU: case IVADDUI: case VIIIADDU:case VIIIADDUI: case XVIADDU: case XVIADDUI:x = oplus (shift left (y; ((op & #

f)� 1)� 3); z); goto store x ;case SETH: case SETMH: case SETML: case SETL: case GETA: case GETAB: x = z;goto store x ;

86. Let's get the simple bitwise operations out of the way too.

hCases for individual MMIX instructions 84 i +�case OR: case ORI: case ORH: case ORMH: case ORML: case ORL: x:h = y:h j z:h;

x:l = y:l j z:l; goto store x ;case ORN: case ORNI: x:h = y:h j �z:h; x:l = y:l j �z:l; goto store x ;case NOR: case NORI: x:h = �(y:h j z:h); x:l = �(y:l j z:l); goto store x ;case XOR: case XORI: x:h = y:h� z:h; x:l = y:l � z:l; goto store x ;case AND: case ANDI: x:h = y:h& z:h; x:l = y:l & z:l; goto store x ;case ANDN: case ANDNI: case ANDNH: case ANDNMH: case ANDNML: case ANDNL:

x:h = y:h&�z:h; x:l = y:l &�z:l; goto store x ;case NAND: case NANDI: x:h = �(y:h& z:h); x:l = �(y:l & z:l); goto store x ;case NXOR: case NXORI: x:h = �(y:h� z:h); x:l = �(y:l � z:l); goto store x ;

Page 384: MMIXware - A RISC Computer for the Third Millennium - Knuth

377 MMIX-SIM: SIMULATING THE INSTRUCTIONS

ADD=#20, x54.

ADDI=#21, x54.

ADDU=#22, x54.

ADDUI=#23, x54.

AND=#c8, x54.

ANDI=#c9, x54.

ANDN=#ca, x54.

ANDNH=#ec, x54.

ANDNI=#cb, x54.

ANDNL=#ef, x54.

ANDNMH=#ed, x54.

ANDNML=#ee, x54.

exc : int, x61.GETA=#

f4, x54.GETAB=#

f5, x54.h: tetra, x10.IIADDU=#

28, x54.IIADDUI=#

29, x54.INCH=#

e4, x54.INCL=#

e7, x54.INCMH=#

e5, x54.INCML=#

e6, x54.IVADDU=#

2a, x54.IVADDUI=#

2b, x54.

l: tetra, x10.NAND=#

cc, x54.NANDI=#

cd, x54.NEG=#

34, x54.NEGI=#

35, x54.NEGU=#

36, x54.NEGUI=#

37, x54.NOR=#

c4, x54.NORI=#

c5, x54.NXOR=#

ce, x54.NXORI=#

cf, x54.ominus : octa ( ),MMIX-ARITH x5.

op : register mmix opcode,x62.

oplus : octa ( ), MMIX-ARITH x5.OR=#

c0, x54.ORH=#

e8, x54.ORI=#

c1, x54.ORL=#

eb, x54.ORMH=#

e9, x54.ORML=#

ea, x54.ORN=#

c2, x54.ORNI=#

c3, x54.

SETH=#e0, x54.

SETL=#e3, x54.

SETMH=#e1, x54.

SETML=#e2, x54.

shift left : octa ( ),MMIX-ARITH x7.

sign bit =macro, x15.SUB=#

24, x54.SUBI=#

25, x54.SUBU=#

26, x54.SUBUI=#

27, x54.V_BIT=macro, x57.VIIIADDU=#

2c, x54.VIIIADDUI=#

2d, x54.w: octa, x61.x: octa, x61.x ptr : octa �, x61.XOR=#

c6, x54.XORI=#

c7, x54.XVIADDU=#

2e, x54.XVIADDUI=#

2f, x54.y: octa, x61.z: octa, x61.

Page 385: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 378

87. The less simple bit manipulations are almost equally simple, given the subrou-tines of MMIX-ARITH. The MUX operation has three inputs; in such cases the inputsappear in y, z, and b.

#de�ne shift amt (z:h _ z:l � 64 ? 64 : z:l)

hCases for individual MMIX instructions 84 i +�case SL: case SLI: x = shift left (y; shift amt );

a = shift right (x; shift amt ; 0);if (a:h 6= y:h _ a:l 6= y:l) exc j= V_BIT;goto store x ;

case SLU: case SLUI: x = shift left (y; shift amt ); goto store x ;case SR: case SRI: case SRU: case SRUI: x = shift right (y; shift amt ; op & #

2);goto store x ;

case MUX: case MUXI: x:h = (y:h& b:h) j (z:h&�b:h); x:l = (y:l & b:l) j (z:l &�b:l);goto store x ;

case SADD: case SADDI: x:l = count bits (y:h&�z:h)+ count bits (y:l&�z:l); goto store x ;case MOR: case MORI: x = bool mult (y; z; false ); goto store x ;case MXOR: case MXORI: x = bool mult (y; z; true ); goto store x ;case BDIF: case BDIFI: x:h = byte di� (y:h; z:h); x:l = byte di� (y:l; z:l); goto store x ;case WDIF: case WDIFI: x:h = wyde di� (y:h; z:h); x:l = wyde di� (y:l; z:l); goto store x ;case TDIF: case TDIFI: if (y:h > z:h) x:h = y:h� z:h;tdif l : if (y:l > z:l) x:l = y:l � z:l; goto store x ;case ODIF: case ODIFI: if (y:h > z:h) x = ominus (y; z);else if (y:h � z:h) goto tdif l ;goto store x ;

88. When an operation has two outputs, the primary output is placed in x and theauxiliary output is placed in a.

hCases for individual MMIX instructions 84 i +�case MUL: case MULI: x = signed omult (y; z);test over ow : if (over ow ) exc j= V_BIT;goto store x ;

case MULU: case MULUI: x = omult (y; z); a = g[rH ] = aux ; goto store x ;case DIV: case DIVI: x = signed odiv (y; z); a = g[rR ] = aux ; goto test over ow ;case DIVU: case DIVUI: x = odiv (b; y; z); a = g[rR ] = aux ; goto store x ;

89. The oating point routines of MMIX-ARITH record exceptional events in avariable called exceptions . Here we simply merge those bits into the exc variable. TheU_BIT is not exactly the same as \under ow," but the true de�nition of under owwill be applied when exc is combined with rA.

hCases for individual MMIX instructions 84 i +�case FADD: x = fplus (y; z);�n oat : round mode = cur round ;store fx : exc j= exceptions ; goto store x ;case FSUB: a = z; if (fcomp(a; zero octa ) 6= 2) a:h �= sign bit ;

x = fplus (y; a); goto �n oat ;case FMUL: x = fmult (y; z); goto �n oat ;case FDIV: x = fdivide (y; z); goto �n oat ;case FREM: x = fremstep(y; z; 2500); goto �n oat ;

Page 386: MMIXware - A RISC Computer for the Third Millennium - Knuth

379 MMIX-SIM: SIMULATING THE INSTRUCTIONS

case FSQRT: x = froot (z; y:l);�n uni oat : if (y:h _ y:l > 4) goto illegal inst ;round mode = (y:l ? y:l : cur round ); goto store fx ;

case FINT: x = �ntegerize (z; y:l); goto �n uni oat ;case FIX: x = �xit (z; y:l); goto �n uni oat ;case FIXU: x = �xit (z; y:l); exceptions &= �W_BIT; goto �n uni oat ;case FLOT: case FLOTI: case FLOTU: case FLOTUI: case SFLOT: case SFLOTI:case SFLOTU: case SFLOTUI: x = oatit (z; y:l; op & #

2; op & #4); goto �n uni oat ;

a: octa, x61.aux : octa, MMIX-ARITH x4.b: octa, x61.BDIF=#

d0, x54.BDIFI=#

d1, x54.bool mult : octa ( ),MMIX-ARITH x29.

byte di� : tetra ( ),MMIX-ARITH x27.

count bits : int ( ),MMIX-ARITH x26.

cur round : int,MMIX-ARITH x30.

DIV=#1c, x54.

DIVI=#1d, x54.

DIVU=#1e, x54.

DIVUI=#1f, x54.

exc : int, x61.exceptions : int,MMIX-ARITH x32.

FADD=#04, x54.

false =0, x9.fcomp : int ( ), MMIX-ARITH x84.FDIV=#

14, x54.fdivide : octa ( ),MMIX-ARITH x44.

FINT=#17, x54.

�ntegerize : octa ( ),MMIX-ARITH x86.

FIX=#05, x54.

�xit : octa ( ), MMIX-ARITH x88.FIXU=#

07, x54. oatit : octa ( ),MMIX-ARITH x89.

FLOT=#08, x54.

FLOTI=#09, x54.

FLOTU=#0a, x54.

FLOTUI=#0b, x54.

FMUL=#10, x54.

fmult : octa ( ),MMIX-ARITH x41.

fplus : octa ( ),MMIX-ARITH x46.

FREM=#16, x54.

fremstep : octa ( ),MMIX-ARITH x93.

froot : octa ( ),MMIX-ARITH x912.

FSQRT=#15, x54.

FSUB=#06, x54.

g: octa [ ], x76.h: tetra, x10.illegal inst : label, x107.l: tetra, x10.MOR=#

dc, x54.MORI=#

dd, x54.MUL=#

18, x54.MULI=#

19, x54.MULU=#

1a, x54.MULUI=#

1b, x54.MUX=#

d8, x54.MUXI=#

d9, x54.MXOR=#

de, x54.MXORI=#

df, x54.ODIF=#

d6, x54.ODIFI=#

d7, x54.odiv : octa ( ), MMIX-ARITH x13.ominus : octa ( ),MMIX-ARITH x5.

omult : octa ( ),MMIX-ARITH x8.

op : register mmix opcode,x62.

over ow : bool,MMIX-ARITH x4.

rH =3, x55.round mode : int, x61.rR =6, x55.

SADD=#da, x54.

SADDI=#db, x54.

SFLOT=#0c, x54.

SFLOTI=#0d, x54.

SFLOTU=#0e, x54.

SFLOTUI=#0f, x54.

shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

sign bit =macro, x15.signed odiv : octa ( ),MMIX-ARITH x24.

signed omult : octa ( ),MMIX-ARITH x12.

SL=#38, x54.

SLI=#39, x54.

SLU=#3a, x54.

SLUI=#3b, x54.

SR=#3c, x54.

SRI=#3d, x54.

SRU=#3e, x54.

SRUI=#3f, x54.

store x : label, x84.TDIF=#

d4, x54.TDIFI=#

d5, x54.true =1, x9.U_BIT=macro, x57.V_BIT=macro, x57.W_BIT=macro, x57.WDIF=#

d2, x54.WDIFI=#

d3, x54.wyde di� : tetra ( ),MMIX-ARITH x28.

x: octa, x61.y: octa, x61.z: octa, x61.zero octa : octa,MMIX-ARITH x4.

Page 387: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 380

90. We have now done all of the arithmetic operations except for the cases thatcompare two registers and yield a value of �1 or 0 or 1.

#de�ne cmp zero store x =� x is 0 by default �=hCases for individual MMIX instructions 84 i +�case CMP: case CMPI: if ((y:h& sign bit ) > (z:h& sign bit )) goto cmp neg ;if ((y:h& sign bit ) < (z:h& sign bit )) goto cmp pos ;

case CMPU: case CMPUI: if (y:h < z:h) goto cmp neg ;if (y:h > z:h) goto cmp pos ;if (y:l < z:l) goto cmp neg ;if (y:l � z:l) goto cmp zero ;

cmp pos : x:l = 1; goto store x ;cmp neg : x = neg one ; goto store x ;case FCMPE: k = fepscomp(y; z; b; true );if (k) goto cmp zero or invalid ;

case FCMP: k = fcomp(y; z);if (k < 0) goto cmp neg ;

cmp �n : if (k � 1) goto cmp pos ;cmp zero or invalid : if (k � 2) exc j= I_BIT;goto cmp zero ;

case FUN: if (fcomp(y; z) � 2) goto cmp pos ; else goto cmp zero ;case FEQL: if (fcomp(y; z) � 0) goto cmp pos ; else goto cmp zero ;case FEQLE: k = fepscomp(y; z; b; false );goto cmp �n ;

case FUNE: if (fepscomp(y; z; b; true ) � 2) goto cmp pos ; else goto cmp zero ;

91. We have now done all the register-register operations except for the conditionalcommands. Conditional commands and branch commands all make use of a simplesubroutine that determines whether a given octabyte satis�es the condition of a givenopcode.

hSubroutines 12 i +�int register truth ARGS((octa;mmix opcode));int register truth (o; op )

octa o;mmix opcode op ;

f register int b;

switch ((op � 1) & #3) f

case 0: b = o:h� 31; break; =� negative? �=case 1: b = (o:h � 0 ^ o:l � 0); break; =� zero? �=case 2: b = (o:h < sign bit ^ (o:h _ o:l)); break; =� positive? �=case 3: b = o:l & #

1; break; =� odd? �=gif (op & #

8) return b� 1;else return b;

g

Page 388: MMIXware - A RISC Computer for the Third Millennium - Knuth

381 MMIX-SIM: SIMULATING THE INSTRUCTIONS

92. The b operand will be zero on the ZS operations; it will be the contents ofregister X on the CS operations.

hCases for individual MMIX instructions 84 i +�case CSN: case CSNI: case CSZ: case CSZI:case CSP: case CSPI: case CSOD: case CSODI:case CSNN: case CSNNI: case CSNZ: case CSNZI:case CSNP: case CSNPI: case CSEV: case CSEVI:case ZSN: case ZSNI: case ZSZ: case ZSZI:case ZSP: case ZSPI: case ZSOD: case ZSODI:case ZSNN: case ZSNNI: case ZSNZ: case ZSNZI:case ZSNP: case ZSNPI: case ZSEV: case ZSEVI:x = register truth (y; op ) ? z : b; goto store x ;

ARGS=macro ( ), x11.b: octa, x61.CMP=#

30, x54.CMPI=#

31, x54.CMPU=#

32, x54.CMPUI=#

33, x54.CSEV=#

6e, x54.CSEVI=#

6f, x54.CSN=#

60, x54.CSNI=#

61, x54.CSNN=#

68, x54.CSNNI=#

69, x54.CSNP=#

6c, x54.CSNPI=#

6d, x54.CSNZ=#

6a, x54.CSNZI=#

6b, x54.CSOD=#

66, x54.CSODI=#

67, x54.CSP=#

64, x54.CSPI=#

65, x54.CSZ=#

62, x54.CSZI=#

63, x54.

exc : int, x61.false =0, x9.FCMP=#

01, x54.FCMPE=#

11, x54.fcomp : int ( ), MMIX-ARITH x84.fepscomp : int ( ),MMIX-ARITH x50.

FEQL=#03, x54.

FEQLE=#13, x54.

FUN=#02, x54.

FUNE=#12, x54.

h: tetra, x10.I_BIT=macro, x57.k: register int, x62.l: tetra, x10.mmix opcode= enum, x54.neg one : octa, MMIX-ARITH x4.octa= struct, x10.op : register mmix opcode,

x62.sign bit =macro, x15.

store x : label, x84.true =1, x9.x: octa, x61.y: octa, x61.z: octa, x61.ZSEV=#

7e, x54.ZSEVI=#

7f, x54.ZSN=#

70, x54.ZSNI=#

71, x54.ZSNN=#

78, x54.ZSNNI=#

79, x54.ZSNP=#

7c, x54.ZSNPI=#

7d, x54.ZSNZ=#

7a, x54.ZSNZI=#

7b, x54.ZSOD=#

76, x54.ZSODI=#

77, x54.ZSP=#

74, x54.ZSPI=#

75, x54.ZSZ=#

72, x54.ZSZI=#

73, x54.

Page 389: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 382

93. Didn't that feel good, when 32 opcodes reduced to a single case? We get to doit one more time. Happiness!

hCases for individual MMIX instructions 84 i +�case BN: case BNB: case BZ: case BZB:case BP: case BPB: case BOD: case BODB:case BNN: case BNNB: case BNZ: case BNZB:case BNP: case BNPB: case BEV: case BEVB:case PBN: case PBNB: case PBZ: case PBZB:case PBP: case PBPB: case PBOD: case PBODB:case PBNN: case PBNNB: case PBNZ: case PBNZB:case PBNP: case PBNPB: case PBEV: case PBEVB:x:l = register truth (b; op);if (x:l) finst ptr = z;good = (op � PBN);

g else good = (op < PBN);if (good ) good guesses++;else bad guesses++; g[rC ]:l += 2; =� penalty is 2� for bad guess �=break;

94. Memory operations are next on our agenda. The memory address, y + z, hasalready been placed in w.

hCases for individual MMIX instructions 84 i +�case LDB: case LDBI: case LDBU: case LDBUI:

i = 56; j = (w:l & #3)� 3;

goto �n ld ;case LDW: case LDWI: case LDWU: case LDWUI:

i = 48; j = (w:l & #2)� 3;

goto �n ld ;case LDT: case LDTI: case LDTU: case LDTUI:

i = 32; j = 0; goto �n ld ;case LDHT: case LDHTI: i = j = 0;�n ld : ll = mem �nd (w); test load bkpt (ll );

x:h = ll~ tet ;x = shift right (shift left (x; j); i; op & #

2);check ld : if (w:h& sign bit ) goto privileged inst ;goto store x ;

case LDO: case LDOI: case LDOU: case LDOUI: case LDUNC: case LDUNCI: w:l &= �8;ll = mem �nd (w);test load bkpt (ll ); test load bkpt (ll + 1);x:h = ll~ tet ; x:l = (ll + 1)~ tet ;goto check ld ;

case LDSF: case LDSFI: ll = mem �nd (w); test load bkpt (ll );x = load sf (ll~ tet ); goto check ld ;

Page 390: MMIXware - A RISC Computer for the Third Millennium - Knuth

383 MMIX-SIM: SIMULATING THE INSTRUCTIONS

b: octa, x61.bad guesses : int, x139.BEV=#

4e, x54.BEVB=#

4f, x54.BN=#

40, x54.BNB=#

41, x54.BNN=#

48, x54.BNNB=#

49, x54.BNP=#

4c, x54.BNPB=#

4d, x54.BNZ=#

4a, x54.BNZB=#

4b, x54.BOD=#

46, x54.BODB=#

47, x54.BP=#

44, x54.BPB=#

45, x54.BZ=#

42, x54.BZB=#

43, x54.g: octa [ ], x76.good : bool, x61.good guesses : int, x139.h: tetra, x10.i: register int, x62.inst ptr : octa, x61.j: register int, x62.l: tetra, x10.LDB=#

80, x54.LDBI=#

81, x54.LDBU=#

82, x54.

LDBUI=#83, x54.

LDHT=#92, x54.

LDHTI=#93, x54.

LDO=#8c, x54.

LDOI=#8d, x54.

LDOU=#8e, x54.

LDOUI=#8f, x54.

LDSF=#90, x54.

LDSFI=#91, x54.

LDT=#88, x54.

LDTI=#89, x54.

LDTU=#8a, x54.

LDTUI=#8b, x54.

LDUNC=#96, x54.

LDUNCI=#97, x54.

LDW=#84, x54.

LDWI=#85, x54.

LDWU=#86, x54.

LDWUI=#87, x54.

ll : register mem tetra �,x62.

load sf : octa ( ),MMIX-ARITH x39.

mem �nd : mem tetra �( ),x20.

op : register mmix opcode,x62.

PBEV=#5e, x54.

PBEVB=#5f, x54.

PBN=#50, x54.

PBNB=#51, x54.

PBNN=#58, x54.

PBNNB=#59, x54.

PBNP=#5c, x54.

PBNPB=#5d, x54.

PBNZ=#5a, x54.

PBNZB=#5b, x54.

PBOD=#56, x54.

PBODB=#57, x54.

PBP=#54, x54.

PBPB=#55, x54.

PBZ=#52, x54.

PBZB=#53, x54.

privileged inst : label, x107.rC =8, x55.register truth : int ( ), x91.shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

sign bit =macro, x15.store x : label, x84.test load bkpt =macro ( ), x83.tet : tetra, x16.w: octa, x61.x: octa, x61.y: octa, x61.z: octa, x61.

Page 391: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 384

95. hCases for individual MMIX instructions 84 i +�case STB: case STBI: case STBU: case STBUI:

i = 56; j = (w:l & #3)� 3;

goto �n pst ;case STW: case STWI: case STWU: case STWUI:

i = 48; j = (w:l & #2)� 3;

goto �n pst ;case STT: case STTI: case STTU: case STTUI:

i = 32; j = 0;�n pst : ll = mem �nd (w);if ((op & #

2) � 0) fa = shift right (shift left (b; i); i; 0);if (a:h 6= b:h _ a:l 6= b:l) exc j= V_BIT;

gll~ tet �= (ll~ tet � (b:l� (i� 32� j))) & ((((tetra) �1)� (i� 32))� j);goto �n st ;

case STSF: case STSFI: ll = mem �nd (w);ll~ tet = store sf (b); exc = exceptions ;goto �n st ;

case STHT: case STHTI: ll = mem �nd (w); ll~ tet = b:h;�n st : test store bkpt (ll );

w:l &= �8; ll = mem �nd (w);a:h = ll~ tet ; a:l = (ll + 1)~ tet ; =� for trace output �=goto check st ;

case STCO: case STCOI: b:l = xx ;case STO: case STOI: case STOU: case STOUI: case STUNC: case STUNCI: w:l &= �8;ll = mem �nd (w);test store bkpt (ll ); test store bkpt (ll + 1);ll~ tet = b:h; (ll + 1)~ tet = b:l;

check st : if (w:h& sign bit ) goto privileged inst ;break;

96. The CSWAP operation has elements of both loading and storing. We shu�e someof the operands around so that they will appear correctly in the trace output.

hCases for individual MMIX instructions 84 i +�case CSWAP: case CSWAPI: w:l &= �8; ll = mem �nd (w);test load bkpt (ll ); test load bkpt (ll + 1);a = g[rP ];if (ll~ tet � a:h ^ (ll + 1)~ tet � a:l) f

x:h = 0; x:l = 1;test store bkpt (ll ); test store bkpt (ll + 1);ll~ tet = b:h; (ll + 1)~ tet = b:l;strcpy (rhs ; "M8[%#w]=%#b");

g else fb:h = ll~ tet ; b:l = (ll + 1)~ tet ;g[rP ] = b;strcpy (rhs ; "rP=%#b");

ggoto check ld ;

Page 392: MMIXware - A RISC Computer for the Third Millennium - Knuth

385 MMIX-SIM: SIMULATING THE INSTRUCTIONS

97. The GET command is permissive, but PUT is restrictive.

hCases for individual MMIX instructions 84 i +�case GET: if (yy 6= 0 _ zz � 32) goto illegal inst ;

x = g[zz ];goto store x ;

case PUT: case PUTI: if (yy 6= 0 _ xx � 32) goto illegal inst ;strcpy (rhs ; "%z = %#z");if (xx � 8) fif (xx � 11) goto illegal inst ; =� can't change rC, rN, rO, rS �=if (xx � 18) goto privileged inst ;if (xx � rA) hGet ready to update rA 100 ielse if (xx � rL) hSet L = z = min(z; L) 98 ielse if (xx � rG ) hGet ready to update rG 99 i;

gg[xx ] = z; zz = xx ; break;

98. h Set L = z = min(z; L) 98 i �f

x = z; strcpy (rhs ; z:h ? "min(rL,%#x) = %z" : "min(rL,%x) = %z");if (z:l > L _ z:h) z:h = 0; z:l = L;else old L = L = z:l;

gThis code is used in section 97.

a: octa, x61.b: octa, x61.check ld : label, x94.CSWAP=#

94, x54.CSWAPI=#

95, x54.exc : int, x61.exceptions : int,MMIX-ARITH x32.

g: octa [ ], x76.GET=#

fe, x54.h: tetra, x10.i: register int, x62.illegal inst : label, x107.j: register int, x62.l: tetra, x10.L: register int, x75.ll : register mem tetra �,

x62.mem �nd : mem tetra �( ),

x20.old L: int, x61.op : register mmix opcode,

x62.privileged inst : label, x107.PUT=#

f6, x54.

PUTI=#f7, x54.

rA=21, x55.rG =19, x55.rhs =macro, x139.rL=20, x55.rP =23, x55.shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

sign bit =macro, x15.STB=#

a0, x54.STBI=#

a1, x54.STBU=#

a2, x54.STBUI=#

a3, x54.STCO=#

b4, x54.STCOI=#

b5, x54.STHT=#

b2, x54.STHTI=#

b3, x54.STO=#

ac, x54.STOI=#

ad, x54.store sf : tetra ( ),MMIX-ARITH x40.

store x : label, x84.STOU=#

ae, x54.

STOUI=#af, x54.

strcpy : char �( ), <string.h>.STSF=#

b0, x54.STSFI=#

b1, x54.STT=#

a8, x54.STTI=#

a9, x54.STTU=#

aa, x54.STTUI=#

ab, x54.STUNC=#

b6, x54.STUNCI=#

b7, x54.STW=#

a4, x54.STWI=#

a5, x54.STWU=#

a6, x54.STWUI=#

a7, x54.test load bkpt =macro ( ), x83.test store bkpt =macro ( ), x82.tet : tetra, x16.tetra=unsigned int, x10.V_BIT=macro, x57.w: octa, x61.x: octa, x61.xx : register int, x62.yy : register int, x62.z: octa, x61.zz : register int, x62.

Page 393: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 386

99. hGet ready to update rG 99 i �fif (z:h 6= 0 _ z:l > 255 _ z:l < L _ z:l < 32) goto illegal inst ;for (j = z:l; j < G; j++) g[j] = zero octa ;G = z:l;

gThis code is used in section 97.

100. #de�ne ROUND_OFF 1#de�ne ROUND_UP 2#de�ne ROUND_DOWN 3#de�ne ROUND_NEAR 4

hGet ready to update rA 100 i �fif (z:h 6= 0 _ z:l � #

40000) goto illegal inst ;cur round = (z:l � #

10000 ? z:l� 16 : ROUND_NEAR);g

This code is used in section 97.

101. Pushing and popping are rather delicate, because we want to trace themcoherently.

hCases for individual MMIX instructions 84 i +�case PUSHGO: case PUSHGOI: inst ptr = w; goto push ;case PUSHJ: case PUSHJB: inst ptr = z;push : if (xx � L) f

if (((S �O) & lring mask ) � L ^ L 6= 0) stack store ( );xx = L++;

gx:l = xx ; l[(O + xx ) & lring mask ] = x; =� the \hole" records the amount pushed �=sprintf (lhs ; "l[%d]=%d, "; (O + xx ) & lring mask ; xx );x = g[rJ ] = incr (loc ; 4);L �= xx + 1; O += xx + 1;b = g[rO ] = incr (g[rO ]; (xx + 1)� 3);

sync L: a:l = g[rL]:l = L; break;case POP: if (xx 6= 0 ^ xx � L) y = l[(O + xx � 1) & lring mask ];if (g[rS ]:l � g[rO ]:l) stack load ( );k = l[(O � 1) & lring mask ]:l & #

ff;while ((tetra) (O � S) � (tetra) k) stack load ( );L = k + (xx � L ? xx : L+ 1);if (L > G) L = G;if (L > k) f

l[(O � 1) & lring mask ] = y;if (y:h) sprintf (lhs ; "l[%d]=#%x%08x, "; (O � 1) & lring mask ; y:h; y:l);else sprintf (lhs ; "l[%d]=#%x, "; (O � 1) & lring mask ; y:l);

g else lhs [0] = '\0';y = g[rJ ]; z:l = yz � 2; inst ptr = oplus (y; z);O �= k + 1; b = g[rO ] = incr (g[rO ];�((k + 1)� 3));goto sync L;

Page 394: MMIXware - A RISC Computer for the Third Millennium - Knuth

387 MMIX-SIM: SIMULATING THE INSTRUCTIONS

102. To complete our simulation of MMIX's register stack, we need to implementSAVE and UNSAVE.

hCases for individual MMIX instructions 84 i +�case SAVE: if (xx < G _ yy 6= 0 _ zz 6= 0) goto illegal inst ;if (((S �O) & lring mask ) � L ^ L 6= 0) stack store ( );l[(O + L) & lring mask ]:l = L;O += L+ 1; g[rO ] = incr (g[rO ]; (L+ 1)� 3);L = g[rL]:l = 0;while (g[rO ]:l 6= g[rS ]:l) stack store ( );for (k = G; ; ) fh Store g[k] in the register stack 103 i;if (k � 255) k = rB ;else if (k � rR) k = rP ;else if (k � rZ + 1) break;else k++;

gO = S; g[rO ] = g[rS ];x = incr (g[rO ];�8); goto store x ;

a: octa, x61.b: octa, x61.cur round : int,MMIX-ARITH x30.

g: octa [ ], x76.G: register int, x75.h: tetra, x10.illegal inst : label, x107.incr : octa ( ), MMIX-ARITH x6.inst ptr : octa, x61.j: register int, x62.k: register int, x62.L: register int, x75.l: tetra, x10.l: octa �, x76.lhs : char [ ], x139.loc : octa, x61.

lring mask : int, x76.O: register int, x75.oplus : octa ( ), MMIX-ARITH x5.POP=#

f8, x54.PUSHGO=#

be, x54.PUSHGOI=#

bf, x54.PUSHJ=#

f2, x54.PUSHJB=#

f3, x54.rB =0, x55.rJ =4, x55.rL=20, x55.rO =10, x55.rP =23, x55.rR =6, x55.rS =11, x55.rZ =27, x55.S: int, x76.

SAVE=#fa, x54.

sprintf : int ( ), <stdio.h>.stack load : void ( ), x83.stack store : void ( ), x82.store x : label, x84.tetra=unsigned int, x10.UNSAVE=#

fb, x54.w: octa, x61.x: octa, x61.xx : register int, x62.y: octa, x61.yy : register int, x62.yz : register int, x62.z: octa, x61.zero octa : octa,MMIX-ARITH x4.

zz : register int, x62.

Page 395: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: SIMULATING THE INSTRUCTIONS 388

103. This part of the program naturally has a lot in common with the stack store

subroutine. (There's a little white lie in the section name; if k is rZ + 1, we store rGand rA, not g[k].)

hStore g[k] in the register stack 103 i �ll = mem �nd (g[rS ]);if (k � rZ + 1) x:h = G� 24; x:l = g[rA]:l;else x = g[k];ll~ tet = x:h; test store bkpt (ll );(ll + 1)~ tet = x:l; test store bkpt (ll + 1);if (stack tracing ) ftracing = true ;if (cur line ) show line ( );if (k � 32) printf (" M8[#%08x%08x]=g[%d]=#%08x%08x, rS+=8\n";

g[rS ]:h; g[rS ]:l; k; x:h; x:l);else printf (" M8[#%08x%08x]=%s=#%08x%08x, rS+=8\n"; g[rS ]:h;

g[rS ]:l; k � rZ + 1 ? "(rG,rA)" : special name [k]; x:h; x:l);gS++; g[rS ] = incr (g[rS ]; 8);

This code is used in section 102.

104. hCases for individual MMIX instructions 84 i +�case UNSAVE: if (xx 6= 0 _ yy 6= 0) goto illegal inst ;

z:l &= �8; g[rS ] = incr (z; 8);for (k = rZ + 1; ; ) fhLoad g[k] from the register stack 105 i;if (k � rP ) k = rR ;else if (k � rB ) k = 255;else if (k � G) break;else k��;

gS = g[rS ]:l� 3;stack load ( );k = l[S & lring mask ]:l & #

ff;for (j = 0; j < k; j++) stack load ( );O = S; g[rO ] = g[rS ];L = k > G ? G : k;g[rL]:l = L; a = g[rL];g[rG ]:l = G; break;

105. hLoad g[k] from the register stack 105 i �g[rS ] = incr (g[rS ];�8);ll = mem �nd (g[rS ]);test load bkpt (ll ); test load bkpt (ll + 1);if (k � rZ + 1) x:l = G = g[rG ]:l = ll~ tet � 24; a:l = g[rA]:l = (ll + 1)~ tet &

#3ffff;

else g[k]:h = ll~ tet ; g[k]:l = (ll + 1)~ tet ;if (stack tracing ) ftracing = true ;if (cur line ) show line ( );if (k � 32) printf (" rS-=8, g[%d]=M8[#%08x%08x]=#%08x%08x\n"; k;

g[rS ]:h; g[rS ]:l; ll~ tet ; (ll + 1)~ tet );

Page 396: MMIXware - A RISC Computer for the Third Millennium - Knuth

389 MMIX-SIM: SIMULATING THE INSTRUCTIONS

else if (k � rZ + 1) printf (" (rG,rA)=M8[#%08x%08x]=#%08x%08x\n";g[rS ]:h; g[rS ]:l; ll~ tet ; (ll + 1)~ tet );

else printf (" rS-=8, %s=M8[#%08x%08x]=#%08x%08x\n";special name [k]; g[rS ]:h; g[rS ]:l; ll~ tet ; (ll + 1)~ tet );

gThis code is used in section 104.

106. The cache maintenance instructions don't a�ect this simulation, because thereare no caches. But if the user has invoked them, we do provide a bit of informationwhen tracing, indicating the scope of the instruction.

hCases for individual MMIX instructions 84 i +�case SYNCID: case SYNCIDI: case PREST: case PRESTI: case SYNCD: case SYNCDI:case PREGO: case PREGOI: case PRELD: case PRELDI: x = incr (w; xx ); break;

107. Several loose ends remain to be nailed down.

hCases for individual MMIX instructions 84 i +�case GO: case GOI: x = inst ptr ; inst ptr = w; goto store x ;case JMP: case JMPB: inst ptr = z; break;case SYNC: if (xx 6= 0 _ yy 6= 0 _ zz > 6) goto illegal inst ;if (zz � 3) break;

case LDVTS: case LDVTSI: privileged inst : strcpy (lhs ; "!privileged");goto break inst ;

illegal inst : strcpy (lhs ; "!illegal");break inst : breakpoint = tracing = true ;if (:interacting ^ :interact after break ) halted = true ;break;

a: octa, x61.breakpoint : bool, x61.cur line : int, x31.G: register int, x75.g: octa [ ], x76.GO=#

9e, x54.GOI=#

9f, x54.h: tetra, x10.halted : bool, x61.incr : octa ( ), MMIX-ARITH x6.inst ptr : octa, x61.interact after break : bool, x61.interacting : bool, x61.j: register int, x62.JMP=#

f0, x54.JMPB=#

f1, x54.k: register int, x62.l: tetra, x10.l: octa �, x76.L: register int, x75.LDVTS=#

98, x54.LDVTSI=#

99, x54.lhs : char [ ], x139.ll : register mem tetra �,

x62.lring mask : int, x76.mem �nd : mem tetra �( ),

x20.O: register int, x75.PREGO=#

9c, x54.PREGOI=#

9d, x54.PRELD=#

9a, x54.PRELDI=#

9b, x54.PREST=#

ba, x54.PRESTI=#

bb, x54.printf : int ( ), <stdio.h>.rA=21, x55.rB =0, x55.rG =19, x55.rL=20, x55.rO =10, x55.rP =23, x55.rR =6, x55.rS =11, x55.rZ =27, x55.S: int, x76.show line : void ( ), x47.

special name : char �[ ], x56.stack load : void ( ), x83.stack store : void ( ), x82.stack tracing : bool, x61.store x : label, x84.strcpy : char �( ), <string.h>.SYNC=#

fc, x54.SYNCD=#

b8, x54.SYNCDI=#

b9, x54.SYNCID=#

bc, x54.SYNCIDI=#

bd, x54.test load bkpt =macro ( ), x83.test store bkpt =macro ( ), x82.tet : tetra, x16.tracing : bool, x61.true =1, x9.UNSAVE=#

fb, x54.w: octa, x61.x: octa, x61.xx : register int, x62.yy : register int, x62.z: octa, x61.zz : register int, x62.

Page 397: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRIPS AND TRAPS 390

108. Trips and traps. We have now implemented 253 of the 256 instructions:all but TRIP, TRAP, and RESUME.The TRIP instruction simply turns H_BIT on in the exc variable; this will trigger

an interruption to location 0.The TRAP instruction is not simulated, except for the system calls mentioned in the

introduction.

hCases for individual MMIX instructions 84 i +�case TRIP: exc j= H_BIT; break;case TRAP: if (xx 6= 0 _ yy > max sys call ) goto privileged inst ;strcpy (rhs ; trap format [yy ]);g[rWW ] = inst ptr ;g[rXX ]:h = sign bit ; g[rXX ]:l = inst ;g[rYY ] = y; g[rZZ ] = z;z:h = 0; z:l = zz ;a = incr (b; 8);hPrepare memory arguments ma = M[a] and mb = M[b] if needed 111 i;switch (yy ) fcase Halt : hEither halt or print warning 109 i; g[rBB ] = g[255]; break;case Fopen : g[rBB ] = mmix fopen ((unsigned char) zz ;mb ;ma ); break;case Fclose : g[rBB ] = mmix fclose ((unsigned char) zz ); break;case Fread : g[rBB ] = mmix fread ((unsigned char) zz ;mb ;ma ); break;case Fgets : g[rBB ] = mmix fgets ((unsigned char) zz ;mb ;ma ); break;case Fgetws : g[rBB ] = mmix fgetws ((unsigned char) zz ;mb ;ma ); break;case Fwrite : g[rBB ] = mmix fwrite ((unsigned char) zz ;mb ;ma ); break;case Fputs : g[rBB ] = mmix fputs ((unsigned char) zz ; b); break;case Fputws : g[rBB ] = mmix fputws ((unsigned char) zz ; b); break;case Fseek : g[rBB ] = mmix fseek ((unsigned char) zz ; b); break;case Ftell : g[rBB ] = mmix ftell ((unsigned char) zz ); break;gx = g[255] = g[rBB ]; break;

109. hEither halt or print warning 109 i �if (:zz ) halted = breakpoint = true ;else if (zz � 1) fif (loc :h _ loc :l � #

90) goto privileged inst ;print trip warning (loc :l� 4; incr (g[rW ];�4));

g else goto privileged inst ;

This code is used in section 108.

110. hGlobal variables 19 i +�char arg count [ ] = f1; 3; 1; 3; 3; 3; 3; 2; 2; 2; 1g;char �trap format [ ] = f"Halt(%z)";

"$255 = Fopen(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x";"$255 = Fclose(%!z) = %x"; "$255 = Fread(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x";"$255 = Fgets(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x";"$255 = Fgetws(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x";"$255 = Fwrite(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x";"$255 = Fputs(%!z,%#b) = %x"; "$255 = Fputws(%!z,%#b) = %x";"$255 = Fseek(%!z,%b) = %x"; "$255 = Ftell(%!z) = %x"g;

Page 398: MMIXware - A RISC Computer for the Third Millennium - Knuth

391 MMIX-SIM: TRIPS AND TRAPS

111. hPrepare memory arguments ma = M[a] and mb = M[b] if needed 111 i �if (arg count [yy ] � 3) fll = mem �nd (b); test load bkpt (ll ); test load bkpt (ll + 1);mb :h = ll~ tet ;mb :l = (ll + 1)~ tet ;ll = mem �nd (a); test load bkpt (ll ); test load bkpt (ll + 1);ma :h = ll~ tet ;ma :l = (ll + 1)~ tet ;

gThis code is used in section 108.

112. The input/output operations invoked by TRAPs are done by subroutines in anauxiliary program module called MMIX-IO. Here we need only declare those subrou-tines, and write three primitive interfaces on which they depend.

113. hGlobal variables 19 i +�extern void mmix io init ARGS((void));extern octa mmix fopen ARGS((unsigned char;octa;octa));extern octa mmix fclose ARGS((unsigned char));extern octa mmix fread ARGS((unsigned char;octa;octa));extern octa mmix fgets ARGS((unsigned char;octa;octa));extern octa mmix fgetws ARGS((unsigned char;octa;octa));extern octa mmix fwrite ARGS((unsigned char;octa;octa));extern octa mmix fputs ARGS((unsigned char;octa));extern octa mmix fputws ARGS((unsigned char;octa));extern octa mmix fseek ARGS((unsigned char;octa));extern octa mmix ftell ARGS((unsigned char));extern void print trip warning ARGS((int;octa));extern void mmix fake stdin ARGS((FILE �));

a: octa, x61.ARGS=macro ( ), x11.b: octa, x61.breakpoint : bool, x61.exc : int, x61.Fclose =2, x59.Fgets =4, x59.Fgetws =5, x59.FILE, <stdio.h>.Fopen =1, x59.Fputs =7, x59.Fputws =8, x59.Fread =3, x59.Fseek =9, x59.Ftell =10, x59.Fwrite =6, x59.g: octa [ ], x76.h: tetra, x10.H_BIT=macro, x57.Halt =0, x59.halted : bool, x61.incr : octa ( ), MMIX-ARITH x6.inst : tetra, x61.inst ptr : octa, x61.l: tetra, x10.ll : register mem tetra �,

x62.

loc : octa, x61.ma : octa, x61.max sys call =macro, x59.mb : octa, x61.mem �nd : mem tetra �( ),

x20.mmix fake stdin : void ( ),MMIX-IO x10.

mmix fclose : octa ( ),MMIX-IO x11.

mmix fgets : octa ( ),MMIX-IO x14.

mmix fgetws : octa ( ),MMIX-IO x16.

mmix fopen : octa ( ),MMIX-IO x8.

mmix fputs : octa ( ),MMIX-IO x19.

mmix fputws : octa ( ),MMIX-IO x20.

mmix fread : octa ( ),MMIX-IO x12.

mmix fseek : octa ( ),MMIX-IO x21.

mmix ftell : octa ( ),MMIX-IO x22.

mmix fwrite : octa ( ),

MMIX-IO x18.mmix io init : void ( ),MMIX-IO x7.

octa= struct, x10.print trip warning : void ( ),MMIX-IO x23.

privileged inst : label, x107.rBB =7, x55.rhs =macro, x139.rW =24, x55.rWW =28, x55.rXX =29, x55.rYY =30, x55.rZZ =31, x55.sign bit =macro, x15.strcpy : char �( ), <string.h>.test load bkpt =macro ( ), x83.tet : tetra, x16.TRAP=#

00, x54.TRIP=#

ff, x54.true =1, x9.x: octa, x61.xx : register int, x62.y: octa, x61.yy : register int, x62.z: octa, x61.zz : register int, x62.

Page 399: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRIPS AND TRAPS 392

114. The subroutine mmgetchars (buf ; size ; addr ; stop) reads characters starting ataddress addr in the simulated memory and stores them in buf , continuing until sizecharacters have been read or some other stopping criterion has been met. If stop < 0there is no other criterion; if stop = 0 a null character will also terminate the process;otherwise addr is even, and two consecutive null bytes starting at an even address willterminate the process. The number of bytes read and stored, exclusive of terminatingnulls, is returned.

hSubroutines 12 i +�int mmgetchars ARGS((char �; int;octa; int));int mmgetchars (buf ; size ; addr ; stop )

char �buf ;int size ;octa addr ;int stop ;

fregister char �p;register int m;register mem tetra �ll ;register tetra x;octa a;

for (p = buf ;m = 0; a = addr ; m < size ; ) fll = mem �nd (a); test load bkpt (ll );x = ll~ tet ;if ((a:l & #

3) _m > size � 4) hRead and store one byte; return if done 115 ielse hRead and store up to four bytes; return if done 116 i

greturn size ;

g115. hRead and store one byte; return if done 115 i �f�p = (x� (8 � ((�a:l) & #

3))) & #ff;

if (:�p ^ stop � 0) fif (stop � 0) return m;if ((a:l & #

1) ^ �(p� 1) � '\0') return m� 1;gp++;m++; a = incr (a; 1);

gThis code is used in section 114.

116. hRead and store up to four bytes; return if done 116 i �f�p = x� 24;if (:�p ^ (stop � 0 _ (stop > 0 ^ x < #

10000))) return m;�(p+ 1) = (x� 16) & #

ff;if (:�(p+ 1) ^ stop � 0) return m+ 1;�(p+ 2) = (x� 8) & #

ff;if (:�(p+ 2) ^ (stop � 0 _ (stop > 0 ^ (x& #

ffff) � 0))) return m+ 2;�(p+ 3) = x& #

ff;

Page 400: MMIXware - A RISC Computer for the Third Millennium - Knuth

393 MMIX-SIM: TRIPS AND TRAPS

if (:�(p+ 3) ^ stop � 0) return m+ 3;p += 4;m += 4; a = incr (a; 4);

gThis code is used in section 114.

117. The subroutine mmputchars (buf ; size ; addr ) puts size characters into the sim-ulated memory starting at address addr .

hSubroutines 12 i +�int mmputchars ARGS((unsigned char �; int;octa));int mmputchars (buf ; size ; addr )

unsigned char �buf ;int size ;octa addr ;

fregister unsigned char �p;register int m;register mem tetra �ll ;register tetra x;octa a;

for (p = buf ;m = 0; a = addr ; m < size ; ) fll = mem �nd (a); test store bkpt (ll );if ((a:l & #

3) _m > size � 4) hLoad and write one byte 118 ielse hLoad and write four bytes 119 i;

gg

118. hLoad and write one byte 118 i �fregister int s = 8 � ((�a:l) & #

3);

ll~ tet �= (((ll~ tet � s)� �p) & #ff)� s;

p++;m++; a = incr (a; 1);g

This code is used in section 117.

119. hLoad and write four bytes 119 i �fll~ tet = (�p� 24) + (�(p+ 1)� 16) + (�(p+ 2)� 8) + �(p+ 3);p += 4;m += 4; a = incr (a; 4);

gThis code is used in section 117.

ARGS=macro ( ), x11.incr : octa ( ), MMIX-ARITH x6.l: tetra, x10.mem �nd : mem tetra �( ),

x20.mem tetra= struct, x16.octa= struct, x10.test load bkpt =macro ( ), x83.

test store bkpt =macro ( ), x82.tet : tetra, x16.tetra=unsigned int, x10.

Page 401: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRIPS AND TRAPS 394

120. When standard input is being read by the simulated program at the sametime as it is being used for interaction, we try to keep the two uses separate bymaintaining a private bu�er for the simulated program's StdIn. Online input isusually transmitted from the keyboard to a C program a line at a time; thereforean fgets operation works much better than fread when we prompt for new input.But there is a slight complication, because fgets might read a null character beforecoming to a newline character. We cannot deduce the number of characters read byfgets simply by looking at strlen (stdin buf ).

hSubroutines 12 i +�char stdin chr ARGS((void));char stdin chr ( )fregister char �p;while (stdin buf start � stdin buf end ) fif (interacting ) fprintf ("StdIn> "); �ush (stdout );

gfgets (stdin buf ; 256; stdin );stdin buf start = stdin buf ;for (p = stdin buf ; p < stdin buf + 254; p++)if (�p � '\n') break;

stdin buf end = p+ 1;greturn �stdin buf start ++;

g121. hGlobal variables 19 i +�char stdin buf [256]; =� standard input to the simulated program �=char �stdin buf start ; =� current position in that bu�er �=char �stdin buf end ; =� current end of that bu�er �=

122. Just after executing each instruction, we do the following. Under ow that isexact and not enabled is ignored. (This applies also to under ow that was triggeredby RESUME_SET.)

hCheck for trip interrupt 122 i �if ((exc & (U_BIT + X_BIT)) � U_BIT ^ :(g[rA]:l & U_BIT)) exc &= �U_BIT;if (exc) fif (exc & tracing exceptions ) tracing = true ;j = exc & (g[rA]:l j H_BIT); =� �nd all exceptions that have been enabled �=if (j) h Initiate a trip interrupt 123 i;g[rA]:l j= exc � 8;

gThis code is used in section 60.

123. h Initiate a trip interrupt 123 i �ftripping = true ;for (k = 0; :(j & H_BIT); j �= 1; k++) ;exc &= �(H_BIT � k); =� trips taken are not logged as events �=

Page 402: MMIXware - A RISC Computer for the Third Millennium - Knuth

395 MMIX-SIM: TRIPS AND TRAPS

g[rW ] = inst ptr ;inst ptr :h = 0; inst ptr :l = k � 4;g[rX ]:h = sign bit ; g[rX ]:l = inst ;if ((op & #

e0) � STB) g[rY ] = w; g[rZ ] = b;else g[rY ] = y; g[rZ ] = z;g[rB ] = g[255];g[255] = zero octa ;

gThis code is used in section 122.

124. We are �nally ready for the last case.

hCases for individual MMIX instructions 84 i +�case RESUME: if (xx _ yy _ zz ) goto illegal inst ;inst ptr = z = g[rW ];b = g[rX ];if (:(b:h& sign bit )) hPrepare to perform a ropcode 125 i;break;

ARGS=macro ( ), x11.b: octa, x61.exc : int, x61.�ush : int ( ), <stdio.h>.fgets : char �( ), <stdio.h>.fread : size t ( ), <stdio.h>.g: octa [ ], x76.h: tetra, x10.H_BIT=macro, x57.illegal inst : label, x107.inst : tetra, x61.inst ptr : octa, x61.interacting : bool, x61.j: register int, x62.k: register int, x62.l: tetra, x10.

op : register mmix opcode,x62.

printf : int ( ), <stdio.h>.rA=21, x55.rB =0, x55.RESUME=#

f9, x54.RESUME_SET=2, x125.rW =24, x55.rX =25, x55.rY =26, x55.rZ =27, x55.sign bit =macro, x15.STB=#

a0, x54.stdin : FILE �, <stdio.h>.stdout : FILE �, <stdio.h>.

strlen : int ( ), <string.h>.tracing : bool, x61.tracing exceptions : int, x61.tripping : bool, x61.true =1, x9.U_BIT=macro, x57.w: octa, x61.X_BIT=macro, x57.xx : register int, x62.y: octa, x61.yy : register int, x62.z: octa, x61.zero octa : octa,MMIX-ARITH x4.

zz : register int, x62.

Page 403: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRIPS AND TRAPS 396

125. Here we check to see if the ropcode restrictions hold. If so, the ropcode willactually be obeyed on the next fetch phase.

#de�ne RESUME_AGAIN 0 =� repeat the command in rX as if in location rW � 4 �=#de�ne RESUME_CONT 1 =� same, but substitute rY and rZ for operands �=#de�ne RESUME_SET 2 =� set r[X] to rZ �=hPrepare to perform a ropcode 125 i �frop = b:h� 24; =� the ropcode is the leading byte of rX �=switch (rop) fcase RESUME_CONT: if ((1� (b:l� 28)) & #

8f30) goto illegal inst ;case RESUME_SET: k = (b:l� 16) & #

ff;if (k � L ^ k < G) goto illegal inst ;

case RESUME_AGAIN: if ((b:l� 24) � RESUME) goto illegal inst ;break;

default: goto illegal inst ;gresuming = true ;

gThis code is used in section 124.

126. h Install special operands when resuming an interrupted operation 126 i �if (rop � RESUME_SET) fop = ORI;y = g[rZ ];z = zero octa ;exc = g[rX ]:h& #

ff00;f = X is dest bit ;

g else f =� RESUME_CONT �=y = g[rY ];z = g[rZ ];

gThis code is used in section 71.

127. We don't want to count the UNSAVE that bootstraps the whole process.

hUpdate the clocks 127 i �if (g[rU ]:l _ g[rU ]:h _ :resuming ) f

g[rC ]:h += info [op ]:mems ; =� clock goes up by 232 for each � �=g[rC ] = incr (g[rC ]; info [op ]:oops ); =� clock goes up by 1 for each � �=g[rU ] = incr (g[rU ]; 1); =� usage counter counts total instructions simulated �=g[rI ] = incr (g[rI ];�1); =� interval timer counts down by 1 only �=if (g[rI ]:l � 0 ^ g[rI ]:h � 0) tracing = breakpoint = true ;

gThis code is used in section 60.

Page 404: MMIXware - A RISC Computer for the Third Millennium - Knuth

397 MMIX-SIM: TRACING

128. Tracing. After an instruction has been executed, we often want to displayits e�ect. This part of the program prints out a symbolic interpretation of what hasjust happened.

hTrace the current instruction, if requested 128 i �if (tracing ) fif (showing source ^ cur line ) show line ( );hPrint the frequency count, the location, and the instruction 130 i;hPrint a stream-of-consciousness description of the instruction 131 i;if (showing stats _ breakpoint ) show stats (breakpoint );just traced = true ;

g else if (just traced ) fprintf (" ...............................................\n");just traced = false ;shown line = �gap � 1; =� gap will not be �lled �=

gThis code is used in section 60.

129. hGlobal variables 19 i +�bool showing stats ; =� should traced instructions also show the statistics? �=bool just traced ; =� was the previous instruction traced? �=

b: octa, x61.bool= enum, x9.breakpoint : bool, x61.cur line : int, x31.exc : int, x61.f : register tetra, x62.false =0, x9.G: register int, x75.g: octa [ ], x76.gap : int, x48.h: tetra, x10.illegal inst : label, x107.incr : octa ( ), MMIX-ARITH x6.info : op info [ ], x65.k: register int, x62.

L: register int, x75.l: tetra, x10.mems : unsigned char, x64.oops : unsigned char, x64.op : register mmix opcode,

x62.ORI=#

c1, x54.printf : int ( ), <stdio.h>.rC =8, x55.RESUME=#

f9, x54.resuming : bool, x61.rI =12, x55.rop : int, x61.rU =17, x55.rX =25, x55.

rY =26, x55.rZ =27, x55.show line : void ( ), x47.show stats : void ( ), x140.showing source : bool, x48.shown line : int, x48.tracing : bool, x61.true =1, x9.UNSAVE=#

fb, x54.X is dest bit =#

20, x65.y: octa, x61.z: octa, x61.zero octa : octa,MMIX-ARITH x4.

Page 405: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRACING 398

130. hPrint the frequency count, the location, and the instruction 130 i �if (resuming ^ op 6= RESUME) fswitch (rop) fcase RESUME_AGAIN:printf (" (%08x%08x: %08x (%s)) "; loc :h; loc :l; inst ; info [op ]:name );break;

case RESUME_CONT: printf (" (%08x%08x: %04xrYrZ (%s)) "; loc :h; loc :l;inst � 16; info [op ]:name ); break;

case RESUME_SET: printf (" (%08x%08x: ..%02x..rZ (SET)) "; loc :h;loc :l; (inst � 16) & #

ff); break;g

g else fll = mem �nd (loc);printf ("%10d. %08x%08x: %08x (%s) "; ll~ freq ; loc :h; loc :l; inst ; info [op ]:name );

gThis code is used in section 128.

131. This part of the simulator was inspired by ideas of E. H. Satterthwaite,Software|Practice and Experience 2 (1972), 197{217. Online debugging tools haveimproved signi�cantly since Satterthwaite published his work, but good o�ine toolsare still valuable; alas, today's algebraic programming languages do not providetracing facilities that come anywhere close to the level of quality that Satterthwaitewas able to demonstrate for ALGOL in 1970.

hPrint a stream-of-consciousness description of the instruction 131 i �if (lhs [0] � '!') printf ("%s instruction!\n"; lhs + 1); =� privileged or illegal �=else fhPrint changes to rL 132 i;if (z:l � 0 ^ (op � ADDUI _ op � ORI)) p = "%l = %y = %#x"; =� LDA, SET �=else p = info [op ]:trace format ;for ( ; �p; p++) h Interpret character �p in the trace format 133 i;if (exc) printf (", rA=#%05x"; g[rA]:l);if (tripping ) tripping = false ; printf (", -> #%02x"; inst ptr :l);printf ("\n");

gThis code is used in section 128.

132. Push, pop, and UNSAVE instructions display changes to rL and rO explicitly;otherwise the change is implicit, if L 6= old L.

hPrint changes to rL 132 i �if (L 6= old L ^ :(f & push pop bit )) printf ("rL=%d, "; L);

This code is used in section 131.

Page 406: MMIXware - A RISC Computer for the Third Millennium - Knuth

399 MMIX-SIM: TRACING

ADDUI=#23, x54.

exc : int, x61.f : register tetra, x62.false =0, x9.freq : tetra, x16.g: octa [ ], x76.h: tetra, x10.info : op info [ ], x65.inst : tetra, x61.inst ptr : octa, x61.l: tetra, x10.L: register int, x75.

lhs : char [ ], x139.ll : register mem tetra �,

x62.loc : octa, x61.mem �nd : mem tetra �( ),

x20.name : char �, x64.old L: int, x61.op : register mmix opcode,

x62.ORI=#

c1, x54.p: register char �, x62.

printf : int ( ), <stdio.h>.push pop bit =#

80, x65.rA=21, x55.RESUME=#

f9, x54.RESUME_AGAIN=0, x125.RESUME_CONT=1, x125.RESUME_SET=2, x125.resuming : bool, x61.rop : int, x61.trace format : char �, x64.tripping : bool, x61.z: octa, x61.

Page 407: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRACING 400

133. Each MMIX instruction has a trace format string, which de�nes its symbolicrepresentation. For example, the string for ADD is "%l = %y + %z = %x"; if theinstruction is, say, ADD $1,$2,$3 with $2 = 5 and $3 = 8, and if the stack o�set is100, the trace output will be "\$1=l[101] = 5 + 8 = 13".Percent signs (%) induce special format conventions, as follows:

� %a, %b, %p, %q, %w, %x, %y, and %z stand for the numeric contents of octabytes a,b, ma , mb , w, x, y, and z, respectively; a \style" character may follow the percentsign in this case, as explained below.

� %( and %) are brackets that indicate the mode of oating point rounding. Ifround mode = ROUND_NEAR, ROUND_OFF, ROUND_UP, ROUND_DOWN, the correspondingbrackets are ( and ), [ and ], ^ and^, _ and _. Such brackets are placed around a oating point operator; for example, oating point addition is denoted by `[+]' whenthe current rounding mode is rounding-o�.

� %l stands for the string lhs , which usually represents the \left hand side" of theinstruction just performed, formatted as a register number and its equivalent in thering of local registers (e.g., `$1=l[101]') or as a register number and its equivalentin the array of global registers (e.g., `$255=g[255]'). The POP instruction uses lhs toindicate how the \hole" in the register stack was plugged.

� %r means to switch to string rhs and continue formatting from there. This mecha-nism allows us to use variable formats for opcodes like TRAP that have several variants.

� %t means to print either `Yes, ->loc' (where loc is the location of the nextinstruction) or `No', depending on the value of x.

� %g means to print ` (bad guess)' if good is false .

� %s stands for the name of special register g[zz ].

� %? stands for omission of the following operator if z = 0. For example, the memoryaddress of LDBI is described by `%#y%?+'; this means to treat the address as simply`%#y' if z = 0, otherwise as `%#y+%z'. This case is used only when z is a relativelysmall number (z:h = 0).

h Interpret character �p in the trace format 133 i �fif (�p 6= '%') fputc(�p; stdout );else fstyle = decimal ;

char switch :switch (�++p) fhCases for formatting characters 134 i;default: printf ("BUG!!"); =� can't happen �=g

gg

This code is used in section 131.

Page 408: MMIXware - A RISC Computer for the Third Millennium - Knuth

401 MMIX-SIM: TRACING

134. Octabytes are printed as decimal numbers unless a \style" character intervenesbetween the percent sign and the name of the octabyte: `#' denotes hexadecimalnotation, pre�xed by #; `0' denotes hexadecimal notation with no pre�xed # and withleading zeros not suppressed; `.' denotes oating decimal notation; and `!' means touse the names StdIn, StdOut, or StdErr if the value is 0, 1, or 2.

hCases for formatting characters 134 i �case '#': style = hex ; goto char switch ;case '0': style = zhex ; goto char switch ;case '.': style = oating ; goto char switch ;case '!': style = handle ; goto char switch ;

See also sections 136 and 138.

This code is used in section 133.

135. hType declarations 9 i +�typedef enum fdecimal ; hex ; zhex ; oating ; handle

g fmt style;

136. hCases for formatting characters 134 i +�case 'a': trace print (a); break;case 'b': trace print (b); break;case 'p': trace print (ma ); break;case 'q': trace print (mb); break;case 'w': trace print (w); break;case 'x': trace print (x); break;case 'y': trace print (y); break;case 'z': trace print (z); break;

a: octa, x61.b: octa, x61.false =0, x9.fputc : int ( ), <stdio.h>.g: octa [ ], x76.good : bool, x61.h: tetra, x10.lhs : char [ ], x139.ma : octa, x61.

mb : octa, x61.p: register char �, x62.printf : int ( ), <stdio.h>.rhs =macro, x139.ROUND_DOWN=3, x100.round mode : int, x61.ROUND_NEAR=4, x100.ROUND_OFF=1, x100.ROUND_UP=2, x100.

stdout : FILE �, <stdio.h>.style : fmt style, x137.trace print : void ( ), x137.w: octa, x61.x: octa, x61.y: octa, x61.z: octa, x61.zz : register int, x62.

Page 409: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: TRACING 402

137. h Subroutines 12 i +�fmt style style ;char �stream name [ ] = f"StdIn"; "StdOut"; "StdErr"g;void trace print ARGS((octa));void trace print (o)

octa o;fswitch (style ) fcase decimal : print int (o); return;case hex : fputc('#'; stdout ); print hex (o); return;case zhex : printf ("%08x%08x"; o:h; o:l); return;case oating : print oat (o); return;case handle : if (o:h � 0 ^ o:l < 3) printf (stream name [o:l]);else print int (o); return;

gg

138. hCases for formatting characters 134 i +�case '(': fputc(left paren [round mode ]; stdout ); break;case ')': fputc(right paren [round mode ]; stdout ); break;case 't': if (x:l) printf (" Yes, -> #"); print hex (inst ptr );else printf (" No"); break;

case 'g': if (:good ) printf (" (bad guess)"); break;case 's': printf (special name [zz ]); break;case '?': p++; if (z:l) printf ("%c%d"; �p; z:l); break;case 'l': printf (lhs ); break;case 'r': p = switchable string ; break;

139. #de�ne rhs &switchable string [1]

hGlobal variables 19 i +�char left paren [ ] = f0; '['; '^'; '_'; '('g; =� denotes the rounding mode �=char right paren [ ] = f0; ']'; '^'; '_'; ')'g; =� denotes the rounding mode �=char switchable string [48]; =� holds rhs ; position 0 is ignored �==� switchable string must be able to hold any trap format �=

char lhs [32];int good guesses ; bad guesses ; =� branch prediction statistics �=

Page 410: MMIXware - A RISC Computer for the Third Millennium - Knuth

403 MMIX-SIM: TRACING

140. h Subroutines 12 i +�void show stats ARGS((bool));void show stats (verbose )

bool verbose ;focta o;

printf (" %d instruction%s, %d mem%s, %d oop%s; %d good guess%s, %d bad\n";g[rU ]:l; g[rU ]:l � 1 ? "" : "s";g[rC ]:h; g[rC ]:h � 1 ? "" : "s";g[rC ]:l; g[rC ]:l � 1 ? "" : "s";good guesses ; good guesses � 1 ? "" : "es"; bad guesses );

if (:verbose ) return;o = halted ? incr (inst ptr ;�4) : inst ptr ;printf (" (%s at location #%08x%08x)\n"; halted ? "halted" : "now"; o:h; o:l);

g

ARGS=macro ( ), x11.bool= enum, x9.decimal =0, x135. oating =3, x135.fmt style= enum, x135.fputc : int ( ), <stdio.h>.g: octa [ ], x76.good : bool, x61.h: tetra, x10.halted : bool, x61.handle =4, x135.

hex =1, x135.incr : octa ( ), MMIX-ARITH x6.inst ptr : octa, x61.l: tetra, x10.octa= struct, x10.p: register char �, x62.print oat : void ( ),MMIX-ARITH x54.

print hex : void ( ), x12.print int : void ( ), x15.printf : int ( ), <stdio.h>.

rC =8, x55.round mode : int, x61.rU =17, x55.special name : char �[ ], x56.stdout : FILE �, <stdio.h>.trap format : char �[ ], x110.x: octa, x61.z: octa, x61.zhex =2, x135.zz : register int, x62.

Page 411: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 404

141. Running the program. Now we are ready to �t the pieces together intoa working simulator.

#include <stdio.h>

#include <stdlib.h>

#include <ctype.h>

#include <string.h>

#include <signal.h>

#include "abstime.h"

hPreprocessor macros 11 ihType declarations 9 ihGlobal variables 19 ih Subroutines 12 imain (argc ; argv )

int argc ;char �argv [ ];

fhLocal registers 62 i;mmix io init ( );hProcess the command line 142 i;h Initialize everything 14 i;hLoad the command line arguments 163 i;hGet ready to UNSAVE the initial context 164 i;while (1) fif (interrupt ^ :breakpoint ) breakpoint = interacting = true ; interrupt = false ;else fbreakpoint = false ;if (interacting ) h Interact with the user 149 i;

gif (halted ) break;do hPerform one instruction 60 i while ((:interrupt ^ :breakpoint ) _ resuming );if (interact after break ) interacting = true ; interact after break = false ;

gend simulation : if (pro�ling ) hPrint all the frequency counts 53 i;if (interacting _ pro�ling _ showing stats ) show stats (true );

g142. Here we process the command-line options; when we �nish, �cur arg shouldbe the name of the object �le to be loaded and simulated.

#de�ne mmo �le name �cur arghProcess the command line 142 i �myself = argv [0];for (cur arg = argv + 1; �cur arg ^ (�cur arg )[0] � '-'; cur arg++)scan option (�cur arg + 1; true );

if (:�cur arg ) scan option ("?"; true ); =� exit with usage note �=argc �= cur arg � argv ; =� this is the argc of the user program �=

This code is used in section 141.

Page 412: MMIXware - A RISC Computer for the Third Millennium - Knuth

405 MMIX-SIM: RUNNING THE PROGRAM

breakpoint : bool, x61.cur arg : char ��, x144.false =0, x9.halted : bool, x61.interact after break : bool, x61.interacting : bool, x61.

interrupt : bool, x144.mmix io init : void ( ),MMIX-IO x7.

myself : char �, x144.pro�ling : bool, x144.

resuming : bool, x61.scan option : void ( ), x143.show stats : void ( ), x140.showing stats : bool, x129.true =1, x9.

Page 413: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 406

143. Careful readers of the following subroutine will notice a little white bug: Atracing speci�cation like t1000000000 or even t0000000000 or even t!!!!!!!!!! issilently converted to t4294967295.The -b and -c options are e�ective only on the command line, but they are harmless

while interacting.

hSubroutines 12 i +�void scan option ARGS((char �;bool));void scan option (arg ; usage )

char �arg ; =� command-line argument (without the `-') �=bool usage ; =� should we exit with usage note if unrecognized? �=

fregister int k;

switch (�arg ) fcase 't': if (strlen (arg ) > 10) trace threshold = #

ffffffff;else if (sscanf (arg + 1; "%d";&trace threshold ) 6= 1) trace threshold = 0;return;

case 'x': if (:�(arg + 1)) tracing exceptions = #ff;

else if (sscanf (arg + 1; "%x";&tracing exceptions ) 6= 1) tracing exceptions = 0;return;

case 'r': stack tracing = true ; return;case 's': showing stats = true ; return;case 'l': if (:�(arg + 1)) gap = 3;else if (sscanf (arg + 1; "%d";&gap) 6= 1) gap = 0;showing source = true ; return;

case 'L': if (:�(arg + 1)) pro�le gap = 3;else if (sscanf (arg + 1; "%d";&pro�le gap) 6= 1) pro�le gap = 0;pro�le showing source = true ;

case 'P': pro�ling = true ; return;case 'v': trace threshold = #

ffffffff; tracing exceptions = #ff;

stack tracing = true ; showing stats = true ;gap = 10; showing source = true ;pro�le gap = 10; pro�le showing source = true ; pro�ling = true ;return;

case 'q': trace threshold = tracing exceptions = 0;stack tracing = showing stats = showing source = false ;pro�ling = pro�le showing source = false ;return;

case 'i': interacting = true ; return;case 'I': interact after break = true ; return;case 'b': if (sscanf (arg + 1; "%d";&buf size ) 6= 1) buf size = 0; return;case 'c': if (sscanf (arg + 1; "%d";&lring size ) 6= 1) lring size = 0; return;case 'f': hOpen a �le for simulated standard input 145 i; return;case 'D': hOpen a �le for dumping binary output 146 i; return;default: if (usage ) f

fprintf (stderr ; "Usage: %s <options> progfile command-line-args...\n";myself );

for (k = 0; usage help [k][0]; k++) fprintf (stderr ; usage help [k]);exit (�1);

Page 414: MMIXware - A RISC Computer for the Third Millennium - Knuth

407 MMIX-SIM: RUNNING THE PROGRAM

g else for (k = 0; usage help [k][1] 6= 'b'; k++) printf (usage help [k]);return;

gg

ARGS=macro ( ), x11.bool= enum, x9.buf size : int, x40.exit : void ( ), <stdlib.h>.false =0, x9.fprintf : int ( ), <stdio.h>.gap : int, x48.interact after break : bool, x61.interacting : bool, x61.

lring size : int, x76.myself : char �, x144.printf : int ( ), <stdio.h>.pro�le gap : int, x48.pro�le showing source : bool,

x48.pro�ling : bool, x144.showing source : bool, x48.showing stats : bool, x129.

sscanf : int ( ), <stdio.h>.stack tracing : bool, x61.stderr : FILE �, <stdio.h>.strlen : int ( ), <string.h>.trace threshold : tetra, x61.tracing exceptions : int, x61.true =1, x9.usage help : char �[ ], x144.

Page 415: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 408

144. hGlobal variables 19 i +�char �myself ; =� argv [0], the name of this simulator �=char ��cur arg ; =� pointer to current place in the argument vector �=bool interrupt ; =� has the user interrupted the simulation recently? �=bool pro�ling ; =� should we print the pro�le at the end? �=FILE �fake stdin ; =� �le substituted for the simulated StdIn �=FILE �dump �le ; =� �le used for binary dumps �=char �usage help [ ] = f" with these options: (<n>=decimal number, <x>=hex number)\n";

"-t<n> trace each instruction the first n times\n";"-e<x> trace each instruction with an exception matching x\n";"-r trace hidden details of the register stack\n";"-l<n> list source lines when tracing, filling gaps <= n\n";"-s show statistics after each traced instruction\n";"-P print a profile when simulation ends\n";"-L<n> list source lines with the profile\n";"-v be verbose: show almost everything\n";"-q be quiet: show only the simulated standard output\n";"-i run interactively (prompt for online commands)\n";"-I interact, but only after the program halts\n";"-b<n> change the buffer size for source lines\n";"-c<n> change the cyclic local register ring size\n";"-f<filename> use given file to simulate standard input\n";"-D<filename> dump a file for use by other simulators\n";""g;

char �interactive help [ ] = f"The interactive commands are:\n";

"<return> trace one instruction\n";"n trace one instruction\n";"c continue until halt or breakpoint\n";"q quit the simulation\n";"s show current statistics\n";"l<n><t> set and/or show local register in format t\n";"g<n><t> set and/or show global register in format t\n";"$<n><t> set and/or show dynamic register in format t\n";"M<x><t> set and/or show memory octabyte in format t\n";"+<n><t> set and/or show n additional octabytes in format t\n";" <t> is ! (decimal) or . (floating) or # (hex) or \" (string)\n";" or <empty> (previous <t>) or =<value> (change value)\n";"@<x> go to location x\n";"b[rwx]<x> set or reset breakpoint at location x\n";"t<x> trace location x\n";"u<x> untrace location x\n";"T set current segment to Text_Segment\n";"D set current segment to Data_Segment\n";"P set current segment to Pool_Segment\n";"S set current segment to Stack_Segment\n";"B show all current breakpoints and tracepoints\n";"i<file> insert commands from file\n";

Page 416: MMIXware - A RISC Computer for the Third Millennium - Knuth

409 MMIX-SIM: RUNNING THE PROGRAM

"-<option> change a tracing/listing/profile option\n";"-? show the tracing/listing/profile options \n";""g;

145. hOpen a �le for simulated standard input 145 i �fclose (fake stdin );fake stdin = fopen (arg + 1; "r");if (:fake stdin ) fprintf (stderr ; "Sorry, I can't open file %s!\n"; arg + 1);else mmix fake stdin (fake stdin );

This code is used in section 143.

146. hOpen a �le for dumping binary output 146 i �dump �le = fopen (arg + 1; "wb");if (:dump �le ) fprintf (stderr ; "Sorry, I can't open file %s!\n"; arg + 1);

This code is used in section 143.

147. h Initialize everything 14 i +�signal (SIGINT; catchint ); =� now catchint will catch the �rst interrupt �=

148. h Subroutines 12 i +�void catchint ARGS((int));void catchint (n)

int n;finterrupt = true ;signal (SIGINT; catchint ); =� now catchint will catch the next interrupt �=

g

arg : char �, x143.ARGS=macro ( ), x11.argv : char �[ ], x141.bool= enum, x9.fclose : int ( ), <stdio.h>.

FILE, <stdio.h>.fopen : FILE �( ), <stdio.h>.fprintf : int ( ), <stdio.h>.mmix fake stdin : void ( ),MMIX-IO x10.

SIGINT=macro, <signal.h>.signal : void (�( ))( ),<signal.h>.

stderr : FILE �, <stdio.h>.true =1, x9.

Page 417: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 410

149. h Interact with the user 149 i �f register int repeating ;

interact : hPut a new command in command buf 150 i;p = command buf ;repeating = 0;switch (�p) fcase '\n': case 'n': breakpoint = tracing = true ; =� trace one inst and break �=case 'c': goto resume simulation ; =� continue until breakpoint �=case 'q': goto end simulation ;case 's': show stats (true ); goto interact ;case '-': k = strlen (p); if (p[k � 1] � '\n') p[k � 1] = '\0';scan option (p+ 1; false ); goto interact ;

hCases that change cur disp mode 152 i;hCases that de�ne cur disp type 153 i;hCases that set and clear tracing and breakpoints 161 i;default: what say : k = strlen (command buf );if (k < 10 ^ command buf [k � 1] � '\n') command buf [k � 1] = '\0';else strcpy (command buf + 9; "...");printf ("Eh? Sorry, I don't understand `%s'. (Type h for help)\n";

command buf );goto interact ;

case 'h': for (k = 0; interactive help [k][0]; k++) printf (interactive help [k]);goto interact ;

gcheck syntax : if (�p 6= '\n') f

if (:�p)incomplete str : printf ("Syntax error: Incomplete command!\n");else f

p[strlen (p)� 1] = '\0';printf ("Syntax error; I'm ignoring `%s'!\n"; p);

ggwhile (repeating ) hDisplay and/or set the value of the current octabyte 156 i;goto interact ;

resume simulation : ;g

This code is used in section 141.

150. hPut a new command in command buf 150 i �f register bool ready = false ;

incl read : while (incl �le ^ :ready )if (:fgets (command buf ; command buf size ; incl �le )) ffclose (incl �le );incl �le = �;

g else if (command buf [0] 6= '\n'^command buf [0] 6= 'i'^command buf [0] 6= '%')if (command buf [0] � ' ') printf (command buf );else ready = true ;

while (:ready ) fprintf ("mmix> "); �ush (stdout );

Page 418: MMIXware - A RISC Computer for the Third Millennium - Knuth

411 MMIX-SIM: RUNNING THE PROGRAM

if (:fgets (command buf ; command buf size ; stdin )) command buf [0] = 'q';if (command buf [0] 6= 'i') ready = true ;else fcommand buf [strlen (command buf )� 1] = '\0';incl �le = fopen (command buf + 1; "r");if (incl �le ) goto incl read ;if (isspace (command buf [1])) incl �le = fopen (command buf + 2; "r");if (incl �le ) goto incl read ;printf ("Can't open file `%s'!\n"; command buf + 1);

gg

gThis code is used in section 149.

151. #de�ne command buf size 1024=� make it plenty long, for oating point tests �=

hGlobal variables 19 i +�char command buf [command buf size ];FILE �incl �le ; =� �le of commands included by `i' �=char cur disp mode = 'l'; =� 'l' or 'g' or '$' or 'M' �=char cur disp type = '!'; =� '!' or '.' or '#' or '"' �=bool cur disp set ; =� was the last <t> of the form =<val>? �=octa cur disp addr ; =� the h half is relevant only in mode 'M' �=octa cur seg ; =� current segment o�set �=char spec reg code [ ] = frA; rB ; rC ; rD ; rE ; rF ; rG ; rH ; rI ; rJ ; rK ; rL; rM ; rN ; rO ; rP ;

rQ ; rR ; rS ; rT ; rU ; rV ; rW ; rX ; rY ; rZ g;char spec regg code [ ] = f0; rBB ; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; rTT ; 0; 0; rWW ;

rXX ; rYY ; rZZ g;

bool= enum, x9.breakpoint : bool, x61.end simulation : label, x141.false =0, x9.fclose : int ( ), <stdio.h>.�ush : int ( ), <stdio.h>.fgets : char �( ), <stdio.h>.FILE, <stdio.h>.fopen : FILE �( ), <stdio.h>.h: tetra, x10.interactive help : char �[ ],

x144.isspace : int ( ), <ctype.h>.k: register int, x62.octa= struct, x10.p: register char �, x62.printf : int ( ), <stdio.h>.rA=21, x55.rB =0, x55.

rBB =7, x55.rC =8, x55.rD =1, x55.rE =2, x55.rF =22, x55.rG =19, x55.rH =3, x55.rI =12, x55.rJ =4, x55.rK =15, x55.rL=20, x55.rM =5, x55.rN =9, x55.rO =10, x55.rP =23, x55.rQ =16, x55.rR =6, x55.rS =11, x55.rT =13, x55.

rTT =14, x55.rU =17, x55.rV =18, x55.rW =24, x55.rWW =28, x55.rX =25, x55.rXX =29, x55.rY =26, x55.rYY =30, x55.rZ =27, x55.rZZ =31, x55.scan option : void ( ), x143.show stats : void ( ), x140.stdin : FILE �, <stdio.h>.stdout : FILE �, <stdio.h>.strcpy : char �( ), <string.h>.strlen : int ( ), <string.h>.tracing : bool, x61.true =1, x9.

Page 419: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 412

152. hCases that change cur disp mode 152 i �case 'l': case 'g': case '$': cur disp mode = �p++;for (cur disp addr :l = 0; isdigit (�p); p++)cur disp addr :l = 10 � cur disp addr :l + �p� '0';

goto new mode ;case 'r': p++; cur disp mode = 'g';if (�p < 'A' _ �p > 'Z') goto what say ;if (�(p+ 1) 6= �p) cur disp addr :l = spec reg code [�p� 'A']; p++;else if (spec regg code [�p� 'A']) cur disp addr :l = spec regg code [�p� 'A']; p += 2;else goto what say ;goto new mode ;

case 'M': cur disp mode = �p;cur disp addr = scan hex (p+ 1; cur seg ); cur disp addr :l &= �8; p = next char ;

new mode : cur disp set = false ; =� the `=' is remembered only by `+' �=repeating = 1;goto scan type ;

case '+': if (:isdigit (�(p+ 1))) repeating = 1;for (p++; isdigit (�p); p++) repeating = 10 � repeating + �p� '0';if (repeating ) fif (cur disp mode � 'M') cur disp addr = incr (cur disp addr ; 8);else cur disp addr :l++;

ggoto scan type ;

This code is used in section 149.

153. hCases that de�ne cur disp type 153 i �case '!': case '.': case '#': case '"': cur disp set = false ;repeating = 1;

set type : cur disp type = �p++; break;scan type : if (�p � '!' _ �p � '.' _ �p � '#' _ �p � '"') goto set type ;if (�p 6= '=') break;goto scan eql ;

case '=': repeating = 1;scan eql : cur disp set = true ;val = zero octa ;if (�++p � '#') cur disp type = �p; val = scan hex (p+ 1; zero octa );else if (�p � '"' _ �p � '\'') goto scan string ;else cur disp type = (scan const (p) > 0 ? '.' : '!');p = next char ;if (�p 6= ',') break;val :h = 0; val :l &= #

ff;scan string : cur disp type = '"';h Scan a string constant 155 i; break;

This code is used in section 149.

154. h Subroutines 12 i +�octa scan hex ARGS((char �;octa));octa scan hex (s; o�set )

char �s;octa o�set ;

Page 420: MMIXware - A RISC Computer for the Third Millennium - Knuth

413 MMIX-SIM: RUNNING THE PROGRAM

fregister char �p;octa o;

o = zero octa ;for (p = s; isxdigit (�p); p++) f

o = incr (shift left (o; 4); �p� '0');if (�p � 'a') o = incr (o; '0' � 'a' + 10);else if (�p � 'A') o = incr (o; '0' � 'A' + 10);

gnext char = p;return oplus (o; o�set );

g155. h Scan a string constant 155 i �while (�p � ',') fif (�++p � '#') faux = scan hex (p+ 1; zero octa ); p = next char ;val = incr (shift left (val ; 8); aux :l & #

ff);g else if (isdigit (�p)) ffor (k = �p++ � '0'; isdigit (�p); p++) k = (10 � k + �p� '0') & #

ff;val = incr (shift left (val ; 8); k);

gelse if (�p � '\n') goto incomplete str ;

gif (�p � '\'' ^ �(p+ 2) � �p) �p = �(p+ 2) = '"';if (�p � '"') ffor (p++; �p ^ �p 6= '\n' ^ �p 6= '"'; p++) val = incr (shift left (val ; 8); �p);if (�p ^ �p++ � '"')if (�p � ',') goto scan string ;

gThis code is used in section 153.

ARGS=macro ( ), x11.aux : octa, MMIX-ARITH x4.cur disp addr : octa, x151.cur disp mode : char, x151.cur disp set : bool, x151.cur disp type : char, x151.cur seg : octa, x151.false =0, x9.h: tetra, x10.incomplete str : label, x149.incr : octa ( ), MMIX-ARITH x6.

isdigit : int ( ), <ctype.h>.isxdigit : int ( ), <ctype.h>.k: register int, x62.l: tetra, x10.next char : char �,MMIX-ARITH x69.

octa= struct, x10.oplus : octa ( ), MMIX-ARITH x5.p: register char �, x62.repeating : register int, x149.scan const : int ( ),

MMIX-ARITH x68.shift left : octa ( ),MMIX-ARITH x7.

spec reg code : char [ ], x151.spec regg code : char [ ], x151.true =1, x9.val : octa, MMIX-ARITH x69.what say : label, x149.zero octa : octa,MMIX-ARITH x4.

Page 421: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 414

156. hDisplay and/or set the value of the current octabyte 156 i �fif (cur disp set ) hSet the current octabyte to val 157 i;hDisplay the current octabyte 159 i;fputc('\n'; stdout );repeating��;if (:repeating ) break;if (cur disp mode � 'M') cur disp addr = incr (cur disp addr ; 8);else cur disp addr :l++;

gThis code is used in section 149.

157. h Set the current octabyte to val 157 i �switch (cur disp mode ) fcase 'l': l[cur disp addr :l & lring mask ] = val ; break;case '$': k = cur disp addr :l & #

ff;if (k < L) l[(O + k) & lring mask ] = val ; else if (k � G) g[k] = val ;break;

case 'g': k = cur disp addr :l & #ff;

if (k < 32) h Set g[k] = val only if permissible 158 i;g[k] = val ; break;

case 'M': if (:(cur disp addr :h& sign bit )) fll = mem �nd (cur disp addr );ll~ tet = val :h; (ll + 1)~ tet = val :l;

g break;g

This code is used in section 156.

158. Here we essentially simulate a PUT command, but we simply break if the PUTis illegal or privileged.

hSet g[k] = val only if permissible 158 i �if (k � 9 ^ k 6= rI ) fif (k � 19) break;if (k � rA) fif (val :h 6= 0 _ val :l � #

40000) break;cur round = (val :l � #

10000 ? val :l� 16 : ROUND_NEAR);g else if (k � rG ) fif (val :h 6= 0 _ val :l > 255 _ val :l < L _ val :l < 32) break;for (j = val :l; j < G; j++) g[j] = zero octa ;G = val :l;

g else if (k � rL) fif (val :h � 0 ^ val :l < L) L = val :l;else break;

gg

This code is used in section 157.

Page 422: MMIXware - A RISC Computer for the Third Millennium - Knuth

415 MMIX-SIM: RUNNING THE PROGRAM

159. hDisplay the current octabyte 159 i �switch (cur disp mode ) fcase 'l': k = cur disp addr :l & lring mask ;printf ("l[%d]="; k); aux = l[k]; break;

case '$': k = cur disp addr :l & #ff;

if (k < L)printf ("$%d=l[%d]="; k; (O + k) & lring mask ); aux = l[(O + k) & lring mask ];

else if (k � G) printf ("$%d=g[%d]="; k; k); aux = g[k];else printf ("$%d="; k); aux = zero octa ;break;

case 'g': k = cur disp addr :l & #ff;

printf ("g[%d]="; k); aux = g[k]; break;case 'M': if (cur disp addr :h& sign bit ) aux = zero octa ;else fll = mem �nd (cur disp addr );aux :h = ll~ tet ; aux :l = (ll + 1)~ tet ;

gprintf ("M8[#"); print hex (cur disp addr ); printf ("]="); break;

gswitch (cur disp type ) fcase '!': print int (aux ); break;case '.': print oat (aux ); break;case '#': fputc('#'; stdout ); print hex (aux ); break;case '"': print string (aux ); break;g

This code is used in section 156.

aux : octa, MMIX-ARITH x4.cur disp addr : octa, x151.cur disp mode : char, x151.cur disp set : bool, x151.cur disp type : char, x151.cur round : int,MMIX-ARITH x30.

fputc : int ( ), <stdio.h>.g: octa [ ], x76.G: register int, x75.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.j: register int, x62.k: register int, x62.

l: tetra, x10.l: octa �, x76.L: register int, x75.ll : register mem tetra �,

x62.lring mask : int, x76.mem �nd : mem tetra �( ),

x20.O: register int, x75.print oat : void ( ),MMIX-ARITH x54.

print hex : void ( ), x12.print int : void ( ), x15.print string : void ( ), x160.

printf : int ( ), <stdio.h>.PUT=#

f6, x54.rA=21, x55.repeating : register int, x149.rG =19, x55.rI =12, x55.rL=20, x55.ROUND_NEAR=4, x100.sign bit =macro, x15.stdout : FILE �, <stdio.h>.tet : tetra, x16.val : octa, MMIX-ARITH x69.zero octa : octa,MMIX-ARITH x4.

Page 423: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 416

160. h Subroutines 12 i +�void print string ARGS((octa));void print string (o)

octa o;fregister int k; state ; b;

for (k = state = 0; k < 8; k++) fb = ((k < 4 ? o:h� (8 � (3� k)) : o:l� (8 � (7� k)))) & #

ff;if (b � 0) fif (state ) printf ("%s,0"; state > 1 ? "\"" : ""); state = 1;

g else if (b � ' ' ^ b � '~')printf ("%s%c"; state > 1 ? "" : state � 1 ? ",\"" : "\""; b); state = 2;

else printf ("%s#%x"; state > 1 ? "\"," : state � 1 ? "," : ""; b); state = 1;gif (state � 0) printf ("0");else if (state > 1) printf ("\"");

g161. hCases that set and clear tracing and breakpoints 161 i �case '@': inst ptr = scan hex (p+ 1; cur seg ); p = next char ;halted = false ; break;

case 't': case 'u': k = �p;val = scan hex (p+ 1; cur seg ); p = next char ;if (val :h < #

20000000) fll = mem �nd (val );if (k � 't') ll~bkpt j= trace bit ;else ll~bkpt &= �trace bit ;

gbreak;

case 'b': for (k = 0; p++; :isxdigit (�p); p++)if (�p � 'r') k j= read bit ;else if (�p � 'w') k j= write bit ;else if (�p � 'x') k j= exec bit ;

val = scan hex (p; cur seg ); p = next char ;if (:(val :h& sign bit )) fll = mem �nd (val );ll~bkpt = (ll~bkpt &�8) j k;

gbreak;

case 'T': cur seg :h = 0; goto passit ;case 'D': cur seg :h = #

20000000; goto passit ;case 'P': cur seg :h = #

40000000; goto passit ;case 'S': cur seg :h = #

60000000; goto passit ;case 'B': show breaks (mem root );passit : p++; break;

This code is used in section 149.

Page 424: MMIXware - A RISC Computer for the Third Millennium - Knuth

417 MMIX-SIM: RUNNING THE PROGRAM

162. h Subroutines 12 i +�void show breaks ARGS((mem node �));void show breaks (p)

mem node �p;fregister int j;octa cur loc ;

if (p~ left ) show breaks (p~ left );for (j = 0; j < 512; j++)if (p~dat [j]:bkpt ) fcur loc = incr (p~ loc ; 4 � j);printf (" %08x%08x %c%c%c%c\n"; cur loc :h; cur loc :l;

p~dat [j]:bkpt & trace bit ? 't' : '-'; p~dat [j]:bkpt & read bit ? 'r' : '-';p~dat [j]:bkpt & write bit ? 'w' : '-'; p~dat [j]:bkpt & exec bit ? 'x' : '-');

gif (p~right ) show breaks (p~right );

g163. We put pointers to the command-line strings in M[Pool_Segment + 8 � (k +1)]8 for 0 � k < argc ; the strings themselves are octabyte-aligned, starting atM[Pool_Segment + 8 � (argc + 2)]8. The location of the �rst free octabyte in thepool segment is placed in M[Pool_Segment]8.

hLoad the command line arguments 163 i �x:h = #

40000000; x:l = #8;

loc = incr (x; 8 � (argc + 1));for (k = 0; k < argc ; k++; cur arg++) fll = mem �nd (x);ll~ tet = loc :h; (ll + 1)~ tet = loc :l;ll = mem �nd (loc);mmputchars (�cur arg ; strlen (�cur arg ); loc);x:l += 8; loc :l += 8 + (strlen (�cur arg ) &�8);

gx:l = 0; ll = mem �nd (x); ll~ tet = loc :h; (ll + 1)~ tet = loc :l;

This code is used in section 141.

argc : int, x141.ARGS=macro ( ), x11.bkpt : unsigned char, x16.cur arg : char ��, x144.cur seg : octa, x151.dat : mem tetra [ ], x16.exec bit =macro, x58.false =0, x9.h: tetra, x10.halted : bool, x61.incr : octa ( ), MMIX-ARITH x6.inst ptr : octa, x61.isxdigit : int ( ), <ctype.h>.k: register int, x62.

l: tetra, x10.left : mem node �, x16.ll : register mem tetra �,

x62.loc : octa, x16.loc : octa, x61.mem �nd : mem tetra �( ),

x20.mem node= struct, x16.mem root : mem node �, x19.mmputchars : int ( ), x117.next char : char �,MMIX-ARITH x69.

octa= struct, x10.p: register char �, x62.printf : int ( ), <stdio.h>.read bit =macro, x58.right : mem node �, x16.scan hex : octa ( ), x154.sign bit =macro, x15.strlen : int ( ), <string.h>.tet : tetra, x16.trace bit =macro, x58.val : octa, MMIX-ARITH x69.write bit =macro, x58.x: octa, x61.

Page 425: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: RUNNING THE PROGRAM 418

164. hGet ready to UNSAVE the initial context 164 i �x:h = 0; x:l = #

90;ll = mem �nd (x);if (ll~ tet ) inst ptr = x;resuming = true ;rop = RESUME_AGAIN;g[rX ]:l = ((tetra) UNSAVE � 24) + 255;if (dump �le ) f

x:l = 1;dump (mem root );dump tet (0); dump tet (0);exit (1);

gThis code is used in section 141.

165. The special option `-D<filename>' can be used to prepare binary �les neededby the MMIX-in-MMIX simulator of Section 1.4.3�. This option puts big-endian octa-bytes into a given �le; a location l is followed by one or more nonzero octabytes M8[l],M8[l+8], M8[l+16], : : : , followed by zero. The simulated simulator knows how to loadprograms in such a format (see exercise 1.4.3�{20), and so does the meta-simulatorMMMIX.

hSubroutines 12 i +�void dump ARGS((mem node �));void dump tet ARGS((tetra));void dump (p)

mem node �p;fregister int j;octa cur loc ;

if (p~ left ) dump (p~ left );for (j = 0; j < 512; j += 2)if (p~dat [j]:tet _ p~dat [j + 1]:tet ) fcur loc = incr (p~ loc ; 4 � j);if (cur loc :l 6= x:l _ cur loc :h 6= x:h) fif (x:l 6= 1) dump tet (0); dump tet (0);dump tet (cur loc :h); dump tet (cur loc :l); x = cur loc ;

gdump tet (p~dat [j]:tet );dump tet (p~dat [j + 1]:tet );x = incr (x; 8);

gif (p~right ) dump (p~right );

g

Page 426: MMIXware - A RISC Computer for the Third Millennium - Knuth

419 MMIX-SIM: RUNNING THE PROGRAM

166. h Subroutines 12 i +�void dump tet (t)

tetra t;ffputc(t� 24; dump �le );fputc((t� 16) & #

ff; dump �le );fputc((t� 8) & #

ff; dump �le );fputc(t& #

ff; dump �le );g

ARGS=macro ( ), x11.dat : mem tetra [ ], x16.dump �le : FILE �, x144.exit : void ( ), <stdlib.h>.fputc : int ( ), <stdio.h>.g: octa [ ], x76.h: tetra, x10.incr : octa ( ), MMIX-ARITH x6.inst ptr : octa, x61.l: tetra, x10.

left : mem node �, x16.ll : register mem tetra �,

x62.loc : octa, x16.mem �nd : mem tetra �( ),

x20.mem node= struct, x16.mem root : mem node �, x19.octa= struct, x10.RESUME_AGAIN=0, x125.

resuming : bool, x61.right : mem node �, x16.rop : int, x61.rX =25, x55.tet : tetra, x16.tetra=unsigned int, x10.true =1, x9.UNSAVE=#

fb, x54.x: octa, x61.

Page 427: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIX-SIM: NAMES OF THE SECTIONS 420

167. Names of the sections.

hCases for formatting characters 134, 136, 138 i Used in section 133.

hCases for individual MMIX instructions 84, 85, 86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 101,

102, 104, 106, 107, 108, 124 i Used in section 60.

hCases for lopcodes in the main loop 33, 34, 35, 36 i Used in section 29.

hCases that change cur disp mode 152 i Used in section 149.

hCases that de�ne cur disp type 153 i Used in section 149.

hCases that set and clear tracing and breakpoints 161 i Used in section 149.

hCheck for trip interrupt 122 i Used in section 60.

hCheck if the source �le has been modi�ed 44 i Used in section 42.

hConvert relative address to absolute address 70 i Used in section 60.

hDisplay and/or set the value of the current octabyte 156 i Used in section 149.

hDisplay the current octabyte 159 i Used in section 156.

hEither halt or print warning 109 i Used in section 108.

hFetch the next instruction 63 i Used in section 60.

hFix up the subtrees of �q 22 i Used in section 21.

hGet ready to UNSAVE the initial context 164 i Used in section 141.

hGet ready to update rA 100 i Used in section 97.

hGet ready to update rG 99 i Used in section 97.

hGlobal variables 19, 25, 31, 40, 48, 52, 56, 61, 65, 76, 110, 113, 121, 129, 139, 144, 151 i Used

in section 141.

h Increase rL 81 i Used in section 80.

h Info for arithmetic commands 66 i Used in section 65.

h Info for branch commands 67 i Used in section 65.

h Info for load/store commands 68 i Used in section 65.

h Info for logical and control commands 69 i Used in section 65.

h Initialize everything 14, 18, 24, 32, 41, 77, 147 i Used in section 141.

h Initiate a trip interrupt 123 i Used in section 122.

h Install operand �elds 71 i Used in section 60.

h Install register X as the destination, adjusting the register stack if necessary 80 iUsed in section 60.

h Install special operands when resuming an interrupted operation 126 i Used in

section 71.

h Interact with the user 149 i Used in section 141.

h Interpret character �p in the trace format 133 i Used in section 131.

hLoad and write four bytes 119 i Used in section 117.

hLoad and write one byte 118 i Used in section 117.

hLoad g[k] from the register stack 105 i Used in section 104.

hLoad tet as a normal item 30 i Used in section 29.

hLoad the command line arguments 163 i Used in section 141.

hLoad the next item 29 i Used in section 32.

hLoad the postamble 37 i Used in section 32.

hLoad the preamble 28 i Used in section 32.

hLocal registers 62, 75 i Used in section 141.

hOpen a �le for dumping binary output 146 i Used in section 143.

Page 428: MMIXware - A RISC Computer for the Third Millennium - Knuth

421 MMIX-SIM: NAMES OF THE SECTIONS

hOpen a �le for simulated standard input 145 i Used in section 143.

hPerform one instruction 60 i Used in section 141.

hPrepare memory arguments ma = M[a] and mb = M[b] if needed 111 i Used in

section 108.

hPrepare to list lines from a new source �le 49 i Used in section 47.

hPrepare to perform a ropcode 125 i Used in section 124.

hPreprocessor macros 11, 43, 46 i Used in section 141.

hPrint a stream-of-consciousness description of the instruction 131 i Used in sec-

tion 128.

hPrint all the frequency counts 53 i Used in section 141.

hPrint changes to rL 132 i Used in section 131.

hPrint frequency data for location p~ loc + 4 � j 51 i Used in section 50.

hPrint the frequency count, the location, and the instruction 130 i Used in sec-

tion 128.

hProcess the command line 142 i Used in section 141.

hPut a new command in command buf 150 i Used in section 149.

hRead and store one byte; return if done 115 i Used in section 114.

hRead and store up to four bytes; return if done 116 i Used in section 114.

hScan a string constant 155 i Used in section 153.

hSearch for key in the treap, setting last mem and p to its location 21 i Used in

section 20.

hSet b from register X 74 i Used in section 71.

hSet b from special register 79 i Used in section 71.

hSet g[k] = val only if permissible 158 i Used in section 157.

hSet L = z = min(z; L) 98 i Used in section 97.

hSet the current octabyte to val 157 i Used in section 156.

hSet y from register Y 73 i Used in section 71.

hSet z as an immediate wyde 78 i Used in section 71.

hSet z from register Z 72 i Used in section 71.

hStore g[k] in the register stack 103 i Used in section 102.

hSubroutines 12, 13, 15, 17, 20, 26, 27, 42, 45, 47, 50, 82, 83, 91, 114, 117, 120, 137, 140, 143, 148,

154, 160, 162, 165, 166 i Used in section 141.

hTrace the current instruction, if requested 128 i Used in section 60.

hType declarations 9, 10, 16, 38, 39, 54, 55, 59, 64, 135 i Used in section 141.

hUpdate the clocks 127 i Used in section 60.

Page 429: MMIXware - A RISC Computer for the Third Millennium - Knuth

422

MMIXAL

1. De�nition of MMIXAL. This program takes input written in MMIXAL, theMMIX assembly language, and translates it into binary �les that can be loaded andexecuted on MMIX simulators. MMIXAL is much simpler than the \industrial strength"assembly languages that computer manufacturers usually provide, because it is pri-marily intended for the simple demonstration programs in The Art of Computer

Programming. Yet it tries to have enough features to serve also as the back end ofcompilers for C and other high-level languages.Instructions for using the program appear at the end of this document. First we will

discuss the input and output languages in detail; then we'll consider the translationprocess, step by step; then we'll put everything together.

2. A program in MMIXAL consists of a series of lines, each of which usually containsa single instruction. However, lines with no instructions are possible, and so are lineswith two or more instructions.Each instruction has three parts called its label �eld, opcode �eld, and operand

�eld; these �elds are separated from each other by one or more spaces. The label�eld, which is often empty, consists of all characters up to the �rst blank space. Theopcode �eld, which is never empty, runs from the �rst nonblank after the label tothe next blank space. The operand �eld, which again might be empty, runs from thenext nonblank character (if any) to the �rst blank or semicolon that isn't part of astring or character constant. If the operand �eld is followed by a semicolon, possiblywith intervening blanks, a new instruction begins immediately after the semicolon;otherwise the rest of the line is ignored. The end of a line is treated as a blank spacefor the purposes of these rules, with the additional proviso that string or characterconstants are not allowed to extend from one line to another.The label �eld must begin with a letter or a digit; otherwise the entire line is

treated as a comment. Popular ways to introduce comments, either at the beginningof a line or after the operand �eld, are to precede them by the character % as in TEX,or by // as in C++; MMIXAL is not very particular. However, Lisp-style commentsintroduced by single semicolons will fail if they follow an instruction, because theywill be assumed to introduce another instruction.

3. MMIXAL has no built-in macro capability, nor does it know how to include header�les and such things. But users can run their �les through a standard C preprocessorto obtain MMIXAL programs in which macros and such things have been expanded.(Caution: The preprocessor also removes C-style comments, unless it is told not todo so.) Literate programming tools could also be used for preprocessing.If a line begins with the special form `# h integer i h string i', this program interprets

it as a line directive emitted by a preprocessor. For example,

# 13 "foo.mms"

means that the following line was line 13 in the user's source �le foo.mms. Linedirectives allow us to correlate errors with the user's original �le; we also pass themto the output, for use by simulators and debuggers.

D.E. Knuth: MMIXware, LNCS 1750, pp. 422-493, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 430: MMIXware - A RISC Computer for the Third Millennium - Knuth

423 MMIXAL: DEFINITION OF MMIXAL

4. MMIXAL deals primarily with symbols and constants, which it interprets andcombines to form machine language instructions and data. Constants are simplest,so we will discuss them �rst.A decimal constant is a sequence of digits, representing a number in radix 10. A hex-

adecimal constant is a sequence of hexadecimal digits, preceded by #, representing anumber in radix 16:

hdigit i �! 0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9hhex digit i �! hdigit i j A j B j C j D j E j F j a j b j c j d j e j fhdecimal constant i �! hdigit i j hdecimal constant ihdigit ihhex constant i �! #hhex digit i j hhex constant ihhex digit i

Constants whose value is 264 or more are reduced modulo 264.

5. A character constant is a single character enclosed in single quote marks; itdenotes the ASCII or Unicode number corresponding to that character. For example,'a' represents the constant #61, also known as 97. The quoted character can beanything except the character that the C library calls \n or newline; that charactershould be represented as #12.

h character constant i �! 'h single byte character except newline i'h constant i �! hdecimal constant i j hhex constant i j h character constant i

Notice that ''' represents a single quote, the code #27; and '\' represents a back-slash, the code #5c. MMIXAL characters are never \quoted" by backslashes as in theC language.In the present implementation a character constant will always be at most 255,

since wyde character input is not supported. But if the input were in Unicode onecould write, say, '@' or '�' for #05d0 or #0416. The present program does notsupport Unicode directly because basic software for inputting and outputting 16-bitcharacters was still in a primitive state at the time of writing. But the data structuresbelow are designed so that a change to Unicode will not be diÆcult when the time isripe.

6. A string constant like "Hello" is an abbreviation for a sequence of one or morecharacter constants separated by commas: 'H','e','l','l','o'. Any characterexcept newline or the double quote mark " can appear between the double quotesof a string constant. Similarly, " " is an abbreviation for ' ',' ',' '

(namely #9ad8,#5fb7,#7eb3) when Unicode is supported.

Page 431: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: DEFINITION OF MMIXAL 424

7. A symbol in MMIXAL is any sequence of letters and digits, beginning with a letter.A colon `:' or underscore symbol `_' is regarded as a letter, for purposes of thisde�nition. All extended-ASCII characters like `�e', whose 8-bit code exceeds 126, arealso treated as letters.

h letter i �! A j B j � � � j Z j a j b j � � � j z j : j _ j h character with code value > 126 ih symbol i �! h letter i j h symbol ih letter i j h symbol ihdigit i

In future implementations, when MMIXAL is used with Unicode, all wyde characterswhose 16-bit code exceeds 126 will be regarded as letters; thus MMIXAL symbols willbe able to involve Greek letters or Chinese characters or thousands of other glyphs.

8. A symbol is said to be fully quali�ed if it begins with a colon. Every symbol that isnot fully quali�ed is an abbreviation for the fully quali�ed symbol obtained by placingthe current pre�x in front of it; the current pre�x is always fully quali�ed. At thebeginning of an MMIXAL program the current pre�x is simply the single character `:',but the user can change it with the PREFIX command. For example,

ADD x,y,z % means ADD :x,:y,:z

PREFIX Foo: % current prefix is :Foo:

ADD x,y,z % means ADD :Foo:x,:Foo:y,:Foo:z

PREFIX Bar: % current prefix is :Foo:Bar:

ADD :x,y,:z % means ADD :x,:Foo:Bar:y,:z

PREFIX : % current prefix reverts to :

ADD x,Foo:Bar:y,Foo:z % means ADD :x,:Foo:Bar:y,:Foo:z

This mechanism allows large programs to avoid con icts between symbol names, whenparts of the program are independent and/or written by di�erent users. The currentpre�x conventionally ends with a colon, but this convention need not be obeyed.

9. A local symbol is a decimal digit followed by one of the letters B, F, or H, meaning\backward," \forward," or \here":

h local operand i �! hdigit i B j hdigit i Fh local label i �! hdigit i H

The B and F forms are permitted only in the operand �eld of MMIXAL instructions;the H form is permitted only in the location �eld. A local operand such as 2B standsfor the last local label 2H in instructions before the current one, or 0 if 2H has not yetappeared as a label. A local operand such as 2F stands for the �rst 2H in instructionsafter the current one. Thus, in a sequence such as

2H JMP 2F; 2H JMP 2B

the �rst instruction jumps to the second and the second jumps to the �rst.Local symbols are useful for references to nearby points of a program, in cases where

no meaningful name is appropriate. They can also be useful in special situations wherea rede�nable symbol is needed; for example, an instruction like

9H IS 9B+1

will maintain a running counter.

Page 432: MMIXware - A RISC Computer for the Third Millennium - Knuth

425 MMIXAL: DEFINITION OF MMIXAL

10. Each symbol receives a value called its equivalent when it appears in the label�eld of an instruction; it is said to be de�ned after its equivalent has been established.A few symbols, like rA and ROUND_OFF and Fopen, are prede�ned because they referto �xed constants associated with the MMIX hardware or its rudimentary operatingsystem; otherwise every symbol should be de�ned exactly once. The two appearancesof `2H' in the example above do not violate this rule, because the second `2H' is notthe same symbol as the �rst.A prede�ned symbol can be rede�ned (given a new equivalent). After it has been

rede�ned it acts like an ordinary symbol and cannot be rede�ned again. A completelist of the prede�ned symbols appears in the program listing below.Equivalents are either pure or register numbers. A pure equivalent is an unsigned

octabyte, but a register number equivalent is a one-byte value, between 0 and 255.A dollar sign is used to change a pure number into a register number; for example,`$20' means register number 20.

Page 433: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: DEFINITION OF MMIXAL 426

11. Constants and symbols are combined into expressions in a simple way:

hprimary expression i �! h constant i j h symbol i j h local operand i j @ j(h expression i) j hunary operator ihprimary expression i

h term i �! hprimary expression i j h term ih strong operator ihprimary expression ih expression i �! h term i j h expression ihweak operator ih term ihunary operator i �! + j - j ~ j $ j &h strong operator i �! * j / j // j % j << j >> j &hweak operator i �! + j - j | j ^

Each expression has a value that is either pure or a register number. The character @stands for the current location, which is always pure. The unary operators +, -, ~, $,and & mean, respectively, \do nothing," \subtract from zero," \complement the bits,"\change from pure value to register number," and \take the serial number". Onlythe �rst of these, +, can be applied to a register number. The last unary operator,&, applies only to symbols, and it is of interest primarily to system programmers;it converts a symbol to the unique positive integer that is used to identify it in thebinary �le output by MMIXAL.Binary operators come in two avors, strong and weak. The strong ones are

essentially concerned with multiplication or division: x*y, x/y, x//y, x%y, x<<y,x>>y, and x&y stand respectively for (x�y) mod 264 (multiplication), bx=yc (division),b264x=yc (fractional division), x mod y (remainder), (x � 2y) mod 264 (left shift),bx=2yc (right shift), and x ^ y (bitwise and) on unsigned octabytes. Division is legalonly if y > 0; fractional division is legal only if x < y. None of the strong binaryoperations can be applied to register numbers.The weak binary operations x+y, x-y, x|y, and x^y stand respectively for (x +

y) mod 264 (addition), (x � y) mod 264 (subtraction), x _ y (bitwise or), and x � y(bitwise exclusive or) on unsigned octabytes. These operations can be applied toregister numbers only in four contexts: h register i + hpure i, hpure i + h register i,h register i � hpure i and h register i � h register i. For example, if x denotes $1 and y

denotes $10, then x+3 and 3+x denote $4, and y-x denotes the pure value 9.Register numbers within expressions are allowed to be arbitrary octabytes, but a

register number assigned as the equivalent of a symbol should not exceed 255.(Incidentally, one might ask why the designer of MMIXAL did not simply adopt the

existing rules of C for expressions. The primary reason is that the designers of Cchose to give <<, >>, and & a lower precedence than +; but in MMIXAL we want to beable to write things like o<<24+x<<16+y<<8+z or @+yz<<2 or @+(#100-@)&#ff. Sincethe conventions of C were inappropriate, it was better to make a clean break, notpretending to have a close relationship with that language. The new rules are quiteeasily memorized, because MMIXAL has just two levels of precedence, and the strongbinary operations are all essentially multiplicative by nature while the weak binaryoperations are essentially additive.)

12. A symbol is called a future reference until it has been de�ned. MMIXAL restrictsthe use of future references, so that programs can be assembled quickly in one passover the input; therefore all expressions can be evaluated when the MMIXAL processor�rst sees them.

Page 434: MMIXware - A RISC Computer for the Third Millennium - Knuth

427 MMIXAL: DEFINITION OF MMIXAL

The restrictions are easily stated: Future references cannot be used in expressionstogether with unary or binary operators (except the unary +, which does nothing);moreover, future references can appear as operands only in instructions that haverelative addresses (namely branches, probable branches, JMP, PUSHJ, GETA) or inoctabyte constants (the pseudo-operation OCTA). Thus, for example, one can sayJMP 1F or JMP 1B-4, but not JMP 1F-4.

13. We noted earlier that each MMIXAL instruction contains a label �eld, an opcode�eld, and an operand �eld. The label �eld is either empty or a symbol or local label;when it is nonempty, the symbol or local label receives an equivalent. The operand�eld is either empty or a sequence of expressions separated by commas; when it isempty, it is equivalent to the simple operand �eld `0'.

h instruction i �! h label ih opcode ih operand list ih label i �! h empty i j h symbol i j h local label ih operand list i �! h empty i j h expression list ih expression list i �! h expression i j h expression list i,h expression i

The opcode �eld either contains a symbolic MMIX operation name (like ADD), oran alias operation, or a pseudo-operation. Alias operations are alternate namesfor MMIX operations whose standard names are inappropriate in certain contexts.Pseudo-operations do not correspond directly to MMIX commands, but they governthe assembly process in important ways.There are two alias operations:

� SET $X,$Y is equivalent to OR $X,$Y,0; it sets register X to register Y. Similarly,SET $X,Y (when Y is not a register) is equivalent to SETL $X,Y.

� LDA $X,$Y,$Z is equivalent to ADDU $X,$Y,$Z; it loads the address of memorylocation $Y+$Z into register X. Similarly, LDA $X,$Y,Z is equivalent to ADDU $X,$Y,Z.

The symbolic operation names for genuine MMIX operations should not include thesuÆx I for an immediate operation or the suÆx B for a backward jump; MMIXALdetermines such things automatically. Thus, one never writes ADDI or JMPB in thesource input to MMIXAL, although such opcodes might appear when a simulator ordebugger or disassembler is presenting a numeric instruction in symbolic form.

h opcode i �! h symbolic MMIX operation i j hpseudo-operation ih symbolic MMIX operation i �! TRAP j FCMP j � � � j TRIPhpseudo-operation i �! IS j LOC j PREFIX j GREG j LOCAL j BSPEC j ESPEC j

BYTE j WYDE j TETRA j OCTA

Page 435: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: DEFINITION OF MMIXAL 428

14. MMIX operations like ADD require exactly three expressions as operands. The�rst two must be register numbers. The third must be either a register number or apure number between 0 and 255; in the latter case, ADD becomes ADDI in the assembledoutput. Thus, for example, the command \set register 1 to the sum of register 2 andregister 3" could be expressed as

ADD $1,$2,$3

or as, say,ADD x,y,y+1

if the equivalent of x is $1 and the equivalent of y is $2. The command \subtract 5from register 1' could be expressed as

SUB $1,$1,5

or asSUB x,x,5

but not as `SUBI $1,$1,5' or `SUBI x,x,5'.MMIX operations like FLOT require either three operands (register, pure, regis-

ter/pure) or only two (register, register/pure). In the �rst case the middle operand isthe rounding mode, which is best expressed in terms of the prede�ned symbolic valuesROUND_CURRENT, ROUND_OFF, ROUND_UP, ROUND_DOWN, ROUND_NEAR, for (0; 1; 2; 3; 4)respectively. In the second case the middle operand is understood to be zero (namely,ROUND_CURRENT).MMIX operations like SETL or INCH, which involve a wyde intermediate constant,

require exactly two operands, (register, pure). The value of the second operand should�t in two bytes.MMIX operations like BNZ, which mention a register and a relative address, also

require two operands. The �rst operand should be a register number. The secondoperand, when subtracted from the current location and divided by four, should yielda result r in the range �216 � r < 216. The second operand might also be unde�ned;in that case, the eventual value must satisfy the restriction stated for de�ned values.The opcodes GETA and PUSHJ are similar, except that the �rst operand to PUSHJ

might also be pure (see below). The JMP operation is also similar, but it has only oneoperand, and it allows the larger address range �224 � r < 224.MMIX operations that refer to memory, like LDO and STHT and GO, are treated like

ADD if they have three operands, except that the �rst operand should be pure (nota register number) in the case of PRELD, PREGO, PREST, STCO, SYNCD, and SYNCID.These opcodes also accept a special two-operand form, in which the second operandstands for a base address and an immediate o�set (see below).The �rst operand of PUSHJ and PUSHGO can be either a pure number or a register

number. In the �rst case (`PUSHJ 2,Sub' or `PUSHGO 2,Sub') the programmer mightbe thinking \let's push down two registers"; in the second case (`PUSHJ $2,Sub' or`PUSHGO $2,Sub') the programmer might be thinking \let's make register 2 the holeposition for this subroutine call." Both cases result in the same assembled output.

Page 436: MMIXware - A RISC Computer for the Third Millennium - Knuth

429 MMIXAL: DEFINITION OF MMIXAL

The remaining MMIX opcodes are idiosyncratic:

NEG r,p,z;

PUT s,z;

GET r,s;

POP p,yz;

RESUME xyz;

SAVE r,0;

UNSAVE r;

SYNC xyz;

TRAP x,y,z or TRAP x,yz or TRAP xyz;

SWYM and TRIP are like TRAP. Here s is an integer between 0 and 31, preferably givenby one of the prede�ned symbols rA, rB, : : : for special register codes; r is a registernumber; p is a pure byte; x, y, and z are either register numbers or pure bytes; yzand xyz are pure values that �t respectively in two and three bytes.All of these rules can be summarized by saying that MMIXAL treats each MMIX opcode

in the most natural way. When there are three operands, they a�ect �elds X, Y, and Zof the assembled MMIX instruction; when there are two operands, they a�ect �elds Xand YZ; when there is just one operand, it a�ects �eld XYZ.

15. In all cases when the opcode corresponds to an MMIX operation, the MMIXAL

instruction tells the assembler to carry out four steps: (1) Align the current locationso that it is a multiple of 4, by adding 1, 2, or 3 if necessary; (2) De�ne the equivalentof the label �eld to be the current location, if the label is nonempty; (3) Evaluatethe operands and assemble the speci�ed MMIX instruction into the current location;(4) Increase the current location by 4.

Page 437: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: DEFINITION OF MMIXAL 430

16. Now let's consider the pseudo-operations, starting with the simplest cases.

� h label i IS h expression i de�nes the value of the label to be the value of theexpression, which must not be a future reference. The expression may be eitherpure or a register number.

� h label i LOC h expression i �rst de�nes the label to be the value of the currentlocation, if the label is nonempty. Then the current location is changed to the valueof the expression, which must be pure.

For example, `LOC #1000' will start assembling subsequent instructions or data inlocation whose hexadecimal value is #1000. `X LOC @+500' de�nes X to be the addressof the �rst of 500 bytes in memory; assembly will continue at location X + 500. Theoperation of aligning the current location to a multiple of 256, if it is not alreadyaligned in that way, can be expressed as `LOC @+(256-@)&255'.A less trivial example arises if we want to emit instructions and data into two

separate areas of memory, but we want to intermix them in the MMIXAL source �le.We could start by de�ning 8H and 9H to be the starting addresses of the instructionand data segments, respectively. Then, a sequence of instructions could be enclosedin `LOC 8B; : : : ; 8H IS @'; a sequence of data could be enclosed in `LOC 9B; : : : ;9H IS @'. Any number of such sequences could then be combined. Instead of thetwo pseudo-instructions `8H IS @; LOC 9B' one could in fact write simply `8H LOC 9B'when switching from instructions to data.

� PREFIX h symbol i rede�nes the current pre�x to be the given symbol (fully quali-�ed). The label �eld should be blank.

17. The next pseudo-operations assemble bytes, wydes, tetrabytes, or octabytes ofdata.

� h label i BYTE h expression list i de�nes the label to be the current location, if thelabel �eld is nonempty; then it assembles one byte for each expression in the expressionlist, and advances the current location by the number of bytes. The expressions shouldall be pure numbers that �t in one byte.String constants are often used in such expression lists. For example, if the current

location is #1000, the instruction BYTE "Hello",0 assembles six bytes containingthe constants 'H', 'e', 'l', 'l', 'o', and 0 into locations #1000, : : : , #1005, andadvances the current location to #1006.

� h label i WYDE h expression list i is similar, but it �rst makes the current locationeven, by adding 1 to it if necessary. Then it de�nes the label (if a nonempty label ispresent), and assembles each expression as a two-byte value. The current location isadvanced by twice the number of expressions in the list. The expressions should allbe pure numbers that �t in two bytes.

� h label i TETRA h expression list i is similar, but it aligns the current location to amultiple of 4 before de�ning the label; then it assembles each expression as a four-bytevalue. The current location is advanced by 4n if there are n expressions in the list.Each expression should be a pure number that �ts in four bytes.

� h label i OCTA h expression list i is similar, but it �rst aligns the current location toa multiple of 8; it assembles each expression as an eight-byte value. The current

Page 438: MMIXware - A RISC Computer for the Third Millennium - Knuth

431 MMIXAL: DEFINITION OF MMIXAL

location is advanced by 8n if there are n expressions in the list. Any or all of theexpressions may be future references, but they should all be de�ned as pure numberseventually.

Page 439: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: DEFINITION OF MMIXAL 432

18. Global registers are important for accessing memory in MMIX programs. Theycould be allocated by hand, and de�ned with IS instructions, but MMIXAL provides amechanism that is usually much more convenient:

� h label i GREG h expression i allocates a new global register, and assigns its numberas the equivalent of the label. At the beginning of assembly, the current globalthreshold G is $255. Each distinct GREG instruction decreases G by 1; the �nal valueof G will be the initial value of rG when the assembled program is loaded.The value of the expression will be loaded into the global register at the beginning

of the program. If this value is nonzero, it should remain constant throughout the

program execution; such global registers are considered to be base addresses. Two ormore base addresses with the same constant value are assigned to the same globalregister number.Base addresses can simplify memory accesses in an important way. Suppose, for

example, �ve octabyte values appear in a data segment, and their addresses are calledAA, BB, CC, DD, and EE:

AA LOC @+8;BB LOC @+8;CC LOC @+8;DD LOC @+8;EE LOC @+8

Then if you say Base GREG AA, you will be able to write simply `LDO $1,AA' to bringAA into register $1, and `LDO $2,CC' to bring CC into register $2.Here's how it works: Whenever a memory operation such as LDO or STB or GO has

only two operands, the second operand should be a pure number whose value can beexpressed as b+Æ, where 0 � Æ < 256 and b is the value of a base address in one of thepreceding GREG commands. The MMIXAL processor will �nd the closest base addressand manufacture an appropriate command. For example, the instruction `LDO $2,CC'in the example of the preceding paragraph would be converted automatically to`LDO $2,Base,16'.If no base address is close enough, an error message will be generated, unless this

program is run with the -x option on the command line. The -x option insertsadditional instructions if necessary, using global register 255, so that any address isaccessible. For example, if there is no base address that allows LDO $2,FF to beimplemented in a single instruction, but if FF equals Base+1000, then the -x optionwould assemble two instructions,

SETL $255,1000; LDO $2,Base,$255

in place of LDO $2,FF. Caution: The -x feature makes the number of actual MMIXinstructions hard to predict, so extreme care must be used if your style of codingincludes relative branch instructions in dangerous forms like `BNZ x,@+8'.This base address convention can be used also with the alias operation LDA. For

example, `LDA $3,CC' loads the address of CC into register 3, by assembling theinstruction `ADDU $3,Base,16'.

Page 440: MMIXware - A RISC Computer for the Third Millennium - Knuth

433 MMIXAL: DEFINITION OF MMIXAL

MMIXAL also allows a two-operand form for memory operations such as

LDO $1,$2

to be an abbreviation for `LDO $1,$2,0'.When MMIXAL programs use subroutines with a memory stack in addition to the

built-in register stack, they usually begin with the instructions `sp GREG 0;fp GREG 0';these instructions allocate a stack pointer sp=$254 and a frame pointer fp=$253.However, subroutine libraries are free to implement any conventions for global regis-ters and stacks that they like.

19. Short programs rarely run out of global registers, but long programs need amechanism to check that GREG hasn't been used too often. The following pseudo-instruction provides the necessary safety valve:

� LOCAL h expression i ensures that the expression will be a local register in theprogram being assembled. The expression should be a register number, and thelabel �eld should be blank. At the close of assembly, MMIXAL will report an error ifthe �nal value of G does not exceed all register numbers that are declared local inthis way.A LOCAL instruction need not be given unless the register number is 32 or more.

(MMIX always considers $0 through $31 to be local, so MMIXAL implicitly acts as if theinstruction `LOCAL $31' were present.)

20. Finally, there are two pseudo-instructions to pass information and hints to theloading routine and/or to debuggers that will be using the assembled program.

� BSPEC h expression i begins \special mode"; the h expression i should have a valuethat �ts in two bytes, and the label �eld should be blank.

� ESPEC ends \special mode"; the operand �eld is ignored, and the label �eld shouldbe blank.

All material assembled between BSPEC and ESPEC is passed directly to the output,but not loaded as part of the assembled program. Ordinary MMIX instructions cannotappear in special mode; only the pseudo-operations IS, PREFIX, BYTE, WYDE, TETRA,OCTA, GREG, and LOCAL are allowed. The operand of BSPEC should have a valuethat �ts in two bytes; this value identi�es the kind of data that follows. (Forexample, BSPEC 0 might introduce information about subroutine calling conventionsat the current location, and BSPEC 1 might introduce line numbers from a high-level-language program that was compiled into the code at the current place. Systemroutines often need to pass such information through an assembler to the operatingsystem, hence MMIXAL provides a general-purpose conduit.)

Page 441: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: DEFINITION OF MMIXAL 434

21. A program should begin at the special symbolic location Main (more precisely,at the address corresponding to the fully quali�ed symbol :Main). This symbol alwayshas serial number 1, and it must always be de�ned.Locations should not receive assembled data more than once. (More precisely, the

loader will load the bitwise xor of all the data assembled for each byte position; but thegeneral rule \do not load two things into the same byte" is safest.) All locations thatdo not receive assembled data are initially zero, except that the loading routine willput register stack data into segment 3, and the operating system may put command-line data and debugger data into segment 2. (The rudimentary MMIX operating systemstarts a program with the number of command-line arguments in $0, and a pointerto the beginning of an array of argument pointers in $1.) Segments 2 and 3 shouldnot get assembled data, unless the user is a true hacker who is willing to take the riskthat such data might crash the system.

Page 442: MMIXware - A RISC Computer for the Third Millennium - Knuth

435 MMIXAL: BINARY MMO OUTPUT

22. Binary MMO output. When the MMIXAL processor assembles a �le calledfoo.mms, it produces a binary output �le called foo.mmo. (The suÆx mms standsfor \MMIX symbolic," and mmo stands for \MMIX object.") Such mmo �les have asimple structure consisting of a sequence of tetrabytes. Some of the tetrabytes areinstructions to a loading routine; others are data to be loaded.Loader instructions are distinguished from tetrabytes of data by their �rst (most

signi�cant) byte, which has the special escape-code value #98, called mm in theprogram below. This code value corresponds to MMIX's opcode LDVTS, which isunlikely to occur in tetras of data. The second byte X of a loader instruction isthe loader opcode, called the lopcode. The third and fourth bytes, Y and Z, areoperands. Sometimes they are combined into a single 16-bit operand called YZ.

#de�ne mm #98

Page 443: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BINARY MMO OUTPUT 436

23. A small, contrived example will help explain the basic ideas of mmo format.Consider the following input �le, called test.mms:

% A peculiar example of MMIXALLOC Data Segment % location #2000000000000000OCTA 1F % a future reference

a GREG @ % $254 is base address for ABCDABCD BYTE "ab" % two bytes of data

LOC #123456789 % switch to the instruction segmentMain JMP 1F % another future reference

LOC @+#4000 % skip past 16384 bytes2H LDB $3,ABCD+1 % use the base address

BZ $3,1F; TRAP % and refer to the future again# 3 "foo.mms" % this comment is a line directive

LOC 2B-4*10 % move 10 tetras before previous location1H JMP 2B % resolve previous references to 1F

BSPEC 5 % begin special data of type 5TETRA &a<<8 % four bytes of special dataWYDE a-$0 % two more bytes of special dataESPEC % end a special data packetLOC ABCD+2 % resume the data segmentBYTE "cd",#98 % assemble three more bytes of data

It de�nes a silly program that essentially puts 'b' into register 3; the program haltswhen it gets to an all-zero TRAP instruction following the BZ. But the assembled outputof this �le illustrates most of the features of MMIX objects, and in fact test.mms wasthe �rst test �le tried by the author when the MMIXAL processor was originally written.The binary output �le test.mmo assembled from test.mms consists of the following

tetrabytes, shown in hexadecimal notation with brief comments. Fuller explanationsappear with the descriptions of individual lopcodes below.

98090101 lop pre 1; 1 (preamble, version 1, 1 tetra)36f4a363 (the �le creation time)98012001 lop loc #20; 1 (data segment, 1 tetra)00000000 (low tetrabyte of address in data segment)00000000 (high tetrabyte of OCTA 1F)00000000 (low tetrabyte, will be �xed up later)61620000 ("ab", padded with trailing zeros)98010002 lop loc 0; 2 (instruction segment, 2 tetras)00000001 (high tetrabyte of address in instruction segment)2345678c (low tetrabyte of address, after alignment)98060002 lop �le 0; 2 (�le name 0, 2 tetras)74657374 ("test")2e6d6d73 (".mms")98070007 lop line 7 (line 7 of the current �le)f0000000 (JMP 1F, will be �xed up later)98024000 lop skip #4000 (advance 16384 bytes)

Page 444: MMIXware - A RISC Computer for the Third Millennium - Knuth

437 MMIXAL: BINARY MMO OUTPUT

98070009 lop line 9 (line 9 of the current �le)8103fe01 (LDB $3,b,1, uses base address b)42030000 (BZ $3,1F, will be �xed later)9807000a lop line 10 (stay on line 10)00000000 (TRAP)98010002 lop loc 0; 2 (instruction segment, 2 tetras)00000001 (high tetrabyte of address in instruction segment)2345a768 (low tetrabyte of address 1H)98050010 lop �xrx 16 (�x 16-bit relative address)0100fff5 (�xup for location @-4*-11)98040ff7 lop �xr #ff7 (�x @-4*#ff7)98032001 lop �xo #20; 1 (data segment, 1 tetra)00000000 (low tetrabyte of data segment address to �x)98060102 lop �le 1; 2 (�le name 1, 2 tetras)666f6f2e ("foo."6d6d7300 ("mms", 098070004 lop line 4 (line 4 of the current �le)f000000a (JMP 2B)98080005 lop spec 5 (begin special data of type 5)00000200 (TETRA &b<<8)00fe0000 (WYDE a-$0)98012001 lop loc #20; 1 (data segment, 1 tetra)0000000a (low tetrabyte of address in data segment)00006364 ("cd" with leading zeros, because of alignment)98000001 lop quote (don't treat next tetrabyte as a lopcode)98000000 (BYTE #98, padded with trailing zeros)980a00fe lop post $254 (begin postamble, G is 254)20000000 (high tetrabyte of the initial contents of $254)00000008 (low tetrabyte of base address $254)00000001 (high tetrabyte of the initial contents of $255)2345678c (low tetrabyte of $255, is address of Main)980b0000 lop stab (begin symbol table)203a5040 (compressed form for symbol table as a ternary trie)50404020412042204309440883404020 (ABCD = #2000000000000008, serial 3)4d20612069056e012345678c81400f61 (Main = #000000012345678c, serial 1)fe820000 (a = $254, serial 2)980c000a lop end (end symbol table, 10 tetras)

lop end =#c, x24.

lop �le =#6, x24.

lop �xo =#3, x24.

lop �xr =#4, x24.

lop �xrx =#5, x24.

lop line =#7, x24.

lop loc =#1, x24.

lop post =#a, x24.

lop pre =#9, x24.

lop quote =#0, x24.

lop skip =#2, x24.

lop spec =#8, x24.

lop stab =#b, x24.

Page 445: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BINARY MMO OUTPUT 438

24. When a tetrabyte of the mmo �le does not begin with the escape code, it is loadedinto the current location �, and � is increased to the next higher multiple of 4. (If� is not a multiple of 4, the tetrabyte actually goes into location � ^ (�4) = 4b�=4c,according to MMIX's usual conventions.) The current line number is also increasedby 1, if it is nonzero.When a tetrabyte does begin with the escape code, its next byte is the lopcode

de�ning a loader instruction. There are thirteen lopcodes:

� lop quote : X = #00, YZ = 1. Treat the next tetra as an ordinary tetrabyte, evenif it begins with the escape code.

� lop loc : X = #01, Y = high byte, Z = tetra count (Z = 1 or 2). Set the currentlocation to the 64-bit address de�ned by the next Z tetras, plus 256Y. Usually Y = 0(for the instruction segment) or Y = #20 (for the data segment). If Z = 2, the hightetra appears �rst.

� lop skip : X = #02, YZ = delta. Increase the current location by YZ.

� lop �xo : X = #03, Y = high byte, Z = tetra count (Z = 1 or 2). Load the value ofthe current location � into octabyte P, where P is the 64-bit address de�ned by thenext Z tetras plus 256Y as in lop loc . (The octabyte at P was previously assembledas zero because of a future reference.)

� lop �xr : X = #04, YZ = delta. Load YZ into the YZ �eld of the tetrabyte inlocation P, where P is ��4YZ, namely the address that precedes the current locationby YZ tetrabytes. (This tetrabyte was previously loaded with an MMIX instructionthat takes a relative address: a branch, probable branch, JMP, PUSHJ, or GETA. ItsYZ �eld was previously assembled as zero because of a future reference.)

� lop �xrx : X = #05, Y = 0, Z = 16 or 24. Proceed as in lop �xr , but load Æ intotetrabyte P = �� 4Æ instead of loading YZ into P = �� 4YZ. Here Æ is the value ofthe tetrabyte following the lop �xrx instruction; its leading byte will either 0 or 1. Ifthe leading byte is 1, Æ should be treated as the negative number (Æ ^ #ffffff)� 2Z

when calculating the address P. (The latter case arises only rarely, but it is neededwhen �xing up a relative \future" reference that ultimately leads to a \backward"instruction. The value of Æ that is xored into location P in such cases will change BZto BZB, or JMP to JMPB, etc.; we have Z = 24 when �xing a JMP, Z = 16 otherwise.)

� lop �le : X = #06, Y = �le number, Z = tetra count. Set the current �le numberto Y and the current line number to zero. If this �le number has occurred previously,Z should be zero; otherwise Z should be positive, and the next Z tetrabytes are thecharacters of the �le name in big-endian order. Trailing zeros follow the �le name ifits length is not a multiple of 4.

� lop line : X = #07, YZ = line number. Set the current line number to YZ. Ifthe line number is nonzero, the current �le and current line should correspond tothe source location that generated the next data to be loaded, for use in diagnosticmessages. (The MMIXAL processor gives precise line numbers to the sources of tetra-bytes in segment 0, which tend to be instructions, but not to the sources of tetrabytesassembled in other segments.)

� lop spec : X = #08, YZ = type. Begin special data of type YZ. The subsequent

Page 446: MMIXware - A RISC Computer for the Third Millennium - Knuth

439 MMIXAL: BINARY MMO OUTPUT

tetrabytes, continuing until the next loader operation other than lop quote , comprisethe special data. A lop quote instruction allows tetrabytes of special data to beginwith the escape code.

� lop pre : X = #09, Y = 1, Z = tetra count. A lop pre instruction, which de�nesthe \preamble," must be the �rst tetrabyte of every mmo �le. The Y �eld speci�es theversion number of mmo format, currently 1; other version numbers may be de�ned later,but version 1 should always be supported as described in the present document. TheZ tetrabytes following a lop pre command provide additional information that mightbe of interest to system routines. If Z > 0, the �rst tetra of additional informationrecords the time that this mmo �le was created, measured in seconds since 00:00:00Greenwich Mean Time on 1 Jan 1970.

� lop post : X = #0a, Y = 0, Z = G (must be 32 or more). This instruction begins thepostamble, which follows all instructions and data to be loaded. It causes the loadedprogram to begin with rG equal to the stated value of G, and with $G, G+1, : : : , $255initially set to the values of the next (256�G)�2 tetrabytes. These tetrabytes specify256�G octabytes in big-endian fashion (high half �rst).

� lop stab : X = #0b, YZ = 0. This instruction must appear immediately afterthe (256 � G) � 2 tetrabytes following lop post . It is followed by the symbol table,which lists the equivalents of all user-de�ned symbols in a compact form that will bedescribed later.

� lop end : X = #0c, YZ = tetra count. This instruction must be the very lasttetrabyte of each mmo �le. Furthermore, exactly YZ tetrabytes must appear betweenit and the lop stab command. (Therefore a program can easily �nd the symbol tablewithout reading forward through the entire mmo �le.)

A separate routine called MMOtype is available to translate binary mmo �les intohuman-readable form.

#de�ne lop quote #0 =� the quotation lopcode �=

#de�ne lop loc #1 =� the location lopcode �=

#de�ne lop skip #2 =� the skip lopcode �=

#de�ne lop �xo #3 =� the octabyte-�x lopcode �=

#de�ne lop �xr #4 =� the relative-�x lopcode �=

#de�ne lop �xrx #5 =� extended relative-�x lopcode �=

#de�ne lop �le #6 =� the �le name lopcode �=

#de�ne lop line #7 =� the �le position lopcode �=

#de�ne lop spec #8 =� the special hook lopcode �=

#de�ne lop pre #9 =� the preamble lopcode �=

#de�ne lop post #a =� the postamble lopcode �=

#de�ne lop stab #b =� the symbol table lopcode �=

#de�ne lop end #c =� the end-it-all lopcode �=

Page 447: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BINARY MMO OUTPUT 440

25. Many readers will have noticed that MMIXAL has no facilities for relocatableoutput, nor does mmo format support such features. The author's �rst drafts of MMIXALand mmo did allow relocatable objects, with external linkages, but the rules weresubstantially more complicated and therefore inconsistent with the goals of The Artof Computer Programming. The present design might actually prove to be superior tothe current practice, now that computer memory is signi�cantly cheaper than it usedto be, because one-pass assembly and loading are extremely fast when relocatabilityand external linkages are disallowed. Di�erent program modules can be assembledtogether about as fast as they could be linked together under a relocatable scheme,and they can communicate with each other in much more exible ways. Debuggingtools are enhanced when open-source libraries are combined with user programs, andsuch libraries will certainly improve in quality when their source form is accessible toa larger community of users.

Page 448: MMIXware - A RISC Computer for the Third Millennium - Knuth

441 MMIXAL: BASIC DATA TYPES

26. Basic data types. This program for the 64-bit MMIX architecture is basedon 32-bit integer arithmetic, because nearly every computer available to the authorat the time of writing was limited in that way. Details of the basic arithmetic appearin a separate program module called MMIX-ARITH, because the same routines areneeded also for the simulators. The de�nition of type tetra should be changed, ifnecessary, to conform with the de�nitions found in MMIX-ARITH.

hType de�nitions 26 i �typedef unsigned int tetra; =� assumes that an int is exactly 32 bits wide �=typedef struct ftetra h; l;

g octa; =� two tetrabytes makes one octabyte �=typedef enum f

false ; trueg bool;

See also sections 30, 54, 58, 62, 68, and 82.

This code is used in section 136.

27. hGlobal variables 27 i �extern octa zero octa ; =� zero octa :h = zero octa :l = 0 �=extern octa neg one ; =� neg one :h = neg one :l = �1 �=extern octa aux ; =� auxiliary output of a subroutine �=extern bool over ow ; =� set by certain subroutines for signed arithmetic �=

See also sections 33, 36, 37, 43, 46, 51, 56, 60, 63, 67, 69, 77, 83, 90, 105, 120, 133, 139, and 143.

This code is used in section 136.

28. Most of the subroutines in MMIX-ARITH return an octabyte as a function oftwo octabytes; for example, oplus (y; z) returns the sum of octabytes y and z. Divisioninputs the high half of a dividend in the global variable aux and returns the remainderin aux .

hSubroutines 28 i �extern octa oplus ARGS((octa y;octa z)); =� unsigned y + z �=extern octa ominus ARGS((octa y;octa z)); =� unsigned y � z �=extern octa incr ARGS((octa y; int delta )); =� unsigned y + Æ (Æ is signed) �=extern octa oand ARGS((octa y;octa z)); =� y ^ z �=extern octa shift left ARGS((octa y; int s)); =� y � s, 0 � s � 64 �=extern octa shift right ARGS((octa y; int s; int uns )); =� y � s, signed if :uns �=extern octa omult ARGS((octa y;octa z)); =� unsigned (aux ; x) = y � z �=extern octa odiv ARGS((octa x;octa y;octa z));

=� unsigned (x; y)=z; aux = (x; y) mod z �=

See also sections 41, 42, 44, 45, 47, 48, 49, 50, 52, 55, 57, 59, 73, and 74.

This code is used in section 136.

ARGS=macro ( ), x31.aux : octa, MMIX-ARITH x4.incr : octa ( ), MMIX-ARITH x6.neg one : octa, MMIX-ARITH x4.oand : octa ( ),MMIX-ARITH x25.

odiv : octa ( ), MMIX-ARITH x13.

ominus : octa ( ),MMIX-ARITH x5.

omult : octa ( ),MMIX-ARITH x8.

oplus : octa ( ), MMIX-ARITH x5.over ow : bool,MMIX-ARITH x4.

shift left : octa ( ),MMIX-ARITH x7.

shift right : octa ( ),MMIX-ARITH x7.

zero octa : octa,MMIX-ARITH x4.

Page 449: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BASIC DATA TYPES 442

29. Here's a rudimentary check to see if arithmetic is in trouble.

h Initialize everything 29 i �acc = shift left (neg one ; 1);if (acc :h 6= #

ffffffff) panic("Type tetra is not implemented correctly");

See also sections 32, 61, 71, 84, 91, and 140.

This code is used in section 136.

30. Future versions of this program will work with symbols formed from Unicodecharacters, but the present code limits itself to an 8-bit subset. The type Char isde�ned here in order to ease the later transition: At present, Char is the same aschar, but Char can be changed to a 16-bit type in the Unicode version.Other changes will also be necessary when the transition to Unicode is made; for

example, some calls of fprintf will become calls of fwprintf , and some occurrences of%s will become %ls in print formats. The switchable type name Char provides atleast a �rst step towards a brighter future with Unicode.

hType de�nitions 26 i +�typedef char Char; =� bytes that will become wydes some day �=

31. While we're talking about classic systems versus future systems, we might aswell de�ne the ARGS macro, which makes function prototypes available on ANSI C

systems without making them uncompilable on older systems. Each subroutine belowis declared �rst with a prototype, then with an old-style de�nition.

hPreprocessor de�nitions 31 i �#ifdef __STDC__

#de�ne ARGS(list ) list

#else#de�ne ARGS(list ) ( )#endif

See also section 39.

This code is used in section 136.

Page 450: MMIXware - A RISC Computer for the Third Millennium - Knuth

443 MMIXAL: BASIC INPUT AND OUTPUT

32. Basic input and output. Input goes into a bu�er that is normally limitedto 72 characters. This limit can be raised, by using the -b option when invokingthe assembler; but short bu�ers will keep listings from becoming unwieldy, because asymbolic listing adds 19 characters per line.

h Initialize everything 29 i +�if (buf size < 72) buf size = 72;bu�er = (Char �) calloc(buf size + 1; sizeof (Char));lab �eld = (Char �) calloc(buf size + 1; sizeof (Char));op �eld = (Char �) calloc(buf size ; sizeof (Char));operand list = (Char �) calloc(buf size ; sizeof (Char));err buf = (Char �) calloc(buf size + 60; sizeof (Char));if (:bu�er _ :lab �eld _ :op �eld _ :operand list _ :err buf )

panic("No room for the buffers");

33. hGlobal variables 27 i +�Char �bu�er ; =� raw input of the current line �=Char �buf ptr ; =� current position within bu�er �=Char �lab �eld ; =� copy of the label �eld of the current instruction �=Char �op �eld ; =� copy of the opcode �eld of the current instruction �=Char �operand list ; =� copy of the operand �eld of the current instruction �=Char �err buf ; =� place where dynamic error messages are sprinted �=

34. hGet the next line of input text, or break if the input has ended 34 i �if (:fgets (bu�er ; buf size + 1; src �le )) break;line no++;line listed = false ;j = strlen (bu�er );if (bu�er [j � 1] � '\n') bu�er [j � 1] = '\0'; =� remove the newline �=else if (fgetc(src �le ) 6= EOF) hFlush the excess part of an overlong line 35 i;if (bu�er [0] � '#') hCheck for a line directive 38 i;buf ptr = bu�er ;

This code is used in section 136.

__STDC__, Standard C.acc : octa, x83.buf size : int, x139.calloc : void �( ), <stdlib.h>.EOF=(�1), <stdio.h>.false =0, x26.fgetc : int ( ), <stdio.h>.

fgets : char �( ), <stdio.h>.fprintf : int ( ), <stdio.h>.fwprintf : int ( ),

multibyte\function.h: tetra, x26.j: register int, x136.line listed : bool, x36.

line no : int, x36.neg one : octa, MMIX-ARITH x4.panic =macro ( ), x45.shift left : octa ( ),MMIX-ARITH x7.

src �le : FILE �, x139.strlen : int ( ), <string.h>.

Page 451: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BASIC INPUT AND OUTPUT 444

35. hFlush the excess part of an overlong line 35 i �fdo j = fgetc(src �le ); while (j 6= '\n' ^ j 6= EOF);if (:long warning given ) f

long warning given = true ;err ("*trailing characters of long input line have been dropped");fprintf (stderr ;

"(say `-b <number>' to increase the length of my input buffer)\n");g else err ("*trailing characters dropped");

g

This code is used in section 34.

36. hGlobal variables 27 i +�int cur �le ; =� index of the current �le in �lename �=int line no ; =� current position in the �le �=bool line listed ; =� have we listed the bu�er contents? �=bool long warning given ; =� have we given the hint about -b? �=

37. We keep track of source �le name and line number at all times, for errorreporting and for synchronization data in the object �le. Up to 256 di�erent source�le names can be remembered.

hGlobal variables 27 i +�char ��lename [257]; =� source �le names, including those in line directives �=int �lename count ; =� how many �lename entries have we �lled? �=

38. If the current line is a line directive, it will also be treated as a comment by theassembler.

hCheck for a line directive 38 i �ffor (p = bu�er + 1; isspace (�p); p++) ;for (j = �p++ � '0'; isdigit (�p); p++) j = 10 � j + �p� '0';for ( ; isspace (�p); p++) ;if (�p � '\"') fif (:�lename [�lename count ]) f

�lename [�lename count ] = (Char �) calloc(FILENAME_MAX + 1; sizeof (Char));if (:�lename [�lename count ])

panic("Capacity exceeded: Out of filename memory");gfor (p++; q = �lename [�lename count ]; �p ^ �p 6= '\"'; p++; q++) �q = �p;if (�p � '\"' ^ �(p� 1) 6= '\"') f =� yes, it's a line directive �=�q = '\0';for (k = 0; strcmp (�lename [k];�lename [�lename count ]) 6= 0; k++) ;if (k � �lename count ) �lename count++;cur �le = k;line no = j � 1;

gg

g

This code is used in section 34.

Page 452: MMIXware - A RISC Computer for the Third Millennium - Knuth

445 MMIXAL: BASIC INPUT AND OUTPUT

39. Archaic versions of the C library do not de�ne FILENAME_MAX.

hPreprocessor de�nitions 31 i +�#ifndef FILENAME_MAX

#de�ne FILENAME_MAX 256#endif

40. hLocal variables 40 i �register Char �p; �q; =� the place where we're currently scanning �=

See also section 65.

This code is used in section 136.

41. The next several subroutines are useful for preparing a listing of the assembledresults. In such a listing, which the user can request with a command-line option,we �ll the leftmost 19 columns with a representation of the output that has beenassembled from the input in the bu�er. Sometimes the assembled output requiresmore than one line, because we have room to output only a tetrabyte per line.The ush listing line subroutine is called when we have �nished generating one

line's worth of assembled material. Its parameter is a string to be printed between theassembled material and the bu�er contents, if the input line hasn't yet been echoed.The length of this string should be 19 minus the number of characters already printedon the current line of the listing.

hSubroutines 28 i +�void ush listing line ARGS((char �));void ush listing line (s)

char �s;fif (line listed ) fprintf (listing �le ; "\n");else f

fprintf (listing �le ; "%s%s\n"; s; bu�er );line listed = true ;

gg

ARGS=macro ( ), x31.bool: enum, x26.bu�er : Char �, x33.calloc : void �( ), <stdlib.h>.Char= char, x30.EOF=(�1), <stdio.h>.err =macro ( ), x45.

fgetc : int ( ), <stdio.h>.FILENAME_MAX=macro,<stdio.h>.

fprintf : int ( ), <stdio.h>.isdigit : int ( ), <ctype.h>.isspace : int ( ), <ctype.h>.j: register int, x136.

k: register int, x136.listing �le : FILE �, x139.panic =macro ( ), x45.src �le : FILE �, x139.stderr : FILE �, <stdio.h>.strcmp : int ( ), <string.h>.true =1, x26.

Page 453: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BASIC INPUT AND OUTPUT 446

42. Only the three least signi�cant hex digits of a location are shown on the listing,unless the other digits have changed. The following subroutine prints an extra linewhen a change needs to be shown.

hSubroutines 28 i +�void update listing loc ARGS((int));void update listing loc(k)

int k; =� the location to display, mod 4 �=focta o;

if (cur loc :h 6= listing loc :h _ ((cur loc :l � listing loc :l) & #fffff000)) f

fprintf (listing �le ; "%08x%08x:"; cur loc :h; (cur loc :l &�4) j k); ush listing line (" ");

glisting loc :h = cur loc :h; listing loc :l = (cur loc :l &�4) j k;

g

43. hGlobal variables 27 i +�octa cur loc ; =� current location of assembled output �=octa listing loc ; =� current location on the listing �=unsigned char hold buf [4]; =� assembled bytes �=unsigned char held bits ; =� which bytes of hold buf are active? �=unsigned char listing bits ; =� which of them haven't been listed yet? �=bool spec mode ; =� are we between BSPEC and ESPEC? �=tetra spec mode loc ; =� number of bytes in the current special output �=

44. When bytes are assembled, they are placed into the hold buf . More precisely, abyte assembled for a location that is j plus a multiple of 4 is placed into hold buf [j];two auxiliary variables, held bits and listing bits , are then increased by 1 � j.Furthermore, listing bits is increased by #10 � j if that byte is a future reference tobe resolved later.The bytes are held until we need to output them. The listing clear routine lists any

that have been held but not yet shown. It should be called only when listing bits 6= 0.

hSubroutines 28 i +�void listing clear ARGS((void));void listing clear ( )fregister int j; k;

for (k = 0; k < 4; k++)if (listing bits & (1� k)) break;

if (spec mode ) fprintf (listing �le ; " ");else f

update listing loc(k);fprintf (listing �le ; " ...%03x: "; (listing loc :l & #

ffc) j k);gfor (j = 0; j < 4; j++)if (listing bits & (#10� j)) fprintf (listing �le ; "xx");else if (listing bits & (1� j)) fprintf (listing �le ; "%02x"; hold buf [j]);else fprintf (listing �le ; " ");

Page 454: MMIXware - A RISC Computer for the Third Millennium - Knuth

447 MMIXAL: BASIC INPUT AND OUTPUT

ush listing line (" ");listing bits = 0;

g

ARGS=macro ( ), x31.bool: enum, x26.BSPEC=#

104, x62.ESPEC=#

105, x62.

ush listing line : void ( ), x41.fprintf : int ( ), <stdio.h>.h: tetra, x26.l: tetra, x26.

listing �le : FILE �, x139.octa= struct, x26.tetra=unsigned int, x26.

Page 455: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BASIC INPUT AND OUTPUT 448

45. Error messages are written to stderr . If the message begins with `*' it is merelya warning; if it begins with `!' it is fatal; otherwise the error is probably serious enoughto make manual correction necessary, yet it is not tragic. Errors and warnings appearalso on the optional listing �le.

#de�ne err (m)f report error (m); if (m[0] 6= '*') goto bypass ; g

#de�ne derr (m; p)f sprintf (err buf ;m; p);

report error (err buf ); if (err buf [0] 6= '*') goto bypass ; g#de�ne dderr (m; p; q)

f sprintf (err buf ;m; p; q);report error (err buf ); if (err buf [0] 6= '*') goto bypass ; g

#de�ne panic(m)f sprintf (err buf ; "!%s";m); report error (err buf ); g

#de�ne dpanic(m; p)f err buf [0] = '!'; sprintf (err buf + 1;m; p); report error (err buf ); g

hSubroutines 28 i +�void report error ARGS((char �));void report error (message )

char �message ;fif (:�lename [cur �le ]) �lename [cur �le ] = "(nofile)";if (message [0] � '*') fprintf (stderr ; "\"%s\", line %d warning: %s\n";

�lename [cur �le ]; line no ;message + 1);else if (message [0] � '!') fprintf (stderr ; "\"%s\", line %d fatal error: %s\n";

�lename [cur �le ]; line no ;message + 1);else f

fprintf (stderr ; "\"%s\", line %d: %s!\n";�lename [cur �le ]; line no ;message );err count++;

gif (listing �le ) fif (:line listed ) ush listing line ("****************** ");if (message [0] � '*')

fprintf (listing �le ; "************ warning: %s\n";message + 1);else if (message [0] � '!')

fprintf (listing �le ; "******** fatal error: %s!\n";message + 1);else fprintf (listing �le ; "********** error: %s!\n";message );

gif (message [0] � '!') exit (�2);

g

46. hGlobal variables 27 i +�int err count ; =� this many errors were found �=

47. Output to the binary obj �le occurs four bytes at a time. The bytes areassembled in small bu�ers, not output as single tetrabytes, because we want the outputto be big-endian even when the assembler is running on a little-endian machine.

#de�ne mmo write (buf )if (fwrite (buf ; 1; 4; obj �le ) 6= 4) dpanic("Can't write on %s"; obj �le name )

Page 456: MMIXware - A RISC Computer for the Third Millennium - Knuth

449 MMIXAL: BASIC INPUT AND OUTPUT

hSubroutines 28 i +�void mmo clear ARGS((void));void mmo out ARGS((void));unsigned char lop quote command [4] = fmm ; lop quote ; 0; 1g;

void mmo clear ( ) =� clears hold buf , when held bits 6= 0 �=fif (hold buf [0] � mm ) mmo write (lop quote command );mmo write (hold buf );if (listing �le ^ listing bits ) listing clear ( );held bits = 0;hold buf [0] = hold buf [1] = hold buf [2] = hold buf [3] = 0;mmo cur loc = incr (mmo cur loc ; 4); mmo cur loc :l &= �4;if (mmo line no) mmo line no++;

g

unsigned char mmo buf [4];int mmo ptr ;

void mmo out ( ) =� output the contents of mmo buf �=fif (held bits ) mmo clear ( );mmo write (mmo buf );

g

ARGS=macro ( ), x31.bypass : label, x102.cur �le : int, x36.err buf : Char �, x33.exit : void ( ), <stdlib.h>.�lename : char �[ ], x37. ush listing line : void ( ), x41.fprintf : int ( ), <stdio.h>.fwrite : size t ( ), <stdio.h>.held bits : unsigned char, x43.

hold buf : unsigned char [ ],x43.

incr : octa ( ), MMIX-ARITH x6.l: tetra, x26.line listed : bool, x36.line no : int, x36.listing bits : unsigned char,x43.

listing clear : void ( ), x44.

listing �le : FILE �, x139.lop quote =#

0, x24.mm =#

98, x22.mmo cur loc : octa, x51.mmo line no : int, x51.obj �le : FILE �, x139.obj �le name : char [ ], x139.sprintf : int ( ), <stdio.h>.stderr : FILE �, <stdio.h>.

Page 457: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BASIC INPUT AND OUTPUT 450

48. h Subroutines 28 i +�void mmo tetra ARGS((tetra));void mmo byte ARGS((unsigned char));void mmo lop ARGS((char; char; char));void mmo lopp ARGS((char;unsigned short));

void mmo tetra (t) =� output a tetrabyte �=tetra t;

fmmo buf [0] = t� 24; mmo buf [1] = (t� 16) & #

ff;mmo buf [2] = (t� 8) & #

ff; mmo buf [3] = t& #ff;

mmo out ( );g

void mmo byte (b)unsigned char b;

fmmo buf [(mmo ptr ++) & 3] = b;if (:(mmo ptr & 3)) mmo out ( );

g

void mmo lop(x; y; z) =� output a loader operation �=char x; y; z;

fmmo buf [0] = mm ; mmo buf [1] = x; mmo buf [2] = y; mmo buf [3] = z;mmo out ( );

g

void mmo lopp(x; yz ) =� output a loader operation with two-byte operand �=char x;unsigned short yz ;

fmmo buf [0] = mm ; mmo buf [1] = x; mmo buf [2] = yz � 8; mmo buf [3] = yz & #

ff;mmo out ( );

g

49. The mmo loc subroutine makes the current location in the object �le equal tocur loc .

hSubroutines 28 i +�void mmo loc ARGS((void));void mmo loc( )focta o;

if (held bits ) mmo clear ( );o = ominus (cur loc ;mmo cur loc);if (o:h � 0 ^ o:l < #

10000) fif (o:l) mmo lopp (lop skip ; o:l);

g else fif (cur loc :h& #

ffffff) fmmo lop (lop loc ; 0; 2);mmo tetra (cur loc :h);

g else mmo lop (lop loc ; cur loc :h� 24; 1);

Page 458: MMIXware - A RISC Computer for the Third Millennium - Knuth

451 MMIXAL: BASIC INPUT AND OUTPUT

mmo tetra (cur loc :l);gmmo cur loc = cur loc ;

g

50. Similarly, the mmo sync subroutine makes sure that the current �le and linenumber in the output �le agree with cur �le and line no .

hSubroutines 28 i +�void mmo sync ARGS((void));void mmo sync( )fregister int j; k;register char �p;

if (cur �le 6= mmo cur �le ) fif (�lename passed [cur �le ]) mmo lop(lop �le ; cur �le ; 0);else f

mmo lop (lop �le ; cur �le ; (strlen (�lename [cur �le ]) + 3)� 2);for (j = 0; p = �lename [cur �le ]; �p; p++; j = (j + 1) & 3) f

mmo buf [j] = �p;if (j � 3) mmo out ( );

gif (j) ffor ( ; j < 4; j++) mmo buf [j] = 0;mmo out ( );

g�lename passed [cur �le ] = 1;

gmmo cur �le = cur �le ;mmo line no = 0;

gif (line no 6= mmo line no) fif (line no � #

10000)panic("I can't deal with line numbers exceeding 65535");

mmo lopp (lop line ; line no );mmo line no = line no ;

gg

ARGS=macro ( ), x31.cur �le : int, x36.cur loc : octa, x43.�lename : char �[ ], x37.�lename passed : char [ ], x51.h: tetra, x26.held bits : unsigned char, x43.l: tetra, x26.line no : int, x36.lop �le =#

6, x24.

lop line =#7, x24.

lop loc =#1, x24.

lop skip =#2, x24.

mm =#98, x22.

mmo buf : unsigned char [ ],x47.

mmo clear : void ( ), x47.mmo cur �le : int, x51.mmo cur loc : octa, x51.

mmo line no : int, x51.mmo out : void ( ), x47.mmo ptr : int, x47.octa= struct, x26.ominus : octa ( ),MMIX-ARITH x5.

panic =macro ( ), x45.strlen : int ( ), <string.h>.tetra=unsigned int, x26.

Page 459: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: BASIC INPUT AND OUTPUT 452

51. hGlobal variables 27 i +�octa mmo cur loc ; =� current location in the object �le �=int mmo line no ; =� current line number in the mmo output so far �=int mmo cur �le ; =� index of the current �le in the mmo output so far �=char �lename passed [256]; =� has a �lename been recorded in the output? �=

52. Here is a basic subroutine that assembles k bytes starting at cur loc . The valueof k should be 1, 2, or 4, and cur loc should be a multiple of k. The x bits parametertells which bytes, if any, are part of a future reference.

hSubroutines 28 i +�void assemble ARGS((char; tetra;unsigned char));void assemble (k; dat ; x bits )

char k;tetra dat ;unsigned char x bits ;

fregister int j; jj ; l;

if (spec mode ) l = spec mode loc ;else f

l = cur loc :l;hMake sure cur loc and mmo cur loc refer to the same tetrabyte 53 i;if (:held bits ^ :(cur loc :h& #

e0000000)) mmo sync( );gfor (j = 0; j < k; j++) f

jj = (l + j) & 3;hold buf [jj ] = (dat � (8 � (k � 1� j))) & #

ff;held bits j= 1� jj ;listing bits j= 1� jj ;

glisting bits j= x bits ;if (((l + k) & 3) � 0) fif (listing �le ) listing clear ( );mmo clear ( );

gif (spec mode ) spec mode loc += k;else cur loc = incr (cur loc ; k);

g

53. hMake sure cur loc and mmo cur loc refer to the same tetrabyte 53 i �if (cur loc :h 6= mmo cur loc :h _ ((cur loc :l �mmo cur loc :l) & #

fffffffc)) mmo loc( );

This code is used in section 52.

Page 460: MMIXware - A RISC Computer for the Third Millennium - Knuth

453 MMIXAL: THE SYMBOL TABLE

54. The symbol table. Symbols are stored and retrieved by means of a ternary

search trie, following ideas of Bentley and Sedgewick. (See ACM{SIAM Symp. on

Discrete Algorithms 8 (1997), 360{369; R. Sedgewick, Algorithms in C (Reading,Mass.: Addison{Wesley, 1998), x15.4.) Each trie node stores a character, and thereare branches to subtries for the cases where a given character is less than, equal to,or greater than the character in the trie. There also is a pointer to a symbol tableentry if a symbol ends at the current node.

hType de�nitions 26 i +�typedef struct ternary trie struct funsigned short ch ; =� the (possibly wyde) character stored here �=struct ternary trie struct �left ; �mid ; �right ;

=� downward in the ternary trie �=struct sym tab struct �sym ; =� equivalents of symbols �=

g trie node;

55. We allocate trie nodes in chunks of 1000 at a time.

hSubroutines 28 i +�trie node �new trie node ARGS((void));trie node �new trie node ( )fregister trie node �t = next trie node ;

if (t � last trie node ) ft = (trie node �) calloc(1000; sizeof (trie node));if (:t) panic("Capacity exceeded: Out of trie memory");last trie node = t+ 1000;

gnext trie node = t+ 1;return t;

g

56. hGlobal variables 27 i +�trie node �trie root ; =� root of the trie �=trie node �op root ; =� root of subtrie for opcodes �=trie node �next trie node ; �last trie node ; =� allocation control �=trie node �cur pre�x ; =� root of subtrie for unquali�ed symbols �=

ARGS=macro ( ), x31.calloc : void �( ), <stdlib.h>.cur loc : octa, x43.h: tetra, x26.held bits : unsigned char, x43.hold buf : unsigned char [ ],x43.

incr : octa ( ), MMIX-ARITH x6.

l: tetra, x26.listing bits : unsigned char,x43.

listing clear : void ( ), x44.listing �le : FILE �, x139.mmo clear : void ( ), x47.mmo loc : void ( ), x49.

mmo sync : void ( ), x50.octa= struct, x26.panic =macro ( ), x45.spec mode : bool, x43.spec mode loc : tetra, x43.sym tab struct: struct, x58.tetra=unsigned int, x26.

Page 461: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: THE SYMBOL TABLE 454

57. The trie search subroutine starts at a given node of the trie and �nds a givenstring in its middle subtrie, inserting new nodes if necessary. The string ends withthe �rst nonletter or nondigit; the location of the terminating character is stored inglobal variable terminator .

#de�ne isletter (c) (isalpha (c) _ c � '_' _ c � ':' _ c > 126)

hSubroutines 28 i +�trie node �trie search ARGS((trie node �;Char �));Char �terminator ; =� where the search ended �=

trie node �trie search (t; s)trie node �t;Char �s;

fregister trie node �tt = t;register Char �p = s;

while (1) fif (:isletter (�p) ^ :isdigit (�p)) f

terminator = p; return tt ;gif (tt~mid ) f

tt = tt~mid ;while (�p 6= tt~ch ) fif (�p < tt~ch ) fif (tt~ left ) tt = tt~ left ;else f

tt~ left = new trie node ( ); tt = tt~ left ; goto store new char ;g

g else fif (tt~right ) tt = tt~right ;else f

tt~right = new trie node ( ); tt = tt~right ; goto store new char ;g

ggp++;

g else ftt~mid = new trie node ( ); tt = tt~mid ;

store new char : tt~ch = �p++;g

gg

58. Symbol table nodes hold the serial numbers and equivalents of de�ned symbols.They also hold \�xup information" for unde�ned symbols; this will allow the loaderto correct any previously assembled instructions that refer to such symbols when arethey are eventually de�ned.In the symbol table node for a de�ned symbol, the link �eld has one of the special

codes DEFINED or REGISTER or PREDEFINED, and the equiv �eld holds the de�nedvalue. The serial number is a unique identi�er for all user-de�ned symbols.

Page 462: MMIXware - A RISC Computer for the Third Millennium - Knuth

455 MMIXAL: THE SYMBOL TABLE

In the symbol table node for an unde�ned symbol, the equiv �eld is ignored. Thelink �eld points to the �rst node of �xup information; that node is, in turn, a symboltable node that might link to other �xups. The serial number in a �xup node iseither 0 or 1 or 2, meaning respectively \�xup the octabyte pointed to by equiv " or\�xup the relative address in the YZ �eld of the instruction pointed to by equiv " or\�xup the relative address in the XYZ �eld of the instruction pointed to by equiv ."

#de�ne DEFINED (sym node �) 1 =� code value for octabyte equivalents �=#de�ne REGISTER (sym node �) 2 =� code value for register-number equivalents �=#de�ne PREDEFINED (sym node �) 3 =� code value for not-yet-used equivalents �=#de�ne �x o 0 =� serial code for octabyte �xup �=#de�ne �x yz 1 =� serial code for relative �xup �=#de�ne �x xyz 2 =� serial code for JMP �xup �=

hType de�nitions 26 i +�typedef struct sym tab struct fint serial ; =� serial number of symbol; type number for �xups �=struct sym tab struct �link ; =� DEFINED status or link to �xup �=octa equiv ; =� the equivalent value �=

g sym node;

ARGS=macro ( ), x31.ch : unsigned short, x54.Char= char, x30.isalpha : int ( ), <ctype.h>.

isdigit : int ( ), <ctype.h>.left : trie node �, x54.mid : trie node �, x54.new trie node : trie node �( ),

x55.octa= struct, x26.right : trie node �, x54.trie node= struct, x54.

Page 463: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: THE SYMBOL TABLE 456

59. The allocation of new symbol table nodes proceeds in chunks, like the allocationof trie nodes. But in this case we also have the possibility of reusing old �xup nodesthat are no longer needed.

#de�ne recycle �xup (pp) pp~ link = sym avail ; sym avail = pp

hSubroutines 28 i +�sym node �new sym node ARGS((bool));sym node �new sym node (serialize )

bool serialize ; =� should the new node receive a unique serial number? �=fregister sym node �p = sym avail ;

if (p) fsym avail = p~ link ; p~ link = �; p~serial = 0; p~equiv = zero octa ;

g else fp = next sym node ;if (p � last sym node ) f

p = (sym node �) calloc(1000; sizeof (sym node));if (:p) panic("Capacity exceeded: Out of symbol memory");last sym node = p+ 1000;

gnext sym node = p+ 1;

gif (serialize ) p~serial = ++serial number ;return p;

g

60. hGlobal variables 27 i +�int serial number ;sym node �sym root ; =� root of the sym �=sym node �next sym node ; �last sym node ; =� allocation control �=sym node �sym avail ; =� stack of recycled symbol table nodes �=

61. We initialize the trie by inserting all the prede�ned symbols. Opcodes are giventhe pre�x ^, to distinguish them from ordinary symbols; this character nicely dividesuppercase letters from lowercase letters.

h Initialize everything 29 i +�trie root = new trie node ( );cur pre�x = trie root ;op root = new trie node ( );trie root~mid = op root ;trie root~ch = ':';op root~ch = '^';hPut the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 i;hPut the special register names into the trie 66 i;hPut other prede�ned symbols into the trie 70 i;

Page 464: MMIXware - A RISC Computer for the Third Millennium - Knuth

457 MMIXAL: THE SYMBOL TABLE

62. Most of the assembly work can be table driven, based on bits that are storedas the \equivalents" of opcode symbols like ^ADD.

#de�ne rel addr bit #1 =� is YZ or XYZ relative? �=

#de�ne immed bit #2 =� should opcode be immediate if Z or YZ not register? �=

#de�ne zar bit #4 =� should register status of Z be ignored? �=

#de�ne zr bit #8 =� must Z be a register? �=

#de�ne yar bit #10 =� should register status of Y be ignored? �=

#de�ne yr bit #20 =� must Y be a register? �=

#de�ne xar bit #40 =� should register status of X be ignored? �=

#de�ne xr bit #80 =� must X be a register? �=

#de�ne yzar bit #100 =� should register status of YZ be ignored? �=

#de�ne yzr bit #200 =� must YZ be a register? �=

#de�ne xyzar bit #400 =� should register status of XYZ be ignored? �=

#de�ne xyzr bit #800 =� must XYZ be a register? �=

#de�ne one arg bit #1000 =� is it OK to have zero or one operand? �=

#de�ne two arg bit #2000 =� is it OK to have exactly two operands? �=

#de�ne three arg bit #4000 =� is it OK to have exactly three operands? �=

#de�ne many arg bit #8000 =� is it OK to have more than three operands? �=

#de�ne align bits #30000 =� how much alignment: byte, wyde, tetra, or octa? �=

#de�ne no label bit #40000 =� should the label be blank? �=

#de�ne mem bit #80000 =� must YZ be a memory reference? �=

#de�ne spec bit #100000 =� is this opcode allowed in SPEC mode? �=

hType de�nitions 26 i +�typedef struct fchar �name ; =� symbolic opcode �=short code ; =� numeric opcode �=int bits ; =� treatment of operands �=

g op spec;

typedef enum fSET = #

100; IS; LOC; PREFIX; BSPEC; ESPEC; GREG; LOCAL;BYTE; WYDE; TETRA; OCTA

g pseudo op;

ARGS=macro ( ), x31.bool: enum, x26.calloc : void �( ), <stdlib.h>.ch : unsigned short, x54.cur pre�x : trie node �, x56.equiv : octa, x58.

link : sym node �, x58.mid : trie node �, x54.new trie node : trie node �( ),x55.

op root : trie node �, x56.panic =macro ( ), x45.

serial : int, x58.sym node= struct, x58.trie root : trie node �, x56.zero octa : octa,MMIX-ARITH x4.

Page 465: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: THE SYMBOL TABLE 458

63. hGlobal variables 27 i +�op spec op init table [ ] = ff"TRAP";#00;#27554g; f"FCMP";#01;#240a8g;f"FUN";#02;#240a8g; f"FEQL";#03;#240a8g;f"FADD";#04;#240a8g; f"FIX";#05;#26288g;f"FSUB";#06;#240a8g; f"FIXU";#07;#26288g;f"FLOT";#08;#26282g; f"FLOTU";#0a;#26282g;f"SFLOT";#0c;#26282g; f"SFLOTU";#0e;#26282g;f"FMUL";#10;#240a8g; f"FCMPE";#11;#240a8g;f"FUNE";#12;#240a8g; f"FEQLE";#13;#240a8g;f"FDIV";#14;#240a8g; f"FSQRT";#15;#26288g;f"FREM";#16;#240a8g; f"FINT";#17;#26288g;f"MUL";#18;#240a2g; f"MULU";#1a;#240a2g;f"DIV";#1c;#240a2g; f"DIVU";#1e;#240a2g;f"ADD";#20;#240a2g; f"ADDU";#22;#240a2g;f"SUB";#24;#240a2g; f"SUBU";#26;#240a2g;f"2ADDU";#28;#240a2g; f"4ADDU";#2a;#240a2g;f"8ADDU";#2c;#240a2g; f"16ADDU";#2e;#240a2g;f"CMP";#30;#240a2g; f"CMPU";#32;#240a2g;f"NEG";#34;#26082g; f"NEGU";#36;#26082g;f"SL";#38;#240a2g; f"SLU";#3a;#240a2g;f"SR";#3c;#240a2g; f"SRU";#3e;#240a2g;f"BN";#40;#22081g; f"BZ";#42;#22081g;f"BP";#44;#22081g; f"BOD";#46;#22081g;f"BNN";#48;#22081g; f"BNZ";#4a;#22081g;f"BNP";#4c;#22081g; f"BEV";#4e;#22081g;f"PBN";#50;#22081g; f"PBZ";#52;#22081g;f"PBP";#54;#22081g; f"PBOD";#56;#22081g;f"PBNN";#58;#22081g; f"PBNZ";#5a;#22081g;f"PBNP";#5c;#22081g; f"PBEV";#5e;#22081g;f"CSN";#60;#240a2g; f"CSZ";#62;#240a2g;f"CSP";#64;#240a2g; f"CSOD";#66;#240a2g;f"CSNN";#68;#240a2g; f"CSNZ";#6a;#240a2g;f"CSNP";#6c;#240a2g; f"CSEV";#6e;#240a2g;f"ZSN";#70;#240a2g; f"ZSZ";#72;#240a2g;f"ZSP";#74;#240a2g; f"ZSOD";#76;#240a2g;f"ZSNN";#78;#240a2g; f"ZSNZ";#7a;#240a2g;f"ZSNP";#7c;#240a2g; f"ZSEV";#7e;#240a2g;f"LDB";#80;#a60a2g; f"LDBU";#82;#a60a2g;f"LDW";#84;#a60a2g; f"LDWU";#86;#a60a2g;f"LDT";#88;#a60a2g; f"LDTU";#8a;#a60a2g;f"LDO";#8c;#a60a2g; f"LDOU";#8e;#a60a2g;f"LDSF";#90;#a60a2g; f"LDHT";#92;#a60a2g;f"CSWAP";#94;#a60a2g; f"LDUNC";#96;#a60a2g;f"LDVTS";#98;#a60a2g; f"PRELD";#9a;#a6022g;f"PREGO";#9c;#a6022g; f"GO";#9e;#a60a2g;f"STB";#a0;#a60a2g; f"STBU";#a2;#a60a2g;f"STW";#a4;#a60a2g; f"STWU";#a6;#a60a2g;f"STT";#a8;#a60a2g; f"STTU";#aa;#a60a2g;

Page 466: MMIXware - A RISC Computer for the Third Millennium - Knuth

459 MMIXAL: THE SYMBOL TABLE

f"STO";#ac;#a60a2g; f"STOU";#ae;#a60a2g;f"STSF";#b0;#a60a2g; f"STHT";#b2;#a60a2g;f"STCO";#b4;#a6022g; f"STUNC";#b6;#a60a2g;f"SYNCD";#b8;#a6022g; f"PREST";#ba;#a6022g;f"SYNCID";#bc;#a6022g; f"PUSHGO";#be;#a6062g;f"OR";#c0;#240a2g; f"ORN";#c2;#240a2g;f"NOR";#c4;#240a2g; f"XOR";#c6;#240a2g;f"AND";#c8;#240a2g; f"ANDN";#ca;#240a2g;f"NAND";#cc;#240a2g; f"NXOR";#ce;#240a2g;f"BDIF";#d0;#240a2g; f"WDIF";#d2;#240a2g;f"TDIF";#d4;#240a2g; f"ODIF";#d6;#240a2g;f"MUX";#d8;#240a2g; f"SADD";#da;#240a2g;f"MOR";#dc;#240a2g; f"MXOR";#de;#240a2g;f"SETH";#e0;#22080g; f"SETMH";#e1;#22080g;f"SETML";#e2;#22080g; f"SETL";#e3;#22080g;f"INCH";#e4;#22080g; f"INCMH";#e5;#22080g;f"INCML";#e6;#22080g; f"INCL";#e7;#22080g;f"ORH";#e8;#22080g; f"ORMH";#e9;#22080g;f"ORML";#ea;#22080g; f"ORL";#eb;#22080g;f"ANDNH";#ec;#22080g; f"ANDNMH";#ed;#22080g;f"ANDNML";#ee;#22080g; f"ANDNL";#ef;#22080g;f"JMP";#f0;#21001g; f"PUSHJ";#f2;#22041g;f"GETA";#f4;#22081g; f"PUT";#f6;#22002g;f"POP";#f8;#23000g; f"RESUME";#f9;#21000g;f"SAVE";#fa;#22080g; f"UNSAVE";#fb;#23a00g;f"SYNC";#fc;#21000g; f"SWYM";#fd;#27554g;f"GET";#fe;#22080g; f"TRIP";#ff;#27554g;f"SET"; SET;#22180g; f"LDA";#22;#a60a2g;f"IS"; IS;#101400g; f"LOC"; LOC;#1400g;f"PREFIX"; PREFIX;#141000g;f"BYTE"; BYTE;#10f000g; f"WYDE"; WYDE;#11f000g;f"TETRA"; TETRA;#12f000g; f"OCTA"; OCTA;#13f000g;f"BSPEC"; BSPEC;#41400g; f"ESPEC"; ESPEC;#141000g;f"GREG"; GREG;#101000g; f"LOCAL"; LOCAL;#141800gg;int op init size ; =� the number of items in op init table �=

BSPEC=#104, x62.

BYTE=#108, x62.

ESPEC=#105, x62.

GREG=#106, x62.

IS=#101, x62.

LOC=#102, x62.

LOCAL=#107, x62.

OCTA=#10b, x62.

op spec= struct, x62.

PREFIX=#103, x62.

SET=#100, x62.

TETRA=#10a, x62.

WYDE=#109, x62.

Page 467: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: THE SYMBOL TABLE 460

64. hPut the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 i �op init size = (sizeof op init table )=sizeof (op spec);for (j = 0; j < op init size ; j++) f

tt = trie search (op root ; op init table [j]:name );pp = tt~sym = new sym node (false );pp~ link = PREDEFINED;pp~equiv :h = op init table [j]:code ; pp~equiv :l = op init table [j]:bits ;

g

This code is used in section 61.

65. hLocal variables 40 i +�register trie node �tt ;register sym node �pp ; �qq ;

66. hPut the special register names into the trie 66 i �for (j = 0; j < 32; j++) f

tt = trie search (trie root ; special name [j]);pp = tt~sym = new sym node (false );pp~ link = PREDEFINED;pp~equiv :l = j;

g

This code is used in section 61.

67. hGlobal variables 27 i +�char �special name [32] = f"rB"; "rD"; "rE"; "rH"; "rJ"; "rM"; "rR"; "rBB"; "rC"; "rN";

"rO"; "rS"; "rI"; "rT"; "rTT"; "rK"; "rQ"; "rU"; "rV"; "rG"; "rL"; "rA"; "rF"; "rP";"rW"; "rX"; "rY"; "rZ"; "rWW"; "rXX"; "rYY"; "rZZ"g;

68. hType de�nitions 26 i +�typedef struct fchar �name ;tetra h; l;

g predef spec;

69. hGlobal variables 27 i +�predef spec predefs [ ] = ff"ROUND_CURRENT"; 0; 0g; f"ROUND_OFF"; 0; 1g; f"ROUND_UP"; 0;

2g; f"ROUND_DOWN"; 0; 3g; f"ROUND_NEAR"; 0; 4g;f"Inf";#7ff00000; 0g;f"Data_Segment";#20000000; 0g; f"Pool_Segment";#40000000; 0g; f"Stack_Segment";

#60000000; 0g;

f"D_BIT"; 0;#80g; f"V_BIT"; 0;#40g; f"W_BIT"; 0;#20g; f"I_BIT"; 0;#10g; f"O_BIT"; 0;#08g; f"U_BIT"; 0;#04g; f"Z_BIT"; 0;#02g; f"X_BIT"; 0;#01g;

f"D_Handler"; 0;#10g; f"V_Handler"; 0;#20g; f"W_Handler"; 0;#30g; f"I_Handler"; 0;#40g; f"O_Handler"; 0;#50g; f"U_Handler"; 0;#60g; f"Z_Handler"; 0;#70g;f"X_Handler"; 0;#80g;

f"StdIn"; 0; 0g; f"StdOut"; 0; 1g; f"StdErr"; 0; 2g;f"TextRead"; 0; 0g; f"TextWrite"; 0; 1g; f"BinaryRead"; 0; 2g; f"BinaryWrite"; 0; 3g;

f"BinaryReadWrite"; 0; 4g;f"Halt"; 0; 0g; f"Fopen"; 0; 1g; f"Fclose"; 0; 2g; f"Fread"; 0; 3g; f"Fgets"; 0; 4g;

f"Fgetws"; 0; 5g; f"Fwrite"; 0; 6g; f"Fputs"; 0; 7g; f"Fputws"; 0; 8g; f"Fseek"; 0; 9g;f"Ftell"; 0; 10gg;

int predef size ;

Page 468: MMIXware - A RISC Computer for the Third Millennium - Knuth

461 MMIXAL: THE SYMBOL TABLE

70. hPut other prede�ned symbols into the trie 70 i �predef size = (sizeof predefs )=sizeof (predef spec);for (j = 0; j < predef size ; j++) f

tt = trie search (trie root ; predefs [j]:name );pp = tt~sym = new sym node (false );pp~ link = PREDEFINED;pp~equiv :h = predefs [j]:h; pp~equiv :l = predefs [j]:l;

g

This code is used in section 61.

71. We place Main into the trie at the beginning of assembly, so that it will showup as an unde�ned symbol if the user speci�es no starting point.

h Initialize everything 29 i +�trie search (trie root ; "Main")~sym = new sym node (true );

bits : int, x62.code : short, x62.equiv : octa, x58.false =0, x26.h: tetra, x26.j: register int, x136.l: tetra, x26.link : sym node �, x58.

name : char �, x62.new sym node : sym node�( ), x59.

op init size : int, x63.op init table : op spec [ ], x63.op root : trie node �, x56.op spec= struct, x62.PREDEFINED=macro, x58.

sym : sym node �, x54.sym node= struct, x58.tetra=unsigned int, x26.trie node= struct, x54.trie root : trie node �, x56.trie search : trie node �( ),x57.

true =1, x26.

Page 469: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: THE SYMBOL TABLE 462

72. At the end of assembly we traverse the entire symbol table, visiting each symbolin lexicographic order and transmitting the trie structure to the output �le. We detectany unde�ned future references at this time.The order of traversal has a simple recursive pattern: To traverse the subtrie rooted

at t, wetraverse t~ left , if the left subtrie is nonempty;visit t~sym , if this symbol table entry is present;traverse t~mid , if the middle subtrie is nonempty;traverse t~right , if the right subtrie is nonempty.

This pattern leads to a compact representation in the mmo �le, usually requiring fewerthan two bytes per trie node plus the bytes needed to encode the equivalents andserial numbers. Each node of the trie is encoded as a \master byte" followed by theencodings of the left subtrie, character, equivalent, middle subtrie, and right subtrie.The master byte is the sum of

#80, if the character occupies two bytes instead of one;#40, if the left subtrie is nonempty;#20, if the middle subtrie is nonempty;#10, if the right subtrie is nonempty;#01 to #08, if the symbol's equivalent is one to eight bytes long;#09 to #0e, if the symbol's equivalent is 261 plus one to six bytes;#0f, if the symbol's equivalent is $0 plus one byte;

the character is omitted if the middle subtrie and the equivalent are both empty. The\equivalent" of an unde�ned symbol is zero, but stated as two bytes long. Symbolequivalents are followed by the serial number, represented as a sequence of one ormore bytes in radix 128; the �nal byte of the serial number is tagged by adding 128.(Thus, serial number 214 � 1 is encoded as #7fff; serial number 214 is #010080.)

73. First we prune the trie by removing all prede�ned symbols that the user didnot rede�ne.

hSubroutines 28 i +�trie node �prune ARGS((trie node �));trie node �prune (t)

trie node �t;fregister int useful = 0;

if (t~sym ) fif (t~sym~serial ) useful = 1;else t~sym = �;

gif (t~ left ) f

t~ left = prune (t~ left );if (t~ left ) useful = 1;

g

Page 470: MMIXware - A RISC Computer for the Third Millennium - Knuth

463 MMIXAL: THE SYMBOL TABLE

if (t~mid ) ft~mid = prune (t~mid );if (t~mid ) useful = 1;

gif (t~right ) f

t~right = prune (t~right );if (t~right ) useful = 1;

gif (useful ) return t;else return �;

g

74. Then we output the trie by following the recursive traversal pattern.

hSubroutines 28 i +�trie node �out stab ARGS((trie node �));trie node �out stab (t)

trie node �t;fregister int m = 0; j;register sym node �pp ;

if (t~ch > #ff) m += #

80;if (t~ left ) m += #

40;if (t~mid ) m += #

20;if (t~right ) m += #

10;if (t~sym ) fif (t~sym~ link � REGISTER) m += #

f;else if (t~sym~ link � DEFINED) hEncode the length of t~sym~equiv 76 ielse if (t~sym~ link _ t~sym~serial � 1) hReport an unde�ned symbol 79 i;

gmmo byte (m);if (t~ left ) out stab (t~ left );if (m& #

2f) hVisit t and traverse t~mid 75 i;if (t~right ) out stab (t~right );

g

ARGS=macro ( ), x31.ch : unsigned short, x54.DEFINED=macro, x58.equiv : octa, x58.left : trie node �, x54.

link : sym node �, x58.mid : trie node �, x54.mmo byte : void ( ), x48.REGISTER=macro, x58.right : trie node �, x54.

serial : int, x58.sym : sym node �, x54.sym node= struct, x58.trie node= struct, x54.

Page 471: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: THE SYMBOL TABLE 464

75. A global variable called sym buf holds all characters on middle branches to thecurrent trie node; sym ptr is the �rst currently unused character in sym buf .

hVisit t and traverse t~mid 75 i �fif (m& #

80) mmo byte (t~ch � 8);mmo byte (t~ch & #

ff);�sym ptr ++ = (m& #

80 ? '?' : t~ch ); =� Unicode? not yet �=m &= #

f; if (m ^ t~sym~ link ) fif (listing �le ) hPrint symbol sym buf and its equivalent 78 i;if (m � 15) m = 1;else if (m > 8) m �= 8;for ( ; m > 0; m��)if (m > 4) mmo byte ((t~sym~equiv :h� (8 � (m� 5))) & #

ff);else mmo byte ((t~sym~equiv :l� (8 � (m� 1))) & #

ff);for (m = 0; m < 4; m++)if (t~sym~serial < (1� (7 � (m+ 1)))) break;

for ( ; m � 0; m��) mmo byte (((t~sym~serial � (7 �m)) & #7f) + (m ? 0 : #80));

gif (t~mid ) out stab (t~mid );sym ptr ��;

g

This code is used in section 74.

76. hEncode the length of t~sym~equiv 76 i �f register tetra x;

if ((t~sym~equiv :h&#ffff0000) � #

20000000)m += 8; x = t~sym~equiv :h�

#20000000; =� data segment �=

else x = t~sym~equiv :h;if (x) m += 4; else x = t~sym~equiv :l;for (j = 1; j < 4; j++)if (x < (1� (8 � j))) break;

m += j;g

This code is used in section 74.

77. We make room for symbols up to 999 bytes long. Strictly speaking, the programshould check if this limit is exceeded; but really!

hGlobal variables 27 i +�Char sym buf [1000];Char �sym ptr ;

Page 472: MMIXware - A RISC Computer for the Third Millennium - Knuth

465 MMIXAL: THE SYMBOL TABLE

78. The initial `:' of each fully quali�ed symbol is omitted here, since most users ofMMIXAL will probably not need the PREFIX feature. One consequence of this omissionis that the one-character symbol `:' itself, which is allowed by the rules of MMIXAL, isprinted as the null string.

hPrint symbol sym buf and its equivalent 78 i �f�sym ptr = '\0';fprintf (listing �le ; " %s = "; sym buf + 1);pp = t~sym ;if (pp~ link � DEFINED) fprintf (listing �le ; "#%08x%08x"; pp~equiv :h; pp~equiv :l);else if (pp~ link � REGISTER) fprintf (listing �le ; "$%03d"; pp~equiv :l);else fprintf (listing �le ; "?");fprintf (listing �le ; " (%d)\n"; pp~serial );

g

This code is used in section 75.

79. hReport an unde�ned symbol 79 i �f�sym ptr = (m& #

80 ? '?' : t~ch ); =� Unicode? not yet �=�(sym ptr + 1) = '\0';fprintf (stderr ; "undefined symbol: %s\n"; sym buf + 1);err count++;m += 2;

g

This code is used in section 74.

80. hCheck and output the trie 80 i �op root~mid = �; =� annihilate all the opcodes �=prune (trie root );sym ptr = sym buf ;if (listing �le ) fprintf (listing �le ; "\nSymbol table:\n");mmo lop (lop stab ; 0; 0);out stab(trie root );while (mmo ptr & 3) mmo byte (0);mmo lopp (lop end ;mmo ptr � 2);

This code is used in section 142.

ch : unsigned short, x54.Char= char, x30.DEFINED=macro, x58.equiv : octa, x58.err count : int, x46.fprintf : int ( ), <stdio.h>.h: tetra, x26.j: register int, x74.l: tetra, x26.link : sym node �, x58.listing �le : FILE �, x139.

lop end =#c, x24.

lop stab =#b, x24.

m: register int, x74.mid : trie node �, x54.mmo byte : void ( ), x48.mmo lop : void ( ), x48.mmo lopp : void ( ), x48.mmo ptr : int, x47.op root : trie node �, x56.out stab : trie node �( ), x74.

pp : register sym node �,x74.

prune : trie node �( ), x73.REGISTER=macro, x58.serial : int, x58.stderr : FILE �, <stdio.h>.sym : sym node �, x54.t: trie node �, x74.tetra=unsigned int, x26.trie root : trie node �, x56.

Page 473: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: EXPRESSIONS 466

81. Expressions. The most intricate part of the assembly process is the taskof scanning and evaluating expressions in the operand �eld. Fortunately, MMIXAL'sexpressions have a simple structure that can be handled easily with a stack-basedapproach.Two stacks hold pending data as the operand �eld is scanned and evaluated. The

op stack contains operators that have not yet been performed; the val stack containsvalues that have not yet been used. After an entire operand list has been scanned,the op stack will be empty and the val stack will hold the operand values needed toassemble the current instruction.

82. Entries on op stack have one of the constant values de�ned here, and they haveone of the precedence levels de�ned here.Entries on val stack have equiv , link , and status �elds; the link points to a trie

node if the expression is a symbol that has not yet been subjected to any operations.

hType de�nitions 26 i +�typedef enum f

negate ; serialize ; complement ; registerize ; inner lp ;plus ;minus ; times ; over ; frac ;mod ; shl ; shr ; and ; or ; xor ;outer lp ; outer rp ; inner rp

g stack op;typedef enum f

zero ;weak ; strong ; unaryg prec;typedef enum f

pure ; reg val ; unde�nedg stat;typedef struct focta equiv ; =� current value �=trie node �link ; =� trie reference for symbol �=stat status ; =� pure , reg val , or unde�ned �=

g val node;

83. #de�ne top op op stack [op ptr � 1] =� top entry on the operator stack �=#de�ne top val val stack [val ptr � 1] =� top entry on the value stack �=#de�ne next val val stack [val ptr � 2] =� next-to-top entry of the value stack �=

hGlobal variables 27 i +�stack op �op stack ; =� stack for pending operators �=int op ptr ; =� number of items on op stack �=val node �val stack ; =� stack for pending operands �=int val ptr ; =� number of items on val stack �=prec precedence [ ] = funary ; unary ; unary ; unary ; zero ;

weak ;weak ; strong ; strong ; strong ; strong ; strong ; strong ; strong ;weak ;weak ;zero ; zero ; zerog; =� precedences of the respective stack op values �=

stack op rt op ; =� newly scanned operator �=octa acc ; =� temporary accumulator �=

Page 474: MMIXware - A RISC Computer for the Third Millennium - Knuth

467 MMIXAL: EXPRESSIONS

84. h Initialize everything 29 i +�op stack = (stack op �) calloc(buf size ; sizeof (stack op));val stack = (val node �) calloc(buf size ; sizeof (val node));if (:op stack _ :val stack ) panic("No room for the stacks");

85. The operand �eld of an instruction will have been copied into a separate Chararray called operand list when we reach this part of the program.

hScan the operand �eld 85 i �p = operand list ;val ptr = 0; =� val stack is empty �=op stack [0] = outer lp ; op ptr = 1;

=� op stack contains an \outer left parenthesis" �=while (1) fh Scan opening tokens until putting something on val stack 86 i;

scan close : hScan a binary operator or closing token, rt op 97 i;while (precedence [top op ] � precedence [rt op ])hPerform the top operation on op stack 98 i;

hold op : op stack [op ptr ++] = rt op ;goperands done :

This code is used in section 102.

buf size : int, x139.calloc : void �( ), <stdlib.h>.octa= struct, x26.

operand list : Char �, x33.p: register Char �, x40.

panic =macro ( ), x45.trie node= struct, x54.

Page 475: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: EXPRESSIONS 468

86. A comment that follows an empty operand list needs to be detected here.

hScan opening tokens until putting something on val stack 86 i �scan open : if (isletter (�p)) h Scan a symbol 87 ielse if (isdigit (�p)) fif (�(p+ 1) � 'F') h Scan a forward local 88 ielse if (�(p+ 1) � 'B') hScan a backward local 89 ielse h Scan a decimal constant 94 i;

g else switch (�p++) fcase '#': h Scan a hexadecimal constant 95 i; break;case '\'': h Scan a character constant 92 i; break;case '\"': h Scan a string constant 93 i; break;case '@': h Scan the current location 96 i; break;case '-': op stack [op ptr ++] = negate ;case '+': goto scan open ;case '&': op stack [op ptr ++] = serialize ; goto scan open ;case '~': op stack [op ptr ++] = complement ; goto scan open ;case '$': op stack [op ptr ++] = registerize ; goto scan open ;case '(': op stack [op ptr ++] = inner lp ; goto scan open ;default:if (p � operand list + 1) f =� treat operand list as empty �=

operand list [0] = '0'; operand list [1] = '\0'; p = operand list ;goto scan open ;

gif (�(p� 1)) derr ("syntax error at character `%c'"; �(p� 1));derr ("syntax error after character `%c'"; �(p� 2));

g

This code is used in section 85.

87. h Scan a symbol 87 i �fif (�p � ':') tt = trie search (trie root ; p+ 1);else tt = trie search (cur pre�x ; p);p = terminator ;

symbol found : val ptr ++;pp = tt~sym ;if (:pp ) pp = tt~sym = new sym node (true );top val :link = tt ; top val :equiv = pp~equiv ;if (pp~ link � PREDEFINED) pp~ link = DEFINED;top val :status = (pp~ link � DEFINED ? pure : pp~ link � REGISTER ? reg val : unde�ned );

g

This code is used in section 86.

Page 476: MMIXware - A RISC Computer for the Third Millennium - Knuth

469 MMIXAL: EXPRESSIONS

88. h Scan a forward local 88 i �f

tt = &forward local host [�p� '0']; p += 2; goto symbol found ;g

This code is used in section 86.

89. h Scan a backward local 89 i �f

tt = &backward local host [�p� '0']; p += 2; goto symbol found ;g

This code is used in section 86.

90. Statically allocated variables forward local host [j] and backward local host [j]masquerade as nodes of the trie.

hGlobal variables 27 i +�trie node forward local host [10]; backward local host [10];sym node forward local [10]; backward local [10];

91. Initially 0H, 1H, : : : , 9H are de�ned to be zero.

h Initialize everything 29 i +�for (j = 0; j < 10; j++) f

forward local host [j]:sym = &forward local [j];backward local host [j]:sym = &backward local [j];backward local [j]:link = DEFINED;

g

complement =2, x82.cur pre�x : trie node �, x56.DEFINED=macro, x58.derr =macro ( ), x45.equiv : octa, x82.equiv : octa, x58.inner lp =4, x82.isdigit : int ( ), <ctype.h>.isletter =macro ( ), x57.j: register int, x136.link : trie node �, x82.link : sym node �, x58.negate =0, x82.new sym node : sym node

�( ), x59.op ptr : int, x83.op stack : stack op �, x83.operand list : Char �, x33.p: register Char �, x40.pp : register sym node �,x65.

PREDEFINED=macro, x58.pure =0, x82.reg val =1, x82.REGISTER=macro, x58.registerize =3, x82.serialize =1, x82.status : stat, x82.

sym : sym node �, x54.sym node= struct, x58.terminator : Char �, x57.top val =macro, x83.trie node= struct, x54.trie root : trie node �, x56.trie search : trie node �( ),x57.

true =1, x26.tt : register trie node �, x65.unde�ned =2, x82.val ptr : int, x83.val stack : val node �, x83.

Page 477: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: EXPRESSIONS 470

92. We have already checked to make sure that the character constant is legal.

hScan a character constant 92 i �acc :h = 0; acc :l = �p;p += 2;goto constant found ;

This code is used in section 86.

93. h Scan a string constant 93 i �acc :h = 0; acc :l = �p;if (�p � '\"') f

p++;acc :l = 0;err ("*null string is treated as zero");

g else if (�(p+ 1) � '\"') p += 2;else �p = '\"'; ���p = ',';goto constant found ;

This code is used in section 86.

94. h Scan a decimal constant 94 i �acc :h = 0; acc :l = �p� '0';for (p++; isdigit (�p); p++) f

acc = oplus (acc ; shift left (acc ; 2));acc = incr (shift left (acc ; 1); �p� '0');

gconstant found : val ptr ++;

top val :link = �;top val :equiv = acc ;top val :status = pure ;

This code is used in section 86.

95. h Scan a hexadecimal constant 95 i �if (:isxdigit (�p)) err ("illegal hexadecimal constant");acc :h = acc :l = 0;for ( ; isxdigit (�p); p++) f

acc = incr (shift left (acc ; 4); �p� '0');if (�p � 'a') acc = incr (acc ; '0' � 'a' + 10);else if (�p � 'A') acc = incr (acc ; '0' � 'A' + 10);

ggoto constant found ;

This code is used in section 86.

96. h Scan the current location 96 i �acc = cur loc ;goto constant found ;

This code is used in section 86.

97. h Scan a binary operator or closing token, rt op 97 i �switch (�p++) fcase '+': rt op = plus ; break;case '-': rt op = minus ; break;

Page 478: MMIXware - A RISC Computer for the Third Millennium - Knuth

471 MMIXAL: EXPRESSIONS

case '*': rt op = times ; break;case '/': if (�p 6= '/') rt op = over ;else p++; rt op = frac ; break;

case '%': rt op = mod ; break;case '<': rt op = shl ; goto sh check ;case '>': rt op = shr ;sh check : p++; if (�(p� 1) � �(p� 2)) break;

derr ("syntax error at `%c'"; �(p� 2));case '&': rt op = and ; break;case '|': rt op = or ; break;case '^': rt op = xor ; break;case ')': rt op = inner rp ; break;case '\0': case ',': rt op = outer rp ; break;default: derr ("syntax error at `%c'"; �(p� 1));g

This code is used in section 85.

98. hPerform the top operation on op stack 98 i �switch (op stack [��op ptr ]) fcase inner lp : if (rt op � inner rp ) goto scan close ;

err ("*missing right parenthesis"); break;case outer lp : if (rt op � outer rp) f

if (top val :status � reg val ^ (top val :equiv :l > #ff _ top val :equiv :h)) f

err ("*register number too large, will be reduced mod 256");top val :equiv :h = 0; top val :equiv :l &= #

ff;gif (:�(p� 1)) goto operands done ;else rt op = outer lp ; goto hold op ; =� comma �=

g else fop ptr ++; err ("*missing left parenthesis");goto scan close ;

ghCases for unary operators 100 ihCases for binary operators 99 ig

This code is used in section 85.

acc : octa, x83.and =13, x82.cur loc : octa, x43.derr =macro ( ), x45.equiv : octa, x82.err =macro ( ), x45.frac =9, x82.h: tetra, x26.hold op : label, x85.incr : octa ( ), MMIX-ARITH x6.inner lp =4, x82.inner rp =18, x82.isdigit : int ( ), <ctype.h>.isxdigit : int ( ), <ctype.h>.

l: tetra, x26.link : trie node �, x82.minus =6, x82.mod =10, x82.op ptr : int, x83.op stack : stack op �, x83.operands done : label, x85.oplus : octa ( ), MMIX-ARITH x5.or =14, x82.outer lp =16, x82.outer rp =17, x82.over =8, x82.p: register Char �, x40.plus =5, x82.

pure =0, x82.reg val =1, x82.rt op : stack op, x83.scan close : label, x85.shift left : octa ( ),MMIX-ARITH x7.

shl =11, x82.shr =12, x82.status : stat, x82.times =7, x82.top val =macro, x83.val ptr : int, x83.xor =15, x82.

Page 479: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: EXPRESSIONS 472

99. Now we come to the part where equivalents are changed by unary or binaryoperators found in the expression being scanned.The most typical operator, and in some ways the fussiest one to deal with, is binary

addition. Once we've written the code for this case, the other cases almost take careof themselves.

hCases for binary operators 99 i �case plus : if (top val :status � unde�ned ) err ("cannot add an undefined quantity");if (next val :status � unde�ned ) err ("cannot add to an undefined quantity");if (top val :status � reg val ^ next val :status � reg val )

err ("cannot add two register numbers");next val :equiv = oplus (next val :equiv ; top val :equiv );

�n bin : next val :status = (top val :status � next val :status ? pure : reg val ); val ptr ��;delink : top val :link = �; break;

See also section 101.

This code is used in section 98.

100. #de�ne unary check (verb)if (top val :status 6= pure ) derr ("can %s pure values only"; verb)

hCases for unary operators 100 i �case negate : unary check ("negate");

top val :equiv = ominus (zero octa ; top val :equiv ); goto delink ;case complement : unary check ("complement");

top val :equiv :h = �top val :equiv :h; top val :equiv :l = �top val :equiv :l;goto delink ;

case registerize : unary check ("registerize");top val :status = reg val ; goto delink ;

case serialize : if (:top val :link ) err ("can take serial number of symbol only");top val :equiv :h = 0; top val :equiv :l = top val :link~sym~serial ;top val :status = pure ; goto delink ;

This code is used in section 98.

101. #de�ne binary check (verb) if (top val :status 6= pure _ next val :status 6= pure )derr ("can %s pure values only"; verb)

hCases for binary operators 99 i +�case minus : if (top val :status � unde�ned )

err ("cannot subtract an undefined quantity");if (next val :status � unde�ned )

err ("cannot subtract from an undefined quantity");if (top val :status � reg val ^ next val :status 6= reg val )

err ("cannot subtract register number from pure value");next val :equiv = ominus (next val :equiv ; top val :equiv ); goto �n bin ;

case times : binary check ("multiply");next val :equiv = omult (next val :equiv ; top val :equiv ); goto �n bin ;

case over : case mod : binary check ("divide");if (top val :equiv :l � 0 ^ top val :equiv :h � 0) err ("*division by zero");next val :equiv = odiv (zero octa ;next val :equiv ; top val :equiv );if (op stack [op ptr ] � mod ) next val :equiv = aux ;goto �n bin ;

Page 480: MMIXware - A RISC Computer for the Third Millennium - Knuth

473 MMIXAL: EXPRESSIONS

case frac : binary check ("compute a ratio of");if (next val :equiv :h � top val :equiv :h ^ (next val :equiv :l �

top val :equiv :l _ next val :equiv :h > top val :equiv :h)) err ("*illegal fraction");next val :equiv = odiv (next val :equiv ; zero octa ; top val :equiv ); goto �n bin ;

case shl : case shr : binary check ("compute a bitwise shift of");if (top val :equiv :h _ top val :equiv :l > 63) next val :equiv = zero octa ;else if (op stack [op ptr ] � shl )

next val :equiv = shift left (next val :equiv ; top val :equiv :l);else next val :equiv = shift right (next val :equiv ; top val :equiv :l; true );goto �n bin ;

case and : binary check ("compute bitwise and of");next val :equiv :h &= top val :equiv :h; next val :equiv :l &= top val :equiv :l;goto �n bin ;

case or : binary check ("compute bitwise or of");next val :equiv :h j= top val :equiv :h;next val :equiv :l j= top val :equiv :l;goto �n bin ;

case xor : binary check ("compute bitwise xor of");next val :equiv :h �= top val :equiv :h; next val :equiv :l �= top val :equiv :l;goto �n bin ;

and =13, x82.aux : octa, MMIX-ARITH x4.complement =2, x82.derr =macro ( ), x45.equiv : octa, x82.err =macro ( ), x45.frac =9, x82.h: tetra, x26.l: tetra, x26.link : trie node �, x82.minus =6, x82.mod =10, x82.negate =0, x82.next val =macro, x83.odiv : octa ( ), MMIX-ARITH x13.ominus : octa ( ),

MMIX-ARITH x5.omult : octa ( ),MMIX-ARITH x8.

op ptr : int, x83.op stack : stack op �, x83.oplus : octa ( ), MMIX-ARITH x5.or =14, x82.over =8, x82.plus =5, x82.pure =0, x82.reg val =1, x82.registerize =3, x82.serial : int, x58.serialize =1, x82.shift left : octa ( ),

MMIX-ARITH x7.shift right : octa ( ),MMIX-ARITH x7.

shl =11, x82.shr =12, x82.status : stat, x82.sym : sym node �, x54.times =7, x82.top val =macro, x83.true =1, x26.unde�ned =2, x82.val ptr : int, x83.xor =15, x82.zero octa : octa,MMIX-ARITH x4.

Page 481: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 474

102. Assembling an instruction. Now let's move up from the expression levelto the instruction level. We get to this part of the program at the beginning of aline, or after a semicolon at the end of an instruction earlier on the current line. Ourcurrent position in the bu�er is the value of buf ptr .

hProcess the next MMIXAL instruction or comment 102 i �p = buf ptr ; buf ptr = "";h Scan the label �eld; goto bypass if there is none 103 i;h Scan the opcode �eld; goto bypass if there is none 104 i;hCopy the operand �eld 106 i;buf ptr = p;if (spec mode ^ :(op bits & spec bit ))

derr ("cannot use `%s' in special mode"; op �eld );if ((op bits & no label bit ) ^ lab �eld [0]) f

derr ("*label field of `%s' instruction is ignored"; op �eld );lab �eld [0] = '\0';

gif (op bits & align bits ) hAlign the location pointer 107 i;h Scan the operand �eld 85 i;if (opcode � GREG) hAllocate a global register 108 i;if (lab �eld [0]) hDe�ne the label 109 i;hDo the operation 116 i;bypass :

This code is used in section 136.

103. h Scan the label �eld; goto bypass if there is none 103 i �if (:�p) goto bypass ;q = lab �eld ;if (:isspace (�p)) fif (:isdigit (�p) ^ :isletter (�p)) goto bypass ; =� comment �=for (�q++ = �p++; isdigit (�p) _ isletter (�p); p++; q++) �q = �p;if (�p ^ :isspace (�p)) derr ("label syntax error at `%c'"; �p);

g�q = '\0';if (isdigit (lab �eld [0]) ^ (lab �eld [1] 6= 'H' _ lab �eld [2]))

derr ("improper local label `%s'"; lab �eld );for (p++; isspace (�p); p++) ;

This code is used in section 102.

104. We copy the opcode �eld to a special bu�er because we might want to referto the symbolic opcode in error messages.

hScan the opcode �eld; goto bypass if there is none 104 i �q = op �eld ; while (isletter (�p) _ isdigit (�p)) �q++ = �p++; �q = '\0';if (:isspace (�p) ^ �p ^ op �eld [0]) derr ("opcode syntax error at `%c'"; �p);pp = trie search (op root ; op �eld )~sym ;if (:pp ) fif (op �eld [0]) derr ("unknown operation code `%s'"; op �eld );if (lab �eld [0]) derr ("*no opcode; label `%s' will be ignored"; lab �eld );goto bypass ;

g

Page 482: MMIXware - A RISC Computer for the Third Millennium - Knuth

475 MMIXAL: ASSEMBLING AN INSTRUCTION

opcode = pp~equiv :h; op bits = pp~equiv :l;while (isspace (�p)) p++;

This code is used in section 102.

105. hGlobal variables 27 i +�tetra opcode ; =� numeric code for MMIX operation or MMIXAL pseudo-op �=tetra op bits ; =� ags describing an operator's special characteristics �=

106. We copy the operand �eld to a special bu�er so that we can change stringconstants while scanning them later.

hCopy the operand �eld 106 i �q = operand list ;while (�p) fif (�p � ';') break;if (�p � '\'') f�q++ = �p++;if (:�p) err ("incomplete character constant");�q++ = �p++;if (�p 6= '\'') err ("illegal character constant");

g else if (�p � '\"') ffor (�q++ = �p++; �p ^ �p 6= '\"'; p++; q++) �q = �p;if (:�p) err ("incomplete string constant");

g�q++ = �p++;if (isspace (�p)) break;

gwhile (isspace (�p)) p++;if (�p � ';') p++;else p = ""; =� if not followed by semicolon, rest of the line is a comment �=if (q � operand list ) �q++ = '0'; =� change empty operand �eld to `0' �=�q = '\0';

This code is used in section 102.

align bits =#30000, x62.

buf ptr : Char �, x33.derr =macro ( ), x45.equiv : octa, x58.err =macro ( ), x45.GREG=#

106, x62.h: tetra, x26.isdigit : int ( ), <ctype.h>.isletter =macro ( ), x57.

isspace : int ( ), <ctype.h>.l: tetra, x26.lab �eld : Char �, x33.no label bit =#

40000, x62.op �eld : Char �, x33.op root : trie node �, x56.operand list : Char �, x33.p: register Char �, x40.pp : register sym node �,

x65.q: register Char �, x40.spec bit =#

100000, x62.spec mode : bool, x43.sym : sym node �, x54.tetra=unsigned int, x26.trie search : trie node �( ),x57.

Page 483: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 476

107. It is important to do the alignment in this step before de�ning the label orevaluating the operand �eld.

hAlign the location pointer 107 i �f

j = (op bits & align bits )� 16;acc :h = �1; acc :l = �(1� j);cur loc = oand (incr (cur loc ; (1� j)� 1); acc);

g

This code is used in section 102.

108. hAllocate a global register 108 i �fif (val stack [0]:equiv :l _ val stack [0]:equiv :h) ffor (j = greg ; j < 255; j++)if (greg val [j]:l � val stack [0]:equiv :l ^ greg val [j]:h � val stack [0]:equiv :h) f

cur greg = j;goto got greg ;

ggif (greg � 32) err ("too many global registers");greg��;greg val [greg ] = val stack [0]:equiv ; cur greg = greg ;

got greg : ;g

This code is used in section 102.

109. If the label is, say 2H, we will already have used the old value of 2B whenevaluating the operands. Furthermore, an operand of 2F will have been treated asunde�ned, which it still is.Symbols can be de�ned more than once, but only if each de�nition gives them the

same equivalent value.A warning message is given when a prede�ned symbol is being rede�ned, if its

prede�ned value has already been used.

hDe�ne the label 109 i �fsym node �new link = DEFINED;

acc = cur loc ;if (opcode � IS) f

cur loc = val stack [0]:equiv ;if (val stack [0]:status � reg val ) new link = REGISTER;

g else if (opcode � GREG) cur loc :h = 0; cur loc :l = cur greg ;new link = REGISTER;hFind the symbol table node, pp 111 i;if (pp~ link � DEFINED _ pp~ link � REGISTER) fif (pp~equiv :l 6= cur loc :l _ pp~equiv :h 6= cur loc :h _ pp~ link 6= new link ) fif (pp~serial ) derr ("symbol `%s' is already defined"; lab �eld );pp~serial = ++serial number ;derr ("*redefinition of predefined symbol `%s'"; lab �eld );

g

Page 484: MMIXware - A RISC Computer for the Third Millennium - Knuth

477 MMIXAL: ASSEMBLING AN INSTRUCTION

g else if (pp~ link � PREDEFINED) pp~serial = ++serial number ;else if (pp~ link ) fif (new link � REGISTER) err ("future reference cannot be to a register");do hFix prior references to this label 112 i while (pp~ link );

gif (isdigit (lab �eld [0])) pp = &backward local [lab �eld [0]� '0'];pp~equiv = cur loc ; pp~ link = new link ;hFix references that might be in the val stack 110 i;if (listing �le ^ (opcode � IS _ opcode � LOC))hMake special listing to show the label equivalent 115 i;

cur loc = acc ;g

This code is used in section 102.

110. hFix references that might be in the val stack 110 i �if (:isdigit (lab �eld [0]))for (j = 0; j < val ptr ; j++)if (val stack [j]:status � unde�ned ^ val stack [j]:link~sym � pp) f

val stack [j]:status = (new link � REGISTER ? reg val : pure );val stack [j]:equiv = cur loc ;

g

This code is used in section 109.

111. hFind the symbol table node, pp 111 i �if (isdigit (lab �eld [0])) pp = &forward local [lab �eld [0]� '0'];else fif (lab �eld [0] � ':') tt = trie search (trie root ; lab �eld + 1);else tt = trie search (cur pre�x ; lab �eld );pp = tt~sym ;if (:pp ) pp = tt~sym = new sym node (true );

g

This code is used in section 109.

acc : octa, x83.align bits =#

30000, x62.backward local : sym node [ ],x90.

cur greg : int, x143.cur loc : octa, x43.cur pre�x : trie node �, x56.DEFINED=macro, x58.derr =macro ( ), x45.equiv : octa, x82.equiv : octa, x58.err =macro ( ), x45.forward local : sym node [ ],x90.

GREG=#106, x62.

greg : int, x143.greg val : octa [ ], x133.h: tetra, x26.

incr : octa ( ), MMIX-ARITH x6.IS=#

101, x62.isdigit : int ( ), <ctype.h>.j: register int, x136.l: tetra, x26.lab �eld : Char �, x33.link : sym node �, x58.link : trie node �, x82.listing �le : FILE �, x139.LOC=#

102, x62.new sym node : sym node�( ), x59.

oand : octa ( ),MMIX-ARITH x25.

op bits : tetra, x105.opcode : tetra, x105.pp : register sym node �,x65.

PREDEFINED=macro, x58.pure =0, x82.reg val =1, x82.REGISTER=macro, x58.serial : int, x58.serial number : int, x60.status : stat, x82.sym : sym node �, x54.sym node= struct, x58.trie root : trie node �, x56.trie search : trie node �( ),x57.

true =1, x26.tt : register trie node �, x65.unde�ned =2, x82.val ptr : int, x83.val stack : val node �, x83.

Page 485: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 478

112. hFix prior references to this label 112 i �f

qq = pp~ link ;pp~ link = qq~ link ;mmo loc( );if (qq~serial � �x o ) hFix a future reference from an octabyte 113 ielse hFix a future reference from a relative address 114 i;recycle �xup (qq );

g

This code is used in section 109.

113. hFix a future reference from an octabyte 113 i �fif (qq~equiv :h&

#ffffff) f

mmo lop (lop �xo ; 0; 2);mmo tetra (qq~equiv :h);

g else mmo lop(lop �xo ; qq~equiv :h� 24; 1);mmo tetra (qq~equiv :l);

g

This code is used in section 112.

114. hFix a future reference from a relative address 114 i �focta o;

o = ominus (cur loc ; qq~equiv );if (o:l & 3)

dderr ("*relative address in location #%08x%08x not divisible by 4";qq~equiv :h; qq~equiv :l);

o = shift right (o; 2; 0); k = 0;if (o:h � 0)if (o:l < #

10000) mmo lopp (lop �xr ; o:l);else if (qq~serial � �x xyz ^ o:l < #

1000000) fmmo lop (lop �xrx ; 0; 24); mmo tetra (o:l);

g else k = 1;else if (o:h � #

ffffffff)if (qq~serial � �x xyz ^ o:l � #

ff000000) fmmo lop (lop �xrx ; 0; 24); mmo tetra (o:l & #

1ffffff);g else if (qq~serial � �x yz ^ o:l � #

ffff0000) fmmo lop (lop �xrx ; 0; 16); mmo tetra (o:l & #

100ffff);g else k = 1;

else k = 1;if (k) dderr ("relative address in location #%08x%08x is too far away";

qq~equiv :h; qq~equiv :l);g

This code is used in section 112.

Page 486: MMIXware - A RISC Computer for the Third Millennium - Knuth

479 MMIXAL: ASSEMBLING AN INSTRUCTION

115. hMake special listing to show the label equivalent 115 i �if (new link � DEFINED) f

fprintf (listing �le ; "(%08x%08x)"; cur loc :h; cur loc :l); ush listing line (" ");

g else ffprintf (listing �le ; "($%03d)"; cur loc :l & #

ff); ush listing line (" ");

g

This code is used in section 109.

116. hDo the operation 116 i �future bits = 0;if (op bits &many arg bit ) hDo a many-operand operation 117 ielse switch (val ptr ) fcase 1: if (:(op bits & one arg bit ))

derr ("opcode `%s' needs more than one operand"; op �eld );hDo a one-operand operation 129 i;

case 2: if (:(op bits & two arg bit ))if (op bits & one arg bit )

derr ("opcode `%s' must not have two operands"; op �eld )else derr ("opcode `%s' must have more than two operands"; op �eld );

hDo a two-operand operation 124 i;case 3: if (:(op bits & three arg bit ))

derr ("opcode `%s' must not have three operands"; op �eld );hDo a three-operand operation 119 i;

default: derr ("too many operands for opcode `%s'"; op �eld );g

This code is used in section 102.

cur loc : octa, x43.dderr =macro ( ), x45.DEFINED=macro, x58.derr =macro ( ), x45.equiv : octa, x58.�x o =0, x58.�x xyz =2, x58.�x yz =1, x58. ush listing line : void ( ), x41.fprintf : int ( ), <stdio.h>.future bits : int, x120.h: tetra, x26.k: register int, x136.l: tetra, x26.

link : sym node �, x58.listing �le : FILE �, x139.lop �xo =#

3, x24.lop �xr =#

4, x24.lop �xrx =#

5, x24.many arg bit =#

8000, x62.mmo loc : void ( ), x49.mmo lop : void ( ), x48.mmo lopp : void ( ), x48.mmo tetra : void ( ), x48.new link : sym node �, x109.octa= struct, x26.ominus : octa ( ),MMIX-ARITH x5.

one arg bit =#1000, x62.

op bits : tetra, x105.op �eld : Char �, x33.pp : register sym node �,x65.

qq : register sym node �,x65.

recycle �xup =macro ( ), x59.serial : int, x58.shift right : octa ( ),MMIX-ARITH x7.

three arg bit =#4000, x62.

two arg bit =#2000, x62.

val ptr : int, x83.

Page 487: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 480

117. The many-operand operators are BYTE, WYDE, TETRA, and OCTA.

hDo a many-operand operation 117 i �for (j = 0; j < val ptr ; j++) fhDeal with cases where val stack [j] is impure 118 i;k = 1� (opcode � BYTE);if ((val stack [j]:equiv :h ^ opcode < OCTA) _

(val stack [j]:equiv :l > #ffff ^ opcode < TETRA) _

(val stack [j]:equiv :l > #ff ^ opcode < WYDE))

if (k � 1) err ("*constant doesn't fit in one byte")else derr ("*constant doesn't fit in %d bytes"; k);

if (k < 8) assemble (k; val stack [j]:equiv :l; 0);else if (val stack [j]:status � unde�ned ) assemble (4; 0;#f0); assemble (4; 0;#f0);else assemble (4; val stack [j]:equiv :h; 0); assemble (4; val stack [j]:equiv :l; 0);

g

This code is used in section 116.

118. hDeal with cases where val stack [j] is impure 118 i �if (val stack [j]:status � reg val ) err ("*register number used as a constant")else if (val stack [j]:status � unde�ned ) fif (opcode 6= OCTA) err ("undefined constant");pp = val stack [j]:link~sym ;qq = new sym node (false );qq~ link = pp~ link ;pp~ link = qq ;qq~serial = �x o ;qq~equiv = cur loc ;

g

This code is used in section 117.

119. hDo a three-operand operation 119 i �hDo the Z �eld 121 i;hDo the Y �eld 122 i;

assemble X : hDo the X �eld 123 i;assemble inst : assemble (4; (opcode � 24) + xyz ; future bits );break;

This code is used in section 116.

120. Individual �elds of an instruction are placed into global variables z, y, x, yz ,and/or xyz .

hGlobal variables 27 i +�tetra z; y; x; yz ; xyz ; =� pieces for assembly �=int future bits ; =� places where there are future references �=

121. hDo the Z �eld 121 i �if (val stack [2]:status � unde�ned ) err ("Z field is undefined");if (val stack [2]:status � reg val ) fif (:(op bits & (immed bit + zr bit + zar bit )))

derr ("*Z field of `%s' should not be a register number"; op �eld );g else if (op bits & immed bit ) opcode++; =� immediate �=

Page 488: MMIXware - A RISC Computer for the Third Millennium - Knuth

481 MMIXAL: ASSEMBLING AN INSTRUCTION

else if (op bits & zr bit )derr ("*Z field of `%s' should be a register number"; op �eld );

if (val stack [2]:equiv :h _ val stack [2]:equiv :l > #ff)

err ("*Z field doesn't fit in one byte");z = val stack [2]:equiv :l & #

ff;

This code is used in section 119.

122. hDo the Y �eld 122 i �if (val stack [1]:status � unde�ned ) err ("Y field is undefined");if (val stack [1]:status � reg val ) fif (:(op bits & (yr bit + yar bit )))

derr ("*Y field of `%s' should not be a register number"; op �eld );g else if (op bits & yr bit )

derr ("*Y field of `%s' should be a register number"; op �eld );if (val stack [1]:equiv :h _ val stack [1]:equiv :l > #

ff)err ("*Y field doesn't fit in one byte");

y = val stack [1]:equiv :l & #ff; yz = (y � 8) + z;

This code is used in section 119.

123. hDo the X �eld 123 i �if (val stack [0]:status � unde�ned ) err ("X field is undefined");if (val stack [0]:status � reg val ) fif (:(op bits & (xr bit + xar bit )))

derr ("*X field of `%s' should not be a register number"; op �eld );g else if (op bits & xr bit )

derr ("*X field of `%s' should be a register number"; op �eld );if (val stack [0]:equiv :h _ val stack [0]:equiv :l > #

ff)err ("*X field doesn't fit in one byte");

x = val stack [0]:equiv :l & #ff; xyz = (x� 16) + yz ;

This code is used in section 119.

assemble : void ( ), x52.BYTE=#

108, x62.cur loc : octa, x43.derr =macro ( ), x45.equiv : octa, x82.equiv : octa, x58.err =macro ( ), x45.false =0, x26.�x o =0, x58.h: tetra, x26.immed bit =#

2, x62.j: register int, x136.k: register int, x136.l: tetra, x26.

link : trie node �, x82.link : sym node �, x58.new sym node : sym node�( ), x59.

OCTA=#10b, x62.

op bits : tetra, x105.op �eld : Char �, x33.opcode : tetra, x105.pp : register sym node �,x65.

qq : register sym node �,x65.

reg val =1, x82.serial : int, x58.

status : stat, x82.sym : sym node �, x54.TETRA=#

10a, x62.tetra=unsigned int, x26.unde�ned =2, x82.val ptr : int, x83.val stack : val node �, x83.WYDE=#

109, x62.xar bit =#

40, x62.xr bit =#

80, x62.yar bit =#

10, x62.yr bit =#

20, x62.zar bit =#

4, x62.zr bit =#

8, x62.

Page 489: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 482

124. hDo a two-operand operation 124 i �if (val stack [1]:status � unde�ned ) fif (op bits & rel addr bit )hAssemble YZ as a future reference and goto assemble X 125 i

else err ("YZ field is undefined");g else if (val stack [1]:status � reg val ) fif (:(op bits & (immed bit + yzr bit + yzar bit )))

derr ("*YZ field of `%s' should not be a register number"; op �eld );if (opcode � SET) val stack [1]:equiv :l�= 8; opcode = #

c1; =� change to OR �=else if (op bits &mem bit ) val stack [1]:equiv :l �= 8; opcode++;

=� silently append ,0 �=g else f =� val stack [1]:status � pure �=if (op bits &mem bit )hAssemble YZ as a memory address and goto assemble X 127 i;

if (opcode � SET) opcode = #e3; =� change to SETL �=

else if (op bits & immed bit ) opcode++; =� immediate �=else if (op bits & yzr bit ) f

derr ("*YZ field of `%s' should be a register number"; op �eld );gif (op bits & rel addr bit )hAssemble YZ as a relative address and goto assemble X 126 i;

gif (val stack [1]:equiv :h _ val stack [1]:equiv :l > #

ffff)err ("*YZ field doesn't fit in two bytes");

yz = val stack [1]:equiv :l & #ffff;

goto assemble X ;

This code is used in section 116.

125. hAssemble YZ as a future reference and goto assemble X 125 i �f

pp = val stack [1]:link~sym ;qq = new sym node (false );qq~ link = pp~ link ;pp~ link = qq ;qq~serial = �x yz ;qq~equiv = cur loc ;yz = 0;future bits = #

c0;goto assemble X ;

g

This code is used in section 124.

126. hAssemble YZ as a relative address and goto assemble X 126 i �focta source ; dest ;

if (val stack [1]:equiv :l & 3) err ("*relative address is not divisible by 4");source = shift right (cur loc ; 2; 0);dest = shift right (val stack [1]:equiv ; 2; 0);acc = ominus (dest ; source );

Page 490: MMIXware - A RISC Computer for the Third Millennium - Knuth

483 MMIXAL: ASSEMBLING AN INSTRUCTION

if (:(acc :h& #80000000)) f

if (acc :l > #ffff _ acc :h)

err ("relative address is more than #ffff tetrabytes forward");g else f

acc = incr (acc ;#10000);opcode++;if (acc :l > #

ffff _ acc :h)err ("relative address is more than #10000 tetrabytes backward");

gyz = acc :l;goto assemble X ;

g

This code is used in section 124.

127. hAssemble YZ as a memory address and goto assemble X 127 i �focta o;

o = val stack [1]:equiv ; k = 0;for (j = greg ; j < 255; j++)if (greg val [j]:h _ greg val [j]:l) f

acc = ominus (val stack [1]:equiv ; greg val [j]);if (acc :h � o:h ^ (acc :l � o:l _ acc :h < o:h)) o = acc ; k = j;

gif (o:l � #

ff ^ :o:h ^ k) yz = (k � 8) + o:l; opcode++;else if (:expanding ) err ("no base address is close enough to the address A")else hAssemble instructions to put supplementary data in $255 128 i;goto assemble X ;

g

This code is used in section 124.

acc : octa, x83.assemble X : label, x119.cur loc : octa, x43.derr =macro ( ), x45.equiv : octa, x82.equiv : octa, x58.err =macro ( ), x45.expanding : int, x139.false =0, x26.�x yz =1, x58.future bits : int, x120.greg : int, x143.greg val : octa [ ], x133.h: tetra, x26.immed bit =#

2, x62.incr : octa ( ), MMIX-ARITH x6.

j: register int, x136.k: register int, x136.l: tetra, x26.link : trie node �, x82.link : sym node �, x58.mem bit =#

80000, x62.new sym node : sym node�( ), x59.

octa= struct, x26.ominus : octa ( ),MMIX-ARITH x5.

op bits : tetra, x105.op �eld : Char �, x33.opcode : tetra, x105.pp : register sym node �,x65.

pure =0, x82.qq : register sym node �,x65.

reg val =1, x82.rel addr bit =#

1, x62.serial : int, x58.SET=#

100, x62.shift right : octa ( ),MMIX-ARITH x7.

status : stat, x82.sym : sym node �, x54.unde�ned =2, x82.val stack : val node �, x83.yz : tetra, x120.yzar bit =#

100, x62.yzr bit =#

200, x62.

Page 491: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 484

128. #de�ne SETH #e0

#de�ne ORH #e8

#de�ne ORL #eb

hAssemble instructions to put supplementary data in $255 128 i �ffor (j = SETH; j � ORL; j++) fswitch (j & 3) fcase 0: yz = o:h� 16; break; =� SETH �=case 1: yz = o:h& #

ffff; break; =� SETMH or ORMH �=case 2: yz = o:l� 16; break; =� SETML or ORML �=case 3: yz = o:l & #

ffff; break; =� SETL or ORL �=gif (yz ) f

assemble (4; (j � 24) + (255� 16) + yz ; 0);j j= ORH;

ggif (k) yz = (k � 8) + 255; =� Y = $k, Z = $255 �=else yz = 255� 8; opcode++; =� Y = $255, Z = 0 �=

g

This code is used in section 127.

129. hDo a one-operand operation 129 i �if (val stack [0]:status � unde�ned ) fif (op bits & rel addr bit )hAssemble XYZ as a future reference and goto assemble inst 130 i

else if (opcode 6= PREFIX) err ("the operand is undefined");g else if (val stack [0]:status � reg val ) fif (:(op bits & (xyzr bit + xyzar bit )))

derr ("*operand of `%s' should not be a register number"; op �eld );g else f =� val stack [0]:status � pure �=if (op bits & xyzr bit )

derr ("*operand of `%s' should be a register number"; op �eld );if (op bits & rel addr bit )hAssemble XYZ as a relative address and goto assemble inst 131 i;

gif (opcode > #

ff) hDo a pseudo-operation and goto bypass 132 i;if (val stack [0]:equiv :h _ val stack [0]:equiv :l > #

ffffff)err ("*XYZ field doesn't fit in three bytes");

xyz = val stack [0]:equiv :l & #ffffff;

goto assemble inst ;

This code is used in section 116.

130. hAssemble XYZ as a future reference and goto assemble inst 130 i �f

pp = val stack [0]:link~sym ;qq = new sym node (false );qq~ link = pp~ link ;pp~ link = qq ;

Page 492: MMIXware - A RISC Computer for the Third Millennium - Knuth

485 MMIXAL: ASSEMBLING AN INSTRUCTION

qq~serial = �x xyz ;qq~equiv = cur loc ;xyz = 0;future bits = #

e0;goto assemble inst ;

g

This code is used in section 129.

131. hAssemble XYZ as a relative address and goto assemble inst 131 i �focta source ; dest ;

if (val stack [0]:equiv :l & 3) err ("*relative address is not divisible by 4");source = shift right (cur loc ; 2; 0);dest = shift right (val stack [0]:equiv ; 2; 0);acc = ominus (dest ; source );if (:(acc :h& #

80000000)) fif (acc :l > #

ffffff _ acc :h)err ("relative address is more than #ffffff tetrabytes forward");

g else facc = incr (acc ;#1000000);opcode++;if (acc :l > #

ffffff _ acc :h)err ("relative address is more than #1000000 tetrabytes backward");

gxyz = acc :l;goto assemble inst ;

g

This code is used in section 129.

acc : octa, x83.assemble : void ( ), x52.assemble inst : label, x119.bypass : label, x102.cur loc : octa, x43.derr =macro ( ), x45.equiv : octa, x82.equiv : octa, x58.err =macro ( ), x45.false =0, x26.�x xyz =2, x58.future bits : int, x120.h: tetra, x26.incr : octa ( ), MMIX-ARITH x6.j: register int, x136.k: register int, x136.

l: tetra, x26.link : trie node �, x82.link : sym node �, x58.new sym node : sym node�( ), x59.

o: octa, x127.octa= struct, x26.ominus : octa ( ),MMIX-ARITH x5.

op bits : tetra, x105.op �eld : Char �, x33.opcode : tetra, x105.pp : register sym node �,x65.

PREFIX=#103, x62.

pure =0, x82.

qq : register sym node �,x65.

reg val =1, x82.rel addr bit =#

1, x62.serial : int, x58.shift right : octa ( ),MMIX-ARITH x7.

status : stat, x82.sym : sym node �, x54.unde�ned =2, x82.val stack : val node �, x83.xyz : tetra, x120.xyzar bit =#

400, x62.xyzr bit =#

800, x62.yz : tetra, x120.

Page 493: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: ASSEMBLING AN INSTRUCTION 486

132. hDo a pseudo-operation and goto bypass 132 i �switch (opcode ) fcase LOC: cur loc = val stack [0]:equiv ;case IS: goto bypass ;case PREFIX: if (:val stack [0]:link ) err ("not a valid prefix");

cur pre�x = val stack [0]:link ; goto bypass ;case GREG: if (listing �le ) hMake listing for GREG 134 i;goto bypass ;

case LOCAL: if (val stack [0]:equiv :l > lreg ) lreg = val stack [0]:equiv :l;if (listing �le ) f

fprintf (listing �le ; "($%03d)"; val stack [0]:equiv :l); ush listing line (" ");

ggoto bypass ;

case BSPEC: if (val stack [0]:equiv :l > #ffff _ val stack [0]:equiv :h)

err ("*operand of `BSPEC' doesn't fit in two bytes");mmo loc( ); mmo sync( );mmo lopp(lop spec ; val stack [0]:equiv :l);spec mode = true ; spec mode loc = 0; goto bypass ;

case ESPEC: spec mode = false ; goto bypass ;g

This code is used in section 129.

133. hGlobal variables 27 i +�octa greg val [256]; =� initial values of global registers �=

134. hMake listing for GREG 134 i �if (val stack [0]:equiv :l _ val stack [0]:equiv :h) f

fprintf (listing �le ; "($%03d=#%08x"; cur greg ; val stack [0]:equiv :h); ush listing line (" ");fprintf (listing �le ; " %08x)"; val stack [0]:equiv :l); ush listing line (" ");

g else ffprintf (listing �le ; "($%03d)"; cur greg ); ush listing line (" ");

g

This code is used in section 132.

Page 494: MMIXware - A RISC Computer for the Third Millennium - Knuth

487 MMIXAL: RUNNING THE PROGRAM

135. Running the program. On a UNIX-like system, the command

mmixal [options] sourcefilename

will assemble the MMIXAL program in �le sourcefilename, writing any error messageson the standard error �le. (Nothing is written to the standard output.) The options,which may appear in any order, are:

� -o objectfilename Send the output to a binary �le called objectfilename. Ifno -o speci�cation is given, the object �le name is obtained from the input �le nameby changing the �nal letter from `s' to `o', or by appending `.mmo' if sourcefilenamedoesn't end with s.

� -l listingname Output a listing of the assembled input and output to a text �lecalled listingname.

� -x Expand memory-oriented commands that cannot be assembled as single in-structions, by assembling auxiliary instructions that make temporary use of globalregister $255.

� -b bufsize Allow up to bufsize characters per line of input.

BSPEC=#104, x62.

bypass : label, x102.cur greg : int, x143.cur loc : octa, x43.cur pre�x : trie node �, x56.equiv : octa, x82.err =macro ( ), x45.ESPEC=#

105, x62.false =0, x26. ush listing line : void ( ), x41.fprintf : int ( ), <stdio.h>.

GREG=#106, x62.

h: tetra, x26.IS=#

101, x62.l: tetra, x26.link : trie node �, x82.listing �le : FILE �, x139.LOC=#

102, x62.LOCAL=#

107, x62.lop spec =#

8, x24.lreg : int, x143.

mmo loc : void ( ), x49.mmo lopp : void ( ), x48.mmo sync : void ( ), x50.octa= struct, x26.opcode : tetra, x105.PREFIX=#

103, x62.spec mode : bool, x43.spec mode loc : tetra, x43.true =1, x26.val stack : val node �, x83.

Page 495: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: RUNNING THE PROGRAM 488

136. Here, �nally, is the overall structure of this program.

#include <stdio.h>

#include <stdlib.h>

#include <ctype.h>

#include <string.h>

#include <time.h>

hPreprocessor de�nitions 31 ihType de�nitions 26 ihGlobal variables 27 ih Subroutines 28 i

int main (argc ; argv )int argc ; char �argv [ ];

fregister int j; k; =� all-purpose integers �=

hLocal variables 40 i;hProcess the command line 137 i;h Initialize everything 29 i;while (1) fhGet the next line of input text, or break if the input has ended 34 i;while (1) fhProcess the next MMIXAL instruction or comment 102 i;if (:�buf ptr ) break;

gif (listing �le ) fif (listing bits ) listing clear ( );else if (:line listed ) ush listing line (" ");

gghFinish the assembly 142 i;

g

137. The space after "-b" is optional, because MMIX-SIM does not use a space inthis context.

hProcess the command line 137 i �for (j = 1; j < argc � 1 ^ argv [j][0] � '-'; j++)if (:argv [j][2]) fif (argv [j][1] � 'x') expanding = 1;else if (argv [j][1] � 'o') j++; strcpy (obj �le name ; argv [j]);else if (argv [j][1] � 'l') j++; strcpy (listing name ; argv [j]);else if (argv [j][1] � 'b' ^ sscanf (argv [j + 1]; "%d";&buf size ) � 1) j++;else break;

g else if (argv [j][1] 6= 'b' _ sscanf (argv [j] + 1; "%d";&buf size ) 6= 1) break;if (j 6= argc � 1) f

fprintf (stderr ; "Usage: %s %s sourcefilename\n"; argv [0];"[-x] [-l listingname] [-b buffersize] [-o objectfilename]");

exit (�1);g

Page 496: MMIXware - A RISC Computer for the Third Millennium - Knuth

489 MMIXAL: RUNNING THE PROGRAM

src �le name = argv [j];

This code is used in section 136.

138. hOpen the �les 138 i �src �le = fopen (src �le name ; "r");if (:src �le ) dpanic("Can't open the source file %s"; src �le name );if (:obj �le name [0]) f

j = strlen (src �le name );if (src �le name [j � 1] � 's') f

strcpy (obj �le name ; src �le name ); obj �le name [j � 1] = 'o';gelse sprintf (obj �le name ; "%s.mmo"; src �le name );

gobj �le = fopen (obj �le name ; "wb");if (:obj �le ) dpanic("Can't open the object file %s"; obj �le name );if (listing name [0]) f

listing �le = fopen (listing name ; "w");if (:listing �le ) dpanic("Can't open the listing file %s"; listing name );

g

This code is used in section 140.

139. hGlobal variables 27 i +�char �src �le name ; =� name of the MMIXAL input �le �=char obj �le name [FILENAME_MAX + 1]; =� name of the binary output �le �=char listing name [FILENAME_MAX + 1]; =� name of the optional listing �le �=FILE �src �le ; �obj �le ; �listing �le ;int expanding ; =� are we expanding instructions when base address fail? �=int buf size ; =� maximum number of characters per line of input �=

140. h Initialize everything 29 i +�hOpen the �les 138 i;�lename [0] = src �le name ;�lename count = 1;hOutput the preamble 141 i;

141. hOutput the preamble 141 i �mmo lop (lop pre ; 1; 1);mmo tetra (time (�));mmo cur �le = �1;

This code is used in section 140.

buf ptr : Char �, x33.dpanic =macro ( ), x45.exit : void ( ), <stdlib.h>.FILE, <stdio.h>.�lename : char �[ ], x37.�lename count : int, x37.FILENAME_MAX=macro,<stdio.h>.

ush listing line : void ( ), x41.

fopen : FILE �( ), <stdio.h>.fprintf : int ( ), <stdio.h>.line listed : bool, x36.listing bits : unsigned char,x43.

listing clear : void ( ), x44.lop pre =#

9, x24.mmo cur �le : int, x51.

mmo lop : void ( ), x48.mmo tetra : void ( ), x48.sprintf : int ( ), <stdio.h>.sscanf : int ( ), <stdio.h>.stderr : FILE �, <stdio.h>.strcpy : char �( ), <string.h>.strlen : int ( ), <string.h>.time : time t ( ), <time.h>.

Page 497: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: RUNNING THE PROGRAM 490

142. hFinish the assembly 142 i �if (lreg � greg )

dpanic("Danger: Must reduce the number of GREGs by %d"; lreg � greg + 1);hOutput the postamble 144 i;hCheck and output the trie 80 i;hReport any unde�ned local symbols 145 i;if (err count ) fif (err count > 1) fprintf (stderr ; "(%d errors were found.)\n"; err count );else fprintf (stderr ; "(One error was found.)\n");

gexit (err count );

This code is used in section 136.

143. hGlobal variables 27 i +�int greg = 255; =� global register allocator �=int cur greg ; =� global register just allocated �=int lreg = 32; =� local register allocator �=

144. hOutput the postamble 144 i �mmo lop (lop post ; 0; greg );greg val [255] = trie search (trie root ; "Main")~sym~equiv ;for (j = greg ; j < 256; j++) f

mmo tetra (greg val [j]:h);mmo tetra (greg val [j]:l);

g

This code is used in section 142.

145. hReport any unde�ned local symbols 145 i �for (j = 0; j < 10; j++)if (forward local [j]:link )

err count++; fprintf (stderr ; "undefined local symbol %dF\n"; j);

This code is used in section 142.

Page 498: MMIXware - A RISC Computer for the Third Millennium - Knuth

491 MMIXAL: RUNNING THE PROGRAM

dpanic =macro ( ), x45.equiv : octa, x58.err count : int, x46.exit : void ( ), <stdlib.h>.forward local : sym node [ ],x90.

fprintf : int ( ), <stdio.h>.

greg val : octa [ ], x133.h: tetra, x26.j: register int, x136.l: tetra, x26.link : sym node �, x58.lop post =#

a, x24.mmo lop : void ( ), x48.

mmo tetra : void ( ), x48.stderr : FILE �, <stdio.h>.sym : sym node �, x54.trie root : trie node �, x56.trie search : trie node �( ),x57.

Page 499: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMIXAL: NAMES OF THE SECTIONS 492

146. Names of the sections.

hAlign the location pointer 107 i Used in section 102.

hAllocate a global register 108 i Used in section 102.

hAssemble instructions to put supplementary data in $255 128 i Used in section 127.

hAssemble XYZ as a future reference and goto assemble inst 130 i Used in sec-

tion 129.

hAssemble XYZ as a relative address and goto assemble inst 131 i Used in sec-

tion 129.

hAssemble YZ as a future reference and goto assemble X 125 i Used in section 124.

hAssemble YZ as a memory address and goto assemble X 127 i Used in section 124.

hAssemble YZ as a relative address and goto assemble X 126 i Used in section 124.

hCases for binary operators 99, 101 i Used in section 98.

hCases for unary operators 100 i Used in section 98.

hCheck and output the trie 80 i Used in section 142.

hCheck for a line directive 38 i Used in section 34.

hCopy the operand �eld 106 i Used in section 102.

hDeal with cases where val stack [j] is impure 118 i Used in section 117.

hDe�ne the label 109 i Used in section 102.

hDo a many-operand operation 117 i Used in section 116.

hDo a one-operand operation 129 i Used in section 116.

hDo a pseudo-operation and goto bypass 132 i Used in section 129.

hDo a three-operand operation 119 i Used in section 116.

hDo a two-operand operation 124 i Used in section 116.

hDo the operation 116 i Used in section 102.

hDo the X �eld 123 i Used in section 119.

hDo the Y �eld 122 i Used in section 119.

hDo the Z �eld 121 i Used in section 119.

hEncode the length of t~sym~equiv 76 i Used in section 74.

hFind the symbol table node, pp 111 i Used in section 109.

hFinish the assembly 142 i Used in section 136.

hFix a future reference from a relative address 114 i Used in section 112.

hFix a future reference from an octabyte 113 i Used in section 112.

hFix prior references to this label 112 i Used in section 109.

hFix references that might be in the val stack 110 i Used in section 109.

hFlush the excess part of an overlong line 35 i Used in section 34.

hGet the next line of input text, or break if the input has ended 34 i Used in

section 136.

hGlobal variables 27, 33, 36, 37, 43, 46, 51, 56, 60, 63, 67, 69, 77, 83, 90, 105, 120, 133, 139, 143 iUsed in section 136.

h Initialize everything 29, 32, 61, 71, 84, 91, 140 i Used in section 136.

hLocal variables 40, 65 i Used in section 136.

hMake listing for GREG 134 i Used in section 132.

hMake special listing to show the label equivalent 115 i Used in section 109.

hMake sure cur loc and mmo cur loc refer to the same tetrabyte 53 i Used in

section 52.

Page 500: MMIXware - A RISC Computer for the Third Millennium - Knuth

493 MMIXAL: NAMES OF THE SECTIONS

hOpen the �les 138 i Used in section 140.

hOutput the postamble 144 i Used in section 142.

hOutput the preamble 141 i Used in section 140.

hPerform the top operation on op stack 98 i Used in section 85.

hPreprocessor de�nitions 31, 39 i Used in section 136.

hPrint symbol sym buf and its equivalent 78 i Used in section 75.

hProcess the command line 137 i Used in section 136.

hProcess the next MMIXAL instruction or comment 102 i Used in section 136.

hPut other prede�ned symbols into the trie 70 i Used in section 61.

hPut the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 i Used in section 61.

hPut the special register names into the trie 66 i Used in section 61.

hReport an unde�ned symbol 79 i Used in section 74.

hReport any unde�ned local symbols 145 i Used in section 142.

hScan a backward local 89 i Used in section 86.

hScan a binary operator or closing token, rt op 97 i Used in section 85.

hScan a character constant 92 i Used in section 86.

hScan a decimal constant 94 i Used in section 86.

hScan a forward local 88 i Used in section 86.

hScan a hexadecimal constant 95 i Used in section 86.

hScan a string constant 93 i Used in section 86.

hScan a symbol 87 i Used in section 86.

hScan opening tokens until putting something on val stack 86 i Used in section 85.

hScan the current location 96 i Used in section 86.

hScan the label �eld; goto bypass if there is none 103 i Used in section 102.

hScan the opcode �eld; goto bypass if there is none 104 i Used in section 102.

hScan the operand �eld 85 i Used in section 102.

hSubroutines 28, 41, 42, 44, 45, 47, 48, 49, 50, 52, 55, 57, 59, 73, 74 i Used in section 136.

hType de�nitions 26, 30, 54, 58, 62, 68, 82 i Used in section 136.

hVisit t and traverse t~mid 75 i Used in section 74.

Page 501: MMIXware - A RISC Computer for the Third Millennium - Knuth

494

MMMIX

1. Introduction. This CWEB program simulates how the MMIX computer mightbe implemented with a high-performance pipeline in many di�erent con�gurations.All of the complexities of MMIX's architecture are treated, except for multiprocessingand low-level details of memory mapped input/output.The present program module, which contains the main routine for the MMIX meta-

simulator, is primarily devoted to administrative tasks. Other modules do the actualwork after this module has told them what to do.

2. A user typically invokes the meta-simulator with a UNIX-like command line ofthe general form `mmmix configfile progfile', where the configfile describes thecharacteristics of an MMIX implementation and the progfile contains a program tobe downloaded and run. Rules for con�guration �les appear in the module calledmmix-config. The program �le is either an \MMIX binary �le" dumped by MMIX-

SIM, or an ASCII text �le that describes hexadecimal data in a rudimentary format.It is assumed to be binary if its name ends with the extension `.mmb'.

#include <stdio.h>

#include "mmix-pipe.h"

char �con�g �le name ; �prog �le name ;

hGlobal variables 5 ih Subroutines 10 i

int main (argc ; argv )int argc ;char �argv [ ];

fhParse the command line 3 i;MMIX con�g (con�g �le name );MMIX init ( );mmix io init ( );h Input the program 4 i;hRun the simulation interactively 13 i;printf ("Simulation ended at time %d.\n"; ticks :l);print stats ( );

g

3. The command line might also contain options, some day. For now I'm forgettingthem and simplifying everything until I gain further experience.

hParse the command line 3 i �if (argc 6= 3) ffprintf (stderr ; "Usage: %s configfile progfile\n"; argv [0]);exit (�3);

gcon�g �le name = argv [1];prog �le name = argv [2];

This code is used in section 2.

D.E. Knuth: MMIXware, LNCS 1750, pp. 494-508, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 502: MMIXware - A RISC Computer for the Third Millennium - Knuth

495 MMMIX: INTRODUCTION

4. h Input the program 4 i �if (strcmp (prog �le name + strlen (prog �le name )� 4; ".mmb") � 0)h Input an MMIX binary �le 9 i

else h Input a rudimentary hexadecimal �le 6 i;fclose (prog �le );

This code is used in section 2.

exit : void ( ), <stdlib.h>.fclose : int ( ), <stdio.h>.fprintf : int ( ), <stdio.h>.l: tetra, MMIX-PIPE x17.MMIX con�g : void ( ),MMIX-CONFIG x38.

MMIX init : void ( ),MMIX-PIPE x10.

mmix io init : void ( ),MMIX-IO x7.

print stats : void ( ),MMIX-PIPE x162.

printf : int ( ), <stdio.h>.prog �le : FILE �, x5.stderr : FILE �, <stdio.h>.strcmp : int ( ), <string.h>.strlen : int ( ), <string.h>.ticks =macro, MMIX-PIPE x87.

Page 503: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: HEXADECIMAL INPUT TO MEMORY 496

5. Hexadecimal input to memory. A rudimentary hexadecimal input formatis implemented here so that the simulator can be run with essentially arbitrary datain the simulated memory. The rules of this format are extremely simple: Each lineof the �le either begins with (i) 12 hexadecimal digits followed by a colon; or (ii) aspace followed by 16 hexadecimal digits. In case (i), the 12 hex digits specify a 48-bitphysical address, called the current location. In case (ii), the 16 hex digits specify anoctabyte to be stored in the current location; the current location is then increasedby 8. The current location should be a multiple of 8, but its three least signi�cantbits are actually ignored. Arbitrary comments can follow the speci�cation of a newcurrent location or a new octabyte, as long as each line is less than 99 characters long.For example, the �le

0123456789ab: SILLY EXAMPLE

0123456789abcdef first octabyte

fedbca9876543210 second

places the octabyte #0123456789abcdef into memory location #0123456789a8 and#fedcba9876543210 into location #0123456789b0.

#de�ne BUF_SIZE 100

hGlobal variables 5 i �octa cur loc ;octa cur dat ;bool new chunk ;char bu�er [BUF_SIZE];FILE �prog �le ;

See also sections 16 and 25.

This code is used in section 2.

6. h Input a rudimentary hexadecimal �le 6 i �fprog �le = fopen (prog �le name ; "r");if (:prog �le ) f

fprintf (stderr ; "Panic: Can't open MMIX hexadecimal file %s!\n";prog �le name );

exit (�3);gnew chunk = true ;while (1) fif (:fgets (bu�er ; BUF_SIZE; prog �le )) break;if (bu�er [strlen (bu�er )� 1] 6= '\n') f

fprintf (stderr ; "Panic: Hexadecimal file line too long: `%s...'!\n"; bu�er );exit (�3);

g

Page 504: MMIXware - A RISC Computer for the Third Millennium - Knuth

497 MMMIX: HEXADECIMAL INPUT TO MEMORY

if (bu�er [12] � ':') hChange the current location 7 ielse if (bu�er [0] � ' ') hRead an octabyte and advance cur loc 8 ielse f

fprintf (stderr ; "Panic: Improper hexadecimal file line: `%s'!\n"; bu�er );exit (�3);

gg

g

This code is used in section 4.

7. hChange the current location 7 i �fif (sscanf (bu�er ; "%4x%8x";&cur loc :h;&cur loc :l) 6= 2) f

fprintf (stderr ; "Panic: Improper hexadecimal file location: `%s'!\n"; bu�er );exit (�3);

gnew chunk = true ;

g

This code is used in section 6.

8. hRead an octabyte and advance cur loc 8 i �fif (sscanf (bu�er + 1; "%8x%8x";&cur dat :h;&cur dat :l) 6= 2) f

fprintf (stderr ; "Panic: Improper hexadecimal file data: `%s'!\n"; bu�er );exit (�3);

gif (new chunk ) mem write (cur loc ; cur dat );else mem hash [last h ]:chunk [(cur loc :l & #

ffff)� 3] = cur dat ;cur loc :l += 8;if ((cur loc :l & #

fff8) 6= 0) new chunk = false ;else f

new chunk = true ;if ((cur loc :l & #

ffff0000) � 0) cur loc :h++;g

g

This code is used in section 6.

bool= enum, MMIX-PIPE x11.chunk : octa �, MMIX-PIPE x206.exit : void ( ), <stdlib.h>.false =0, MMIX-PIPE x11.fgets : char �( ), <stdio.h>.FILE, <stdio.h>.fopen : FILE �( ), <stdio.h>.

fprintf : int ( ), <stdio.h>.h: tetra, MMIX-PIPE x17.l: tetra, MMIX-PIPE x17.last h : int, MMIX-PIPE x211.mem hash : chunknode �,MMIX-PIPE x207.

mem write : void ( ),

MMIX-PIPE x213.octa= struct, MMIX-PIPE x17.prog �le name , x2.sscanf : int ( ), <stdio.h>.stderr : FILE �, <stdio.h>.strlen : int ( ), <string.h>.true =1, MMIX-PIPE x11.

Page 505: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: BINARY INPUT TO MEMORY 498

9. Binary input to memory. When the program �le was dumped by MMIX-

SIM, it has the simple format discussed in exercise 1.4.30{20 of the MMIX fascicle.In this case we assume that the user's program has text, data, pool, and stacksegments, as in the conventions of that book. We load it into four 232-byte pages ofphysical memory, one for each segment; page zero of segment i is mapped to physicallocation 232i. Page tables are kept in physical locations starting at 232 � 4; statictraps begin at 232 � 5 and dynamic traps at 232 � 6. (These conventions agree withthe special register settings rT = #8000000500000000, rTT = #8000000600000000,rV = #369c200400000000 assumed by the stripped-down simulator.)

h Input an MMIX binary �le 9 i �fprog �le = fopen (prog �le name ; "rb");if (:prog �le ) f

fprintf (stderr ; "Panic: Can't open MMIX binary file %s!\n"; prog �le name );exit (�3);

gwhile (1) fif (:undump octa ( )) break;new chunk = true ;cur loc = cur dat ;if (cur loc :h& #

9fffffff) bad address = true ;else bad address = false ; cur loc :h�= 29;

=� apply trivial mapping function for each segment �=h Input consecutive octabytes beginning at cur loc 11 i;

gh Set up the canned environment 12 i;

g

This code is used in section 4.

10. The undump octa routine reads eight bytes from the binary �le prog �le intothe global octabyte cur dat , taking care as usual to be big-Endian regardless of thehost computer's bias.

hSubroutines 10 i �static bool undump octa ARGS((void));static bool undump octa ( )fregister int t0 ; t1 ; t2 ; t3 ;

t0 = fgetc(prog �le ); if (t0 � EOF) return false ;t1 = fgetc(prog �le ); if (t1 � EOF) goto oops ;t2 = fgetc(prog �le ); if (t2 � EOF) goto oops ;t3 = fgetc(prog �le ); if (t3 � EOF) goto oops ;cur dat :h = (t0 � 24) + (t1 � 16) + (t2 � 8) + t3 ;t0 = fgetc(prog �le ); if (t0 � EOF) goto oops ;t1 = fgetc(prog �le ); if (t1 � EOF) goto oops ;t2 = fgetc(prog �le ); if (t2 � EOF) goto oops ;t3 = fgetc(prog �le ); if (t3 � EOF) goto oops ;cur dat :l = (t0 � 24) + (t1 � 16) + (t2 � 8) + t3 ;return true ;

Page 506: MMIXware - A RISC Computer for the Third Millennium - Knuth

499 MMMIX: BINARY INPUT TO MEMORY

oops : fprintf (stderr ; "Premature end of file on %s!\n"; prog �le name );return false ;

g

See also sections 17 and 20.

This code is used in section 2.

11. h Input consecutive octabytes beginning at cur loc 11 i �while (1) fif (:undump octa ( )) f

fprintf (stderr ; "Unexpected end of file on %s!\n"; prog �le name );break;

gif (:(cur dat :h _ cur dat :l)) break;if (bad address ) f

fprintf (stderr ; "Panic: Unsupported virtual address %08x%08x!\n"; cur loc :h;cur loc :l);

exit (�5);gif (new chunk ) mem write (cur loc ; cur dat );else mem hash [last h ]:chunk [(cur loc :l & #

ffff)� 3] = cur dat ;cur loc :l += 8;if ((cur loc :l & #

fff8) 6= 0) new chunk = false ;else f

new chunk = true ;if ((cur loc :l & #

ffff0000) � 0) fbad address = true ;cur loc :h = (cur loc :h� 29) + 1;

gg

g

This code is used in section 9.

ARGS=macro ( ), MMIX-PIPE x6.bad address : bool, x25.bool= enum, MMIX-PIPE x11.chunk : octa �, MMIX-PIPE x206.cur dat : octa, x5.cur loc : octa, x5.EOF=(�1), <stdio.h>.exit : void ( ), <stdlib.h>.

false =0, MMIX-PIPE x11.fgetc : int ( ), <stdio.h>.fopen : FILE �( ), <stdio.h>.fprintf : int ( ), <stdio.h>.h: tetra, MMIX-PIPE x17.l: tetra, MMIX-PIPE x17.last h : int, MMIX-PIPE x211.mem hash : chunknode �,

MMIX-PIPE x207.mem write : void ( ),MMIX-PIPE x213.

new chunk : bool, x5.prog �le : FILE �, x5.prog �le name , x2.stderr : FILE �, <stdio.h>.true =1, MMIX-PIPE x11.

Page 507: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: BINARY INPUT TO MEMORY 500

12. The primitive operating system assumed in simple programs of The Art of

Computer Programming will set up text segment, data segment, pool segment, andstack segment as in MMIX-SIM. The runtime stack will be initialized if we UNSAVE

from the last location loaded in the .mmb �le.

#de�ne rQ 16

hSet up the canned environment 12 i �if (cur loc :h 6= 3) ffprintf (stderr ; "Panic: MMIX binary file didn't set up the stack!\n");exit (�6);

ginst ptr :o = mem read (incr (cur loc ;�8 � 14)); =� Main �=inst ptr :p = �;cur loc :h = #

60000000;g[255]:o = incr (cur loc ;�8); =� place to UNSAVE �=cur dat :l = #

90;if (mem read (cur dat ):h) inst ptr :o = cur dat ; =� start at #90 if nonzero �=head~ inst = (UNSAVE � 24) + 255; tail ��; =� prefetch a fabricated command �=head~ loc = incr (inst ptr :o;�4); =� in case the UNSAVE is interrupted �=g[rT ]:o:h = #

80000005; g[rTT ]:o:h = #80000006;

cur dat :h = (RESUME � 24) + 1; cur dat :l = 0; cur loc :h = 5; cur loc :l = 0;mem write (cur loc ; cur dat ); =� the primitive trap handler �=cur dat :l = cur dat :h; cur dat :h = (NEGI � 24) + (255� 16) + 1;cur loc :h = 6; cur loc :l = 8;mem write (cur loc ; cur dat ); =� the primitive dynamic trap handler �=cur dat :h = (GET � 24) + rQ ; cur dat :l = (PUTI � 24) + (rQ � 16); cur loc :l = 0;mem write (cur loc ; cur dat ); =� more of the primitive dynamic trap handler �=cur dat :h = 0; cur dat :l = 7; =� generate a PTE with rwx permission �=cur loc :h = 4; =� beginning of skeleton page table �=mem write (cur loc ; cur dat ); =� PTE for the text segment �=ITcache~set [0][0]:tag = zero octa ;ITcache~set [0][0]:data [0] = cur dat ; =� prime the IT cache �=cur dat :l = 6; =� PTE with read and write permission only �=cur dat :h = 1; cur loc :l = 3� 13;mem write (cur loc ; cur dat ); =� PTE for the data segment �=cur dat :h = 2; cur loc :l = 6� 13;mem write (cur loc ; cur dat ); =� PTE for the pool segment �=cur dat :h = 3; cur loc :l = 9� 13;mem write (cur loc ; cur dat ); =� PTE for the stack segment �=g[rK ]:o = neg one ; =� enable all interrupts �=g[rV ]:o:h = #

3692004;page bad = false ; page r = 4� (32� 13); page s = 32; page mask :l = #

ffffffff;page b [1] = 3; page b [2] = 6; page b [3] = 9; page b [4] = 12;

This code is used in section 9.

Page 508: MMIXware - A RISC Computer for the Third Millennium - Knuth

501 MMMIX: INTERACTION

cur dat : octa, x5.cur loc : octa, x5.data : octa �, MMIX-PIPE x167.exit : void ( ), <stdlib.h>.false =0, MMIX-PIPE x11.fprintf : int ( ), <stdio.h>.g: int, MMIX-PIPE x167.GET: ???, x0.h: tetra, MMIX-PIPE x17.head : fetch �, MMIX-PIPE x69.incr : octa ( ), MMIX-ARITH x6.inst : tetra, MMIX-PIPE x68.inst ptr : spec, MMIX-PIPE x284.ITcache : cache �,MMIX-PIPE x168.

l: tetra, MMIX-PIPE x17.

loc : octa, MMIX-PIPE x44.mem read : octa ( ),MMIX-PIPE x210.

mem write : void ( ),MMIX-PIPE x213.

neg one : octa, MMIX-ARITH x4.NEGI: ???, x0.o: octa, MMIX-PIPE x40.p: specnode �, MMIX-PIPE x40.page b : int [ ], MMIX-PIPE x238.page bad : bool,MMIX-PIPE x238.

page mask : octa,MMIX-PIPE x238.

page r : int, MMIX-PIPE x238.

page s : int, MMIX-PIPE x238.PUTI: ???, x0.RESUME: ???, x0.rK =15, MMIX-PIPE x52.rT =13, MMIX-PIPE x52.rTT =14, MMIX-PIPE x52.rV =18, MMIX-PIPE x52.set : cacheset �,MMIX-PIPE x167.

stderr : FILE �, <stdio.h>.tag : octa, MMIX-PIPE x167.tail : fetch �, MMIX-PIPE x69.UNSAVE: ???, x0.zero octa : octa,MMIX-ARITH x4.

Page 509: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: INTERACTION 502

13. Interaction. When prompted for instructions, this simulator understandsthe following terse commands:

� hpositive integer i: Run for this many clock cycles.

� @hhexadecimal integer i: Set the instruction pointer to this virtual address; succes-sive instructions will be fetched from here.

� bhhexadecimal integer i: Set the breakpoint to this virtual address; simulation willpause when an instruction from the breakpoint address enters the fetch bu�er.

� vhhexadecimal integer i: Set the desired level of diagnostic output; each bit in thehexadecimal integer enables certain printouts when the simulator is running. Bit#1 shows instructions when issued, deissued, or committed; #2 shows the pipelineand locks after each cycle; #4 shows each coroutine activation; #8 each coroutinescheduling; #10 reports when reading from an uninitialized chunk of memory; #20asks for online input when reading from addresses � 248; #40 reports all I/O tomemory address � 248.

� -h integer i: Deissue this many instructions.

� lh integer i or gh integer i: Show current \hot" contents of a local or global register.

� mhhexadecimal integer i: Show current contents of a physical memory address.(This value may not be up to date; newer values might appear in the write bu�erand/or in the caches.)

� fhhexadecimal integer i: Insert a tetrabyte into the fetch bu�er. (Use with care!)

� ih integer i: Set the interval counter rI to the given value; this will trigger aninterrupt after the speci�ed number of cycles.

� IT, DT, I, D, or S: Show current contents of a cache.

� D* or S*: Show dirty blocks of a cache.

� p: Show current contents of the pipeline.

� s: Show current statistics on branch prediction and speed of instruction issue.

� h: Help (show the possibilities for interaction).

� q: Quit.

hRun the simulation interactively 13 i �while (1) fprintf ("mmmix> ");fgets (bu�er ; BUF_SIZE; stdin );switch (bu�er [0]) fdefault: what say :

printf ("Eh? Sorry, I don't understand. (Type h for help)\n");continue;

case 'q': case 'x': goto done ;hCases for interaction 14 i

ggdone :

This code is used in section 2.

Page 510: MMIXware - A RISC Computer for the Third Millennium - Knuth

503 MMMIX: INTERACTION

14. hCases for interaction 14 i �case 'h': case '?': printf ("The interactive commands are as follows:\n");

printf (" <n> to run for n cycles\n");printf (" @<x> to take next instruction from location x\n");printf (" b<x> to pause when location x is fetched\n");printf (" v<x> to print specified diagnostics when running;\n");printf (" x=1[insts enter/leave pipe]+2[whole pipeline each cycle]+\n");printf (" 4[coroutine activations]+8[coroutine scheduling]+\n");printf (" 10[uninitialized read]+20[online I/O read]+\n");printf (" 40[I/O read/write]+80[branch prediction details]+\n");printf (" 100[invalid cache blocks displayed too]\n");printf (" -<n> to deissue n instructions\n");printf (" l<n> to print current value of local register n\n");printf (" g<n> to print current value of global register n\n");printf (" m<x> to print current value of memory address x\n");printf (" f<x> to insert instruction x into the fetch buffer\n");printf (" i<n> to initiate a timer interrupt after n cycles\n");printf (" IT, DT, I, D, or S to print current cache contents\n");printf (" D* or S* to print dirty blocks of a cache\n");printf (" p to print current pipeline contents\n");printf (" s to print current stats\n");printf (" h to print this message\n");printf (" q to exit\n");printf ("(Here <n> is a decimal integer, <x> is hexadecimal.)\n");continue;

See also sections 15, 18, 19, 21, 22, 23, and 24.

This code is used in section 13.

15. hCases for interaction 14 i +�case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7':case '8': case '9':if (sscanf (bu�er ; "%d";&n) 6= 1) goto what say ;printf ("Running %d at time %d"; n; ticks :l);if (bp :h � �1 ^ bp :l � �1) printf ("\n");else printf (" with breakpoint %08x%08x\n"; bp :h; bp :l);MMIX run (n; bp); continue;

case '@': inst ptr :o = read hex (bu�er + 1); inst ptr :p = �; continue;case 'b': bp = read hex (bu�er + 1); continue;case 'v': verbose = read hex (bu�er + 1):l; continue;

bp : octa, x16.BUF_SIZE=100, x5.bu�er : char [ ], x5.fgets : char �( ), <stdio.h>.h: tetra, MMIX-PIPE x17.inst ptr : spec, MMIX-PIPE x284.

l: tetra, MMIX-PIPE x17.MMIX run : void ( ),MMIX-PIPE x10.

n: int, x16.o: octa, MMIX-PIPE x40.p: specnode �, MMIX-PIPE x40.

printf : int ( ), <stdio.h>.read hex : octa ( ), x17.sscanf : int ( ), <stdio.h>.stdin : FILE �, <stdio.h>.ticks =macro, MMIX-PIPE x87.verbose : int, MMIX-PIPE x4.

Page 511: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: INTERACTION 504

16. hGlobal variables 5 i +�int n; m; =� temporary integer �=octa bp = f�1;�1g; =� breakpoint �=octa tmp ; =� an octabyte of temporary interest �=static unsigned char d[BUF_SIZE];

17. Here's a simple program to read an octabyte in hexadecimal notation from abu�er. It changes the bu�er by storing a null character after the input.

hSubroutines 10 i +�octa read hex ARGS((char �));octa read hex (p)

char �p;fregister int j; k;octa val ;

val :h = val :l = 0;for (j = 0; ; j++) fif (p[j] � '0' ^ p[j] � '9') d[j] = p[j]� '0';else if (p[j] � 'a' ^ p[j] � 'f') d[j] = p[j]� 'a' + 10;else if (p[j] � 'A' ^ p[j] � 'F') d[j] = p[j]� 'A' + 10;else break;

gp[j] = '\0';for (j��; k = 0; k � j; k++) fif (k � 8) val :h += d[j � k]� (4 � k � 32);else val :l += d[j � k]� (4 � k);

greturn val ;

g

18. hCases for interaction 14 i +�case '-': if (sscanf (bu�er + 1; "%d";&n) 6= 1 _ n < 0) goto what say ;if (cool � hot ) m = hot � cool ; else m = (hot � reorder bot ) + 1+ (reorder top � cool );if (n > m) deissues = m; else deissues = n;continue;

case 'l': if (sscanf (bu�er + 1; "%d";&n) 6= 1 _ n < 0) goto what say ;if (n � lring size ) goto what say ;printf (" l[%d]=%08x%08x\n"; n; l[n]:o:h; l[n]:o:l); continue;

case 'm': tmp = mem read (read hex (bu�er + 1));printf (" m[%s]=%08x%08x\n"; bu�er + 1; tmp :h; tmp :l); continue;

Page 512: MMIXware - A RISC Computer for the Third Millennium - Knuth

505 MMMIX: INTERACTION

19. The register stack pointers, rO and rS, are not kept up to date in the g array.Therefore we have to deduce their values by examining the pipeline.

hCases for interaction 14 i +�case 'g': if (sscanf (bu�er + 1; "%d";&n) 6= 1 _ n < 0) goto what say ;if (n � 256) goto what say ;if (n � rO _ n � rS ) fif (hot � cool ) =� pipeline empty �=

g[rO ]:o = sl3 (cool O ); g[rS ]:o = sl3 (cool S );else g[rO ]:o = sl3 (hot~cur O ); g[rS ]:o = sl3 (hot~cur S );

gprintf (" g[%d]=%08x%08x\n"; n; g[n]:o:h; g[n]:o:l);continue;

20. h Subroutines 10 i +�static octa sl3 ARGS((octa));static octa sl3 (y) =� shift left by 3 bits �=

octa y;fregister tetra yhl = y:h� 3; ylh = y:l� 29;

y:h = yhl + ylh ; y:l�= 3;return y;

g

21. hCases for interaction 14 i +�case 'I': print cache (bu�er [1] � 'T' ? ITcache : Icache ; false ); continue;case 'D': print cache (bu�er [1] � 'T' ? DTcache : Dcache ;

bu�er [1] � '*'); continue;case 'S': print cache (Scache ; bu�er [1] � '*'); continue;case 'p': print pipe ( ); print locks ( ); continue;case 's': print stats ( ); continue;case 'i': if (sscanf (bu�er + 1; "%d";&n) � 1) g[rI ]:o = incr (zero octa ; n);continue;

ARGS=macro ( ), MMIX-PIPE x6.BUF_SIZE=100, x5.bu�er : char [ ], x5.cool : control �, MMIX-PIPE x60.cool O : octa, MMIX-PIPE x98.cool S : octa, MMIX-PIPE x98.cur O : octa, MMIX-PIPE x44.cur S : octa, MMIX-PIPE x44.Dcache : cache �,MMIX-PIPE x168.

deissues : int, MMIX-PIPE x60.DTcache : cache �,MMIX-PIPE x168.

false =0, MMIX-PIPE x11.g: int, MMIX-PIPE x167.h: tetra, MMIX-PIPE x17.hot : control �, MMIX-PIPE x60.Icache : cache �,

MMIX-PIPE x168.incr : octa ( ), MMIX-ARITH x6.ITcache : cache �,MMIX-PIPE x168.

l: tetra, MMIX-PIPE x17.lring size : int, MMIX-PIPE x86.mem read : octa ( ),MMIX-PIPE x210.

o: octa, MMIX-PIPE x40.octa= struct, MMIX-PIPE x17.print cache : void ( ),MMIX-PIPE x176.

print locks : void ( ),MMIX-PIPE x39.

print pipe : void ( ),MMIX-PIPE x253.

print stats : void ( ),

MMIX-PIPE x162.printf : int ( ), <stdio.h>.reorder bot : control �,MMIX-PIPE x60.

reorder top : control �,MMIX-PIPE x60.

rI =12, MMIX-PIPE x52.rO =10, MMIX-PIPE x52.rS =11, MMIX-PIPE x52.Scache : cache �,MMIX-PIPE x168.

sscanf : int ( ), <stdio.h>.tetra=unsigned int,MMIX-PIPE x17.

what say : label, x13.zero octa : octa,MMIX-ARITH x4.

Page 513: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: INTERACTION 506

22. hCases for interaction 14 i +�case 'f': tmp = read hex (bu�er + 1);fregister fetch �new tail ;

if (tail � fetch bot ) new tail = fetch top ;else new tail = tail � 1;if (new tail � head ) printf ("Sorry, the fetch buffer is full!\n");else f

tail~ loc = inst ptr :o;tail~ inst = tmp :l;tail~ interrupt = 0;tail~noted = false ;tail = new tail ;

gcontinue;

g

23. A hidden case here, for me when debugging. It essentially disables the transla-tion caches, by mapping everything to zero.

hCases for interaction 14 i +�case 'd': if (ticks :l)

printf ("Sorry: I disable ITcache and DTcache only at the beginning!\n");else fITcache~set [0][0]:tag = zero octa ;ITcache~set [0][0]:data [0] = seven octa ;DTcache~set [0][0]:tag = zero octa ;DTcache~set [0][0]:data [0] = seven octa ;g[rK ]:o = neg one ;page bad = false ;page mask = neg one ;inst ptr :p = (specnode �) 1;

g continue;

24. And another case, for me when kludging. At the moment, it simply lists thefunctional unit names.But I might decide to put other stu� here when giving a demo.

hCases for interaction 14 i +�case 'k': f

register int j;

for (j = 0; j < funit count ; j++) printf ("unit %s %d\n"; funit [j]:name ; funit [j]:k);gcontinue;

Page 514: MMIXware - A RISC Computer for the Third Millennium - Knuth

507 MMMIX: INTERACTION

25. hGlobal variables 5 i +�bool bad address ;extern bool page bad ;extern octa page mask ;extern int page r ; page s ; page b [5];extern octa zero octa ;extern octa neg one ;octa seven octa = f0; 7g;extern octa incr ARGS((octa y; int delta )); =� unsigned y + Æ (Æ is signed) �=extern void mmix io init ARGS((void));

ARGS=macro ( ), MMIX-PIPE x6.bool= enum, MMIX-PIPE x11.bu�er : char [ ], x5.data : octa �, MMIX-PIPE x167.DTcache : cache �,MMIX-PIPE x168.

false =0, MMIX-PIPE x11.fetch= struct, MMIX-PIPE x68.fetch bot : fetch �,MMIX-PIPE x69.

fetch top : fetch �,MMIX-PIPE x69.

funit : func �, MMIX-PIPE x77.funit count : int,MMIX-PIPE x77.

g: int, MMIX-PIPE x167.head : fetch �, MMIX-PIPE x69.incr : octa ( ), MMIX-ARITH x6.inst : tetra, MMIX-PIPE x68.

inst ptr : spec, MMIX-PIPE x284.interrupt : unsigned int,MMIX-PIPE x68.

ITcache : cache �,MMIX-PIPE x168.

k: register int, x17.l: tetra, MMIX-PIPE x17.loc : octa, MMIX-PIPE x44.mmix io init : void ( ),MMIX-IO x7.

name : char �, MMIX-PIPE x167.neg one : octa, MMIX-ARITH x4.noted : bool, MMIX-PIPE x68.o: octa, MMIX-PIPE x40.octa= struct, MMIX-PIPE x17.p: specnode �, MMIX-PIPE x40.page b : int [ ], MMIX-PIPE x238.page bad : bool,

MMIX-PIPE x238.page mask : octa,MMIX-PIPE x238.

page r : int, MMIX-PIPE x238.page s : int, MMIX-PIPE x238.printf : int ( ), <stdio.h>.read hex : octa ( ), x17.rK =15, MMIX-PIPE x52.set : cacheset �,MMIX-PIPE x167.

specnode= struct,MMIX-PIPE x40.

tag : octa, MMIX-PIPE x167.tail : fetch �, MMIX-PIPE x69.ticks =macro, MMIX-PIPE x87.tmp : octa, x16.zero octa : octa,MMIX-ARITH x4.

Page 515: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMMIX: NAMES OF THE SECTIONS 508

26. Names of the sections.

hCases for interaction 14, 15, 18, 19, 21, 22, 23, 24 i Used in section 13.

hChange the current location 7 i Used in section 6.

hGlobal variables 5, 16, 25 i Used in section 2.

h Input a rudimentary hexadecimal �le 6 i Used in section 4.

h Input an MMIX binary �le 9 i Used in section 4.

h Input consecutive octabytes beginning at cur loc 11 i Used in section 9.

h Input the program 4 i Used in section 2.

hParse the command line 3 i Used in section 2.

hRead an octabyte and advance cur loc 8 i Used in section 6.

hRun the simulation interactively 13 i Used in section 2.

hSet up the canned environment 12 i Used in section 9.

hSubroutines 10, 17, 20 i Used in section 2.

Page 516: MMIXware - A RISC Computer for the Third Millennium - Knuth

510

MMOTYPE

1. Introduction. This program reads a binary mmo �le output by the MMIXAL

processor and lists it in human-readable form. It lists only the symbol table, ifinvoked with the -s option. It lists also the tetrabytes of input, if invoked with the-v option.

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

hPrototype preparations 5 ihType de�nitions 7 ihGlobal variables 4 ih Subroutines 8 i

int main (argc ; argv )int argc ; char �argv [ ];

fregister int j; k; delta ; postamble = 0;register char �p;register tetra t;

hProcess the command line 2 i;h Initialize everything 3 i;hList the preamble 23 i;do hList the next item 13 i while (:postamble );hList the postamble 24 i;hList the symbol table 25 i;

g

2. hProcess the command line 2 i �listing = 1; verbose = 0;for (j = 1; j < argc � 1 ^ argv [j][0] � '-' ^ argv [j][2] � '\0'; j++) fif (argv [j][1] � 's') listing = 0;else if (argv [j][1] � 'v') verbose = 1;else break;

gif (j 6= argc � 1) f

fprintf (stderr ; "Usage: %s [-s] [-v] mmofile\n"; argv [0]);exit (�1);

g

This code is used in section 1.

3. h Initialize everything 3 i �mmo �le = fopen (argv [argc � 1]; "rb");if (:mmo �le ) f

fprintf (stderr ; "Can't open file %s!\n"; argv [argc � 1]);exit (�2);

g

See also sections 12 and 17.

This code is used in section 1.

D.E. Knuth: MMIXware, LNCS 1750, pp. 510-523, 1999. Springer-Verlag Berlin Heidelberg 1999

Page 517: MMIXware - A RISC Computer for the Third Millennium - Knuth

511 MMOTYPE: INTRODUCTION

4. hGlobal variables 4 i �int listing ; =� are we listing everything? �=int verbose ; =� are we also showing the tetras of input as they are read? �=FILE �mmo �le ; =� the input �le �=

See also sections 11, 16, and 29.

This code is used in section 1.

5. hPrototype preparations 5 i �#ifdef __STDC__

#de�ne ARGS(list ) list

#else#de�ne ARGS(list ) ( )#endif

This code is used in section 1.

6. A complete de�nition of mmo format appears in the MMIXAL document. Here weneed to de�ne only the basic constants used for interpretation.

#de�ne mm #98 =� the escape code of mmo format �=

#de�ne lop quote #0 =� the quotation lopcode �=

#de�ne lop loc #1 =� the location lopcode �=

#de�ne lop skip #2 =� the skip lopcode �=

#de�ne lop �xo #3 =� the octabyte-�x lopcode �=

#de�ne lop �xr #4 =� the relative-�x lopcode �=

#de�ne lop �xrx #5 =� extended relative-�x lopcode �=

#de�ne lop �le #6 =� the �le name lopcode �=

#de�ne lop line #7 =� the �le position lopcode �=

#de�ne lop spec #8 =� the special hook lopcode �=

#de�ne lop pre #9 =� the preamble lopcode �=

#de�ne lop post #a =� the postamble lopcode �=

#de�ne lop stab #b =� the symbol table lopcode �=

#de�ne lop end #c =� the end-it-all lopcode �=

__STDC__, Standard C.exit : void ( ), <stdlib.h>.

FILE, <stdio.h>.fopen : FILE �( ), <stdio.h>.

fprintf : int ( ), <stdio.h>.stderr : FILE �, <stdio.h>.

Page 518: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMOTYPE: LOW-LEVEL ARITHMETIC 512

7. Low-level arithmetic. This program is intended to work correctly wheneveran int has at least 32 bits.

hType de�nitions 7 i �typedef unsigned char byte; =� a monobyte �=typedef unsigned int tetra; =� a tetrabyte �=typedef struct f tetra h; l;g octa; =� an octabyte �=

This code is used in section 1.

8. The incr subroutine adds a signed integer to an (unsigned) octabyte.

hSubroutines 8 i �octa incr ARGS((octa; int));

octa incr (o; delta )octa o;int delta ;

fregister tetra t;octa x;

if (delta � 0) ft = #

ffffffff� delta ;if (o:l � t) x:l = o:l + delta ; x:h = o:h;else x:l = o:l � t� 1; x:h = o:h+ 1;

gelse f

t = �delta ;if (o:l � t) x:l = o:l � t; x:h = o:h;else x:l = o:l + (#ffffffff+ delta ) + 1; x:h = o:h� 1;

greturn x;

g

See also sections 9, 10, and 26.

This code is used in section 1.

Page 519: MMIXware - A RISC Computer for the Third Millennium - Knuth

513 MMOTYPE: LOW-LEVEL INPUT

9. Low-level input. The tetrabytes of an mmo �le are stored in friendly big-endian fashion, but this program is supposed to work also on computers that arelittle-endian. Therefore we read four successive bytes and pack them into a tetrabyte,instead of reading a single tetrabyte.

hSubroutines 8 i +�void read tet ARGS((void));

void read tet ( )fif (fread (buf ; 1; 4;mmo �le ) 6= 4) f

fprintf (stderr ; "Unexpected end of file after %d tetras!\n"; count );exit (�3);

gyz = (buf [2]� 8) + buf [3];tet = (((buf [0]� 8) + buf [1])� 16) + yz ;if (verbose ) printf (" %08x\n"; tet );count++;

g

10. h Subroutines 8 i +�byte read byte ARGS((void));

byte read byte ( )fregister byte b;

if (:byte count ) read tet ( );b = buf [byte count ];byte count = (byte count + 1) & 3;return b;

g

11. hGlobal variables 4 i +�int count ; =� the number of tetrabytes we've read �=int byte count ; =� index of the next-to-be-read byte �=byte buf [4]; =� the most recently read bytes �=int yz ; =� the two least signi�cant bytes �=tetra tet ; =� buf bytes packed big-endianwise �=

12. h Initialize everything 3 i +�count = byte count = 0;

ARGS=macro ( ), x5.exit : void ( ), <stdlib.h>.fprintf : int ( ), <stdio.h>.

fread : size t ( ), <stdio.h>.mmo �le : FILE �, x4.printf : int ( ), <stdio.h>.

stderr : FILE �, <stdio.h>.verbose : int, x4.

Page 520: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMOTYPE: THE MAIN LOOP 514

13. The main loop. Now for the bread-and-butter part of this program.

hList the next item 13 i �f

read tet ( );loop : if (buf [0] � mm )

switch (buf [1]) fcase lop quote : if (yz 6= 1) err ("YZ field of lop_quote should be 1");

read tet ( ); break;hCases for lopcodes in the main loop 18 idefault: err ("Unknown lopcode");g

if (listing ) hList tet as a normal item 15 i;g

This code is used in section 1.

14. We want to catch all cases where the rules of mmo format are not obeyed. Theerror macro ameliorates this somewhat tedious chore.

#de�ne err (m)f fprintf (stderr ; "Error in tetra %d: %s!\n"; count ;m); continue; g

15. In a normal situation, the newly read tetrabyte is simply supposed to be loadedinto the current location. We list not only the current location but also the current�le position, if cur line is nonzero and cur loc belongs to segment 0.

hList tet as a normal item 15 i �f

printf ("%08x%08x: %08x"; cur loc :h; cur loc :l; tet );if (:cur line ) printf ("\n");else fif (cur loc :h& #

e0000000) printf ("\n");else fif (cur �le � listed �le ) printf (" (line %d)\n"; cur line );else f

printf (" (\"%s\", line %d)\n";�le name [cur �le ]; cur line );listed �le = cur �le ;

ggcur line++;

gcur loc = incr (cur loc ; 4); cur loc :l &= �4;

g

This code is used in section 13.

16. hGlobal variables 4 i +�octa cur loc ; =� the current location �=int listed �le ; =� the most recently listed �le number �=int cur �le ; =� the most recently selected �le number �=int cur line ; =� the current position in cur �le �=char ��le name [256]; =� �le names seen �=octa tmp ; =� an octabyte of temporary interest �=

Page 521: MMIXware - A RISC Computer for the Third Millennium - Knuth

515 MMOTYPE: THE MAIN LOOP

17. h Initialize everything 3 i +�cur loc :h = cur loc :l = 0;listed �le = cur �le = �1;cur line = 0;

buf : byte [ ], x11.count : int, x11.fprintf : int ( ), <stdio.h>.h: tetra, x7.incr : octa ( ), x8.

l: tetra, x7.listing : int, x4.lop quote =#

0, x6.mm =#

98, x6.octa, x7.

printf : int ( ), <stdio.h>.read tet : void ( ), x9.stderr : FILE �, <stdio.h>.tet : tetra, x11.yz : int, x11.

Page 522: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMOTYPE: THE SIMPLE LOPCODES 516

18. The simple lopcodes. We have already implemented lop quote , which fallsthrough to the normal case after reading an extra tetrabyte. Now let's consider theother lopcodes in turn.

#de�ne y buf [2] =� the next-to-least signi�cant byte �=#de�ne z buf [3] =� the least signi�cant byte �=

hCases for lopcodes in the main loop 18 i �case lop loc : if (z � 2) f

read tet ( ); cur loc :h = (y � 24) + tet ;g else if (z � 1) cur loc :h = y � 24;else err ("Z field of lop_loc should be 1 or 2");read tet ( ); cur loc :l = tet ;continue;

case lop skip : cur loc = incr (cur loc ; yz ); continue;

See also sections 19, 20, 21, and 22.

This code is used in section 13.

19. Fixups load information out of order, when future references have been resolved.The current �le name and line number are not considered relevant.

hCases for lopcodes in the main loop 18 i +�case lop �xo : if (z � 2) f

read tet ( ); tmp :h = (y � 24) + tet ;g else if (z � 1) tmp :h = y � 24;else err ("Z field of lop_fixo should be 1 or 2");read tet ( ); tmp :l = tet ;if (listing ) printf ("%08x%08x: %08x%08x\n"; tmp :h; tmp :l; cur loc :h; cur loc :l);continue;

case lop �xr : delta = yz ;goto �xr ;

case lop �xrx : j = yz ; if (j 6= 16 ^ j 6= 24)err ("YZ field of lop_fixrx should be 16 or 24");

read tet ( );delta = tet ;if (delta & #

fe000000) err ("increment of lop_fixrx is too large");�xr : tmp = incr (cur loc ;�(delta � #

1000000 ? (delta & #ffffff)� (1� j) : delta )� 2);

if (listing ) printf ("%08x%08x: %08x\n"; tmp :h; tmp :l; delta );continue;

20. The space for �le names isn't allocated until we are sure we need it.

hCases for lopcodes in the main loop 18 i +�case lop �le : if (�le name [y]) f

if (z) err ("Two file names with the same number");for (j = z; j > 0; j��) read tet ( );cur �le = y;

g else fif (:z) err ("No name given for newly selected file");�le name [y] = (char �) calloc(4 � z + 1; 1);

Page 523: MMIXware - A RISC Computer for the Third Millennium - Knuth

517 MMOTYPE: THE SIMPLE LOPCODES

if (:�le name [y]) ffprintf (stderr ; "No room to store the file name!\n"); exit (�4);

gcur �le = y;for (j = z; p = �le name [y]; j > 0; j��; p += 4) f

read tet ( );�p = buf [0]; �(p+ 1) = buf [1]; �(p+ 2) = buf [2]; �(p+ 3) = buf [3];

ggcur line = 0; continue;

case lop line : if (cur �le < 0) err ("No file was selected for lop_line");cur line = yz ; continue;

21. Special bytes in the �le might be in synch with the current location and/or thecurrent �le position, so we list those parameters too.

hCases for lopcodes in the main loop 18 i +�case lop spec : if (listing ) f

printf ("Special data %d at loc %08x%08x"; yz ; cur loc :h; cur loc :l);if (:cur line ) printf ("\n");else if (cur �le � listed �le ) printf (" (line %d)\n"; cur line );else f

printf (" (\"%s\", line %d)\n";�le name [cur �le ]; cur line );listed �le = cur �le ;

ggwhile (1) f

read tet ( );if (buf [0] � mm ) fif (buf [1] 6= lop quote _ yz 6= 1) goto loop ; =� end of special data �=read tet ( );

gif (listing ) printf (" %08x\n"; tet );

g

buf : byte [ ], x11.calloc : void �( ), <stdlib.h>.cur �le : int, x16.cur line : int, x16.cur loc : octa, x16.delta : int, x8.err =macro ( ), x14.exit : void ( ), <stdlib.h>.�le name : char �[ ], x16.fprintf : int ( ), <stdio.h>.h: tetra, x7.incr : octa ( ), x8.

j: register int, x1.l: tetra, x7.listed �le : int, x16.listing : int, x4.loop : label, x13.lop �le =#

6, x6.lop �xo =#

3, x6.lop �xr =#

4, x6.lop �xrx =#

5, x6.lop line =#

7, x6.lop loc =#

1, x6.

lop quote =#0, x6.

lop skip =#2, x6.

lop spec =#8, x6.

mm =#98, x6.

p: register char �, x1.printf : int ( ), <stdio.h>.read tet : void ( ), x9.stderr : FILE �, <stdio.h>.tet : tetra, x11.tmp : octa, x16.yz : int, x11.

Page 524: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMOTYPE: THE SIMPLE LOPCODES 518

22. The other cases shouldn't appear in the main loop.

hCases for lopcodes in the main loop 18 i +�case lop pre : err ("Can't have another preamble");case lop post : postamble = 1;if (y) err ("Y field of lop_post should be zero");if (z < 32) err ("Z field of lop_post must be 32 or more");continue;

case lop stab : err ("Symbol table must follow postamble");case lop end : err ("Symbol table can't end before it begins");

Page 525: MMIXware - A RISC Computer for the Third Millennium - Knuth

519 MMOTYPE: THE PREAMBLE AND POSTAMBLE

23. The preamble and postamble. Now here's what we do before and afterthe main loop.

hList the preamble 23 i �read tet ( ); =� read the �rst tetrabyte of input �=if (buf [0] 6= mm _ buf [1] 6= lop pre ) f

fprintf (stderr ; "Input is not an MMO file (first two bytes are wrong)!\n");exit (�5);

gif (y 6= 1) fprintf (stderr ;

"Warning: I'm reading this file as version 1, not version %d!"; y);if (z > 0) f

j = z;read tet ( );if (listing ) printf ("File was created %s"; asctime (localtime ((time t �) &tet )));for (j��; j > 0; j��) f

read tet ( );if (listing ) printf ("Preamble data %08x\n"; tet );

gg

This code is used in section 1.

24. hList the postamble 24 i �for (j = z; j < 256; j++) f

read tet ( ); tmp :h = tet ; read tet ( );if (listing ) fif (tmp :h _ tet ) printf ("g%03d: %08x%08x\n"; j; tmp :h; tet );else printf ("g%03d: 0\n"; j);

gg

This code is used in section 1.

asctime : char �( ), <time.h>.buf : byte [ ], x11.err =macro ( ), x14.exit : void ( ), <stdlib.h>.fprintf : int ( ), <stdio.h>.h: tetra, x7.j: register int, x1.listing : int, x4.

localtime : struct tm �( ),<time.h>.

lop end =#c, x6.

lop post =#a, x6.

lop pre =#9, x6.

lop stab =#b, x6.

mm =#98, x6.

postamble : register int, x1.

printf : int ( ), <stdio.h>.read tet : void ( ), x9.stderr : FILE �, <stdio.h>.tet : tetra, x11.tmp : octa, x16.y=macro, x18.z=macro, x18.

Page 526: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMOTYPE: THE SYMBOL TABLE 520

25. The symbol table. Finally we come to the symbol table, which is the mostinteresting part of this program because it recursively traces an implicit ternary triestructure.

hList the symbol table 25 i �read tet ( );if (buf [0] 6= mm _ buf [1] 6= lop stab) f

fprintf (stderr ; "Symbol table does not follow the postamble!\n");exit (�6);

gif (yz ) fprintf (stderr ; "YZ field of lop_stab should be zero!\n");printf ("Symbol table (beginning at tetra %d):\n"; count );stab start = count ;sym ptr = sym buf ;print stab( );hCheck the lop end 30 i;

This code is used in section 1.

26. The main work is done by a recursive subroutine called print stab , whichmanipulates a global array sym buf containing the current symbol pre�x; the globalvariable sym ptr points to the �rst un�lled character of that array.

hSubroutines 8 i +�void print stab ARGS((void));

void print stab( )fregister int m = read byte ( ); =� the master control byte �=register int c; =� the character at the current trie node �=register int j; k;

if (m& #40) print stab( ); =� traverse the left subtrie, if it is nonempty �=

if (m& #2f) f

hRead the character c 27 i;�sym ptr ++ = c;if (sym ptr � &sym buf [sym length max ]) f

fprintf (stderr ; "Oops, the symbol is too long!\n"); exit (�7);gif (m& #

f) hPrint the current symbol with its equivalent and serial number 28 i;if (m& #

20) print stab ( ); =� traverse the middle subtrie �=sym ptr ��;

gif (m& #

10) print stab( ); =� traverse the right subtrie, if it is nonempty �=g

Page 527: MMIXware - A RISC Computer for the Third Millennium - Knuth

521 MMOTYPE: THE SYMBOL TABLE

27. The present implementation doesn't support Unicode; characters with morethan 8-bit codes are printed as `?'. However, the changes for 16-bit codes would bequite easy if proper fonts for Unicode output were available. In that case, sym buf

would be an array of wyde characters.

hRead the character c 27 i �if (m& #

80) j = read byte ( ); =� 16-bit character �=else j = 0;c = read byte ( );if (j) c = '?'; =� oops, we can't print (j � 8) + c easily at this time �=

This code is used in section 26.

28. hPrint the current symbol with its equivalent and serial number 28 i �f�sym ptr = '\0';j = m& #

f;if (j � 15) sprintf (equiv buf ; "$%03d"; read byte ( ));else if (j � 8) f

strcpy (equiv buf ; "#");for ( ; j > 0; j��) sprintf (equiv buf + strlen (equiv buf ); "%02x"; read byte ( ));if (strcmp(equiv buf ; "#0000") � 0) strcpy (equiv buf ; "?"); =� unde�ned �=

g else fstrncpy (equiv buf ; "#20000000000000"; 33� 2 � j);for ( ; j > 8; j��) sprintf (equiv buf + strlen (equiv buf ); "%02x"; read byte ( ));

gfor (j = k = read byte ( ); ; k = read byte ( ); j = (j � 7) + k)if (k � 128) break; =� the serial number is now j � 128 �=

printf (" %s = %s (%d)\n"; sym buf + 1; equiv buf ; j � 128);g

This code is used in section 26.

29. #de�ne sym length max 1000

hGlobal variables 4 i +�int stab start ; =� where the symbol table began �=char sym buf [sym length max ];=� the characters on middle transitions to current node �=

char �sym ptr ; =� the character in sym buf following the current pre�x �=char equiv buf [20]; =� equivalent of the current symbol �=

ARGS=macro ( ), x5.buf : byte [ ], x11.count : int, x11.exit : void ( ), <stdlib.h>.fprintf : int ( ), <stdio.h>.lop end =#

c, x6.

lop stab =#b, x6.

mm =#98, x6.

printf : int ( ), <stdio.h>.read byte : byte ( ), x10.read tet : void ( ), x9.sprintf : int ( ), <stdio.h>.

stderr : FILE �, <stdio.h>.strcmp : int ( ), <string.h>.strcpy : char �( ), <string.h>.strlen : int ( ), <string.h>.strncpy : char �( ), <string.h>.yz : int, x11.

Page 528: MMIXware - A RISC Computer for the Third Millennium - Knuth

MMOTYPE: THE SYMBOL TABLE 522

30. hCheck the lop end 30 i �while (byte count )if (read byte ( )) fprintf (stderr ; "Nonzero byte follows the symbol table!\n");

read tet ( );if (buf [0] 6= mm _ buf [1] 6= lop end )

fprintf (stderr ; "The symbol table isn't followed by lop_end!\n");else if (count 6= stab start + yz + 1)

fprintf (stderr ; "YZ field at lop_end should have been %d!\n"; count � yz � 1);else fif (verbose ) printf ("Symbol table ends at tetra %d.\n"; count );if (fread (buf ; 1; 1;mmo �le ))

fprintf (stderr ; "Extra bytes follow the lop_end!\n");g

This code is used in section 25.

Page 529: MMIXware - A RISC Computer for the Third Millennium - Knuth

523 MMOTYPE: NAMES OF THE SECTIONS

31. Names of the sections.

hCases for lopcodes in the main loop 18, 19, 20, 21, 22 i Used in section 13.

hCheck the lop end 30 i Used in section 25.

hGlobal variables 4, 11, 16, 29 i Used in section 1.

h Initialize everything 3, 12, 17 i Used in section 1.

hList tet as a normal item 15 i Used in section 13.

hList the next item 13 i Used in section 1.

hList the postamble 24 i Used in section 1.

hList the preamble 23 i Used in section 1.

hList the symbol table 25 i Used in section 1.

hPrint the current symbol with its equivalent and serial number 28 i Used in sec-

tion 26.

hProcess the command line 2 i Used in section 1.

hPrototype preparations 5 i Used in section 1.

hRead the character c 27 i Used in section 26.

hSubroutines 8, 9, 10, 26 i Used in section 1.

hType de�nitions 7 i Used in section 1.

buf : byte [ ], x11.byte count : int, x11.count : int, x11.fprintf : int ( ), <stdio.h>.fread : size t ( ), <stdio.h>.

lop end =#c, x6.

mm =#98, x6.

mmo �le : FILE �, x4.printf : int ( ), <stdio.h>.read byte : byte ( ), x10.

read tet : void ( ), x9.stab start : int, x29.stderr : FILE �, <stdio.h>.verbose : int, x4.yz : int, x11.

Page 530: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 524

MASTER INDEX

The following list, a compilation of the indexes produced from all the MMIXware

programs and documentation, shows the section numbers where each identi�er makes

an appearance. Underlined numbers indicate a place of de�nition. Single-letter

identi�ers are indexed only when they are de�ned.

Further characteristics of the program segments, such as `system dependencies',

can also be found here, together with signi�cant error messages and other indexable

things like the names of people whose work is cited.

Digits follow letters in the lexicographic order of this index. For example, `t1 '

follows `tt '.

?? : MMIX-PIPE 25.%% : MMIX-CONFIG 18.__STDC__ : MMIX-ARITH 2, MMIX-IO 2, MMIX-

PIPE 6, MMIX-SIM 11, MMIXAL 31,MMOTYPE 5.

a: MMIX-ARITH 28, 29, 59, MMIX-PIPE 44, 91,167, 381, 384, MMIX-SIM 61, 114, 117.

aa : MMIX-CONFIG 16, 23, 31, 32, MMIX-PIPE 167, 177, 181, 186, 187, 189, 191, 193,196, 199, 205, 233, 234.

aaaaa : MMIX-PIPE 237, 243, 244.abort : MMIX-IO 8.absolute value, oating point: MMIX 13.ABSTIME : MMIX-PIPE 89, MMIX-SIM 77.acc : MMIX-ARITH 8, 11, 12, 13, 19, MMIXAL 29,

83, 92, 93, 94, 95, 96, 107, 109, 126, 127, 131.access time : MMIX-CONFIG 16, 23, MMIX-

PIPE 167, 217, 224, 230, 233, 234, 257, 261,262, 266, 267, 268, 270, 271, 272, 273, 274,288, 291, 292, 295, 296, 300, 326, 353, 354,358, 359, 360, 364, 365, 366.

acctm : MMIX-CONFIG 13, 15, 23.ADD : MMIX 9, MMIX-PIPE 47, MMIX-SIM 54,

84, MMIXAL 63.add : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 140.add go : MMIX-PIPE 331.ADDI : MMIX-PIPE 47, MMIX-SIM 54, 84.addr : MMIX-IO 4, MMIX-MEM 2, 3, MMIX-

PIPE 40, 43, 44, 73, 89, 95, 100, 115, 116,144, 208, 209, 210, 212, 213, 216, 236,240, 246, 251, 255, 256, 257, 259, 260,261, 262, 281, 297, 356, 378, 379, 381, 384,MMIX-SIM 20, 114, 117.

addr found : MMIX-PIPE 256.ADDU : MMIX 7, 9, MMIX-PIPE 47, MMIX-SIM 54,

78, 85, MMIXAL 63.addu : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 139.ADDUI : MMIX-PIPE 47, MMIX-SIM 54, 85, 131.Advanced Micro Devices: MMIX 42.after : MMIX-PIPE 282.alf : MMIX-PIPE 192, 193, 195, 205.align bits : MMIXAL 62, 102, 107.

alloc cache : MMIX-CONFIG 31, 35.alloc slot : MMIX-PIPE 204, 205, 218, 222, 225,

261, 272, 274, 276, 298, 300, 326.Alpha computers: MMIX 45.alt name : MMIX-SIM 24.AND : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.and : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

138, MMIXAL 82, 97, 101.Anderson, Jennifer-Ann Monique: MMIX 40.ANDI : MMIX-PIPE 47, MMIX-SIM 54, 86.ANDN : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.andn : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 138.ANDNH : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.ANDNI : MMIX-PIPE 47, MMIX-SIM 54, 86.ANDNL : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.ANDNMH : MMIX 13, MMIX-PIPE 47, MMIX-

SIM 54, 86, MMIXAL 63.ANDNML : MMIX 13, MMIX-PIPE 47, MMIX-

SIM 54, 86, MMIXAL 63.Aragon, Cecilia Rodriguez: MMIX-SIM 16.arg : MMIX-SIM 143, 145, 146.arg count : MMIX-PIPE 374, 380, MMIX-

SIM 110, 111.arg loc : MMIX-PIPE 380.argc : MMIX-SIM 37, 141, 142, 163, MMIXAL 136,

137, MMMIX 2, 3, MMOTYPE 1, 2, 3.ARGS : MMIX-ARITH 2, MMIX-IO 2, MMIX-

PIPE 6, MMIX-SIM 11, MMIXAL 31,MMOTYPE 5.

argv : MMIX-SIM 141, 142, 144, MMIXAL 136,137, MMMIX 2, 3, MMOTYPE 1, 2, 3.

arith exc : MMIX-PIPE 44, 46, 59, 98, 100,146, 307.

ASCII: MMIX 6.asctime : MMOTYPE 23.assemble : MMIXAL 52, 117, 119, 128.assemble inst : MMIXAL 119, 129, 130, 131.assemble X : MMIXAL 119, 124, 125, 126, 127.

Page 531: MMIXware - A RISC Computer for the Third Millennium - Knuth

525 MASTER INDEX

assembly language: MMIXAL 1.assoc : MMIX-CONFIG 13, 15, 23.AT&T Bell Laboratories: MMIX 42.atomic instruction: MMIX 31.Attempt to get characters... : MMIX-

PIPE 381.Attempt to put characters... : MMIX-

PIPE 384.aux : MMIX-ARITH 4, 8, 9, 11, 12, 13, 14, 19, 24,

43, 45, MMIX-PIPE 20, 21, 343, MMIX-SIM 13,37, 88, 155, 159, MMIXAL 27, 28, 101.

avoid D : MMIX-PIPE 273, 277.awaken : MMIX-PIPE 125, 222, 224, 245.b: MMIX-ARITH 28, 29, 59, MMIX-PIPE 44, 56,

82, 157, 167, 172, MMIX-SIM 27, 61, 91, 160,MMIXAL 48, MMOTYPE 10.

B_BIT : MMIX-PIPE 54, 118, 304, 323, 329,330, 332, 336, 337.

backward local : MMIXAL 90, 91, 109.backward local host : MMIXAL 89, 90, 91.Bad object file : MMIX-SIM 26.bad address : MMMIX 9, 11, 25.bad fetch : MMIX-PIPE 288, 293, 296, 298, 301.bad guesses : MMIX-SIM 93, 139, 140.bad inst mask : MMIX-PIPE 304, 305, 323.bad resume : MMIX-PIPE 323.bb : MMIX-CONFIG 16, 23, 30, 31, 32, 33, 35, 36,

37, MMIX-PIPE 167, 170, 172, 179, 185, 193,201, 203, 216, 217, 218, 219, 221, 223, 224,226, 227, 228, 229, 259, 262, 265, 268, 271,273, 275, 276, 280, 292, 294, 364, 378, 379.

BDIF : MMIX 11, MMIX-PIPE 47, MMIX-SIM 54,87, MMIXAL 63.

bdif : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 344.BDIFI : MMIX-PIPE 47, MMIX-SIM 54, 87.before : MMIX-PIPE 282.Bentley, Jon Louis: MMIXAL 54.Berc, Lance Michael: MMIX 40.BEV : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BEVB : MMIX-PIPE 47, MMIX-SIM 54, 93.big-endian versus little-endian: MMIX 6, 12,

MMIX-IO 16, MMIX-PIPE 304, MMIXAL 47.bignum: MMIX-ARITH 54, 59, 60, 61, 62,

66, 68, 81, 82.bignum compare : MMIX-ARITH 54, 61, 64,

65, 83.bignum dec : MMIX-ARITH 54, 62, 65, 83.bignum double : MMIX-ARITH 68, 82, 83.bignum prec : MMIX-ARITH 59, 62, 65, 83.bignum times ten : MMIX-ARITH 54, 60,

64, 65, 82.binary �les: MMMIX 9.binary-to-decimal conversion: MMIX-ARITH 54.binary check : MMIXAL 101.BinaryRead : MMIX-SIM 4, MMIXAL 69.BinaryReadWrite : MMIX-SIM 4, MMIXAL 69.BinaryWrite : MMIX-SIM 4, MMIXAL 69.bit code map : MMIX-PIPE 54, 56.

bits : MMIXAL 62, 64.bkpt : MMIX-SIM 16, 58, 63, 82, 83, 161, 162.blksz : MMIX-CONFIG 13, 15, 23.block di� : MMIX-PIPE 217, 219.BN : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BNB : MMIX-PIPE 47, MMIX-SIM 54, 93.BNN : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BNNB : MMIX-PIPE 47, MMIX-SIM 54, 93.BNP : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BNPB : MMIX-PIPE 47, MMIX-SIM 54, 93.BNZ : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BNZB : MMIX-PIPE 47, MMIX-SIM 54, 93.BOD : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BODB : MMIX-PIPE 47, MMIX-SIM 54, 93.bool: MMIX-ARITH 1, 4, 9, 29, 70, MMIX-

PIPE 11, 12, 20, 21, 40, 44, 65, 66, 68, 75,169, 170, 175, 176, 202, 203, 238, 242, 303,315, MMIX-SIM 9, 13, 48, 52, 61, 129, 140,143, 144, 150, 151, MMIXAL 26.

bool mult : MMIX-ARITH 29, MMIX-PIPE 21,344, MMIX-SIM 13, 87.

Boolean multiplication: MMIX 12.borrow : MMIX-ARITH 62.BP : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.bp : MMMIX 15, 16.bp a : MMIX-CONFIG 15, 37, MMIX-PIPE 150,

153.bp amask : MMIX-PIPE 151, 152, 153, 154.bp b : MMIX-CONFIG 15, 37, MMIX-PIPE 150,

151, 152, 153.bp bad stat : MMIX-PIPE 154, 155, 162.bp bcmask : MMIX-PIPE 151, 152, 153, 154.bp c : MMIX-CONFIG 15, 37, MMIX-PIPE 150,

153.bp cmask : MMIX-PIPE 151, 152, 153, 154.bp good stat : MMIX-PIPE 154, 155, 162.bp n : MMIX-CONFIG 15, 37, MMIX-PIPE 150,

153.bp nmask : MMIX-PIPE 152, 153, 154.bp npower : MMIX-PIPE 151, 152, 153, 154, 160.bp ok stat : MMIX-PIPE 152, 154, 162.bp rev stat : MMIX-PIPE 152, 154, 162.bp table : MMIX-CONFIG 37, MMIX-PIPE 150,

151, 152, 160, 162.BPB : MMIX-PIPE 47, MMIX-SIM 54, 93.br : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

85, 106, 152, 155.break inst : MMIX-SIM 107.breakpoint : MMIX-PIPE 9, 10, 304, MMIX-

SIM 61, 63, 82, 83, 107, 109, 127, 128,141, 149.

breakpoint hit : MMIX-PIPE 10, 12, 304.BSPEC : MMIXAL 63, MMIXAL 43, 62, 63, 132.

Page 532: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 526

buf : MMIX-ARITH 75, 76, 79, MMIX-IO 4, 12,13, 14, 15, 16, 17, 18, 19, 20, MMIX-MEM 1,2, MMIX-PIPE 381, 384, MMIX-SIM 13, 25,26, 27, 28, 29, 33, 35, 36, 45, 114, 117,MMIXAL 47, MMOTYPE 9, 10, 11, 13, 18,20, 21, 23, 25, 30.

buf max : MMIX-ARITH 73, 74, 75.buf pointer : MMIX-CONFIG 9, 10.buf ptr : MMIXAL 33, 34, 102, 136.BUF_SIZE : MMIX-CONFIG 9, 10, MMMIX 5,

6, 13, 16.buf size : MMIX-SIM 40, 41, 42, 45, 143,

MMIXAL 32, 34, 84, 137, 139.bu�er : MMIX-CONFIG 9, 10, MMIX-IO 12,

14, 16, 18, MMIX-SIM 4, 40, 41, 42, 45,MMIXAL 32, 33, 34, 38, 41, MMMIX 5, 6,7, 8, 13, 15, 18, 19, 21, 22.

buf0 : MMIX-ARITH 73, 74, 75, 79.bus words : MMIX-CONFIG 36, 37, MMIX-

PIPE 214, 216, 219, 223, 297.bypass : MMIXAL 45, 102, 103, 104, 132.BYTE : MMIXAL 62, 63, 117, MMIXAL 63.byte: MMIX 6.byte: MMIX-SIM 10, 25, 27, MMOTYPE 7,

10, 11.byte count : MMIX-SIM 24, 25, 27,

MMOTYPE 10, 11, 12, 30.byte di� : MMIX-ARITH 27, 28, MMIX-PIPE 21,

344, MMIX-SIM 13, 87.BZ : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.BZB : MMIX-PIPE 47, MMIX-SIM 54, 93.c: MMIX-ARITH 29, MMIX-CONFIG 16, 23, 31,

MMIX-PIPE 25, 28, 31, 33, 46, 159, 167,170, 172, 174, 176, 179, 181, 183, 185, 193,196, 199, 201, 203, 205, 215, 217, 222, 224,237, 326, MMOTYPE 26.

C preprocessor: MMIXAL 3.c param: MMIX-CONFIG 13.cache: MMIX-PIPE 167, 168, 169, 170, 171,

172, 173, 174, 175, 176, 178, 179, 180,181, 182, 183, 184, 185, 192, 193, 195, 196,198, 199, 200, 201, 202, 203, 204, 205, 215,217, 222, 224, 237, 326.

cache addr : MMIX-PIPE 192, 193, 196, 201,205, 217.

cache search : MMIX-PIPE 192, 193, 195, 205,206, 217, 224, 233, 234, 262, 267, 268,271, 272, 273, 291, 292, 296, 302, 353, 354,365, 366, 367, 378, 379.

cacheblock: MMIX-PIPE 167, 169, 170, 171,172, 178, 179, 184, 185, 186, 187, 188, 189,190, 191, 192, 193, 195, 196, 198, 199, 200,201, 202, 203, 204, 205, 217, 222, 224, 232,237, 257, 258, 378, 379.

caches: MMIX 30, MMIX-PIPE 163.cacheset: MMIX-PIPE 167, 186, 187, 188, 189,

190, 191, 193, 194, 196, 205.calloc : MMIX-CONFIG 16, 18, 26, 31, 32, 33,

34, 36, 37, 38, MMIX-IO 12, MMIX-PIPE 213,MMIX-SIM 17, 24, 35, 41, 42, 77, MMIXAL 32,38, 55, 59, 84, MMOTYPE 20.

can complement... : MMIXAL 100.can compute... : MMIXAL 101.can divide... : MMIXAL 101.can multiply... : MMIXAL 101.can negate... : MMIXAL 100.can registerize... : MMIXAL 100.can take serial number... : MMIXAL 100.Can't allocate... : MMIX-CONFIG 16, 18, 31,

32, 33, 34, 36, 37, MMIX-SIM 17.Can't have another... : MMOTYPE 22.Can't open... : MMIX-CONFIG 38, MMIX-

SIM 24, MMIXAL 138, MMMIX 6, 9,MMOTYPE 3.

Can't write... : MMIXAL 47.cannot add... : MMIXAL 99.cannot subtract... : MMIXAL 101.cannot use... : MMIXAL 102.Capacity exceeded... : MMIXAL 38, 55, 59.carry : MMIX-ARITH 60, 82.carry-save addition: MMIX 40.catchint : MMIX-SIM 147, 148.cc : MMIX-CONFIG 16, 23, 31, 32, MMIX-

PIPE 158, 159, 167, 177, 181, 184, 185, 222,224, 233, 234, 237, 245, 357.

cease : MMIX-PIPE 10.ch : MMIXAL 54, 57, 61, 74, 75, 79.Char: MMIX-SIM 39, 40, 41, MMIXAL 30, 32,

33, 38, 40, 57, 77.char switch : MMIX-SIM 133, 134.check ld : MMIX-SIM 94, 96.check st : MMIX-SIM 95.check syntax : MMIX-SIM 149.choose victim : MMIX-PIPE 186, 187, 196, 205.chunk : MMIX-CONFIG 37, MMIX-PIPE 206, 209,

210, 213, 216, 219, 223, 297, MMMIX 8, 11.chunknode: MMIX-PIPE 206, 207.citm : MMIX-CONFIG 13, 15, 23.clean block : MMIX-PIPE 178, 179, 181, 276,

365, 366, 367.clean co : MMIX-PIPE 230, 231, 361, 363,

364, 368.clean ctl : MMIX-PIPE 230, 231, 361, 368.clean lock : MMIX-PIPE 39, 230, 233, 234,

361, 368.cleanup : MMIX-PIPE 129, 230, 231, 232.clearerr : MMIX-IO 13.Clock time is... : MMIX-PIPE 14.CMP : MMIX 15, MMIX-PIPE 47, MMIX-SIM 54,

90, MMIXAL 63.cmp : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 143.cmp �n : MMIX-PIPE 348, MMIX-SIM 90.cmp neg : MMIX-PIPE 143, 348, MMIX-SIM 90.cmp pos : MMIX-PIPE 143, 348, MMIX-SIM 90.cmp zero : MMIX-PIPE 143, 348, MMIX-SIM 90.cmp zero or invalid : MMIX-PIPE 348, MMIX-

SIM 90.

Page 533: MMIXware - A RISC Computer for the Third Millennium - Knuth

527 MASTER INDEX

CMPI : MMIX-PIPE 47, MMIX-SIM 54, 90.CMPU : MMIX 9, 15, MMIX-PIPE 47, MMIX-

SIM 54, 90, MMIXAL 63.cmpu : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 143.CMPUI : MMIX-PIPE 47, MMIX-SIM 54, 90.co : MMIX-CONFIG 26, MMIX-PIPE 76, 81,

82, 237, 243, 244.code : MMIXAL 62, 64.command line arguments: MMIX-SIM 2, 6, 163.command buf : MMIX-SIM 149, 150, 151.command buf size : MMIX-SIM 150, 151.commit max : MMIX-CONFIG 15, MMIX-

PIPE 59, 67, 145, 330.compare-and-swap: MMIX 31.complement : MMIXAL 82, 86, 100.config file line... : MMIX-CONFIG 10.con�g �le : MMIX-CONFIG 9, 10, 19, 38.con�g �le name : MMMIX 2, 3.Configuration error... : MMIX-CONFIG 20,

23, 24, 25, 29, 31, 35, 37.Configuration syntax error... : MMIX-

CONFIG 19, 23.confusion : MMIX-PIPE 13, 28, 135, 185, 187.constant doesn't fit... : MMIXAL 117.constant found : MMIXAL 92, 93, 94, 95, 96.continuous pro�ling: MMIX 40.control: MMIX-PIPE 44, 45, 46, 60, 63, 73,

78, 124, 127, 158, 159, 167, 230, 235, 248,254, 255, 285, 357.

control struct: MMIX-PIPE 44.cool : MMIX-PIPE 60, 61, 63, 67, 69, 75, 78, 81,

82, 84, 85, 86, 98, 99, 100, 102, 103, 104,105, 106, 108, 109, 110, 111, 112, 113, 114,117, 118, 119, 120, 121, 122, 123, 145, 152,158, 160, 227, 308, 309, 312, 314, 316, 322,323, 324, 332, 333, 334, 335, 337, 338, 339,340, 341, 347, 355, 372, MMMIX 18, 19.

cool G : MMIX-PIPE 99, 102, 104, 105, 106, 110,117, 120, 312, 323, 335, 337.

cool hist : MMIX-PIPE 74, 75, 99, 151, 152,160, 308, 309, 316.

cool L: MMIX-PIPE 99, 102, 104, 105, 106, 110,112, 119, 120, 312, 323, 337, 338.

cool O : MMIX-PIPE 75, 98, 100, 104, 105, 106,110, 112, 117, 118, 119, 120, 145, 147, 333,337, 338, 339, MMMIX 19.

cool S : MMIX-PIPE 75, 98, 100, 110, 113, 114,118, 119, 120, 145, 147, 337, MMMIX 19.

copy block : MMIX-PIPE 184, 185, 217, 221.copy in time : MMIX-CONFIG 16, 23, MMIX-

PIPE 167, 217, 222, 224, 237, 276.copy out time : MMIX-CONFIG 16, 23, MMIX-

PIPE 167, 203, 221, 233, 234, 259.coroutine: MMIX-PIPE 23, 24, 25, 26, 27, 28,

29, 30, 31, 32, 33, 34, 35, 36, 37, 44, 76, 124,127, 167, 222, 224, 230, 235, 237, 248, 285.

coroutine bit : MMIX-PIPE 8, 10, 125.coroutine struct: MMIX-PIPE 23.

cotm : MMIX-CONFIG 13, 15, 23.count : MMIX-PIPE 216, 219, 223, MMOTYPE 9,

11, 12, 14, 25, 30.count bits : MMIX-ARITH 26, 28, MMIX-PIPE 21,

344, MMIX-SIM 13, 87.counting leading zeros: MMIX 28.counting ones: MMIX 12.counting trailing zeros: MMIX 37.CPV : MMIX-CONFIG 15, 16, 17, 23.CPV size : MMIX-CONFIG 15, 17, 23.cpv spec: MMIX-CONFIG 13, 15, 17.cset : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 345.CSEV : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSEVI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSN : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSNI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSNN : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSNNI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSNP : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSNPI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSNZ : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSNZI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSOD : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSODI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSP : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSPI : MMIX-PIPE 47, MMIX-SIM 54, 92.CSWAP : MMIX 31, 50, MMIX-PIPE 47, 281,

MMIX-SIM 54, 96, MMIXAL 63.cswap : MMIX-PIPE 49, 51, 117, 283, 307.CSWAPI : MMIX-PIPE 47, MMIX-SIM 54, 96.CSZ : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.CSZI : MMIX-PIPE 47, MMIX-SIM 54, 92.ctl : MMIX-CONFIG 16, MMIX-PIPE 23, 30, 31,

32, 44, 81, 124, 125, 128, 134, 222, 224, 231,236, 243, 244, 245, 249, 255, 286.

ctl change bit : MMIX-PIPE 81, 83, 85.cur arg : MMIX-SIM 142, 144, 163.cur dat : MMMIX 5, 8, 9, 10, 11, 12.cur disp addr : MMIX-SIM 151, 152, 156,

157, 159.cur disp mode : MMIX-SIM 151, 152, 156,

157, 159.cur disp set : MMIX-SIM 151, 152, 153, 156.cur disp type : MMIX-SIM 151, 153, 159.cur �le : MMIX-SIM 30, 31, 32, 35, 42, 44, 45,

47, 49, 51, 53, 63, MMIXAL 36, 38, 45, 50,MMOTYPE 15, 16, 17, 20, 21.

cur greg : MMIXAL 108, 109, 134, 143.cur line : MMIX-SIM 30, 31, 32, 35, 47, 51, 53,

63, 82, 83, 103, 105, 128, MMOTYPE 15,16, 17, 20, 21.

Page 534: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 528

cur loc : MMIX-SIM 30, 31, 32, 33, 34, 50, 51,162, 165, MMIXAL 42, 43, 49, 52, 53, 96,107, 109, 110, 114, 115, 118, 125, 126,130, 131, 132, MMMIX 5, 7, 8, 9, 11, 12,MMOTYPE 15, 16, 17, 18, 19, 21.

cur O : MMIX-PIPE 44, 100, 145, 147,MMMIX 19.

cur pre�x : MMIXAL 56, 61, 87, 111, 132.cur round : MMIX-ARITH 30, 40, 43, 45, 46,

47, 86, 88, 89, 91, MMIX-PIPE 20, 346,MMIX-SIM 13, 77, 89, 100, 158.

cur S : MMIX-PIPE 44, 100, 145, 147,MMMIX 19.

cur seg : MMIX-SIM 151, 152, 161.cur time : MMIX-PIPE 28, 29, 125.cycs : MMIX-PIPE 9, 10.d: MMIX-ARITH 13, 27, 46, 50, MMIX-PIPE 28,

31, 97, 170, 197, 201, 203, 220, MMMIX 16.D_BIT : MMIX-ARITH 31, MMIX-PIPE 54, 308,

343, MMIX-SIM 57, MMIXAL 69.D_Handler : MMIXAL 69.Danger : MMIXAL 142.dat : MMIX-ARITH 59, 60, 61, 62, 63, 64, 65,

79, 80, 82, 83, MMIX-SIM 16, 20, 50, 51,162, 165, MMIXAL 52.

data : MMIX-CONFIG 31, 32, 33, MMIX-PIPE 124, 125, 130, 131, 132, 133, 134,135, 137, 138, 139, 140, 141, 142, 143,144, 155, 156, 160, 167, 172, 179, 185,197, 201, 203, 215, 216, 217, 218, 219, 220,222, 223, 224, 225, 226, 232, 233, 234, 237,239, 243, 244, 245, 257, 259, 260, 261, 262,264, 265, 266, 267, 268, 269, 270, 271, 272,273, 274, 275, 276, 277, 278, 279, 280, 281,282, 283, 288, 289, 291, 292, 293, 294, 295,296, 297, 298, 300, 301, 302, 304, 307, 308,309, 310, 313, 325, 326, 327, 328, 329, 330,331, 336, 342, 343, 344, 345, 346, 348, 350,351, 352, 353, 354, 356, 357, 358, 359, 360,361, 363, 364, 365, 366, 367, 368, 369, 370,378, 379, MMMIX 12, 23.

Data_Segment : MMIX-SIM 3, MMIXAL 69.Dcache : MMIX-CONFIG 17, 21, 35, 36, MMIX-

PIPE 39, 128, 168, 215, 217, 222, 227, 228,233, 234, 257, 259, 261, 262, 263, 265, 267,268, 271, 273, 274, 275, 276, 280, 360, 364,366, 378, 379, MMMIX 21.

Dclean : MMIX-PIPE 233.Dclean inc : MMIX-PIPE 233.Dclean loop : MMIX-PIPE 233.dd : MMIX-PIPE 197, 203.dderr : MMIXAL 45, 114.Dean, Je�rey Adgate: MMIX 40.dec pt : MMIX-ARITH 73, 74, 76, 77, 79.decgamma : MMIX-PIPE 49, 114, 147, 327.decimal : MMIX-SIM 133, 135, 137.decimal-to-binary conversion: MMIX-ARITH 68.default go : MMIX-PIPE 46.DEFINED : MMIXAL 58, 74, 78, 87, 91, 109, 115.

defval : MMIX-CONFIG 12, 13, 14, 16, 17.deissues : MMIX-PIPE 60, 61, 63, 64, 67, 145,

160, 308, 309, 316, MMMIX 18.del : MMIX-PIPE 216.delay : MMIX-PIPE 219.delink : MMIXAL 99, 100.delta : MMIX-ARITH 6, 93, 94, MMIX-PIPE 21,

MMIX-SIM 13, 25, 34, MMIXAL 28,MMMIX 25, MMOTYPE 1, 8, 19.

demote and �x : MMIX-PIPE 198, 199, 233, 234,268, 271, 273, 353, 354, 365, 366, 367.

demote usage : MMIX-PIPE 190, 191, 199.denin : MMIX-PIPE 44, 100, 133, 346, 348.denin penalty : MMIX-CONFIG 15, MMIX-

PIPE 279, 346, 348, 349, 350.denormal numbers: MMIX 21.denout : MMIX-PIPE 44, 100, 133, 134, 346.denout penalty : MMIX-CONFIG 15, MMIX-

PIPE 281, 346, 349, 351.derr : MMIXAL 45, 86, 97, 100, 101, 102, 103,

104, 109, 116, 117, 121, 122, 123, 124, 129.dest : MMIXAL 126, 131.die : MMIX-PIPE 144, 160, 265, 308, 309, 310.dig : MMIX-SIM 15.dirty : MMIX-CONFIG 31, 32, 33, MMIX-

PIPE 167, 170, 172, 179, 181, 185, 197, 201,203, 216, 221, 259, 262.

dirty only : MMIX-PIPE 176, 177.dispatch count : MMIX-PIPE 64, 65, 81.dispatch done : MMIX-PIPE 101, 112, 113,

114, 332.dispatch lock : MMIX-PIPE 39, 64, 65, 75, 81,

85, 310, 329, 330, 356.dispatch max : MMIX-CONFIG 15, 37, MMIX-

PIPE 59, 74, 75, 85, 162.dispatch stat : MMIX-CONFIG 37, MMIX-

PIPE 64, 66, 162.Ditzel, David Roger: MMIX 42.DIV : MMIX 20, 50, MMIX-PIPE 47, MMIX-

SIM 54, 88, MMIXAL 63.div : MMIX-CONFIG 15, 27, 28, MMIX-PIPE 7,

49, 51, 121, 343.DIVI : MMIX-PIPE 47, MMIX-SIM 54, 88.divide check exception: MMIX 20, 32.division by zero : MMIXAL 101.DIVU : MMIX 20, MMIX-PIPE 47, MMIX-SIM 54,

88, MMIXAL 63.divu : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

121, 343.DIVUI : MMIX-PIPE 47, MMIX-SIM 54, 88.Dlocker : MMIX-PIPE 127, 128, 276.do resume trans : MMIX-PIPE 325, 326.do syncd : MMIX-PIPE 280, 364, 369.do syncid : MMIX-PIPE 280, 364, 369.doing interrupt : MMIX-PIPE 63, 64, 65, 314,

317, 318.done : MMIX-ARITH 65, MMIX-IO 12, 13,

MMIX-PIPE 125, 134, 233, 234, MMMIX 13.done with write : MMIX-PIPE 256.

Page 535: MMIXware - A RISC Computer for the Third Millennium - Knuth

529 MASTER INDEX

down : MMIX-PIPE 40, 86, 89, 95, 97, 116.dpanic : MMIXAL 45, 47, 138, 142.DPTco : MMIX-PIPE 235, 236, 237.DPTctl : MMIX-PIPE 235, 236.DPTname : MMIX-PIPE 235, 236.DT hit : MMIX-PIPE 267, 268, 270, 271,

272, 273.DT miss : MMIX-PIPE 267, 270, 272.DT retry : MMIX-PIPE 272.DTcache : MMIX-CONFIG 17, 21, 35, MMIX-

PIPE 39, 128, 168, 236, 237, 266, 267, 268,269, 270, 272, 325, 353, 358, MMMIX 21, 23.

dump : MMIX-SIM 164, 165.dump �le : MMIX-SIM 144, 146, 164, 166.dump tet : MMIX-SIM 164, 165, 166.DUNNO : MMIX-PIPE 254, 255, 268, 270, 271, 278.dynamic traps: MMIX 35, 37.e: MMIX-ARITH 31, 34, 37, 38, 39, 40, 50, 56, 89.E_BIT : MMIX-ARITH 31, 93, 94, MMIX-PIPE 54,

56, 306, 314, 317, 351.ee : MMIX-ARITH 37, 38, 50, 51, 53.ef : MMIX-ARITH 50, 51.Emerson, Ralph Waldo: MMIX 7.emulate virt : MMIX-PIPE 272, 310, 327.emulation: MMIX 24, 27, 33, 36, 38, 47, 49,

MMIX-CONFIG 6.end simulation : MMIX-SIM 141, 149.EOF : MMIXAL 34, 35, MMMIX 10.eof : MMIX-IO 14, 15, 16, 17.eps : MMIX-PIPE 21, MMIX-SIM 13.equiv : MMIXAL 58, 59, 64, 66, 70, 75, 76, 78,

82, 87, 94, 98, 99, 100, 101, 104, 108, 109,110, 113, 114, 117, 118, 121, 122, 123, 124,125, 126, 127, 129, 130, 131, 132, 134, 144.

equiv buf : MMOTYPE 28, 29.err : MMIXAL 35, 45, 93, 95, 98, 99, 100, 101,

106, 108, 109, 117, 118, 121, 122, 123, 124,126, 127, 129, 131, 132, MMOTYPE 13,14, 18, 19, 20, 22.

err buf : MMIXAL 32, 33, 45.err count : MMIXAL 45, 46, 79, 142, 145.Error in tetra... : MMOTYPE 14.errprint coroutine id : MMIX-PIPE 24, 25, 28.errprint0 : MMIX-CONFIG 8, 18, 24, 35, 36,

37, MMIX-PIPE 13, 22, 25.errprint1 : MMIX-CONFIG 8, 10, 16, 19, 20, 23,

24, 25, 29, 31, 32, 33, 34, 38, MMIX-PIPE 13,14, 28, 213.

errprint2 : MMIX-CONFIG 8, 20, 23, 31, 32, 33,MMIX-PIPE 13, 14, 25, 210.

errprint3 : MMIX-CONFIG 8, 32.es : MMIX-ARITH 50.ESPEC : MMIXAL 63, MMIXAL 43, 62, 63, 132.et : MMIX-ARITH 50.ex : MMIX-ARITH 90.exc : MMIX-SIM 60, 61, 84, 85, 87, 88, 89, 90,

95, 108, 122, 123, 126, 131.exceptions: MMIX 32.

exceptions : MMIX-ARITH 31, 32, 33, 35, 36,37, 38, 40, 41, 42, 44, 46, 86, 88, 89, 90,91, 93, 94, MMIX-PIPE 20, 281, 346, 351,MMIX-SIM 13, 89, 95.

exec bit : MMIX-SIM 58, 63, 161, 162.exit : MMIX-CONFIG 8, 38, MMIX-PIPE 14,

MMIX-SIM 7, 14, 24, 26, 35, 143, 164,MMIXAL 45, 137, 142, MMMIX 3, 6, 7, 8, 9,11, 12, MMOTYPE 2, 3, 9, 20, 23, 25, 26.

exp : MMIX-ARITH 73, 76, 77, 79, 83, 84.exp sign : MMIX-ARITH 77.expanding : MMIXAL 127, 137, 139.expire : MMIX-PIPE 13, 14.Extern: MMIX-PIPE 4, 5.Extra bytes follow... : MMOTYPE 30.f : MMIX-ARITH 31, 34, 37, 38, 39, 40, 56, 60,

61, 62, 82, MMIX-IO 10, MMIX-PIPE 75,MMIX-SIM 62.

F_BIT : MMIX-PIPE 54, 122, 256, 302, 306, 309,310, 313, 314, 317, 319, 320, 321, 327.

FADD : MMIX 22, MMIX-PIPE 47, MMIX-SIM 54,89, MMIXAL 63.

fadd : MMIX-CONFIG 15, 28, MMIX-PIPE 49,51, 346.

fake stdin : MMIX-SIM 144, 145.false : MMIX-ARITH 1, 24, 68, MMIX-CONFIG 10,

15, MMIX-PIPE 11, 12, 59, 75, 81, 100, 112,113, 114, 147, 170, 179, 201, 203, 205, 221,239, 244, 259, 301, 304, 314, 323, 324, 330,332, 337, 340, 351, 363, 369, MMIX-SIM 9,49, 51, 60, 87, 90, 128, 131, 133, 141, 143,149, 150, 152, 153, 161, MMIXAL 26, 34,64, 66, 70, 118, 125, 130, 132, MMMIX 8,9, 10, 11, 12, 21, 22, 23.

Fascicle 1: MMIX 4.Fclose : MMIXAL 69.Fclose : MMIX-PIPE 371, 372, MMIX-SIM 59,

108.fclose : MMIX-IO 8, 11, MMIX-SIM 4, 32, 145,

150, MMMIX 4.FCMP : MMIX 23, MMIX-PIPE 47, MMIX-SIM 54,

90, MMIXAL 63.fcmp : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 348.FCMPE : MMIX 25, MMIX-PIPE 47, 348, MMIX-

SIM 54, 90, MMIXAL 63.fcomp : MMIX-ARITH 85, MMIX-PIPE 21, 346,

348, MMIX-SIM 13, 89, 90.FDIV : MMIX 22, 50, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63.fdiv : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 346.fdivide : MMIX-ARITH 44, MMIX-PIPE 21, 346,

MMIX-SIM 13, 89.feof : MMIX-IO 15, 17, MMIX-SIM 42.feps : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 348.fepscomp : MMIX-ARITH 50, MMIX-PIPE 21,

348, MMIX-SIM 13, 90.

Page 536: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 530

FEQL : MMIX 23, MMIX-PIPE 47, MMIX-SIM 54,90, MMIXAL 63.

FEQLE : MMIX 25, MMIX-PIPE 47, 348, MMIX-SIM 54, 90, MMIXAL 63.

ferror : MMIX-IO 13.fetch: MMIX-PIPE 68, 69, 70, 73, 74, 301.fetch bot : MMIX-CONFIG 37, MMIX-PIPE 69,

73, 74, 75, 301, MMMIX 22.fetch buf size : MMIX-CONFIG 15, 37.fetch co : MMIX-PIPE 285, 286, 287.fetch ctl : MMIX-PIPE 285, 286.fetch hi : MMIX-PIPE 285, 294, 297, 301.fetch lo : MMIX-PIPE 285, 294, 297, 301, 304.fetch max : MMIX-CONFIG 15, MMIX-PIPE 59,

284, 301.fetch one : MMIX-PIPE 301.fetch ready : MMIX-PIPE 285, 291, 292, 296,

297, 299, 301.fetch retry : MMIX-PIPE 298, 300.fetch top : MMIX-CONFIG 37, MMIX-PIPE 69,

71, 73, 74, 75, 301, MMMIX 22.fetched : MMIX-CONFIG 37, MMIX-PIPE 284,

285, 294, 297, 301, 304.�: MMIX-ARITH 63, 64, 65, 66, 79, 80, 81, 83.�ush : MMIX-IO 18, 19, 20, MMIX-PIPE 387,

MMIX-SIM 4, 120, 150.fgetc : MMIXAL 34, 35, MMMIX 10.Fgets : MMIX-SIM 4, MMIXAL 69.Fgets : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.fgets : MMIX-CONFIG 10, 38, MMIX-IO 15,

MMIX-MEM 2, MMIX-PIPE 387, MMIX-SIM 4,42, 45, 120, 150, MMIXAL 34, MMMIX 6, 13.

Fgetws : MMIX-SIM 4, MMIXAL 69.Fgetws : MMIX-PIPE 371, 372, MMIX-SIM 59,

108.fgetws : MMIX-SIM 4.�le : MMIX-SIM 4.File...was modified : MMIX-SIM 44.�le info : MMIX-SIM 35, 40, 42, 44, 45, 49.�le name : MMOTYPE 15, 16, 20, 21.�le no : MMIX-SIM 16, 30, 51, 63.�le node: MMIX-SIM 38, 40.�lename : MMIX-CONFIG 38, MMIXAL 36, 37,

38, 45, 50, 140.�lename count : MMIXAL 37, 38, 140.FILENAME_MAX : MMIX-IO 2, 8, MMIXAL 38,

39, 139.�lename passed : MMIXAL 50, 51.�ll from mem : MMIX-CONFIG 35, MMIX-

PIPE 129, 222, 224, 237.�ll from S : MMIX-CONFIG 35, MMIX-PIPE 129,

224, 237.�ll from virt : MMIX-CONFIG 35, MMIX-

PIPE 129, 237, 242.�ll lock : MMIX-PIPE 167, 174, 222, 224, 225,

226, 237, 257, 261, 272, 274, 298, 300.�ller : MMIX-CONFIG 16, 35, MMIX-PIPE 167,

176, 195, 196, 204, 218, 224, 225, 261, 272,274, 276, 298, 300, 326.

�ller ctl : MMIX-CONFIG 16, MMIX-PIPE 167,176, 225, 236, 261, 272, 274, 298, 300, 326.

�n b ot : MMIX-PIPE 346.�n bin : MMIXAL 99, 101.�n ex : MMIX-PIPE 135, 144, 155, 266, 269,

271, 272, 273, 274, 276, 279, 281, 283,296, 298, 300, 301, 313, 325, 326, 327,328, 329, 331, 336, 342, 345, 346, 350, 351,356, 360, 363, 364, 370.

�n oat : MMIX-SIM 89.�n ot : MMIX-PIPE 346.�n ld : MMIX-PIPE 279, MMIX-SIM 94.�n pst : MMIX-SIM 95.�n st : MMIX-PIPE 281, MMIX-SIM 95.�n u ot : MMIX-PIPE 346.�n uni oat : MMIX-SIM 89.�nish store : MMIX-PIPE 272, 279, 280.FINT : MMIX 22, 24, 28, MMIX-PIPE 47,

MMIX-SIM 54, 89, MMIXAL 63.�nt : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 346, 347.�ntegerize : MMIX-ARITH 86, 88, MMIX-

PIPE 21, 346, MMIX-SIM 13, 89.�rst : MMIX-PIPE 216.FIX : MMIX 27, 28, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63.�x : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 346, 347.�x o : MMIXAL 58, 112, 118.�x xyz : MMIXAL 58, 114, 130.�x yz : MMIXAL 58, 114, 125.�xit : MMIX-ARITH 88, MMIX-PIPE 21, 346,

MMIX-SIM 13, 89.�xr : MMIX-SIM 34, MMOTYPE 19.FIXU : MMIX 27, 28, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63. ags : MMIX-PIPE 80, 81, 83, 312, 320,

MMIX-SIM 60, 64, 65. oat-to-�x exception: MMIX 27, 32. oating : MMIX-SIM 134, 135, 137. oating point arithmetic: MMIX 21. oatit : MMIX-ARITH 89, MMIX-PIPE 21, 346,

MMIX-SIM 13, 89.FLOT : MMIX 27, 28, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63. ot : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 346, 347.FLOTI : MMIX-PIPE 47, MMIX-SIM 54, 89.FLOTU : MMIX 27, 28, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63.FLOTUI : MMIX-PIPE 47, MMIX-SIM 54, 89. ush cache : MMIX-PIPE 202, 203, 205, 217,

233, 234, 263. ush listing line : MMIXAL 41, 42, 44, 45,

115, 132, 134, 136. ush to mem : MMIX-CONFIG 35, MMIX-

PIPE 129, 215. ush to S : MMIX-CONFIG 35, MMIX-PIPE 129,

217.

Page 537: MMIXware - A RISC Computer for the Third Millennium - Knuth

531 MASTER INDEX

usher : MMIX-CONFIG 16, 35, MMIX-PIPE 167,176, 202, 203, 204, 205, 215, 217, 221,233, 234, 259, 263.

usher ctl : MMIX-CONFIG 16, MMIX-PIPE 167.fmt style: MMIX-SIM 135, 137.FMUL : MMIX 22, MMIX-PIPE 47, MMIX-SIM 54,

89, MMIXAL 63.fmul : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 346.fmult : MMIX-ARITH 41, MMIX-PIPE 21, 346,

MMIX-SIM 13, 89.Fopen : MMIX-SIM 4, MMIXAL 69.Fopen : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.fopen : MMIX-CONFIG 38, MMIX-IO 8, MMIX-

SIM 4, 24, 49, 145, 146, 150, MMIXAL 138,MMMIX 6, 9, MMOTYPE 3.

forced traps: MMIX 35, 36.forward local : MMIXAL 90, 91, 111, 145.forward local host : MMIXAL 88, 90, 91.found : MMIX-SIM 21.fp : MMIX-IO 5, 7, 8, 10, 11, 13, 15, 17, 18,

19, 20, 21, 22, 23.fpack : MMIX-ARITH 31, 34, 36, 39, 43, 45, 46,

47, 49, 84, 87, 89, 92, 94.fplus : MMIX-ARITH 46, MMIX-PIPE 21, 346,

MMIX-SIM 13, 89.fprintf : MMIX-CONFIG 8, MMIX-IO 23, MMIX-

PIPE 13, 381, 384, MMIX-SIM 14, 24, 26, 35,44, 49, 143, 145, 146, MMIXAL 30, 35, 41,42, 44, 45, 78, 79, 80, 115, 132, 134, 137,142, 145, MMMIX 3, 6, 7, 8, 9, 10, 11, 12,MMOTYPE 2, 3, 9, 14, 20, 23, 25, 26, 30.

fputc : MMIX-SIM 133, 137, 138, 156, 159, 166.Fputs : MMIX-SIM 4, MMIXAL 69.Fputs : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.fputs : MMIX-SIM 4.Fputws : MMIX-SIM 4, MMIXAL 69.Fputws : MMIX-PIPE 371, 372, MMIX-SIM 59,

108.fputws : MMIX-SIM 4.frac : MMIXAL 82, 97, 101.frame pointer: MMIXAL 18.Fread : MMIX-SIM 4, MMIXAL 69.Fread : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.fread : MMIX-IO 13, 17, MMIX-PIPE 387,

MMIX-SIM 4, 26, 120, MMOTYPE 9, 30.free : MMIX-IO 12, MMIX-SIM 24.freeze dispatch : MMIX-PIPE 75, 81, 118, 355.FREM : MMIX 22, 34, 50, MMIX-PIPE 47,

MMIX-SIM 54, 89, MMIXAL 63.frem : MMIX-CONFIG 27, 28, MMIX-PIPE 49,

51, 320, 350, 351.frem max : MMIX-CONFIG 15, MMIX-PIPE 349,

351.fremstep : MMIX-ARITH 93, MMIX-PIPE 21, 350,

351, MMIX-SIM 13, 89.freopen : MMIX-SIM 4, 49.freq : MMIX-SIM 16, 50, 51, 63, 130.

froot : MMIX-ARITH 91, MMIX-PIPE 21, 346,MMIX-SIM 13, 89.

Fseek : MMIX-SIM 4, MMIXAL 69.Fseek : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.fseek : MMIX-IO 21, MMIX-SIM 4, 45.FSQRT : MMIX 22, 28, 50, MMIX-PIPE 47,

MMIX-SIM 54, 89, MMIXAL 63.fsqrt : MMIX-CONFIG 15, 28, MMIX-PIPE 7,

49, 51, 346, 347.FSUB : MMIX 22, MMIX-PIPE 47, MMIX-SIM 54,

89, MMIXAL 63.fsub : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 346.Ftell : MMIX-SIM 4, MMIXAL 69.Ftell : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.ftell : MMIX-IO 22, MMIX-SIM 4, 42.ftype: MMIX-ARITH 36, 37, 38, 39, 40, 41,

44, 46, 85, 86, 88, 91, 93.FUN : MMIX 23, MMIX-PIPE 47, 348, MMIX-

SIM 54, 90, MMIXAL 63.func: MMIX-PIPE 76.func struct: MMIX-PIPE 76.FUNE : MMIX 25, MMIX-PIPE 47, 348, MMIX-

SIM 54, 90, MMIXAL 63.funeq : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 348.funit : MMIX-CONFIG 18, 25, 26, 29, MMIX-

PIPE 77, 79, 82, MMMIX 24.funit count : MMIX-CONFIG 18, 19, 25, 26,

MMIX-PIPE 77, 79, 82, MMMIX 24.funpack : MMIX-ARITH 36, 37, 40, 41, 44, 46,

50, 85, 86, 88, 91, 93.future reference cannot... : MMIXAL 109.future bits : MMIXAL 116, 119, 120, 125, 130.fwprintf : MMIXAL 30.Fwrite : MMIX-SIM 4, MMIXAL 69.Fwrite : MMIX-PIPE 371, 372, MMIX-SIM 59,

108.fwrite : MMIX-IO 18, 19, 20, MMIX-SIM 4,

MMIXAL 47.G: MMIX 29, MMIX-SIM 75.g: MMIX-ARITH 56, 61, 62, MMIX-PIPE 86,

167, 172, MMIX-SIM 76.gap : MMIX-SIM 47, 48, 53, 128, 143.GET : MMIX 43, MMIX-PIPE 47, MMIX-SIM 54,

97, MMIXAL 63, MMMIX 12.get : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

118, 146, 328.get int : MMIX-CONFIG 11, 20, 23, 24.get reader : MMIX-PIPE 182, 183, 233, 257, 266,

267, 271, 272, 273, 288, 291, 296, 353, 354,358, 359, 360, 365, 366.

get token : MMIX-CONFIG 10, 11, 18, 19,22, 23, 25.

GETA : MMIX 18, MMIX-PIPE 47, MMIX-SIM 54,85, MMIXAL 63.

GETAB : MMIX-PIPE 47, MMIX-SIM 54, 85.gg : MMIX-ARITH 63, 64, 65, 66, MMIX-

CONFIG 16, 23, 31, 35, MMIX-PIPE 167,170, 172, 216.

Page 538: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 532

Ghemawat, Sanjay: MMIX 40.Gill, Stanley: MMIX-ARITH 26.Gillies, Donald Bruce: MMIX-ARITH 26.global registers: MMIX 29.GO : MMIX 19, MMIX-PIPE 47, 235, MMIX-

SIM 54, 107, MMIXAL 63.go : MMIX-CONFIG 16, 28, MMIX-PIPE 44, 46,

49, 51, 85, 100, 119, 120, 122, 123, 128,155, 160, 231, 236, 249, 286, 308, 312,320, 321, 322, 331, 364.

GOI : MMIX-PIPE 47, MMIX-SIM 54, 107.good : MMIX-SIM 61, 93, 133, 138.good guesses : MMIX-SIM 93, 139, 140.got DT : MMIX-PIPE 272.got greg : MMIXAL 108.got IT : MMIX-PIPE 291, 298.got one : MMIX-PIPE 291, 300, 301.gran : MMIX-CONFIG 13, 15, 23.graphics: MMIX 11.Gray, James Nicholas: MMIX 31.GREG : MMIXAL 62, 63, 102, 109, 132,

MMIXAL 63.greg : MMIXAL 108, 127, 142, 143, 144.greg val : MMIXAL 108, 127, 133, 144.h: MMIX-ARITH 3, MMIX-IO 3, MMIX-

PIPE 17, 151, 152, 210, 213, MMIX-SIM 10,MMIXAL 26, 68, MMOTYPE 7.

H_BIT : MMIX-PIPE 54, 146, 306, 308, 313,314, 317, 319, 320, 321, MMIX-SIM 57,108, 122, 123.

h down : MMIX-PIPE 152.h up : MMIX-PIPE 152.Halt : MMIX-SIM 7, MMIXAL 69.Halt : MMIX-PIPE 371, 372, MMIX-SIM 59, 108.halted : MMIX-PIPE 10, 12, 356, 373, MMIX-

SIM 61, 107, 109, 140, 141, 161.handle : MMIX-IO 8, 11, 12, 13, 14, 15, 16, 17,

18, 19, 20, 21, 22, MMIX-SIM 4, 134, 135, 137.handlers: MMIX 32, 35, 38.hardware PT : MMIX-CONFIG 15, 37.hash prime : MMIX-CONFIG 15, 37, MMIX-

PIPE 207, 209, 210, 213.head : MMIX-PIPE 69, 71, 73, 74, 75, 80, 81, 84,

85, 100, 110, 151, 152, 160, 228, 229, 301,308, 309, 316, 323, 335, 341, MMMIX 12, 22.

held bits : MMIXAL 43, 44, 47, 49, 52.Hennessy, John LeRoy: MMIX 1, 3, MMIX-

PIPE 58, 150, 163.Henzinger, Monika Hildegard Rauch: MMIX 40.hex : MMIX-SIM 134, 135, 137.Hexadecimal file line... : MMMIX 6.hexadecimal �les: MMMIX 5.hi : MMIX-SIM 15.hist : MMIX-PIPE 44, 46, 68, 75, 85, 100,

160, 308, 309.hit : MMIX-PIPE 193.hit and miss : MMIX-PIPE 267, 268, 271, 273.hit set : MMIX-PIPE 192, 193, 194, 196, 199,

201, 217.

hold buf : MMIXAL 43, 44, 47, 52.hold op : MMIXAL 85, 98.holding time : MMIX-CONFIG 15, MMIX-

PIPE 247, 256, 257.hot : MMIX-PIPE 60, 61, 63, 64, 67, 69, 86, 101,

146, 147, 149, 256, 314, 316, 317, 318, 319,320, 321, 357, MMMIX 18, 19.

i: MMIX-ARITH 8, 13, MMIX-CONFIG 38,MMIX-PIPE 10, 12, 44, 172, 176, 181, 185,201, 246, MMIX-SIM 62.

I can't allocate... : MMIX-PIPE 213.I can't deal with... : MMIXAL 50.I can't open... : MMIX-SIM 49.I'm reading this file... : MMOTYPE 23.I/O: MMIX 33, 37, 44, MMIX-IO 1, MMIX-

MEM 1, MMIX-SIM 4.I_BIT : MMIX-ARITH 31, 40, 41, 42, 44,

46, 86, 88, 91, 93, MMIX-PIPE 54, 348,MMIX-SIM 57, 90, MMIXAL 69.

I_Handler : MMIXAL 69.IBM Corporation: MMIX 31.Icache : MMIX-CONFIG 17, 21, 35, 36, 37,

MMIX-PIPE 39, 128, 168, 222, 227, 229,265, 280, 291, 292, 294, 296, 300, 359,364, 365, MMMIX 21.

IEEE/ANSI Standard 754: MMIX 21.Ihit and miss : MMIX-PIPE 291, 292, 296,

298, 299.ii : MMIX-PIPE 185, 216.IIADDU : MMIX-PIPE 47, MMIX-SIM 54, 85.IIADDUI : MMIX-PIPE 47, MMIX-SIM 54, 85.illegal character constant : MMIXAL 106.illegal fraction : MMIXAL 101.illegal hexadecimal constant : MMIXAL 95.illegal instructions: MMIX 28, 29, 33, 37,

38, 43, 45, 51.illegal inst : MMIX-PIPE 118, 347, MMIX-SIM 89,

97, 99, 100, 102, 104, 107, 124, 125.immed bit : MMIXAL 62, 121, 124.immediate operands: MMIX 5, 13.implied loc : MMIX-SIM 51, 52, 53.Improper hexadecimal... : MMMIX 6, 7, 8.improper local label... : MMIXAL 103.inbuf : MMIX-CONFIG 31, MMIX-PIPE 167, 200,

201, 219, 220, 222, 223, 226, 245, 379.incgamma : MMIX-PIPE 49, 113, 147, 323,

327, 338.INCH : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

78, 85, MMIXAL 63.INCL : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.incl �le : MMIX-SIM 150, 151.incl read : MMIX-SIM 150.INCMH : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.INCML : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.incomplete...constant : MMIXAL 106.incomplete str : MMIX-SIM 149, 155.

Page 539: MMIXware - A RISC Computer for the Third Millennium - Knuth

533 MASTER INDEX

Incorrect implementation... : MMIX-

PIPE 22, MMIX-SIM 14.incr : MMIX-ARITH 6, 33, 53, 55, 73, 87, 92,

MMIX-IO 4, 14, 16, 18, 19, 20, MMIX-PIPE 21,46, 64, 84, 85, 100, 113, 114, 119, 120, 236,240, 265, 279, 301, 314, 320, 322, 323, 325,333, 338, 339, 369, 370, 373, 380, 381, 382,383, 384, 385, 386, MMIX-SIM 13, 30, 33,34, 37, 51, 60, 63, 70, 82, 83, 101, 102, 103,104, 105, 106, 108, 109, 115, 116, 118, 119,127, 140, 152, 154, 155, 156, 162, 163, 165,MMIXAL 28, 47, 52, 94, 95, 107, 126, 131,MMMIX 12, 21, 25, MMOTYPE 8, 15, 18, 19.

increase L: MMIX-PIPE 110, 312.increment...too large : MMOTYPE 19.incrl : MMIX-PIPE 49, 112, 119, 327.inexact exception: MMIX 21, 32.Inf : MMIXAL 69.inf : MMIX-ARITH 36, 37, 38, 39, 40, 41, 42,

44, 46, 50, 85, 86, 88, 91, 93.inf octa : MMIX-ARITH 4, 39, 41, 44, 46.in�nity: MMIX 21.info : MMIX-SIM 51, 60, 65, 71, 79, 127,

130, 131.initialization of a user program: MMIX-

SIM 6, 164.inner lp : MMIXAL 82, 86, 98.inner rp : MMIXAL 82, 97, 98.Input is not... : MMOTYPE 23.input/output: MMIX 33, 37, 44, MMIX-IO 1,

MMIX-MEM 1, MMIX-SIM 4.inst : MMIX-PIPE 68, 73, 75, 84, 100, 110, 228,

229, 304, 323, 335, 341, MMIX-SIM 60, 61,63, 70, 108, 123, 130, MMMIX 12, 22.

inst ptr : MMIX-PIPE 71, 73, 81, 85, 119, 120,122, 123, 160, 284, 288, 290, 294, 301,302, 304, 308, 309, 310, 312, 314, 322,323, MMIX-SIM 37, 60, 61, 63, 70, 93, 101,107, 108, 123, 124, 131, 138, 140, 161, 164,MMMIX 12, 15, 22, 23.

INT_MAX : MMIX-CONFIG 15, 38.int op : MMIX-CONFIG 27, 28.int stages : MMIX-CONFIG 27, 28.interact : MMIX-SIM 149.interact after break : MMIX-SIM 61, 107,

141, 143.interacting : MMIX-SIM 61, 107, 120, 141, 143.interactive help : MMIX-SIM 144, 149.interactive read bit : MMIX-MEM 2, MMIX-

PIPE 8.interim : MMIX-PIPE 44, 46, 81, 100, 112, 113,

114, 227, 320, 330, 332, 337, 340, 342, 350,351, 361, 363, 364, 369.

internal op : MMIX-CONFIG 28, MMIX-PIPE 51,80, 320.

internal op name : MMIX-PIPE 46, 50.internal opcode: MMIX-PIPE 49.interrupt : MMIX-PIPE 44, 46, 59, 68, 73, 81,

100, 118, 122, 132, 140, 141, 144, 146, 149,

160, 256, 266, 269, 271, 272, 281, 282, 288,301, 302, 304, 306, 307, 308, 309, 310, 313,314, 317, 319, 320, 321, 322, 323, 327, 329,330, 331, 332, 336, 337, 343, 346, 348, 351,MMIX-SIM 141, 144, 148, MMMIX 22.

interrupts: MMIX 33, 34, 35, 36, 37, 38,MMIX-PIPE 306, MMIX-SIM 1, 2, 108.

INTERVAL_TIMEOUT : MMIX-PIPE 57, 314.invalid exception: MMIX 21, 32.IPTco : MMIX-PIPE 235, 236, 237.IPTctl : MMIX-PIPE 235, 236.IPTname : MMIX-PIPE 235, 236.IS : MMIXAL 62, 63, 109, 132, MMIXAL 63.is denormal : MMIX-PIPE 346, 348, 350, 351.is dirty : MMIX-PIPE 169, 170, 177, 205,

233, 234.is load store : MMIX-PIPE 307, 310, 316, 320.is trivial : MMIX-PIPE 346, 350.isalpha : MMIXAL 57.isdigit : MMIX-ARITH 68, 73, 74, 77, MMIX-

SIM 152, 155, MMIXAL 38, 57, 86, 94, 103,104, 109, 110, 111.

isletter : MMIXAL 57, 86, 103, 104.isspace : MMIX-CONFIG 10, 38, MMIX-SIM 150,

MMIXAL 38, 103, 104, 106.issue bit : MMIX-PIPE 8, 10, 81, 145, 146, 147,

149, 283, 310, 314, 319, 320, 321.issued between : MMIX-PIPE 158, 159, 160,

308, 309, 316.isxdigit : MMIX-SIM 154, 161, MMIXAL 95.IT hit : MMIX-PIPE 291, 292, 295, 296, 298, 299.IT miss : MMIX-PIPE 291, 295, 298, 299.ITcache : MMIX-CONFIG 17, 21, 35, MMIX-

PIPE 39, 128, 168, 236, 237, 288, 291,292, 293, 295, 298, 302, 325, 354, 360,MMMIX 12, 21, 23.

IVADDU : MMIX-PIPE 47, MMIX-SIM 54, 85.IVADDUI : MMIX-PIPE 47, MMIX-SIM 54, 85.j: MMIX-ARITH 8, 13, 56, MMIX-CONFIG 23,

30, 31, 38, MMIX-PIPE 10, 12, 56, 162,170, 172, 176, 179, 181, 183, 185, 189, 191,203, 205, MMIX-SIM 15, 50, 62, 162, 165,MMIXAL 44, 50, 52, 74, 136, MMMIX 17,24, MMOTYPE 1, 26.

jj : MMIX-PIPE 185, MMIXAL 52.JMP : MMIX 19, MMIX-PIPE 47, MMIX-SIM 54,

70, 107, MMIXAL 63.jmp : MMIX-PIPE 49, 51, 84, 85, 327.JMPB : MMIX-PIPE 47, MMIX-SIM 54, 70, 107.just traced : MMIX-SIM 128, 129.k: MMIX-ARITH 8, 13, 29, 56, 81, 91, MMIX-

CONFIG 31, MMIX-PIPE 76, MMIX-SIM 42,45, 47, 62, 82, 83, 143, 160, MMIXAL 42, 44,50, 52, 136, MMMIX 17, MMOTYPE 1, 26.

K_BIT : MMIX-PIPE 54, 118, 322.keep : MMIX-PIPE 202, 203.key : MMIX-PIPE 210, 213, MMIX-SIM 20, 21, 22.known : MMIX-PIPE 40, 43, 44, 46, 59, 85,

89, 93, 100, 102, 112, 119, 120, 131, 132,

Page 540: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 534

133, 135, 144, 237, 244, 255, 265, 290,312, 322, 331, 338, 364.

known phys : MMIX-PIPE 296, 298.Knuth, Donald Ervin: MMIX-ARITH 58.L: MMIX 29, MMIX-SIM 75.l: MMIX-ARITH 3, MMIX-CONFIG 30, MMIX-

IO 3, MMIX-PIPE 17, 86, 187, 189, 191,MMIX-SIM 10, 22, 42, 76, MMIXAL 26,52, 68, MMOTYPE 7.

lab �eld : MMIXAL 32, 33, 102, 103, 104,109, 110, 111.

label field...ignored : MMIXAL 102.label syntax error... : MMIXAL 103.last h : MMIX-PIPE 209, 210, 211, 213, 216,

219, 223, 297, MMMIX 8, 11.last mem : MMIX-SIM 18, 19, 20, 21.last o� : MMIX-PIPE 216.last sym node : MMIXAL 59, 60.last trie node : MMIXAL 55, 56.ld : MMIX-CONFIG 27, 28, MMIX-PIPE 49, 51,

117, 265, 307, 327, 357.ld ready : MMIX-PIPE 267, 268, 270, 271, 273,

274, 277, 278, 279.ld retry : MMIX-PIPE 272, 273, 274.ld st launch : MMIX-PIPE 265, 266, 354.LDA : MMIX 7, MMIXAL 13, 18, 63.LDB : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDBI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDBU : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDBUI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDHT : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDHTI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDO : MMIX 7, MMIX-PIPE 47, MMIX-SIM 54,

94, MMIXAL 63.LDOI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDOU : MMIX 7, MMIX-PIPE 47, 114, 332,

MMIX-SIM 54, 94, MMIXAL 63.LDOUI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDPTE : MMIX-PIPE 235, 236, 279.ldpte : MMIX-PIPE 49, 235, 236, 265.LDPTP : MMIX-PIPE 235, 236, 279.ldptp : MMIX-PIPE 49, 235, 236, 265.LDSF : MMIX 26, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDSFI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDT : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDTI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDTU : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDTUI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDUNC : MMIX 30, MMIX-PIPE 47, MMIX-SIM 54,

94, MMIXAL 63.ldunc : MMIX-PIPE 49, 51, 117, 265, 268,

271, 273, 357.LDUNCI : MMIX-PIPE 47, MMIX-SIM 54, 94.

LDVTS : MMIX 46, MMIX-PIPE 47, MMIX-SIM 54,107, MMIXAL 63.

ldvts : MMIX-PIPE 49, 51, 118, 265, 352.LDVTSI : MMIX-PIPE 47, MMIX-SIM 54, 107.LDW : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDWI : MMIX-PIPE 47, MMIX-SIM 54, 94.LDWU : MMIX 7, MMIX-PIPE 47, 279, MMIX-

SIM 54, 94, MMIXAL 63.LDWUI : MMIX-PIPE 47, MMIX-SIM 54, 94.left : MMIX-SIM 16, 21, 22, 50, 162, 165,

MMIXAL 54, 57, 72, 73, 74.left paren : MMIX-SIM 138, 139.Leung, Shun-Tak Albert: MMIX 40.lg : MMIX-CONFIG 30, 31.lhs : MMIX-SIM 80, 101, 107, 131, 133, 138, 139.lim : MMIX-PIPE 185.line directives: MMIXAL 3.line count : MMIX-SIM 38, 42, 45.line listed : MMIXAL 34, 36, 41, 45, 136.line no : MMIX-SIM 16, 30, 51, 63, MMIXAL 34,

36, 38, 45, 50.line shown : MMIX-SIM 45, 48, 51.link : MMIXAL 58, 59, 64, 66, 70, 74, 75, 78,

82, 87, 91, 94, 99, 100, 109, 110, 112, 118,125, 130, 132, 145.

Liptay, John S.: MMIX 30.list : MMIX-ARITH 2, MMIX-IO 2, MMIX-PIPE 6,

MMIX-SIM 11, MMIXAL 31, MMOTYPE 5.listed �le : MMOTYPE 15, 16, 17, 21.listing : MMOTYPE 2, 4, 13, 19, 21, 23, 24.listing bits : MMIXAL 43, 44, 47, 52, 136.listing clear : MMIXAL 44, 47, 52, 136.listing �le : MMIXAL 41, 42, 44, 45, 47, 52, 75,

78, 80, 109, 115, 132, 134, 136, 138, 139.listing loc : MMIXAL 42, 43, 44.listing name : MMIXAL 137, 138, 139.literate programming: MMIXAL 3.little-endian versus big-endian: MMIX 6, 12,

MMIX-IO 16, MMIX-PIPE 304, MMIXAL 47.ll : MMIX-SIM 30, 37, 62, 63, 82, 83, 94, 95,

96, 103, 105, 111, 114, 117, 118, 119, 130,157, 159, 161, 163, 164.

lo : MMIX-SIM 15.load cache : MMIX-PIPE 200, 201, 222, 224, 237.load sf : MMIX-ARITH 39, MMIX-PIPE 21, 279,

MMIX-SIM 13, 94.LOC : MMIXAL 63, MMIXAL 62, 63, 109, 132.loc : MMIX-IO 23, MMIX-PIPE 44, 46, 68, 73, 80,

81, 84, 85, 100, 118, 119, 122, 144, 149, 151,152, 160, 236, 266, 271, 296, 304, 320, 322,323, 331, 355, 364, 368, 372, MMIX-SIM 16,18, 20, 21, 22, 30, 51, 60, 61, 63, 70, 101,109, 130, 162, 163, 165, MMMIX 12, 22.

loc implied : MMIX-SIM 51.LOCAL : MMIXAL 63, MMIXAL 62, 63, 132.local registers: MMIX 29.localtime : MMOTYPE 23.

Page 541: MMIXware - A RISC Computer for the Third Millennium - Knuth

535 MASTER INDEX

lock : MMIX-PIPE 167, 174, 200, 217, 222, 224,225, 226, 233, 234, 237, 257, 261, 266, 267,271, 272, 273, 274, 276, 288, 291, 296, 300,326, 353, 354, 358, 359, 360, 365, 366, 367.

lockloc : MMIX-PIPE 23, 37, 125, 145, 234, 257,279, 287, 301, 360, 361, 364.

lockvar: MMIX-PIPE 37, 65, 167, 214, 230, 247.long warning given : MMIXAL 35, 36.loop : MMIX-SIM 29, 36, 42, MMOTYPE 13, 21.lop end : MMIX-SIM 23, MMIXAL 23, 24, 80,

MMOTYPE 6, 22, 30.lop �le : MMIX-SIM 23, 35, MMIXAL 23, 24,

50, MMOTYPE 6, 20.lop �xo : MMIX-SIM 23, 34, MMIXAL 23, 24,

113, MMOTYPE 6, 19.lop �xr : MMIX-SIM 23, 34, MMIXAL 23, 24,

114, MMOTYPE 6, 19.lop �xrx : MMIX-SIM 23, 34, MMIXAL 23, 24,

114, MMOTYPE 6, 19.lop line : MMIX-SIM 23, 35, MMIXAL 23, 24,

50, MMOTYPE 6, 20.lop loc : MMIX-SIM 23, 33, MMIXAL 23, 24,

49, MMOTYPE 6, 18.lop post : MMIX-SIM 23, 25, 29, MMIXAL 23,

24, 144, MMOTYPE 6, 22.lop pre : MMIX-SIM 23, 28, MMIXAL 23, 24,

141, MMOTYPE 6, 22, 23.lop quote : MMIX-SIM 23, 29, 33, 36,

MMIXAL 23, 24, 47, MMOTYPE 6, 13, 18, 21.lop quote command : MMIXAL 47.lop skip : MMIX-SIM 23, 33, MMIXAL 23, 24,

49, MMOTYPE 6, 18.lop spec : MMIX-SIM 23, 36, MMIXAL 23, 24,

132, MMOTYPE 6, 21.lop stab : MMIX-SIM 23, MMIXAL 23, 24, 80,

MMOTYPE 6, 22, 25.lopcodes: MMIXAL 22.lreg : MMIXAL 132, 142, 143.lring mask : MMIX-PIPE 88, 89, 104, 105, 106,

110, 112, 113, 114, 117, 119, 120, 337, 338,MMIX-SIM 72, 73, 74, 76, 77, 80, 81, 82,83, 101, 102, 104, 157, 159.

lring size : MMIX-CONFIG 15, 37, MMIX-PIPE 86, 88, 89, MMIX-SIM 72, 76, 77,143, MMMIX 18.

lru : MMIX-CONFIG 22, MMIX-PIPE 164, 186,187, 189, 191.

�: MMIX 50, MMIX-SIM 1.m: MMIX-ARITH 27, MMIX-PIPE 12, 187, 189,

191, 268, 270, 271, 278, 381, 384, MMIX-SIM 114, 117, MMIXAL 74, MMMIX 16,MMOTYPE 26.

ma : MMIX-PIPE 372, 380, MMIX-SIM 61, 108,111, 133, 136.

magic done : MMIX-PIPE 372.magic o�set : MMIX-ARITH 63.magic read : MMIX-PIPE 377, 378, 380, 381,

385.magic write : MMIX-PIPE 377, 379, 385, 386.

Main : MMIX-SIM 6, MMIXAL 21, 71.main : MMIX-SIM 141, MMIXAL 136, MMMIX 2,

MMOTYPE 1.make it in�nite : MMIX-ARITH 72, 79.make it zero : MMIX-ARITH 79.make map : MMIX-SIM 42, 49.many arg bit : MMIXAL 62, 116.map : MMIX-SIM 38, 42, 45, 49.marginal registers: MMIX 29.mask : MMIX-ARITH 13, 18, MMIX-PIPE 282.matrices of bits: MMIX 12.max : MMIX-PIPE 268, 292.max cycs : MMIX-CONFIG 15, 23, 24, 36.max mem slots : MMIX-CONFIG 15, MMIX-

PIPE 86, 89.max pipe op : MMIX-CONFIG 27, MMIX-PIPE 49,

133, 136.max real command : MMIX-CONFIG 27, 28,

MMIX-PIPE 49, 81.max rename regs : MMIX-CONFIG 15, MMIX-

PIPE 86, 89.max stage : MMIX-CONFIG 36, MMIX-PIPE 26,

129.max sys call : MMIX-PIPE 371, 372, MMIX-

SIM 59, 108.maxval : MMIX-CONFIG 12, 13, 20, 23.mb : MMIX-PIPE 372, 380, MMIX-SIM 61, 108,

111, 133, 136.McClellan, Hubert Rae, Jr.: MMIX 42.mem : MMIX-PIPE 113, 114, 115, 116, 117,

227, 236, 246, 249, 254, 255, 265, 333,334, 339, 355.

mem addr time : MMIX-CONFIG 15, 36, MMIX-PIPE 214, 216, 219, 225, 260, 261, 271,274, 277, 297, 300.

mem bit : MMIXAL 62, 124.mem bus bytes : MMIX-CONFIG 15, 36.mem chunks : MMIX-CONFIG 37, MMIX-

PIPE 207, 213.mem chunks max : MMIX-CONFIG 15, 37,

MMIX-PIPE 206, 207, 213.mem direct : MMIX-PIPE 257.mem �nd : MMIX-SIM 20, 30, 37, 63, 82, 83,

94, 95, 96, 103, 105, 111, 114, 117, 130,157, 159, 161, 163, 164.

mem hash : MMIX-CONFIG 37, MMIX-PIPE 207,209, 210, 213, 216, 219, 223, 297,MMMIX 8, 11.

mem lock : MMIX-PIPE 39, 214, 215, 219, 222,225, 260, 261, 271, 274, 277, 297, 300.

mem locker : MMIX-PIPE 127, 128, 219, 260,271, 277, 297.

mem node: MMIX-SIM 16, 17, 19, 20, 21,22, 50, 162, 165.

mem node struct: MMIX-SIM 16.mem read : MMIX-PIPE 208, 209, 210, 219, 222,

271, 277, 297, 378, MMMIX 12, 18.mem read time : MMIX-CONFIG 15, 36, MMIX-

PIPE 214, 219, 222, 223, 271, 277, 297.

Page 542: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 536

mem root : MMIX-SIM 18, 19, 21, 53, 161, 164.mem slots : MMIX-PIPE 63, 86, 89, 111, 145,

147, 256.mem tetra: MMIX-SIM 16, 20, 62, 82, 83,

114, 117.mem write : MMIX-PIPE 208, 212, 213, 216,

260, 379, MMMIX 8, 11, 12.mem write time : MMIX-CONFIG 15, 36, MMIX-

PIPE 214, 216, 260.mem x : MMIX-PIPE 44, 46, 100, 111, 113, 117,

123, 144, 145, 146, 147, 255, 327, 339, 355.memory-mapped input/output: MMIX 44,

MMIX-MEM 1.mems: MMIX 50, MMIX-SIM 1.mems : MMIX-SIM 64, 127.message : MMIXAL 45.Metze, Gernot: MMIX 40.mid : MMIXAL 54, 57, 61, 72, 73, 74, 75, 80.Miller, Je�rey Charles Percy: MMIX-ARITH 26.minus : MMIXAL 82, 97, 101.minus zero: MMIX 21, 22, 23.minval : MMIX-CONFIG 12, 13, 20, 23.missing left parenthesis : MMIXAL 98.missing right parenthesis : MMIXAL 98.mm : MMIX-SIM 23, 28, 29, 36, MMIXAL 22, 47,

48, MMOTYPE 6, 13, 21, 23, 25, 30.mmgetchars : MMIX-IO 4, 8, 18, 19, 20,

MMIX-PIPE 377, 381, MMIX-SIM 114.MMIX binary file... : MMMIX 12.mmix> : MMIX-SIM 3, 150.MMIX con�g : MMIX-CONFIG 8, 38, MMIX-

PIPE 1, 9, 23, 29, 49, 59, 136, 207, 259,MMMIX 2.

mmix fake stdin : MMIX-IO 10, MMIX-SIM 113,145.

mmix fclose : MMIX-IO 11, MMIX-PIPE 372,376, MMIX-SIM 108, 113.

mmix fgets : MMIX-IO 14, MMIX-PIPE 372, 376,MMIX-SIM 108, 113.

mmix fgetws : MMIX-IO 16, MMIX-PIPE 372,376, MMIX-SIM 108, 113.

mmix fopen : MMIX-IO 8, MMIX-PIPE 372, 376,MMIX-SIM 108, 113.

mmix fputs : MMIX-IO 19, MMIX-PIPE 372, 376,MMIX-SIM 108, 113.

mmix fputws : MMIX-IO 20, MMIX-PIPE 372,376, MMIX-SIM 108, 113.

mmix fread : MMIX-IO 12, MMIX-PIPE 372,376, MMIX-SIM 108, 113.

mmix fseek : MMIX-IO 21, MMIX-PIPE 372, 376,MMIX-SIM 108, 113.

mmix ftell : MMIX-IO 22, MMIX-PIPE 372, 376,MMIX-SIM 108, 113.

mmix fwrite : MMIX-IO 18, MMIX-PIPE 372,376, MMIX-SIM 108, 113.

MMIX init : MMIX-PIPE 1, 9, 10, MMMIX 2.mmix io init : MMIX-IO 7, MMIX-SIM 113,

141, MMMIX 2, 25.

mmix opcode: MMIX-PIPE 47, MMIX-SIM 54, 62, 91.

MMIX run : MMIX-PIPE 1, 9, 10, MMMIX 15.mmmix> : MMMIX 13.mmo buf : MMIXAL 47, 48, 50.mmo byte : MMIXAL 48, 74, 75, 80.mmo clear : MMIXAL 47, 49, 52.mmo cur �le : MMIXAL 50, 51, 141.mmo cur loc : MMIXAL 47, 49, 51, 53.mmo err : MMIX-SIM 26, 28, 29, 33, 34, 35.mmo �le : MMIX-SIM 24, 25, 26, 32,

MMOTYPE 3, 4, 9, 30.mmo �le name : MMIX-SIM 24, 142.mmo line no : MMIXAL 47, 50, 51.mmo load : MMIX-SIM 30, 34.mmo loc : MMIXAL 49, 53, 112, 132.mmo lop : MMIXAL 48, 49, 50, 80, 113, 114,

141, 144.mmo lopp : MMIXAL 48, 49, 50, 80, 114, 132.mmo out : MMIXAL 47, 48, 50.mmo ptr : MMIXAL 47, 48, 80.mmo sync : MMIXAL 50, 52, 132.mmo tetra : MMIXAL 48, 49, 113, 114, 141, 144.mmo write : MMIXAL 47.mmputchars : MMIX-IO 4, 12, 14, 16, MMIX-

PIPE 377, 384, MMIX-SIM 117, 163.mod : MMIXAL 82, 97, 101.mode : MMIX-CONFIG 16, 23, 31, MMIX-IO 5,

7, 8, 11, 12, 14, 16, 18, 19, 20, 21, 22,23, MMIX-PIPE 21, 167, 217, 257, 263,MMIX-SIM 4, 13.

mode code : MMIX-IO 8, 9.mode string : MMIX-IO 8, 9, MMIX-SIM 4.MOR : MMIX 12, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.mor : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 344.More...chunks are needed : MMIX-PIPE 213.MORI : MMIX-PIPE 47, MMIX-SIM 54, 87.MSE: MMIX-SIM 4.MUL : MMIX 20, 50, MMIX-PIPE 47, MMIX-

SIM 54, 88, MMIXAL 63.mul : MMIX-CONFIG 27, 28, MMIX-PIPE 49,

51, 343.MULI : MMIX-PIPE 47, MMIX-SIM 54, 88.multiprecision conversion: MMIX-ARITH 54, 68.multiprecision division: MMIX-ARITH 13.multiprecision multiplication: MMIX-ARITH 8.MULU : MMIX 20, MMIX-PIPE 47, MMIX-SIM 54,

88, MMIXAL 63.mulu : MMIX-PIPE 49, 51, 121, 343.MULUI : MMIX-PIPE 47, MMIX-SIM 54, 88.mul0 : MMIX-CONFIG 15, 27, MMIX-PIPE 49,

343.mul1 : MMIX-CONFIG 15, MMIX-PIPE 49, 343.mul2 : MMIX-CONFIG 15, MMIX-PIPE 49.mul3 : MMIX-CONFIG 15, MMIX-PIPE 49.mul4 : MMIX-CONFIG 15, MMIX-PIPE 49.mul5 : MMIX-CONFIG 15, MMIX-PIPE 49.

Page 543: MMIXware - A RISC Computer for the Third Millennium - Knuth

537 MASTER INDEX

mul6 : MMIX-CONFIG 15, MMIX-PIPE 49.mul7 : MMIX-CONFIG 15, MMIX-PIPE 49.mul8 : MMIX-CONFIG 15, 27, MMIX-PIPE 49,

343.MUX : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.mux : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 142.MUXI : MMIX-PIPE 47, MMIX-SIM 54, 87.MXOR : MMIX 12, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.MXORI : MMIX-PIPE 47, MMIX-SIM 54, 87.my div : MMIX-PIPE 7.my fsqrt : MMIX-PIPE 7.my random : MMIX-PIPE 7.myself : MMIX-SIM 142, 143, 144.n: MMIX-ARITH 13, MMIX-CONFIG 23, 30,

38, MMIX-IO 12, 14, 16, 18, 19, 20, 23,MMIX-SIM 148, MMMIX 16.

N_BIT : MMIX-PIPE 54, 271.name : MMIX-CONFIG 12, 13, 14, 16, 18, 20,

23, 24, 25, 26, 29, 31, 32, 33, 34, 35, 36,MMIX-IO 8, MMIX-PIPE 23, 25, 39, 76,128, 167, 174, 176, 231, 236, 249, 286,MMIX-SIM 4, 35, 38, 44, 49, 51, 64, 130,MMIXAL 62, 64, 68, 70, MMMIX 24.

name buf : MMIX-IO 8.NaN: MMIX 21.NaN : MMIX-ARITH 68, 70, 73, 84.nan : MMIX-ARITH 36, 37, 38, 39, 40, 42,

50, 85, 86, 88, 91.NAND : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.nand : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 138.NANDI : MMIX-PIPE 47, MMIX-SIM 54, 86.need b : MMIX-PIPE 44, 46, 100, 106, 108, 112,

113, 114, 131, 312, 345.need ra : MMIX-PIPE 44, 46, 100, 108, 112,

113, 131, 324.NEG : MMIX 9, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.neg one : MMIX-ARITH 4, 24, MMIX-IO 4, 8,

11, 12, 14, 15, 16, 17, 19, 20, 21, 22,MMIX-PIPE 20, 22, 143, 236, 282, 372,MMIX-SIM 13, 14, 53, 77, 90, MMIXAL 27,29, MMMIX 12, 23, 25.

negate : MMIXAL 82, 86, 100.negate q : MMIX-ARITH 24.negation, oating point: MMIX 13.negative locations: MMIX 35, 40, 44.NEGI : MMIX-PIPE 47, MMIX-SIM 54, 85,

MMMIX 12.NEGU : MMIX 9, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.NEGUI : MMIX-PIPE 47, MMIX-SIM 54, 85.new cache : MMIX-CONFIG 16, 17, 21.new chunk : MMMIX 5, 6, 7, 8, 9, 11.new cool : MMIX-PIPE 75, 78, 101.

new fetch : MMIX-PIPE 288, 298, 301, 302.new head : MMIX-PIPE 74, 75, 81, 85, 120.new L: MMIX-PIPE 120.new link : MMIXAL 109, 110, 115.new mem : MMIX-SIM 17, 18, 21.new mode : MMIX-SIM 152.new O : MMIX-PIPE 75, 99, 100, 119, 120,

333, 334, 338, 339.new Q : MMIX-PIPE 146, 148, 149, 310,

314, 329.new S : MMIX-PIPE 75, 99, 100, 113, 114,

333, 334, 339.new sym node : MMIXAL 59, 64, 66, 70, 71,

87, 111, 118, 125, 130.new tail : MMIX-PIPE 301, MMMIX 22.new trie node : MMIXAL 55, 57, 61.next : MMIX-PIPE 23, 26, 28, 32, 33, 35, 82,

125, 134, 145, 176, 183, 196, 202, 205,217, 218, 221, 225, 233, 234, 259, 261,263, 266, 272, 274, 276, 298, 300, 326,350, 361, 363, 364, 368.

next char : MMIX-ARITH 68, 69, 71, 72, 73, 77,MMIX-SIM 13, 152, 153, 154, 155, 161.

next sym node : MMIXAL 59, 60.next sync : MMIX-PIPE 364.next trie node : MMIXAL 55, 56.next val : MMIXAL 83, 99, 101.NNIX operating system: MMIX 2.no base address... : MMIXAL 127.No file was selected... : MMOTYPE 20.No name given... : MMOTYPE 20.no opcode... : MMIXAL 104.No room... : MMIX-SIM 35, 42, 77, MMIXAL 32,

84, MMOTYPE 20.no-op: MMIX 49.no const found : MMIX-ARITH 68.no hardware PT : MMIX-CONFIG 37, MMIX-

PIPE 242, 272, 298.no label bit : MMIXAL 62, 102.NONEXISTENT_MEMORY : MMIX-PIPE 57.Nonzero byte follows... : MMOTYPE 30.noop : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

80, 118, 122, 322, 323, 327, 332, 337.noop inst : MMIX-PIPE 118, 227.NOR : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.nor : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 138.NORI : MMIX-PIPE 47, MMIX-SIM 54, 86.normal numbers: MMIX 21.not a valid prefix : MMIXAL 132.note usage : MMIX-PIPE 188, 189, 190, 196.noted : MMIX-PIPE 68, 73, 75, 85, 304, 323,

MMMIX 22.null string... : MMIXAL 93.nullifying : MMIX-PIPE 75, 85, 146, 147,

310, 315, 316.num : MMIX-ARITH 36, 37, 38, 39, 40, 41, 42,

44, 46, 50, 85, 86, 88, 91, 93.

Page 544: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 538

NXOR : MMIX-PIPE 47, MMIX-SIM 54, 86,MMIXAL 63.

nxor : MMIX-CONFIG 28, MMIX-PIPE 49,51, 138.

NXORI : MMIX-PIPE 47, MMIX-SIM 54, 86.nybble: MMIX 6, 11.O: MMIX-SIM 75.o: MMIX-ARITH 29, 31, 34, 47, 50, 88, MMIX-

IO 12, 14, 16, 19, 20, 22, MMIX-PIPE 19,40, 157, 246, MMIX-SIM 12, 15, 91, 137,140, 154, 160, MMIXAL 42, 49, 114, 127,MMOTYPE 8.

O_BIT : MMIX-ARITH 31, 33, 35, MMIX-PIPE 54,MMIX-SIM 57, MMIXAL 69.

O_Handler : MMIXAL 69.oand : MMIX-ARITH 25, MMIX-PIPE 21, 241,

MMIX-SIM 13, MMIXAL 28, 107.oandn : MMIX-ARITH 25, MMIX-PIPE 21, 146,

240, 279, 325.obj �le : MMIXAL 47, 138, 139.obj �le name : MMIXAL 47, 137, 138, 139.obj time : MMIX-SIM 28, 31, 44.object �les: MMIXAL 22.OCTA : MMIXAL 62, 63, 117, 118, MMIXAL 63.octa: MMIX-ARITH 3, 4, 5, 6, 7, 8, 9, 12, 13,

24, 25, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46,47, 50, 54, 56, 69, 85, 86, 87, 88, 89, 91, 93,MMIX-IO 3, 4, 8, 11, 12, 14, 16, 18, 19, 20,21, 22, 23, MMIX-SIM 10, 12, 13, 15, 16, 20,31, 50, 52, 61, 76, 77, 91, 113, 114, 117, 137,140, 151, 154, 160, 162, 165, MMIXAL 26,27, 28, 42, 43, 49, 51, 58, 82, 83, 114, 126,127, 131, 133, MMOTYPE 7, 8, 16.

octabyte: MMIX 6.odd : MMIX-ARITH 93, 94, 95.ODIF : MMIX 11, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.odif : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 344.ODIFI : MMIX-PIPE 47, MMIX-SIM 54, 87.odiv : MMIX-ARITH 13, 24, 45, MMIX-PIPE 21,

343, MMIX-SIM 13, 88, MMIXAL 28, 101.o� : MMIX-PIPE 185, 210, 213, 216, 219,

223, 226.o�set : MMIX-IO 21, MMIX-SIM 4, 20, 154.old hot : MMIX-PIPE 60, 64, 276, 283, 310, 322,

328, 329, 342, 351, 353, 356, 364.old L: MMIX-SIM 60, 61, 98, 132.old tail : MMIX-PIPE 64, 69, 70, 74, 75, 85,

160, 308, 309.ominus : MMIX-ARITH 5, 12, 24, 47, 53, 73, 88,

89, 92, 94, 95, MMIX-IO 4, 12, 18, MMIX-PIPE 21, 139, 140, 344, MMIX-SIM 13, 85, 87,MMIXAL 28, 49, 100, 101, 114, 126, 127, 131.

omult : MMIX-ARITH 8, 12, 43, MMIX-PIPE 21,343, MMIX-SIM 13, 88, MMIXAL 28, 101.

one arg bit : MMIXAL 62, 116.oo : MMIX-ARITH 31, 34, 47, 49, 50, 53, 87.oops: MMIX 50, MMIX-SIM 1.oops : MMIX-SIM 64, 127, MMMIX 10.

Oops...too long : MMOTYPE 26.OP : MMIX-CONFIG 15, 17, 24.op : MMIX-PIPE 44, 46, 75, 80, 81, 82, 84, 85,

100, 102, 103, 108, 109, 112, 113, 114, 117,124, 139, 151, 152, 155, 156, 157, 236, 279,281, 282, 312, 320, 321, 327, 332, 339, 344,345, 346, 348, MMIX-SIM 60, 62, 65, 70,71, 78, 79, 85, 87, 89, 91, 92, 93, 94, 95,123, 126, 127, 130, 131.

OP codes: MMIX 5.OP codes, table: MMIX 51.op bits : MMIXAL 102, 104, 105, 107, 116, 121,

122, 123, 124, 129.op �eld : MMIXAL 32, 33, 102, 104, 116, 121,

122, 123, 124, 129.op info: MMIX-SIM 64, 65.op init size : MMIXAL 63, 64.op init table : MMIXAL 63, 64.op ptr : MMIXAL 83, 85, 86, 98, 101.op root : MMIXAL 56, 61, 64, 80, 104.OP size : MMIX-CONFIG 15, 17, 24.op spec: MMIX-CONFIG 14, 15, 17,

MMIXAL 62, 63, 64.op stack : MMIXAL 81, 82, 83, 84, 85, 86,

98, 101.opcode : MMIXAL 102, 104, 105, 109, 117, 118,

119, 121, 124, 126, 127, 128, 129, 131, 132.opcode syntax error... : MMIXAL 104.opcode...operand(s) : MMIXAL 116.opcode name : MMIX-PIPE 48, 73.open : MMIX-ARITH 65.operand of `BSPEC'... : MMIXAL 132.operand...register number : MMIXAL 129.operand list : MMIXAL 32, 33, 85, 86, 106.operands done : MMIXAL 85, 98.operating system: MMIX 2, 29, 30, 33, 35, 37,

38, 43, 44, 47, MMIX-PIPE 243.oplus : MMIX-ARITH 5, 47, 53, 73, MMIX-

IO 4, MMIX-PIPE 21, 139, 140, 241, 265,331, MMIX-SIM 13, 60, 84, 85, 101, 154,MMIXAL 28, 94, 99.

ops : MMIX-CONFIG 18, 25, 29, MMIX-PIPE 76,79, 82.

OR : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,86, MMIXAL 63.

or : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 138,MMIXAL 82, 97, 101.

ORH : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,86, MMIXAL 128, MMIXAL 63.

ORI : MMIX-PIPE 47, MMIX-SIM 54, 86, 126, 131.origin : MMIX-ARITH 63, 64, 65.ORL : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 128, MMIXAL 63.ORMH : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.ORML : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.ORN : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.

Page 545: MMIXware - A RISC Computer for the Third Millennium - Knuth

539 MASTER INDEX

orn : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 138.ORNI : MMIX-PIPE 47, MMIX-SIM 54, 86.out stab : MMIXAL 74, 75, 80.outbuf : MMIX-CONFIG 31, MMIX-PIPE 167,

176, 202, 203, 215, 216, 217, 218, 219,221, 259, 379.

outer lp : MMIXAL 82, 85, 98.outer rp : MMIXAL 82, 97, 98.over : MMIXAL 82, 97, 101.over ow: MMIX 20, 21, 22, 32.overflow : MMIX 8, 9.over ow : MMIX-ARITH 4, 9, 12, 24, MMIX-

PIPE 20, 21, 343, MMIX-SIM 13, 88,MMIXAL 27.

owner : MMIX-PIPE 44, 46, 63, 67, 73, 81, 124,134, 144, 145, 244, 314, 357.

oxor : MMIX-ARITH 25.o0 : MMIX-ARITH 34.p: MMIX-ARITH 60, 61, 62, 66, 70, 82, 89,

MMIX-CONFIG 10, 36, MMIX-IO 13, 14, 16,MMIX-PIPE 26, 28, 33, 35, 40, 63, 73, 120,170, 172, 179, 185, 187, 189, 191, 193, 196,199, 201, 203, 205, 251, 255, 256, 258, 378,379, 381, 384, 387, MMIX-SIM 17, 20, 42, 50,62, 114, 117, 120, 154, 162, 165, MMIXAL 40,50, 57, 59, MMMIX 17, MMOTYPE 1.

P_BIT : MMIX-PIPE 54, 81, 149, 160, 322, 331.pack bytes : MMIX-PIPE 320, 335, 341.packit : MMIX-ARITH 71, 78, 79.page coloring: MMIX-PIPE 268, 292.page fault: MMIX 37.page table entry: MMIX 45.page table pointer: MMIX 45.page b : MMIX-PIPE 238, 239, 243, 244,

MMMIX 12, 25.page bad : MMIX-PIPE 238, 239, 266, 288,

MMMIX 12, 23, 25.page mask : MMIX-PIPE 238, 239, 240, 241,

279, 325, MMMIX 12, 23, 25.page n : MMIX-PIPE 238, 239, 240, 279.page r : MMIX-PIPE 238, 239, 244, MMMIX 12,

25.page s : MMIX-PIPE 238, 239, 243, 268, 292,

MMMIX 12, 25.panic : MMIX-CONFIG 8, 10, 16, 18, 19, 20,

23, 24, 25, 29, 31, 32, 33, 34, 35, 36, 37,38, MMIX-PIPE 13, 22, 28, 135, 185, 187,213, MMIX-SIM 14, 17, 42, 77, MMIXAL 29,32, 38, 45, 50, 55, 59, 84.

PARITY_ERROR : MMIX-PIPE 57.pass after : MMIX-PIPE 125, 134, 266, 268,

270, 271, 288, 350, 353.pass data : MMIX-PIPE 134, 135.passit : MMIX-PIPE 134, 266, 268, 270, 271,

288, 350, 353, MMIX-SIM 161.Patterson, David Andrew: MMIX 1, MMIX-

PIPE 58, 150, 163.PBEV : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.

PBEVB : MMIX-PIPE 47, MMIX-SIM 54, 93.PBN : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBNB : MMIX-PIPE 47, MMIX-SIM 54, 93.PBNN : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBNNB : MMIX-PIPE 47, MMIX-SIM 54, 93.PBNP : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBNPB : MMIX-PIPE 47, MMIX-SIM 54, 93.PBNZ : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBNZB : MMIX-PIPE 47, MMIX-SIM 54, 93.PBOD : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBODB : MMIX-PIPE 47, MMIX-SIM 54, 93.PBP : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBPB : MMIX-PIPE 47, MMIX-SIM 54, 93.pbr : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

81, 85, 106, 152, 155.PBZ : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,

93, MMIXAL 63.PBZB : MMIX-PIPE 47, MMIX-SIM 54, 93.pcs : MMIX-CONFIG 21, 23.peek hist : MMIX-PIPE 68, 74, 75, 85, 99,

100, 151, 152.peekahead : MMIX-CONFIG 15, MMIX-PIPE 59,

74.performance monitoring: MMIX 40.permission bits: MMIX 37, 46.phys addr : MMIX-PIPE 240, 241, 268, 270,

272, 292, 295, 298.physical addresses: MMIX 44, 45, 47.pipe bit : MMIX-PIPE 8, 10.pipe limit : MMIX-CONFIG 24, MMIX-PIPE 136.pipe seq : MMIX-CONFIG 17, 24, 27, MMIX-

PIPE 133, 134, 136, 141.pixels: MMIX 11.plus : MMIXAL 82, 97, 99.policy : MMIX-PIPE 186, 187, 189, 191.Pool_Segment : MMIX-SIM 3, 6, 37, 163,

MMIXAL 69.POP : MMIX 29, MMIX-PIPE 47, MMIX-SIM 54,

101, MMIXAL 63.pop : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

85, 120, 331.pop unsave : MMIX-PIPE 120, 332.population counting: MMIX 12.ports : MMIX-CONFIG 16, 23, 34, MMIX-

PIPE 128, 167, 183.postamble : MMIX-SIM 25, 29, 32, MMOTYPE 1,

22.power-saver mode: MMIX 31.POWER_FAILURE : MMIX-PIPE 57.power of two : MMIX-CONFIG 12, 13, 20, 23.pp : MMIX-ARITH 61, 80, 81, MMIX-PIPE 184,

185, MMIXAL 59, 64, 65, 66, 70, 74, 78, 87,104, 109, 110, 111, 112, 118, 125, 130.

Page 546: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 540

ppol : MMIX-CONFIG 22, 23.PR_BIT : MMIX-PIPE 54, 266, 269.prec: MMIXAL 82, 83.precedence : MMIXAL 83, 85.predef size : MMIXAL 69, 70.predef spec: MMIXAL 68, 69, 70.PREDEFINED : MMIXAL 58, 64, 66, 70, 87, 109.prede�ned symbols: MMIXAL 10, 67, 69.predefs : MMIXAL 69, 70.predicted : MMIX-PIPE 85, 151.PREFIX : MMIXAL 62, 63, 129, 132, MMIXAL 63.PREGO : MMIX 30, MMIX-PIPE 47, 235, MMIX-

SIM 54, 106, MMIXAL 63.prego : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 81,

227, 265, 288, 289, 294, 296, 298, 300, 301.PREGOI : MMIX-PIPE 47, MMIX-SIM 54, 106.PRELD : MMIX 30, MMIX-PIPE 47, MMIX-SIM 54,

106, MMIXAL 63.preld : MMIX-PIPE 49, 51, 81, 227, 265, 266,

269, 271, 272, 273, 274.PRELDI : MMIX-PIPE 47, MMIX-SIM 54, 106.Premature end of file... : MMMIX 10.PREST : MMIX 30, 34, MMIX-PIPE 47, 275,

MMIX-SIM 54, 106, MMIXAL 63.prest : MMIX-PIPE 49, 51, 81, 227, 265, 269,

271, 272, 273, 274, 275.prest span : MMIX-PIPE 275, 276.prest win : MMIX-PIPE 267, 276.PRESTI : MMIX-PIPE 47, MMIX-SIM 54, 106.print bits : MMIX-PIPE 46, 55, 56, 73.print cache : MMIX-PIPE 175, 176, MMMIX 21.print cache block : MMIX-PIPE 171, 172, 177.print cache locks : MMIX-PIPE 39, 173, 174.print control block : MMIX-PIPE 45, 46, 63,

81, 125, 145, 146, 147.print coroutine id : MMIX-PIPE 24, 25, 28, 33,

63, 73, 81, 125, 145.print fetch bu�er : MMIX-PIPE 72, 73, 253.print oat : MMIX-ARITH 54, 59, MMIX-SIM 13,

137, 159.print freqs : MMIX-SIM 50, 53.print hex : MMIX-SIM 12, 137, 138, 159.print int : MMIX-SIM 15, 137, 159.print line : MMIX-SIM 45, 47.print locks : MMIX-PIPE 10, 38, 39, MMMIX 21.print octa : MMIX-PIPE 18, 19, 43, 46, 73,

91, 149, 152, 160, 176, 251, 283, 310,314, 319, 320, 321.

print pipe : MMIX-PIPE 10, 252, 253, MMMIX 21.print reorder bu�er : MMIX-PIPE 62, 63, 253.print spec : MMIX-PIPE 42, 43, 46.print specnode : MMIX-PIPE 43, 46.print specnode id : MMIX-PIPE 43, 73, 90, 91.print stab : MMOTYPE 25, 26.print stats : MMIX-PIPE 161, 162, MMMIX 2, 21.print string : MMIX-SIM 159, 160.print trip warning : MMIX-IO 23, MMIX-

PIPE 373, 376, MMIX-SIM 109, 113.print write bu�er : MMIX-PIPE 250, 251, 253.

printf : MMIX-ARITH 54, 55, 57, 67, MMIX-MEM 2, 3, MMIX-PIPE 10, 19, 25, 28, 33,39, 43, 46, 56, 63, 73, 81, 91, 125, 145,146, 147, 149, 152, 160, 162, 172, 174, 176,177, 251, 283, 310, 314, 319, 320, 321, 387,MMIX-SIM 12, 15, 45, 47, 49, 51, 53, 82,83, 103, 105, 120, 128, 130, 131, 132, 133,137, 138, 140, 143, 149, 150, 159, 160, 162,MMMIX 2, 13, 14, 15, 18, 19, 22, 23, 24,MMOTYPE 9, 15, 19, 21, 23, 24, 25, 28, 30.

priority : MMIX-SIM 17, 19, 21.privileged instructions: MMIX 37.privileged operations: MMIX 31, 33, 37, 43, 44.privileged inst : MMIX-PIPE 118, 355, MMIX-

SIM 60, 94, 95, 97, 107, 108, 109.pro�le gap : MMIX-SIM 48, 53, 143.pro�le showing source : MMIX-SIM 48, 53, 143.pro�le started : MMIX-SIM 51, 52.pro�ling : MMIX-SIM 141, 143, 144.prog �le : MMMIX 4, 5, 6, 9, 10.prog �le name : MMMIX 2, 3, 4, 6, 9, 10, 11.program counter: MMIX-PIPE 284.PROT_OFFSET : MMIX-PIPE 54, 269, 272,

293, 298.protection bits: MMIX 37, 46.protection fault: MMIX 45.prototypes for functions: MMIX-ARITH 2,

MMIX-PIPE 6.prts : MMIX-CONFIG 13, 15, 23.prune : MMIXAL 73, 80.PRW_BITS : MMIX-PIPE 266, 269, 272.pseudo lru : MMIX-CONFIG 22, MMIX-PIPE 164,

186, 187, 189, 191.pseudo op: MMIXAL 62.pst : MMIX-PIPE 49, 51, 117, 254, 265, 266,

280, 321, 357.PTE: MMIX 45, 47.PTP: MMIX 45, 47.ptr a : MMIX-CONFIG 16, MMIX-PIPE 44, 114,

117, 215, 217, 222, 224, 227, 236, 237, 249,254, 255, 325, 326, 333, 334.

ptr b : MMIX-PIPE 44, 217, 218, 222, 224, 225,232, 233, 234, 237, 257, 261, 262, 272,274, 298, 300, 326.

ptr c : MMIX-PIPE 44, 224, 225, 236, 237.pure : MMIXAL 82, 87, 94, 99, 100, 101,

110, 124, 129.push : MMIX-SIM 101.push pop bit : MMIX-SIM 65, 132.PUSHGO : MMIX 29, MMIX-PIPE 47, MMIX-

SIM 54, 101, MMIXAL 63.pushgo : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 85, 119, 331.PUSHGOI : MMIX-PIPE 47, MMIX-SIM 54, 101.PUSHJ : MMIX 29, MMIX-PIPE 47, MMIX-SIM 54,

101, MMIXAL 63.pushj : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

85, 119, 327.PUSHJB : MMIX-PIPE 47, MMIX-SIM 54, 101.

Page 547: MMIXware - A RISC Computer for the Third Millennium - Knuth

541 MASTER INDEX

PUT : MMIX 43, MMIX-PIPE 47, MMIX-SIM 54,97, 158, MMIXAL 63.

put : MMIX-CONFIG 28, MMIX-PIPE 49, 51,118, 146, 149, 329.

PUTI : MMIX-PIPE 47, MMIX-SIM 54, 97,MMMIX 12.

PV : MMIX-CONFIG 15, 17, 20.PV size : MMIX-CONFIG 15, 17, 20.pv spec: MMIX-CONFIG 12, 15, 17.PW_BIT : MMIX-PIPE 54, 266, 269.PX_BIT : MMIX-PIPE 54, 269, 293, 298, 301.q: MMIX-ARITH 13, 24, 60, 61, 62, 70, 82,

MMIX-CONFIG 10, MMIX-PIPE 35, 196, 205,255, 256, 258, 378, 379, MMIX-SIM 21,MMIXAL 40.

qhat : MMIX-ARITH 13, 20, 21, 22, 23.qq : MMIX-ARITH 61, 62, MMIXAL 65, 112,

113, 114, 118, 125, 130.quantify mul : MMIX-PIPE 343.queuelist : MMIX-PIPE 34, 35, 125.quiet NaN: MMIX 21.r: MMIX-ARITH 31, 34, 62, 86, 88, 89, 91,

MMIX-PIPE 35, 93, 95, 189, 191, MMIX-SIM 15, 22.

rA: MMIX 21, 22, 32, 38.rA: MMIX-PIPE 52, 107, 108, 146, 324, 329,

334, 342, MMIX-SIM 55, 72, 97, 103, 105,122, 131, 151, 158.

ra : MMIX-PIPE 44, 46, 59, 100, 108, 131, 144,307, 308, 324, 346.

radix conversion: MMIX-ARITH 54, 68,MMMIX 17.

random : MMIX-CONFIG 16, 22, MMIX-PIPE 7,164, 167, 186, 187.

rank : MMIX-PIPE 167, 172, 186, 187, 188,189, 191.

rB: MMIX 35.rB : MMIX-PIPE 52, 86, 310, 312, 319, MMIX-

SIM 55, 72, 102, 104, 123, 151.rBB: MMIX 36, 38.rBB : MMIX-PIPE 52, 312, 319, 322, 372, 380,

MMIX-SIM 55, 108, 151.rC: MMIX 40, MMIX-SIM 1.rC : MMIX-PIPE 52, 87, MMIX-SIM 55, 93,

127, 140, 151.rD: MMIX 20.rD : MMIX-PIPE 52, 107, MMIX-SIM 55, 66, 151.rE: MMIX 25.rE : MMIX-PIPE 52, 107, 108, MMIX-SIM 55,

66, 151.read bit : MMIX-SIM 58, 83, 161, 162.read byte : MMIX-SIM 27, MMOTYPE 10, 26,

27, 28, 30.read hex : MMIX-MEM 1, 2, MMMIX 15,

17, 18, 22.read tet : MMIX-SIM 26, 27, 28, 29, 33, 34,

35, 36, 37, MMOTYPE 9, 10, 13, 18, 19,20, 21, 23, 24, 25, 30.

reader : MMIX-CONFIG 34, MMIX-PIPE 128, 167,183, 233, 257, 266, 267, 271, 272, 273, 288,291, 296, 353, 354, 358, 359, 360, 365, 366.

ready : MMIX-SIM 150.REBOOT_SIGNAL : MMIX-PIPE 57.recycle �xup : MMIXAL 59, 112.redefinition... : MMIXAL 109.reg val : MMIXAL 82, 87, 98, 99, 100, 101, 109,

110, 118, 121, 122, 123, 124, 129.REGISTER : MMIXAL 58, 74, 78, 87, 109, 110.register number... : MMIXAL 98, 118.register stack: MMIX 29, 42, 43.register truth : MMIX-PIPE 155, 156, 157, 345,

MMIX-SIM 91, 92, 93.registerize : MMIXAL 82, 86, 100.rel addr bit : MMIX-PIPE 75, 83, 106, MMIX-

SIM 60, 65, MMIXAL 62, 124, 129.relative address... : MMIXAL 114, 126, 131.release lock : MMIX-PIPE 37, 222, 226, 233,

234, 272, 298, 356.ren a : MMIX-PIPE 44, 46, 100, 111, 117,

119, 121, 123, 144, 145, 146, 147, 312,322, 334, 340.

ren x : MMIX-PIPE 44, 46, 100, 110, 111, 112,114, 118, 119, 120, 123, 144, 145, 146, 147,236, 312, 322, 333, 334, 338.

rename registers: MMIX-PIPE 44, 86.rename regs : MMIX-PIPE 63, 86, 89, 111,

145, 146, 147.reorder bot : MMIX-CONFIG 37, MMIX-PIPE 60,

63, 67, 75, 145, 159, 318, 357, MMMIX 18.reorder buf size : MMIX-CONFIG 15, 37.reorder top : MMIX-CONFIG 37, MMIX-PIPE 60,

61, 63, 67, 75, 145, 159, 318, 357, MMMIX 18.repeating : MMIX-SIM 149, 152, 153, 156.repl : MMIX-CONFIG 16, 23, MMIX-PIPE 167,

196, 199, 205.replace policy: MMIX-PIPE 164, 167, 186,

187, 188, 189, 190, 191.report error : MMIXAL 45.res : MMIX-PIPE 93.resum : MMIX-PIPE 49, 67, 314, 323, 325.RESUME : MMIX 38, 49, 50, MMIX-PIPE 47, 304,

305, 323, MMIX-SIM 54, 60, 124, 125, 130,MMIXAL 63, MMMIX 12.

resume : MMIX-CONFIG 28, MMIX-PIPE 49, 51,85, 149, 322, 323, 325.

RESUME_AGAIN : MMIX-PIPE 320, 323, MMIX-SIM 71, 125, 130, 164.

resume again : MMIX-PIPE 323.RESUME_CONT : MMIX-PIPE 320, 323, 364,

MMIX-SIM 125, 126, 130.RESUME_SET : MMIX-PIPE 307, 320, 323, 324,

MMIX-SIM 122, 125, 126, 130.resume simulation : MMIX-SIM 149.RESUME_TRANS : MMIX-PIPE 242, 320, 323, 325.resume trans : MMIX-PIPE 325, 326.resuming : MMIX-PIPE 73, 78, 81, 103, 160,

308, 309, 316, 323, 324, MMIX-SIM 60, 61,

Page 548: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 542

71, 125, 127, 130, 141, 164.Reuter, Andreas Horst: MMIX 31.reversed : MMIX-PIPE 152.rewind : MMIX-CONFIG 19, 38.rF: MMIX 48.rF : MMIX-PIPE 52, MMIX-SIM 55, 151.rf : MMIX-ARITH 91, 92.rG: MMIX 29, 39.rG : MMIX-PIPE 52, 89, 102, 329, 330, 334, 342,

MMIX-SIM 55, 97, 104, 105, 151, 158.rH: MMIX 20.rH : MMIX-PIPE 52, 121, MMIX-SIM 55, 88, 151.rhat : MMIX-ARITH 13, 21.rhs : MMIX-SIM 96, 97, 98, 108, 133, 139.rI: MMIX 40, MMIX-SIM 1.rI : MMIX-PIPE 52, 314, MMIX-SIM 55, 127,

151, 158, MMMIX 21.right : MMIX-SIM 16, 21, 22, 50, 162, 165,

MMIXAL 54, 57, 72, 73, 74.right paren : MMIX-SIM 138, 139.ring : MMIX-CONFIG 36, MMIX-PIPE 26, 28,

29, 34, 35.ring of local registers: MMIX 42, 43.ring size : MMIX-CONFIG 36, MMIX-PIPE 26,

27, 28, 29, 125.rJ: MMIX 19, 29.rJ : MMIX-PIPE 52, 85, 107, 119, MMIX-SIM 55,

69, 101, 151.rK: MMIX 36, 37, 38.rK : MMIX-PIPE 52, 149, 314, 317, 322, 328,

MMIX-SIM 55, 77, 151, MMMIX 12, 23.rL: MMIX 29, 39, 43.rL: MMIX-PIPE 52, 102, 112, 119, 120, 329,

330, 334, 338, MMIX-SIM 37, 55, 81, 97,101, 102, 104, 151, 158.

rl : MMIX-PIPE 44, 46, 100, 112, 119, 120, 123,145, 146, 147, 334, 338.

rM: MMIX 10.rM : MMIX-PIPE 52, 107, MMIX-SIM 55, 69, 151.rN: MMIX 41.rN : MMIX-PIPE 52, 89, MMIX-SIM 55, 77, 151.rO: MMIX 42, 43.rO : MMIX-PIPE 52, 98, 118, MMIX-SIM 55, 101,

102, 104, 151, MMMIX 19.Robertson, James Evans: MMIX 40.rop : MMIX-SIM 61, 71, 125, 126, 130, 164.ropcodes: MMIX 38, 47, 49.ROUND_CURRENT : MMIXAL 14, 69.ROUND_DOWN : MMIX 28, MMIX-ARITH 30, 33,

35, 46, 87, MMIX-PIPE 346, MMIX-SIM 100,133, MMIXAL 14, 69.

round mode : MMIX-SIM 61, 89, 133, 138.ROUND_NEAR : MMIX 28, MMIX-ARITH 30, 33,

35, 84, 87, MMIX-PIPE 346, MMIX-SIM 77,100, 133, 158, MMIXAL 14, 69.

ROUND_OFF : MMIX 28, MMIX-ARITH 30, 33,35, 39, 46, 87, 94, MMIX-PIPE 346, MMIX-SIM 100, 133, MMIXAL 14, 69.

ROUND_UP : MMIX 28, MMIX-ARITH 30, 33,35, 87, MMIX-PIPE 346, MMIX-SIM 100,133, MMIXAL 14, 69.

rounding modes: MMIX 21, 32.rP: MMIX 31.rP : MMIX-PIPE 52, 283, 335, 341, MMIX-

SIM 55, 96, 102, 104, 151.rQ: MMIX 37, 40, 43.rQ : MMIX-PIPE 52, 146, 149, 310, 314, 329,

MMIX-SIM 55, 151, MMMIX 12.rR: MMIX 20.rR: MMIX-PIPE 52, 121, 335, 341, MMIX-

SIM 55, 88, 102, 104, 151.rr : MMIX-CONFIG 22.rS: MMIX 42, 43.rS : MMIX-PIPE 52, 98, 118, MMIX-SIM 55, 82,

83, 101, 102, 103, 104, 105, 151, MMMIX 19.rT: MMIX 36.rT : MMIX-PIPE 52, 122, 310, 312, 319, 372,

MMIX-SIM 55, 77, 151, MMMIX 12.rt op : MMIXAL 83, 85, 97, 98.rTT: MMIX 37.rTT : MMIX-PIPE 52, 314, 319, MMIX-SIM 55,

77, 151, MMMIX 12.rU: MMIX 40, MMIX-SIM 1.rU : MMIX-PIPE 52, 100, 146, MMIX-SIM 55,

127, 140, 151.running times, approximate: MMIX 50.rV: MMIX 44, 45, 47.rV : MMIX-PIPE 52, 329, MMIX-SIM 55, 77,

151, MMMIX 12.rv : MMIX-PIPE 239.rW: MMIX 34, 38.rW : MMIX-PIPE 52, 320, 322, 373, MMIX-

SIM 55, 109, 123, 124, 151.rWW: MMIX 36, 38.rWW : MMIX-PIPE 52, 320, 322, 373, MMIX-

SIM 55, 108, 151.rX: MMIX 34, 37, 38.rX : MMIX-PIPE 52, 320, 322, MMIX-SIM 55,

60, 123, 124, 126, 151, 164.rXX: MMIX 36, 38.rXX : MMIX-PIPE 52, 320, 322, 372, MMIX-

SIM 55, 108, 151.rY: MMIX 34, 38.rY : MMIX-PIPE 52, 321, 324, MMIX-SIM 55,

123, 126, 151.rYY: MMIX 36, 38.rYY : MMIX-PIPE 52, 321, 323, 324, MMIX-

SIM 55, 108, 151.rZ: MMIX 34, 38.rZ : MMIX-PIPE 52, 321, 324, 335, 339, MMIX-

SIM 55, 102, 103, 104, 105, 123, 126, 151.rZZ: MMIX 36, 38.rZZ : MMIX-PIPE 52, 321, 323, 324, MMIX-

SIM 55, 108, 151.S: MMIX-SIM 76.s: MMIX-ARITH 7, 31, 34, 37, 38, 39, 40, 50,

66, 68, 89, MMIX-IO 14, 16, MMIX-PIPE 21,

Page 549: MMIXware - A RISC Computer for the Third Millennium - Knuth

543 MASTER INDEX

28, 43, 133, 134, 187, 189, 191, 193,196, 205, 385, MMIX-SIM 13, 118, 154,MMIXAL 28, 41, 57.

S_BIT : MMIX-PIPE 54, 149.S non miss : MMIX-PIPE 224.SADD : MMIX 12, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.sadd : MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 344.SADDI : MMIX-PIPE 47, MMIX-SIM 54, 87.Satterthwaite, Edwin Hallowell, Jr.: MMIX-

SIM 131.saturating arithmetic: MMIX 11.sav : MMIX-PIPE 49, 327, 337.SAVE : MMIX 43, 50, MMIX-PIPE 47, 81, 281,

305, 341, MMIX-SIM 54, 102, MMIXAL 63.save : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

327, 337, 340.Scache : MMIX-CONFIG 17, 21, 35, 36, MMIX-

PIPE 39, 168, 215, 217, 218, 219, 220, 221,222, 224, 225, 226, 234, 261, 274, 300, 360,364, 367, 378, 379, MMMIX 21.

scan close : MMIXAL 85, 98.scan const : MMIX-ARITH 68, 69, MMIX-

SIM 13, 153.scan eql : MMIX-SIM 153.scan hex : MMIX-SIM 152, 153, 154, 155, 161.scan open : MMIXAL 86.scan option : MMIX-SIM 142, 143, 149.scan string : MMIX-SIM 153, 155.scan type : MMIX-SIM 152, 153.schedule : MMIX-PIPE 27, 28, 31, 125, 326, 368.schedule bit : MMIX-PIPE 8, 10, 28, 33.Sclean : MMIX-PIPE 234.Sclean inc : MMIX-PIPE 234.Sclean loop : MMIX-PIPE 234.security violation: MMIX 37.security disabled : MMIX-CONFIG 15, MMIX-

PIPE 66, 67.Sedgewick, Robert: MMIXAL 54.SEEK_END : MMIX-IO 2, 21, MMIX-SIM 4.SEEK_SET : MMIX-IO 2, 21, MMIX-SIM 4, 45, 46.segments: MMIX 44, 45, 47, MMMIX 9.Seidel, Raimund: MMIX-SIM 16.self : MMIX-PIPE 124, 125, 134, 215, 217, 222,

224, 225, 226, 233, 234, 237, 257, 259, 260,261, 262, 264, 266, 272, 274, 279, 298, 300,301, 310, 350, 356, 358, 359, 360, 361, 362,364, 365, 366, 367, 368.

sentinel : MMIX-PIPE 35, 36, 125.serial : MMIX-CONFIG 22, MMIX-PIPE 164, 186,

187, 189, 191, MMIXAL 58, 59, 73, 74, 75,78, 100, 109, 112, 114, 118, 125, 130.

serial number: MMIXAL 11, 21.serial number : MMIXAL 59, 60, 109.serialize : MMIXAL 59, 82, 86, 100.SET : MMIX 10, MMIXAL 62, 63, 124,

MMIXAL 13, 63.

set : MMIX-CONFIG 28, 32, MMIX-PIPE 49, 51,109, 137, 167, 177, 181, 192, 233, 234,343, MMMIX 12, 23.

set l : MMIX-PIPE 44, 46, 100, 112, 119, 120,123, 145, 146, 147, 334, 338.

set lock : MMIX-PIPE 37, 81, 215, 217, 219,222, 224, 225, 226, 233, 234, 237, 259,260, 261, 262, 264, 271, 272, 274, 276,277, 297, 298, 300, 310, 358, 359, 360, 361,362, 365, 366, 367, 368.

set round : MMIX-PIPE 346.set type : MMIX-SIM 153.SETH : MMIX 13, MMIX-PIPE 47, 112, 323,

MMIX-SIM 54, 71, 85, MMIXAL 63,MMIXAL 128.

SETL : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,85, MMIXAL 63.

SETMH : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,85, MMIXAL 63.

SETML : MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,85, MMIXAL 63.

setsz : MMIX-CONFIG 13, 15, 23.seven octa : MMMIX 23, 25.s�le : MMIX-IO 6, 7, 8, 10, 11, 12, 13, 14, 15,

16, 17, 18, 19, 20, 21, 22, 23.SFLOT : MMIX 27, 28, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63.SFLOTI : MMIX-PIPE 47, MMIX-SIM 54, 89.SFLOTU : MMIX 27, 28, MMIX-PIPE 47, MMIX-

SIM 54, 89, MMIXAL 63.SFLOTUI : MMIX-PIPE 47, MMIX-SIM 54, 89.sfpack : MMIX-ARITH 34, 39, 40, 90.sfunpack : MMIX-ARITH 38, 39, 90.sh : MMIX-CONFIG 15, 28, MMIX-PIPE 49, 141.sh check : MMIXAL 97.shift amt : MMIX-PIPE 141, MMIX-SIM 87.shift left : MMIX-ARITH 7, 31, 34, 37, 38, 43,

45, 47, 49, 51, 52, 53, 55, 63, 73, 87, 88,89, 92, 94, 95, MMIX-PIPE 21, 22, 113,114, 118, 139, 141, 244, 279, 282, 333, 339,MMIX-SIM 13, 14, 85, 87, 94, 95, 154, 155,MMIXAL 28, 29, 94, 95, 101.

shift right : MMIX-ARITH 7, 29, 31, 33, 39,45, 47, 49, 51, 53, 63, 87, 88, 89, 94,MMIX-PIPE 21, 141, 239, 243, 279, 282, 334,343, MMIX-SIM 13, 87, 94, 95, MMIXAL 28,101, 114, 126, 131.

shl : MMIX-PIPE 49, 51, 141, MMIXAL 82,97, 101.

shlu : MMIX-PIPE 49, 51, 141.short oat: MMIX 26, 27.show breaks : MMIX-SIM 161, 162.show line : MMIX-SIM 47, 50, 51, 82, 83,

103, 105, 128.show pred bit : MMIX-PIPE 8, 46, 152, 160.show spec bit : MMIX-MEM 2, 3, MMIX-PIPE 8.show stats : MMIX-SIM 128, 140, 141, 149.show wholecache bit : MMIX-PIPE 8, 177.

Page 550: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 544

showing source : MMIX-SIM 48, 49, 51, 53,128, 143.

showing stats : MMIX-SIM 128, 129, 141, 143.shown �le : MMIX-SIM 47, 48, 49, 53.shown line : MMIX-SIM 47, 48, 49, 53, 128.shr : MMIX-PIPE 49, 51, 141, MMIXAL 82,

97, 101.shrt : MMIX-PIPE 21, MMIX-SIM 13.shru : MMIX-PIPE 49, 51, 141.SIGINT : MMIX-SIM 147, 148.sign : MMIX-ARITH 68, 70, 73, 84.sign bit : MMIX-ARITH 4, 12, 24, 33, 35, 37,

38, 39, 40, 41, 44, 46, 54, 87, 89, 91,93, MMIX-CONFIG 32, 33, MMIX-IO 21,MMIX-PIPE 80, 81, 82, 85, 89, 91, 100, 118,119, 140, 143, 144, 149, 157, 160, 177, 179,205, 230, 233, 234, 244, 266, 271, 279, 288,296, 320, 322, 331, 346, 353, 354, 355, 364,368, MMIX-SIM 15, 84, 85, 89, 90, 91, 94,95, 108, 123, 124, 157, 159, 161.

signal : MMIX-SIM 147, 148.signaling NaN: MMIX 21.signed integers: MMIX 6, 7.signed odiv : MMIX-ARITH 24, MMIX-PIPE 21,

343, MMIX-SIM 13, 88.signed omult : MMIX-ARITH 12, MMIX-PIPE 21,

343, MMIX-SIM 13, 88.sim : MMIX-PIPE 21, MMIX-SIM 13.sim �le info: MMIX-IO 5, 6.Sites, Richard Lee: MMIX 3, 40.size : MMIX-IO 4, 12, 13, 14, 15, 16, 17, 18,

MMIX-PIPE 381, 384, MMIX-SIM 4, 114, 117.SL : MMIX 14, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.sleep : MMIX-PIPE 125, 224, 257, 272, 274,

298, 300, 301.sleepy : MMIX-PIPE 301, 302, 303.SLI : MMIX-PIPE 47, MMIX-SIM 54, 87.SLU : MMIX 14, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.SLUI : MMIX-PIPE 47, MMIX-SIM 54, 87.sl3 : MMMIX 19, 20.Sorry, I can't open... : MMIX-SIM 145, 146.source : MMIXAL 126, 131.spec: MMIX-PIPE 40, 41, 42, 43, 44, 92, 93, 284.spec bit : MMIXAL 62, 102.spec install : MMIX-PIPE 94, 95, 110, 112, 113,

114, 117, 118, 119, 120, 121, 312, 322, 333,334, 338, 339, 340, 355.

spec mode : MMIXAL 43, 44, 52, 102, 132.spec mode loc : MMIXAL 43, 52, 132.spec read : MMIX-MEM 1, 2, MMIX-PIPE 206,

208, 210.spec reg code : MMIX-SIM 151, 152.spec regg code : MMIX-SIM 151, 152.spec rem : MMIX-PIPE 96, 97, 123, 145, 146,

147, 256.spec write : MMIX-MEM 1, 3, MMIX-PIPE 206,

208, 213.

special registers: MMIX 39, 43.special name : MMIX-PIPE 53, 91, MMIX-SIM 56,

103, 105, 138, MMIXAL 66, 67.special reg: MMIX-SIM 55.specnode: MMIX-PIPE 40, 43, 44, 71, 86, 92,

93, 94, 95, 96, 97, 100, 115, 120, 255.specnode struct: MMIX-PIPE 40.specval : MMIX-PIPE 92, 93, 104, 105, 106, 108,

113, 118, 120, 122, 312, 322, 323, 324, 339.speed lock : MMIX-PIPE 39, 247, 257, 362.Sprep : MMIX-PIPE 233, 234.sprintf : MMIX-SIM 24, 45, 80, 101, MMIXAL 45,

138, MMOTYPE 28.square one : MMIX-PIPE 272, 369, 370.SR : MMIX 14, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.src �le : MMIX-SIM 42, 45, 47, 48, 49,

MMIXAL 34, 35, 138, 139.src �le name : MMIXAL 137, 138, 139, 140.SRI : MMIX-PIPE 47, MMIX-SIM 54, 87.SRU : MMIX 14, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.SRUI : MMIX-PIPE 47, MMIX-SIM 54, 87.sscanf : MMIX-CONFIG 11, 38, MMIX-SIM 143,

MMIXAL 137, MMMIX 7, 8, 15, 18, 19, 21.st : MMIX-CONFIG 27, 28, MMIX-PIPE 49, 51,

117, 254, 265, 266, 267, 270, 271, 272,279, 280, 321, 327.

st mtime : MMIX-SIM 44.st ready : MMIX-PIPE 267, 270, 271, 272, 280.stab start : MMOTYPE 25, 29, 30.stack pointer: MMIXAL 18.stack load : MMIX-SIM 83, 101, 104.stack op: MMIXAL 82, 83, 84.Stack_Segment : MMIX-SIM 3, 37, MMIXAL 69.stack store : MMIX-SIM 81, 82, 83, 101, 102, 103.stack tracing : MMIX-SIM 61, 82, 83, 103,

105, 143.stage : MMIX-CONFIG 26, 34, 35, 36, MMIX-

PIPE 23, 25, 26, 28, 39, 59, 124, 125, 126,128, 129, 134, 136, 174, 231, 236, 249, 284.

stages : MMIX-CONFIG 27, 28, 29.stall : MMIX-PIPE 75, 82, 101, 102, 111, 120,

312, 322, 332.stamp : MMIX-PIPE 246, 251, 256, 257, MMIX-

SIM 16, 17, 21.standard oating point conventions: MMIX 22.standard NaN : MMIX-ARITH 4, 41, 44,

46, 91, 93.start fetch : MMIX-PIPE 288, 289.start ld st : MMIX-PIPE 265.startup : MMIX-PIPE 30, 31, 81, 203, 219, 221,

225, 233, 244, 249, 257, 259, 260, 261, 266,267, 271, 272, 273, 274, 276, 277, 286, 287,288, 291, 296, 297, 298, 300, 353, 354, 358,359, 360, 361, 365, 366.

stat: MMIXAL 82.stat : MMIX-SIM 43, 44.stat buf : MMIX-SIM 44.

Page 551: MMIXware - A RISC Computer for the Third Millennium - Knuth

545 MASTER INDEX

state : MMIX-PIPE 30, 31, 44, 46, 124, 125,130, 131, 133, 134, 135, 215, 217, 219,222, 224, 232, 233, 234, 237, 257, 259,260, 262, 264, 265, 267, 268, 270, 271, 272,273, 274, 276, 277, 278, 279, 280, 281, 288,291, 292, 295, 296, 297, 298, 300, 301, 310,325, 326, 345, 351, 354, 358, 359, 360, 361,364, 368, MMIX-SIM 160.

state 4 : MMIX-PIPE 308, 310, 311.state 5 : MMIX-PIPE 307, 310, 311.status : MMIXAL 82, 87, 94, 98, 99, 100, 101,

109, 110, 117, 118, 121, 122, 123, 124, 129.STB : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, 123, MMIXAL 63.STBI : MMIX-PIPE 47, MMIX-SIM 54, 95.STBU : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STBUI : MMIX-PIPE 47, MMIX-SIM 54, 95.STCO : MMIX 8, MMIX-PIPE 47, 117, MMIX-

SIM 54, 95, MMIXAL 63.STCOI : MMIX-PIPE 47, MMIX-SIM 54, 95.StdErr : MMIX-SIM 4, 134, 137, MMIXAL 69.stderr : MMIX-CONFIG 8, MMIX-IO 7, MMIX-

PIPE 13, 381, 384, MMIX-SIM 4, 14, 24, 26,35, 44, 49, 143, 145, 146, MMIXAL 35, 45, 79,137, 142, 145, MMMIX 3, 6, 7, 8, 9, 10, 11,12, MMOTYPE 2, 3, 9, 14, 20, 23, 25, 26, 30.

StdIn : MMIX-SIM 4, 134, 137, MMIXAL 69.stdin : MMIX-IO 7, 10, 13, 15, 17, MMIX-MEM 2,

MMIX-PIPE 387, MMIX-SIM 4, 120, 150,MMMIX 13.

StdIn> : MMIX-PIPE 387, MMIX-SIM 120.stdin buf : MMIX-PIPE 387, 388, MMIX-

SIM 120, 121.stdin buf end : MMIX-PIPE 387, 388, MMIX-

SIM 120, 121.stdin buf start : MMIX-PIPE 387, 388, MMIX-

SIM 120, 121.stdin chr : MMIX-IO 4, 13, 15, 17, MMIX-

PIPE 377, 387, MMIX-SIM 120.StdOut : MMIX-SIM 4, 134, 137, MMIXAL 69.stdout : MMIX-IO 7, MMIX-PIPE 387, MMIX-

SIM 4, 120, 133, 137, 138, 150, 156, 159.STHT : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STHTI : MMIX-PIPE 47, MMIX-SIM 54, 95.sticky bit: MMIX-ARITH 31, 34, 49, 53, 79, 87.STO : MMIX 8, MMIX-PIPE 47, MMIX-SIM 54,

95, MMIXAL 63.STOI : MMIX-PIPE 47, MMIX-SIM 54, 95.Stone, Harold Stuart: MMIX 31.stop : MMIX-IO 4, MMIX-PIPE 381, 382, 383,

MMIX-SIM 114, 115, 116.store fx : MMIX-SIM 89.store new char : MMIXAL 57.store sf : MMIX-ARITH 40, MMIX-PIPE 21, 281,

MMIX-SIM 13, 95.store x : MMIX-SIM 84, 85, 86, 87, 88, 89, 90,

92, 94, 97, 102, 107.

STOU : MMIX 8, MMIX-PIPE 47, 113, 339,MMIX-SIM 54, 95, MMIXAL 63.

STOUI : MMIX-PIPE 47, MMIX-SIM 54, 95.strcmp : MMIX-CONFIG 18, 19, 20, 21, 22, 23,

24, 38, MMIXAL 38, MMMIX 4, MMOTYPE 28.strcpy : MMIX-CONFIG 10, 18, 25, 38, MMIX-

SIM 96, 97, 98, 107, 108, 149, MMIXAL 137,138, MMOTYPE 28.

stream name : MMIX-SIM 137.string : MMIX-IO 19, 20, MMIX-SIM 4.strlen : MMIX-ARITH 67, MMIX-CONFIG 10, 25,

27, 38, MMIX-PIPE 387, MMIX-SIM 4, 24, 42,45, 120, 143, 149, 150, 163, MMIXAL 34, 50,138, MMMIX 4, 6, MMOTYPE 28.

strncmp : MMIX-ARITH 68, 79.strncpy : MMOTYPE 28.strong : MMIXAL 82, 83.STSF : MMIX 26, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STSFI : MMIX-PIPE 47, MMIX-SIM 54, 95.STT : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STTI : MMIX-PIPE 47, MMIX-SIM 54, 95.STTU : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STTUI : MMIX-PIPE 47, MMIX-SIM 54, 95.STUNC : MMIX 30, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.stunc : MMIX-PIPE 49, 251, 254, 257, 281.STUNCI : MMIX-PIPE 47, MMIX-SIM 54, 95.STW : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STWI : MMIX-PIPE 47, MMIX-SIM 54, 95.STWU : MMIX 8, MMIX-PIPE 47, 281, MMIX-

SIM 54, 95, MMIXAL 63.STWUI : MMIX-PIPE 47, MMIX-SIM 54, 95.style : MMIX-SIM 133, 134, 137.SUB : MMIX 9, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.sub : MMIX-CONFIG 28, MMIX-PIPE 44, 49,

51, 140.SUBI : MMIX-PIPE 47, MMIX-SIM 54, 85.subroutine library initialization: MMIX-SIM 6,

164.SUBSUBVERSION : MMIX-PIPE 89, MMIX-SIM 77.SUBU : MMIX 9, MMIX-PIPE 47, MMIX-SIM 54,

85, MMIXAL 63.subu : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 139.SUBUI : MMIX-PIPE 47, MMIX-SIM 54, 85.SUBVERSION : MMIX-PIPE 89, MMIX-SIM 77.support : MMIX-PIPE 78, 79, 80.suppress dispatch : MMIX-PIPE 64, 65, 317.switchable string : MMIX-SIM 138, 139.switch0 : MMIX-PIPE 288, 299.switch1 : MMIX-PIPE 130, 133, 265, 327,

345, 359, 360.switch2 : MMIX-PIPE 135, 364.

Page 552: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 546

SWYM : MMIX 49, MMIX-PIPE 47, 301, 321, 323,325, MMIX-SIM 54, MMIXAL 63.

swym one : MMIX-PIPE 301, 302.sy : MMIX-ARITH 24.sym : MMIXAL 54, 64, 66, 70, 71, 72, 73, 74,

75, 76, 78, 87, 91, 100, 104, 110, 111,118, 125, 130, 144.

sym avail : MMIXAL 59, 60.sym buf : MMIXAL 75, 77, 78, 79, 80,

MMOTYPE 25, 26, 27, 28, 29.sym length max : MMOTYPE 26, 29.sym ptr : MMIXAL 75, 77, 78, 79, 80,

MMOTYPE 25, 26, 28, 29.sym root : MMIXAL 60.sym tab struct: MMIXAL 58.Symbol table... : MMOTYPE 22, 25..symbol...already defined : MMIXAL 109.symbol found : MMIXAL 87, 88, 89.SYNC : MMIX 31, MMIX-PIPE 47, 304, 305, 323,

MMIX-SIM 54, 107, MMIXAL 63.sync : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

230, 233, 234, 251, 254, 256, 257, 355,356, 361.

sync check : MMIX-PIPE 269, 271, 272, 370.sync L: MMIX-SIM 101.SYNCD : MMIX 30, MMIX-PIPE 47, MMIX-SIM 54,

106, MMIXAL 63.syncd : MMIX-PIPE 49, 51, 230, 265, 269, 271,

272, 280, 320, 323, 364, 368, 369.SYNCDI : MMIX-PIPE 47, MMIX-SIM 54, 106.SYNCID : MMIX 30, MMIX-PIPE 47, MMIX-

SIM 54, 106, MMIXAL 63.syncid : MMIX-PIPE 49, 51, 85, 119, 265, 266,

267, 269, 270, 271, 272, 280, 320, 323.SYNCIDI : MMIX-PIPE 47, MMIX-SIM 54, 106.syntax error... : MMIXAL 86, 97.syntax of oating point constants: MMIX-

ARITH 68.sys call: MMIX-PIPE 371, MMIX-SIM 59.system dependencies: MMIX-ARITH 3, MMIX-

IO 16, MMIX-PIPE 17, 89, MMIX-SIM 10, 43,44, 77, MMIXAL 26, MMOTYPE 27.

System/360: MMIX 7.System/370: MMIX 31.sz : MMIX-ARITH 24.t: MMIX-ARITH 8, 13, 39, 40, 89, 90, MMIX-

PIPE 35, 82, 95, 97, 197, 241, MMIX-SIM 15, 166, MMIXAL 48, 55, 57, 73, 74,MMOTYPE 1, 8.

table lookaside bu�er: MMIX-PIPE 163.tag : MMIX-CONFIG 32, 33, MMIX-PIPE 167,

172, 176, 177, 179, 185, 193, 196, 197, 201,203, 205, 206, 210, 213, 216, 217, 218, 219,221, 223, 226, 233, 234, 245, 259, 276, 353,354, 379, MMMIX 12, 23.

tagmask : MMIX-CONFIG 31, MMIX-PIPE 167,192, 193, 379.

tail : MMIX-PIPE 64, 69, 71, 73, 74, 85, 120,160, 301, 304, 308, 309, 316, MMMIX 12, 22.

TC: MMIX 46.TDIF : MMIX 11, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.tdif : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 344.tdif l : MMIX-PIPE 344, MMIX-SIM 87.TDIFI : MMIX-PIPE 47, MMIX-SIM 54, 87.terabytes: MMIX 42, 45.terminate : MMIX-PIPE 125, 126, 144, 215, 217,

221, 222, 224, 232, 237.terminator : MMIXAL 57, 87.ternary trie struct: MMIXAL 54.test load bkpt : MMIX-SIM 83, 94, 96, 105,

111, 114.test over ow : MMIX-SIM 88.test store bkpt : MMIX-SIM 82, 95, 96, 103, 117.tet : MMIX-SIM 16, 25, 26, 28, 30, 33, 34, 37,

51, 63, 82, 83, 94, 95, 96, 103, 105, 111,114, 118, 119, 157, 159, 163, 164, 165,MMOTYPE 9, 11, 15, 18, 19, 21, 23, 24.

TETRA : MMIXAL 63, MMIXAL 62, 63, 117.tetra: MMIX-ARITH 3, 7, 8, 13, 26, 27, 28,

29, 34, 38, 39, 40, 54, 59, 60, 61, 62, 82,90, MMIX-IO 3, MMIX-PIPE 17, 21, 68, 76,78, 120, 206, 210, 213, 246, MMIX-SIM 10,13, 15, 16, 19, 25, 31, 44, 61, 62, 95, 101,114, 117, 164, 165, 166, MMIXAL 26, 43,48, 52, 68, 76, 105, 120.

tetrabyte: MMIX 6.Text_Segment : MMIX-SIM 3.TextRead : MMIX-SIM 4, MMIXAL 69.TextWrite : MMIX-SIM 4, MMIXAL 69.The number of local... : MMIX-SIM 77.the operand is undefined : MMIXAL 129.The symbol table isn't... : MMOTYPE 30.thinking big: MMIX-PIPE 58, 74.third operand : MMIX-PIPE 103, 107, 108,

MMIX-SIM 64, 71, 79.This can't happen : MMIX-PIPE 13.three arg bit : MMIXAL 62, 116.thresh : MMIX-ARITH 93, 94.ticks : MMIX-MEM 2, 3, MMIX-PIPE 10, 14, 28,

64, 87, 187, 251, 256, 257, MMMIX 2, 15, 23.time : MMIX-PIPE 89, MMIX-SIM 77,

MMIXAL 141.times : MMIXAL 82, 97, 101.TLB: MMIX-PIPE 163.tmp : MMIX-IO 8, MMIX-SIM 31, 34, MMMIX 16,

18, 22, MMOTYPE 16, 19, 24.tmpo : MMIX-PIPE 141.token : MMIX-CONFIG 9, 10, 11, 18, 19, 20,

21, 22, 23, 24, 25.token prescanned : MMIX-CONFIG 9, 10, 22, 24.Tomasulo, Robert Marco: MMIX-PIPE 58.too many global registers : MMIXAL 108.too many operands... : MMIXAL 116.top op : MMIXAL 83, 85.top val : MMIXAL 83, 87, 94, 98, 99, 100, 101.trace bit : MMIX-SIM 58, 63, 161, 162.trace format : MMIX-SIM 64, 65, 131.

Page 553: MMIXware - A RISC Computer for the Third Millennium - Knuth

547 MASTER INDEX

trace print : MMIX-SIM 136, 137.trace threshold : MMIX-SIM 61, 63, 143.tracing : MMIX-SIM 61, 63, 82, 83, 103, 105,

107, 122, 127, 128, 149.tracing exceptions : MMIX-SIM 61, 122, 143.trailing characters... : MMIXAL 35.trans : MMIX-PIPE 241.trans key : MMIX-PIPE 240, 245, 267, 272, 291,

298, 302, 326, 353, 354.translation caches: MMIX 46, 47, 49, MMIX-

PIPE 163.TRAP : MMIX 33, 36, 50, MMIX-PIPE 47, 80, 82,

320, MMIX-SIM 54, 108, MMIXAL 63.trap : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 80,

81, 82, 85, 103, 149, 310, 312, 313, 317, 320.trap format : MMIX-SIM 108, 110, 139.trap loc : MMIX-PIPE 373.traps: MMIX 35.trie node: MMIXAL 54, 55, 56, 57, 65, 73,

74, 82, 90.trie root : MMIXAL 56, 61, 66, 70, 71, 80,

87, 111, 144.trie search : MMIXAL 57, 64, 66, 70, 71, 87,

104, 111, 144.TRIP : MMIX 33, 35, 50, MMIX-PIPE 47,

MMIX-SIM 54, 108, MMIXAL 63.trip : MMIX-CONFIG 28, MMIX-PIPE 49, 51,

80, 85, 312, 313, 317.trip warning : MMIX-IO 23, 24.tripping : MMIX-SIM 61, 123, 131.trips: MMIX 35.true : MMIX-ARITH 1, 24, 68, MMIX-CONFIG 15,

22, 24, MMIX-PIPE 11, 59, 68, 85, 89, 100,106, 108, 110, 112, 113, 114, 117, 118, 119,120, 121, 144, 170, 185, 217, 227, 236, 238,239, 259, 262, 263, 265, 302, 304, 310, 312,314, 316, 317, 322, 324, 330, 331, 332, 333,334, 337, 338, 339, 340, 345, 350, 355, 361,364, 373, MMIX-SIM 9, 45, 51, 63, 82, 83,87, 90, 103, 105, 107, 109, 122, 123, 125,127, 128, 141, 142, 143, 148, 149, 150, 153,164, MMIXAL 26, 35, 41, 71, 87, 101, 111,132, MMMIX 6, 7, 8, 9, 10, 11.

true head : MMIX-PIPE 74, 81.try complement : MMIX-ARITH 94, 95.trying to interrupt : MMIX-PIPE 314, 315, 330,

351, 363, 364.tt : MMIX-ARITH 65, 66, 81, 83, MMIX-PIPE 28,

MMIXAL 57, 64, 65, 66, 70, 87, 88, 89, 111.Two file names... : MMOTYPE 20.two arg bit : MMIXAL 62, 116.Type tetra... : MMIXAL 29.t0 : MMMIX 10.t1 : MMMIX 10.t2 : MMMIX 10.t3 : MMMIX 10.�: MMIX 50, MMIX-SIM 1.u: MMIX-ARITH 7, 8, 13, 89, MMIX-PIPE 75,

79, 97.

U_BIT : MMIX-ARITH 31, 33, 35, MMIX-PIPE 54,307, MMIX-SIM 57, 89, 122, MMIXAL 69.

U_Handler : MMIXAL 69.unary : MMIXAL 82, 83.unary check : MMIXAL 100.unde�ned : MMIXAL 82, 87, 99, 101, 110, 117,

118, 121, 122, 123, 124, 129.undefined constant : MMIXAL 118.undefined local symbol : MMIXAL 145.undefined symbol : MMIXAL 79.under ow: MMIX 21, 22, 32, MMIX-ARITH 31.undump octa : MMMIX 9, 10, 11.Unexpected end of file... : MMMIX 11,

MMOTYPE 9.Unicode: MMIX 6, MMIXAL 5, 6, 7, 30, 75,

MMOTYPE 27.uninit mem bit : MMIX-PIPE 8, 210.uninitialized memory... : MMIX-PIPE 210.unit busy : MMIX-PIPE 82.unit found : MMIX-PIPE 82.Unknown lopcode : MMOTYPE 13.unknown operation code : MMIXAL 104.UNKNOWN_SPEC : MMIX-PIPE 71, 73, 85, 120,

123, 290, 309.uns : MMIX-PIPE 21, MMIX-SIM 13, MMIXAL 28.unsav : MMIX-PIPE 49, 327, 332.UNSAVE : MMIX 43, 50, MMIX-PIPE 47, 81, 102,

279, 305, 332, 335, MMIX-SIM 37, 54, 102,104, 127, 164, MMIXAL 63, MMMIX 12.

unsave : MMIX-CONFIG 28, MMIX-PIPE 49,51, 327, 332.

unschedule : MMIX-PIPE 32, 33, 145, 287.unsgnd : MMIX-PIPE 21, MMIX-SIM 13.Unsupported virtual address : MMMIX 11.up : MMIX-PIPE 40, 73, 85, 86, 89, 93, 95, 97,

100, 102, 114, 116, 117, 120, 146, 227,254, 255, 312, 333, 334.

update listing loc : MMIXAL 42, 44.usage : MMIX-PIPE 44, 46, 81, 100, 146, 324,

MMIX-SIM 143.Usage: ... : MMIX-SIM 143, MMIXAL 137,

MMMIX 3, MMOTYPE 2.usage help : MMIX-SIM 143, 144.use and �x : MMIX-PIPE 195, 196, 198, 201,

217, 262, 268, 269, 271, 273, 292, 293,296, 353, 354.

useful : MMIXAL 73.v: MMIX-ARITH 8, 13, MMIX-CONFIG 11, 12,

13, 14, MMIX-PIPE 167.V_BIT : MMIX-ARITH 31, MMIX-PIPE 54, 140,

141, 282, 343, MMIX-SIM 57, 84, 85, 87,88, 95, MMIXAL 69.

V_Handler : MMIXAL 69.val : MMIX-ARITH 68, 69, 71, 73, 83, 84,

MMIX-MEM 2, 3, MMIX-PIPE 208, 212,213, 379, MMIX-SIM 13, 30, 153, 155, 157,158, 161, MMMIX 17.

val node: MMIXAL 82, 83, 84.

Page 554: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 548

val ptr : MMIXAL 83, 85, 87, 94, 99, 110,116, 117.

val stack : MMIXAL 81, 82, 83, 84, 85, 108, 109,110, 117, 118, 121, 122, 123, 124, 125, 126,127, 129, 130, 131, 132, 134.

Vandevoorde, Mark Thierry: MMIX 40.vanish : MMIX-CONFIG 34, MMIX-PIPE 126,

128, 129, 260.vanish ctl : MMIX-PIPE 127, 128.vctsz : MMIX-CONFIG 13, 15, 23.verb : MMIXAL 100, 101.verbose : MMIX-MEM 2, 3, MMIX-PIPE 4, 10,

28, 33, 46, 81, 125, 145, 146, 147, 149,152, 160, 177, 210, 283, 310, 314, 319,320, 321, MMIX-SIM 140, MMMIX 15,MMOTYPE 2, 4, 9, 30.

VERSION : MMIX-PIPE 89, MMIX-SIM 77.version number: MMIX 41, 51.vh : MMIX-ARITH 13, 17, 21.victim : MMIX-CONFIG 33, MMIX-PIPE 167,

177, 181, 193, 196, 199, 205, 233, 234.VIIIADDU : MMIX-PIPE 47, MMIX-SIM 54, 85.VIIIADDUI : MMIX-PIPE 47, MMIX-SIM 54, 85.virt : MMIX-PIPE 241.virtual address emulation: MMIX 49.virtual addresses: MMIX 44, 45, 47.vmh : MMIX-ARITH 13, 17, 21.vrepl : MMIX-CONFIG 16, 23, MMIX-PIPE 167,

196, 199, 205.Vuillemin, Jean Etienne: MMIX-SIM 16.vv : MMIX-CONFIG 16, 23, 31, 33, MMIX-

PIPE 167, 177, 181, 193, 196, 199, 205,233, 234.

w: MMIX-ARITH 8, MMIX-SIM 61.W_BIT : MMIX-ARITH 31, 88, MMIX-PIPE 54,

346, MMIX-SIM 57, 89, MMIXAL 69.W_Handler : MMIXAL 69.wait : MMIX-PIPE 125, 131, 133, 134, 215, 216,

217, 218, 219, 221, 222, 223, 224, 225,233, 234, 237, 257, 259, 260, 261, 262,263, 264, 266, 271, 272, 273, 276, 277,278, 279, 281, 283, 288, 290, 297, 298, 301,310, 326, 328, 329, 330, 342, 350, 351, 353,354, 356, 357, 358, 359, 360, 361, 362, 363,364, 365, 366, 367, 368.

wait or pass : MMIX-PIPE 288, 292, 295, 296.Waldspurger, Carl Alan: MMIX 40.wbuf bot : MMIX-CONFIG 37, MMIX-PIPE 247,

251, 255, 256, 257, 378, 379.wbuf lock : MMIX-PIPE 39, 247, 256, 257, 259,

260, 262, 264, 360.wbuf top : MMIX-CONFIG 37, MMIX-PIPE 247,

249, 251, 255, 256, 257, 378, 379.wcslen : MMIX-SIM 4.WDIF : MMIX 11, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.wdif : MMIX-CONFIG 28, MMIX-PIPE 49,

51, 344.WDIFI : MMIX-PIPE 47, MMIX-SIM 54, 87.

weak : MMIXAL 82, 83.Weihl, William Edward: MMIX 40.what say : MMIX-SIM 149, 152, MMMIX 13,

15, 18, 19.Wheeler, David John: MMIX-ARITH 26.Wilkes, Maurice Vincent: MMIX 30, MMIX-

ARITH 26.Wirth, Niklaus Emil: MMIX 7.wow : MMIX-PIPE 11.wra : MMIX-CONFIG 13, 15, 23.wrb : MMIX-CONFIG 13, 15, 23.WRITE_ALLOC : MMIX-CONFIG 23, 31, MMIX-

PIPE 166, 167, 217, 257.WRITE_BACK : MMIX-CONFIG 23, MMIX-

PIPE 166, 167, 217, 263.write bit : MMIX-SIM 58, 82, 161, 162.write buf size : MMIX-CONFIG 15, 37.write co : MMIX-PIPE 248, 249.write ctl : MMIX-PIPE 248, 249, 360.write from wbuf : MMIX-PIPE 129, 249, 257,

272.write head : MMIX-PIPE 247, 249, 251, 255, 256,

257, 259, 260, 261, 262, 360, 362, 378, 379.write node: MMIX-PIPE 246, 247, 251, 255,

256, 378, 379.write restart : MMIX-PIPE 257, 261.write search : MMIX-PIPE 254, 255, 268, 270,

271, 278.write tail : MMIX-PIPE 247, 249, 251, 255, 256,

257, 360, 362, 378, 379.WYDE : MMIXAL 62, 63, 117, MMIXAL 63.wyde: MMIX 6.wyde di� : MMIX-ARITH 28, MMIX-PIPE 21,

344, MMIX-SIM 13, 87.x: MMIX-ARITH 5, 6, 13, 25, 26, 27, 29, 37, 38,

39, 40, 41, 44, 46, 54, 60, 62, 81, 82, 85, 91,93, MMIX-IO 22, MMIX-PIPE 21, 44, 56, 119,120, 381, 384, MMIX-SIM 13, 61, 114, 117,MMIXAL 28, 48, 76, 120, MMOTYPE 8.

X field doesn't fit... : MMIXAL 123.X field is undefined : MMIXAL 123.X field...register number : MMIXAL 123.X_BIT : MMIX-ARITH 31, 33, 35, MMIX-PIPE 54,

307, MMIX-SIM 57, 122, MMIXAL 69.x bits : MMIXAL 52.X_Handler : MMIXAL 69.X is dest bit : MMIX-PIPE 83, 101, 312, 320,

MMIX-SIM 60, 65, 126.X is source bit : MMIX-SIM 65.x ptr : MMIX-SIM 61, 80, 84.xar bit : MMIXAL 62, 123.xe : MMIX-ARITH 41, 43, 44, 45, 46, 47, 49,

91, 92, 93, 94.xf : MMIX-ARITH 41, 43, 44, 45, 46, 47, 86,

87, 91, 92, 93, 94.XOR : MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,

86, MMIXAL 63.xor : MMIX-ARITH 29, MMIX-CONFIG 28,

MMIX-PIPE 21, 49, 51, 138, MMIX-SIM 13,

Page 555: MMIXware - A RISC Computer for the Third Millennium - Knuth

549 MASTER INDEX

MMIXAL 82, 97, 101.XORI : MMIX-PIPE 47, MMIX-SIM 54, 86.xr bit : MMIXAL 62, 123.xs : MMIX-ARITH 41, 43, 44, 45, 46, 47, 93, 94.XVIADDU : MMIX-PIPE 47, MMIX-SIM 54, 85.XVIADDUI : MMIX-PIPE 47, MMIX-SIM 54, 85.xx : MMIX-ARITH 26, MMIX-PIPE 44, 46, 100,

102, 106, 110, 117, 118, 119, 120, 146,227, 265, 275, 312, 320, 323, 325, 329,332, 335, 336, 337, 340, 341, 364, 369, 370,MMIX-SIM 60, 62, 74, 80, 95, 97, 101, 102,104, 106, 107, 108, 124.

xyz : MMIXAL 119, 120, 123, 129, 130, 131.XYZ field doesn't fit... : MMIXAL 129.xyzar bit : MMIXAL 62, 129.xyzr bit : MMIXAL 62, 129.y: MMIX-ARITH 5, 6, 7, 8, 12, 13, 24, 25, 27,

28, 29, 41, 44, 46, 50, 85, 93, MMIX-PIPE 21,44, MMIX-SIM 13, 61, MMIXAL 28, 48, 120,MMMIX 20, 25, MMOTYPE 18.

Y field doesn't fit... : MMIXAL 122.Y field is undefined : MMIXAL 122.Y field of lop_post... : MMOTYPE 22.Y field...register number : MMIXAL 122.Y is immed bit : MMIX-SIM 65.Y is source bit : MMIX-SIM 65.yar bit : MMIXAL 62, 122.ybyte : MMIX-SIM 28, 29, 33, 34, 35.ye : MMIX-ARITH 41, 43, 44, 45, 46, 47, 48,

50, 51, 52, 85, 93, 94, 95.Yellin, Frank Nathan: MMIX 40.yf : MMIX-ARITH 41, 43, 44, 45, 46, 47, 48, 49,

50, 51, 52, 53, 85, 93, 94, 95.yhl : MMIX-ARITH 7, MMMIX 20.ylh : MMIX-ARITH 7, MMMIX 20.yr bit : MMIXAL 62, 122.ys : MMIX-ARITH 41, 44, 46, 47, 48, 49, 50,

53, 85, 93, 94.yt : MMIX-ARITH 41, 44, 46, 50, 85, 93.yy : MMIX-ARITH 24, MMIX-PIPE 44, 46, 100,

103, 105, 118, 320, 333, 335, 337, 339, 341,372, 380, MMIX-SIM 60, 62, 71, 73, 97, 102,104, 107, 108, 111, 124.

yz : MMIX-PIPE 75, 84, 85, 109, 120, MMIX-SIM 60, 62, 70, 78, 101, MMIXAL 48,120, 122, 123, 124, 125, 126, 127, 128,MMOTYPE 9, 11, 13, 18, 19, 20, 21, 25, 30.

YZ field at lop_end... : MMOTYPE 30.YZ field doesn't fit... : MMIXAL 124.YZ field is undefined : MMIXAL 124.YZ field of lop_fixrx... : MMOTYPE 19.YZ field...register number : MMIXAL 124.YZ field...should be zero : MMOTYPE 25.YZ field...should be 1 : MMOTYPE 13.yzar bit : MMIXAL 62, 124.yzbytes : MMIX-SIM 25, 26, 29, 33, 34, 35, 36.yzr bit : MMIXAL 62, 124.z: MMIX-ARITH 5, 8, 12, 13, 24, 25, 27, 28, 29,

39, 40, 41, 44, 46, 50, 85, 86, 88, 89, 91,

93, MMIX-PIPE 21, 44, MMIX-SIM 13, 61,MMIXAL 28, 48, 120, MMOTYPE 18.

Z field doesn't fit... : MMIXAL 121.Z field is undefined : MMIXAL 121.Z field of lop_fixo... : MMOTYPE 19.Z field of lop_loc... : MMOTYPE 18.Z field of lop_post... : MMOTYPE 22.Z field...register number : MMIXAL 121.Z_BIT : MMIX-ARITH 31, 44, MMIX-PIPE 54,

MMIX-SIM 57, MMIXAL 69.Z_Handler : MMIXAL 69.Z is immed bit : MMIX-SIM 65.Z is source bit : MMIX-SIM 65.zap cache : MMIX-PIPE 180, 181, 358, 359, 360.zar bit : MMIXAL 62, 121.zbyte : MMIX-SIM 28, 29, 33, 34, 35, 37.ze : MMIX-ARITH 41, 43, 44, 45, 46, 47, 48, 50,

51, 52, 85, 86, 87, 88, 91, 92, 93, 94, 95.zero : MMIXAL 82, 83.zero exponent : MMIX-ARITH 36, 37, 38, 51.zero octa : MMIX-ARITH 4, 24, 29, 31, 39, 41,

44, 45, 46, 53, 73, 83, 88, 89, 93, MMIX-IO 4,8, 11, 14, 16, 18, 19, 20, 21, MMIX-PIPE 20,100, 112, 179, 237, 243, 244, 265, 279,288, 312, 317, 319, 330, 346, 356, 364, 380,MMIX-SIM 13, 60, 81, 89, 99, 123, 126, 153,154, 155, 158, 159, MMIXAL 27, 59, 100,101, MMMIX 12, 21, 23, 25.

zero out : MMIX-ARITH 93, 95.zero spec : MMIX-PIPE 41, 85, 100, 109, 112,

113, 114.zeros : MMIX-ARITH 74, 76, 77, 79.zf : MMIX-ARITH 41, 43, 44, 45, 46, 47, 48,

49, 50, 51, 52, 53, 85, 86, 87, 88, 91,92, 93, 94, 95.

zhex : MMIX-SIM 134, 135, 137.zr bit : MMIXAL 62, 121.zro : MMIX-ARITH 36, 37, 38, 39, 40, 41, 42,

44, 46, 50, 85, 86, 88, 91, 93.zs : MMIX-ARITH 41, 44, 46, 47, 48, 49, 50,

53, 85, 86, 87, 88, 91, 93.zset : MMIX-CONFIG 28, MMIX-PIPE 49, 51, 345.ZSEV : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSEVI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSN : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSNI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSNN : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSNNI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSNP : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSNPI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSNZ : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSNZI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSOD : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.

Page 556: MMIXware - A RISC Computer for the Third Millennium - Knuth

MASTER INDEX 550

ZSODI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSP : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSPI : MMIX-PIPE 47, MMIX-SIM 54, 92.ZSZ : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, MMIXAL 63.ZSZI : MMIX-PIPE 47, MMIX-SIM 54, 92.zt : MMIX-ARITH 41, 44, 46, 50, 85, 86,

88, 91, 93.zz : MMIX-ARITH 24, MMIX-PIPE 44, 46, 100,

103, 104, 118, 146, 320, 322, 323, 328,337, 338, 339, 341, 355, 356, 372, 373,MMIX-SIM 60, 62, 71, 72, 97, 102, 107,108, 109, 124, 133, 138.

16ADDU : MMIX 9, MMIXAL 63.2ADDU : MMIX 9, MMIXAL 63.4ADDU : MMIX 9, MMIXAL 63.8ADDU : MMIX 9, MMIXAL 63.