The circular synchronous computer and its relation to functional programming

North-Holland Publishing Company Microprocessing and Microprogramming 10 (1982) 333-340

333

The Circular Synchronous Computer and Its Relation to Functional Programming

Dan Tomescu and Cristian B~leanu Electronics Department, L P. B., Co P.68-86, Bucuresti, Romania

Of.post.68,

The model of a computing system - a circular pipeline of identical processing elements with a simple structure - is presented. It is shown that a circular synchronous computer is fit for the execution of functional programs. The processing elements of a circular synchronous computer execute in cooperation the same program; during its execution a program, represented as a string that is circularly shifted from one processing element to its next neighbour, is successively transformed until it is turned into the empty string.

Keywords: Computer architecture, computer system organization, pipelining, multiprocessing, functional programming.

1. Introduction

Computer architecture is far from keeping step with the impressive advances that component technology has made during the last decade. The con- ceptual framework underlying computing structure has remained unchanged during all that time, actually more or less since the advent of computers. We feel that we could do better with LSI technology than manufacture microprocessors and computers on a chip, because while they are responsible for the dramatic lowering of hardware costs and size, they emphasize the programmers ' need for more manageable computers. We regard the fact that hardware costs tend to become negli- gible as compared to software costs as an anomaly due, to a certain extent, to conservative architectu- res supporting the same class of languages- variations on a well known theme by John von Neumann.

The tremendous complexity of non-trivial programs seems to be considered a kind of fatality. The techniques grouped round structured programming help us deal with this complexity but

almost nothing has been done to reduce it drasti- cally. Moreover, despite the efforts of many computer scientists, testing is the only practical means of verifying programs and as Dijkstra [3] pointed out: "Program testing can be used to show the presence of bugs, but never to show their absence".

If we want to design a powerful computing system, a multiprocessor is an obvious choice. But here we are facing many difficult problems, among which memory contention. Not only operating systems for multiprocessors are not the most trivial programs one can think of, but the addition of extra processors may not always increase the computing power. Actually, a very careful design is needed if one wants to be able to extend within limits the number of processors.

Backus [1] made a refreshing criticism of current models of computing systems and proposed a functional style of programming. A computer is always "seen" through some language that is deeply influenced by the kind of machine it was designed for. From this viewpoint "the differences

between Fortran and Algol 68, although considerable, are less significant than the fact that both are based on the programming style o f the yon Neumann computer" [1].

The present paper describes the model of a multiprocessor originating in some considerations regarding computer system organization. Sub- sequently we found out that our machine was fit to support languages from the class of the functional style of programming, which made us put more confidence in this model because we think that [1] proposes a very promising research area. This is the reason why, after an outline of the structure of a "circular synchronous computer" , we show that it can efficiently execute functional programs.

334 D. Tomescu and C. B&leanu / The Circular Synchronous Computer and Functional Programming

2. The Circular Synchronous Computer

The circular synchronous computer (CSC) is a model of a computing system which has as a starting-point our experience with a multiprocessor whose functional units (processors, memories, peripheral devices) are connected to a circular synchronous bus (CSB) [9].

A CSB is a set of registers where a clock controls at fixed intervals circular shifts of data. Each register (called a "s ta t ion") is coupled to one of the system's functional units (Fig. 1). Any unit may change data with any other by placing - within a protocol limits - information "waggons" into the station it is coupled to. After a number of clock periods dependent upon the distance between the two units relative to the rotation direction on the bus, any emitted waggon reaches its destination.

Fig. 1. A multiprocessor with a circular synchronous bus.

Because "there is only one reason to utilize a multiprocessor f o r its performance or throughput

characteristics; that reason is that no single processor is large enough to handle a single task in the time available" [4], it follows that various multiprocessor organizations have one common draw- back: if a program is to take advantage of a multiprocessor's computing capability then it has to be put in a parallel form. Hence, not all programs can

benefit from execution on a multiprocessor, while others have to be rewritten (in special cases, parallelisms may be automatically detected by time-consuming programs).

The amount of bookkeeping to be performed by a system is dependent upon the "unit of computa- t ion" that may be executed in parallel with others [7]. In a pipeline organization [8] bookkeeping is practically of no significance, whereas it may become considerable for systems with processes as units of computation, i.e. multiprocessors [7]. But, what is more important, in a pipeline organization parallel execution is totally transparent on the program level. Hence, any program is executed faster if arithmetic operations or machine instruction cycles are performed along pipelines.

One more remark: the addition of extra processors to a multiprocessor may involve considerable changes on many levels of that system.

The circular synchronous computer is an attempt to extend pipelining on a larger scale, i.e. a CSC is a circular pipeline of processors (fig. 2, where PE = processing element, DFC = data flow controller, I /O = input /output module). Fig. 2 is almost identical with Fig. 1, but while in the latter FUi are modules of a conventional multiprocessor, in the former PEi are processing elements with a simple structure (Fig. 3, where R~ = input register, R o = output register, CU = computing unit, IB = input buffer).

Fig. 2. The structure of the circular synchronous computer.

D. Tomescu and C. BMeanu / The Circular Synchronous Computer and Functional Programming 335

IB

PE I

Fig. 3. The structure of a processing element.

FROM PEI.I

Any number of PE's can be added to or removed from the system with no changes except for the

hardware connections. No processor " k n o w s " about the existence of

other processors and, f rom an abstract point of

view, a one PE CSC is equivalent to a 100 PE CSC. The number of PE ' s influence a program execution time. The structure of a CSC eliminates any kind of contention among its PE 's .

PEl . . . . . PEn execute in cooperation a program that is circularly shifted f rom one station into another. In order to continue the analogy involving stations and waggons, a program is a " t r a i n " that is processed as it moves round the bus. A program

is injected into the system by DFC in a manner to be detailed further below.

Like in a pipeline, PE ' s are units which process data that pass f rom station to station, but unlike in

a pipeline they are not dedicated modules, as they per form any operation f rom an "instruction set" common to all PE 's .

A program is not stored during execution into a random access memory, but it rotates in the CSC f rom the moment it is injected into the system until completion of execution. How does such a program look like? We postpone this issue till the next section, now we shall only make a few remarks:

A program is built out of a set of primitive functions and a set of combining forms. The evaluation of primitive functions and combining forms is performed by " p r o g r a m s " on the PE level. The pipeline structure of the system requires that evaluation procedures be performed as much as possible in sequences of simple actions (evaluation stages) carried out by consecutive PE 's .

Some evaluation stages may take several pipeline clock periods, while new waggons enter that PE. In order to preserve the proper sequence of evalua- tions, data entering a PE during this interval are delayed into its input buffer.

A computing unit (CU) is a simple "p rocessor" with no von Neumann store. Its actions are driven by data stored in a transition memory (TM). The TM's of all PE ' s have the same contents.

A CSC has a random access memory as part of the I / O module (Fig. 2) and any memory access passes through DFC.

3. Functional Programming and the Circular Synchronous Computer

Backus proposed in [1] a new programming style in an at tempt to avoid the main drawbacks of the "von Neumann style":

Inflexible languages, closely coupled to the von Neumann computer model, with the CPU and the

memory connected through a "von Neumann bot t leneck" [1], which imply a word at a time programming style.

Languages with a bulky and rigid f ramework due to the close coupling between states and semantics, but with weak changeable parts.

Formal proofs of programs are beyond current practice because languages lack useful mathemati- cal properties.

The purpose of this section is to show that

functional programs can be efficiently executed on

a CSC.

3.1. FP systems

We shall introduce Functional Programming (FP) Systems [1] by means of an example. To this end a

short description of the relevant FP elements is given.

Programs in an FP system are functions with- out variables. New functions can be built out of a set of primitive functions and the use of functional forms and definitions. All functions map objects into objects and have only one argument.

An FP system is made up of a set of objects, a set of functions, the operation of application, a set of functional forms and a set of definitions:

- ob j ec t s : an object is either an a tom x, a sequence with objects as elements (xl . . . . . Xn>, or

336 D. Tomescu and C. B~leanu / The Circular Synchronous Computer and Funct ional Programming

± ("undef ined") . 0 is the empty sequence, T is the " t r u e " atom, F is the " fa l se" atom;

-appl icat ion: f : x denotes an object, i.e. the result of applying a function f to an object x;

- f u n c t i o n s : a function is either primitive or defined or a functional form.

Examples of primitive functions: - add:

+ : x - x = ( y , z ) & y , z a r e n u m b e r s - * y + z ; ± (1)

(All definitions have the form: p l - - ' e l ; . . . ; p n ~

en;en+l which means: if one o f p i - from left to right - is true, then el, else en+~. Thus (1) must be read: " + applied to an x has the following meaning: if x is a sequence of two numbers, then

y + z, otherwise it is undefined") - transpose:

trans: x - x = ( 0 . . . . . O) ~ 0 ;

X = ( X l . . . . . X n ) - + ( y l . . . . . Y m ) ; ± (2)

where

Xi= (Xil , . . . . Xim ) ,

Y j = ( X l j . . . . . x n j ) , l <_i<_n, l <_j<_m

Examples of functional forms: - composition:

( f o g ) : x = - f : ( g : x ) (3)

- apply to all:

a f : x - x = O - * O ;

X= (Xl . . . . . X n ) ~ ( f : X l . . . . . f : X n ) ; ± (4)

Assuming a linear representation of matrices in row-major order, a function performing the addition of two matrices is defined as:

D e f M A = (act + )o (atrans) otrans (5)

Let two 2 × 3 matrices be ( ( 1 , 2 , 3 ) , ( 4 , 5 , 6 ) )

Table 1

and ( ( 10, 20, 30), (40, 50, 60) ). The function MA when applied to the sequence with the two matrices as its elements is evaluated as shown in Eq. (6), presented in Table 1.

As one can easily notice, a functional program is data independent. No argument is named and, generally, it uses no variables; no order of argument evaluation is implied by the definition of a function. Complex functions (i.e. programs) may be built out of simpler ones by the use of functional forms and the operation of application. Thus FP offers a framework more appropriate for hier- archical construction of programs than procedural languages do. The advantages of the FP style over procedural languages are more striking if one compares the MA definition (5) with a sequence of statements performing the same task. In Fortran, one of the most widely used higher level languages, our two matrices would be added by:

DO 1 1=1 ,2 DO 1 J = l , 3

1 C(I, J) = A(I , J) + B(I, J) (7)

with A, B the initial matrices and C the sum matrix. This program is "word at a t ime", i.e. the result is constructed by repetition of an assignment state- ment. Because it names its arguments (A, B, C) and the number of times its actions are performed is dependent upon the length of arguments, it is far from being general (in contrast with MA(5) which works for any pair of matrices). Of course, one can write a more general matrix add program than (7) by the use of procedures, but procedures imply clumsy declarations and parameter passing conventions which contribute greatly to the complexity of programs made out of such components. For a more detailed discussion on this subject and a full presentation of FP systems, the interested

M A : ( ( (1 ,2 ,3 ) , (4 ,5 ,6 ) ) , ( (10,20,30) , (40,50,60) ) ) = (aa + )o (atrans) o trans : ( ( ( 1,2,3) , (4 ,5 ,6) ) , ( (10,20,30), (40 ,50,60) ) ) = (aa + ): ((crtrans) : (trans : ( ( ( 1,2,3), ( 4 , 5 ,6 ) ) , ( (10,20,30), (40 ,50 ,50) ) )))

(aa + ) : ((atrans) : ( ( ( 1,2,3), (10,20,30) ) , ( ( 4 ,5 ,6 ) , (40 ,50,60) ) )) = (aa + ): ( t rans : ( ( 1,2,3), (10,20,30) ) , t rans : ( ( 4 , 5 ,6 ) , (40 ,50 ,60) ) ) = (aa + ): ( ( ( 1,10) , ( 2 ,20 ) , (3 ,30 ) ) , ( ( 4 , 4 0 ) , ( 5 ,50 ) , (6 ,60 ) ) ) = ( a + : ( ( 1 , 1 0 ) , ( 2 , 2 0 ) , ( 3 , 3 0 ) ) , a + : ( ( 4 , 4 0 ) , ( 5 , 5 0 L ( 6 , 6 0 ) ) ) = ( ( + ( 1 , 1 0 ) , + ( 2 , 2 0 ) , + ( 3 , 3 0 ) ) , ( + ( 4 , 4 0 ) , + ( 5 , 5 0 ) , + ( 6 , 6 0 ) ) ) = ( ( 11,22,33) , (44,55,66) )

(6)

D. Tomescu and C. BMeanu / The Circular Synchronous Computer and Functional Programming 337

reader is referred to [1]. Actually, any procedural language "simulates"

at a higher level of abstraction the actions performed by the CPU of a von Neumann computer. If one is not convinced of this fact, the coding of (7) in any assembly language might be very instruc- tive.

To conclude this short introduction to FP systems, we give some more definitions [1]:

Primitive functions: - equals:

eq: x=--x = (y , z ) & y = z ~ T;

x = ( y , z ) & y ~ z ~ F ; ± (8)

- multiply:

× : x = x = (y , z ) & y , z are numbers ~ y × z; ± (9)

- distribute from left:

distl: x - x = ( y , 0 ) ~ 0 ; x = (y, (zl . . . . . z , ) )

-~( (Y, ZI) . . . . . (Y, Zn) ); ± (10)

Functional forms: - construction:

If1 . . . . . fn]: x m ( f l :x, . . . . f n : x ) (11)

- condition:

( p ~ f ; g ) : x - (p :x) = T -* f : x; ( p : x ) = F ~ g : x ; ± (12)

- insert

/ f : x - - x = ( x l ) ~ X l ;

X= (Xl . . . . . Xn) & n>_2

~ f : ( x l , / f : (x2 . . . . . xn ) ) ; ± (13)

3.2. Execu t ion o f Func t iona l Programs on a C S C

The CSC is a functional language machine, i.e. it executes directly programs expressed in an FP form by making use of a circular pipeline of processing elements (PE's). Suppose that a program containing FP primitive functions and functional forms is injected into a CSC. The behaviour of a PE on detection of function symbols is detailed below. The actions performed by PE's are specified as follows:

SI ~ S o ,

where S I = input string to PEi, S o = output string out of PEi (input to PEi+ l).

Functional forms constitute the prime candi- dates for parallel (i.e. pipeline) evaluation. Apply to all, construction and condition are good examples in this respect. - apply to all (4):

a f : ( x l . . . . . X n ) = ( ~ t f : ( x 2 . . . . . Xn) . f l : x I ) (14)

(14) requires a few comments, fa is the result of the first (sometimes the only) step in the evaluation of f . Remember t h a t f m a y be itself a functional form which in its turn is evaluated step by step. The full stop separates the unevaluated portion from the partially evaluated one. Each PE "moves" the full stop to the left (in fact it applies f to the first element in the actual argument sequence of the functional form and makes the result of the first evaluation step into the last element of the "evalua ted" part). The full stop "disappears" when there are no other symbols to its left except for " ( " . As an example consider:

a x : ( (3 ,4 ) , (5,6) , ( 7 , 8 ) ) ( a x : ( ( 5 , 6 ) , ( 7 , 8 ) ) . 12)

= ( a × : ( ( 7 , 8 ) ) . 1 2 , 3 0 ) = (12,30,56)

Each PE extracts two operands from the input data and as it is performing the multiplication other PE's will fetch their operands etc. It is assumed that the multiplication time is longer than the pipeline clock period. Thus a PE is continuous- ly scanning its input string. Under certain circum- stances a parallel process is initiated: it extracts from the input string an operator (primitive function) and its evaluated argument in order to perform the function evaluation. As the program rotates into the pipeline, the place where the result of the evaluation is to be inserted is detected. If the operation is still under way, the input waggons are delayed into the input buffer and " e m p t y " waggons leave the PE until it is able to emit the result. Since we are not concerned with implementation details, we do not make any assumptions about the actual representation of information in the CSC. An obvious requirement (which is easily met) is, however, that programs rotating in the system be deterministically parsable.

338 D. Tomescu and C. B~leanu / The Circular Synchronous Computer and Functional Programming

With the above conventions, other two functional forms are detailed: - construction (11):

[fl . . . . . f n l : x = (If2 . . . . . f n l : x . f~ :x)

- condition (12):

(p--" f ;g) :x = ( p : x ~ f : x ;g :x)

Example:

(eqoa+ ~ + o a x ; + o a + o [ a x , a + ] ) : ( (3 ,5) , (7, 1))

= ( e q : ( a + : ( ( 3 , 5 ) , ( 7 , 1 ) ) ) - , + : ( a x : ( ( 3 , 5 ) , ( 7 , 1 ) ) ) ;

+ : ( a + : ( [ ax , a + ] : ( (3 ,5) , (7, 1)))))

(eq: ( a + :((7, 1)) .8 )---, + : (c~× : ( (7 , 1 ) ) . 15);

+ : ( a + : ( ( a × : ((7, 1 ) ) . 15), ( a + : ((7, 1)) .8) ) ) )

= ( e q : ( 8 , 8 ) ~ + : (7 , 15);

+ : ( a+ :((7, 15), (8, 8))) )

= (T--* 22; + : ( a + : ( ( 8 , 8 ) ) .22))

= 22

All functions from the example above are evaluated in parallel. This process resembles some- what the "colonel-sergeants" strategy suggested in [6]. When the Boolean function is evaluated, its result selects either another result (if the evaluation of the function to be selected is already complete) or a process (if evaluation is under way). At the same time a result or an evaluation process is discarded.

Some primitive functions may too be evaluated by the use of the pipeline: - transpose (2):

trans : ( ( xl 1 . . . . . Xl m ) . . . . . (Xn 1 . . . . . Xnm ) )

= ( t r a n s : ( (x12 . . . . . x l m ) . . . . .

(Xn2 . . . . . X n m ) ) . (Xl l . . . . . Xnl ) )

- distribute from left (10):

distl : (y, (zl . . . . . zn ) ) = (distl: (y, (z2 . . . . . zn ) ) . (Y, Zl ) )

4 . T h e D a t a F l o w C o n t r o l l e r

The data flow controller (DFC) performs all I /O and memory functions of a CSC, injects programs into the processing bus, performs bookkeeping operations at each rotation, maintains a library of function definitions and translates " source" programs.

Suppose that a source program is introduced into the system via an input device coupled to the DFC. The program may contain new function definitions, functional forms with primitive and library functions, I /O and memory read/write operations. Function definitions are stored into the library and non-primitive function symbols are replaced by their library definitions (if such definitions exist; otherwise the program is aborted). As any definition may contain non-primitive functions, the translation process formulated above continues until the program is a string of functional forms with primitive functions, I /O and memory operations. The program is now ready for execution and it can be injected into the processing bus one waggon per bus clock period. Since the length of a real program expressed as a number of waggons is supposed to be greater than the number of PE's, waggons already processed complete their first rotation before the end of the program injec- tion; they are delayed into a FIFO store until the last waggon of the program is introduced into the processing bus. Then waggons that represent the intermediate form leave head (FIFO) with the purpose of being introduced into the bus for the second rotation. If one of these waggons specify an I /O or memory operation and if that operation can be performed, then it is performed before new waggons are introduced into the bus. A read operation means the extraction of a substring from the program string and storing of that substring into the memory. The DFC repeats the execution cycle outlined above until it detects the program completion.

The execution of a functional program on a CSC with a number of PE's that allows its evaluation in a single rotation is shown in Fig. 4. The program [1] performs the inner product of two vectors:

D e f IP = ( / + )o (a x )o trans

Fig. 5 details the evaluation of the same IP function on a CSC with fewer PE's , where several rotations are needed. The representation of information could be rather different in an actual implementation. In particular, one would choose to represent more than one separator per waggon.

In fact we have only sketched out the structure of a CSC and the way it could be used to evaluate

D. Tomescu and C. ~uleanu / The Circular Synchronous Computer and Functional Programming 339

CLOCK PERIOD I 48

Fig. 4. Execution of the inner product program on a CSC with 7 PE's.

~OTATION NUMBER I,

CLOCK PERIOD1

PE.

PE

PE

I 2 3

--..,

~lllllll Icl~[t I,H ~llllllllltlll~l

5~

* ~ I I~l:l;l~lll~l~l,l,l~l~l.l,l~lllll ,I;1~1,1,181~1111

i i )i [ i i i i i [ H:[.I<i i H~I,I,H~i. ~],i 1 I-ilLB:I<B~I,B~H [ I/~1,1

Fig. 5. Execution of the inner product program on a CSC with 3 PE's.

functional programs. As a final remark, not all the program need rotate in the pipeline. The DFC could select the innermost functions and wait for

their (partial) evaluation before it injects outer functions, or it could make use of some other strategy to dynamically control the length o f the

340 D. Tomescu and C. B~leanu / The Circular Synchronous Computer and Functional Programming

"act ive" (i.e. rotating in the pipeline) part of the program.

5. Conclusions

Acknowledgement

The authors wish to thank the referee for his help- ful comments.

The CSC is a model, but we intend to use it as a starting point in the design of an experimental computing system dedicated to research work in the area of functional programming.

The set of primitive functions is specified in the transition memories (TM) of the processing elements. We think of introducing as "res ident" primitive functions only some write operations

into TM's. Thus the set of primitive functions, i.e. the set of CSC "machine instructions" may be dynamically changed and a CSC becomes a collec- tion of virtual machines that implement a family of FP-like languages. A TM alteration may occur at any moment in the course of the execution of a

program. The addition of any number o f PE's is trans-

parent on all levels of a CSC. Moreover, no competition exists among PE's. A PE has a simpler structure as compared to a conventional processor.

During its execution a program is successively transformed until the processing bus (pipeline) remains empty (the final results are normally stored into the memory). We only mention now this potentially interesting feature not to be found in "classical" computing systems.

While procedural languages are based on von Neumann machines, fundamentally incompatible with distributed computing systems [5], functional programs are suited for parallel evaluation. Other functional forms [2, 6] that could be embedded in an FP system lend themselves to pipeline evaluation.

References

[1] J. Backus, Can Programming Be Liberated from the yon Neumann Style? A Functional Style and Its Algebra of Pro- grams, Comm. ACM 21, Nr. 8 (1978) 613-641.

[2] A. Chiarini, On FP Languages Combining Forms, SIGPLAN Notices 15, Nr. 9 (1980) 25-27.

[3] O.J. Dahl, E.W. Dijkstra, C.A.R. Hoare, Structured Pro- gramming (Academic Press, 1972).

[4] P.H. Enslow Jr. (ed.), Multiprocessors and Parallel Proces- sing (John Wiley, New York, 1974).

[5] M.J. Flynn and J.L. Hennessy, Parallelism and Representa- tion Problems in Distributed Systems, IEEE Trans. Comput. C-29, Nr. 12 (1980) 1080-1086.

[6] D.P. Friedman and D.S. Wise, Aspects of Applicative Pro- gramming for Parallel Processing, IEEE Trans. Comput. C-27, Nr. 4 (1978) 289-296.

[7] A.K. Jones and P. Schwartz, Experience Using Multi- processor Systems - A Status Report, Computing Surveys 12, Nr. 2 (1980) 121-165.

[8] C.V. Ramamoorthy and H.F. Li, Pipeline Architecture, Computing Surveys 9, Nr. 1 (1977) 61-102.

[9] D. Tomescu, C. B~lleanu and A. Botta, The Circular Syn- chronous Bus, Microprocessing and Microprogramming 7, Nr, 5 (1981) 344-350.

Dan Tomescu was born in Bucuresti, Romania, in 1952. He graduated (MS) in Computer Science at Bucuresti Polytechnic Institute. Since 1977 he has been a research engineer at Bucu- resti Polytechnic Institute. His interests include computer architecture, programming languages and operating systems.

Cristian B~tleanu was born in Bucuresti, Romania, in 1952. He graduated (MS) in Electronics Engineering at Bucuresti Poly- technic Institute where he has been a research engineer since 1977. His interests include computer architecture, microprogramming and microprocessor based systems and he is co- author of a book on computer peripherals.

The circular synchronous computer and its relation to functional programming

Documents

Transcript of The circular synchronous computer and its relation to functional programming