IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will...

25

Transcript of IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will...

Page 1: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...
Page 2: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

>

»

"

Reprinted

from

Bbhaviokal

Science

Vuliime 7. no. 2. April, ]')h2

lii ! in I .S.A.

COMPUTERS IN BEHAVIORAL SCIENCEPublication of this department is partially supported by a grant from the National Science

Foundation to the chairman of its editorial committee, Steven G. Vandenberg.

An Experimental Course in Simulationof Cognitive Processes, Edward A.Feigenbaum, School of Business Ad-ministration, University of California,Berkeley.

In the fall of 1960 a graduate seminar incomputer simulation of cognitive processeswas offered by the Graduate School of Busi-ness Administration, University of California,Berkeley. The course, given on an experi-mental basis, was supported by a grant fromthe Committee on Simulation of CognitiveProcesses of the Social Science ResearchCouncil as part of their programof sponsoringthe initiation of educational efforts in thisresearch area.

Thirteen students were registered in thecourse. An equal number of auditors (gradu-ate students and faculty members) attendedregularly. University departments and re-search facilities represented in the groupwere: mathematics, engineering, psychology,physics, biophysics, business administration,linguistics, philosophy, Lawrence RadiationLaboratory, the Berkeley Computer Center,and the Survey Research Center.

The course was designed with three ob-jectives in mind:

first,

to survey past andpresent research in simulation of cognitiveprocesses ( broadly construed to include arti-ficial intelligence research)

;

second, to teachthe technological and conceptual tools neces-sary for independent research in the area;third, to proceed in detail through an exist-ing large simulation to show how the toolsmay be applied to a particular research prob-lem.

Following this plan, about half of thetotal time of the seminar was devoted toformal instruction in the use of the list-language Information Processing Language 5(1961); informal "workshop" instructiondealing with IPL 5 programming problemsgenerated by the students themselves; and a

thorough presentation of the Elementary Per-ceiver and Memorizer (Feigenbaum. 1961),a simulation of human verbal learning be-havior (a discussion which proceeded fromgeneral conceptual features of EPAM to theactual IPL programming which realized thesimulation). Students were also acquaintedwith other list languages:

LISP,

Fortran ListProcessing Language, and COMIT.

The remainder of the course was devotedto: a discussion of theories of cognition, con-struction of models, problems in the collec-tion of empirical data, criteria for reasonablemodels, and verification problems; a reviewof previous and current research studies, in-cluding fairly complete treatments of theNewell-Shaw-Simon General Problem Solver(1960) and Feldman's Binary Choice Simu-lation (1961); and a discussion of the arti-ficial intelligence literature, including Tonge'sLine Balancing Program (1961), Gelernter'sGeometry Theorem Prover (1960), chessplaying programs, associative memories, in-formation retrieval problems, and a numberof pattern recognition and learning machines( for references to this literature, see Minsky,1961 ). A special lecture by visiting Professorof Philosophy Yehoshua Bar-Hillel on me-chanical translation of languages was ar-ranged.

Each student was required to carry througha computer simulation project during the se-mester. Other student projects treated char-acter recognition, list-language programmingproblems, chess end game problems, and thegame of Shogi (Japanese chess).

Computer time for student projects wasmade available by the Berkeley ComputingCenter (30.5 hours on an IBM 704).

The quality of the completed student proj-ects was excellent and afforded further proofthat the techniques and concepts of simula-tion of cognitive processes can be successfullycommunicated in one semester of a graduateseminar.

244

Page 3: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 245

V

The graduate course just described was of-fered in substantially the same form by Pro-fessor Julian Feldman in the spring of 1961.Regular attendance (students and auditors)was about twenty persons. In May, 1961, thecourse was made a regular offering by thefaculty of the School of Business Adminis-tration and will be permanently incorporatedin the Berkeley curriculum as the spring se-mester of B.A. 210, Applications of DigitalComputers to Problems in the Social Sciences(the fall semester of which deals with simu-lation of economic and industrial behavior).

Five reports on research done by graduatestudents in the course follow which are ofparticular interest and significance. The firsttwo papers, by students of psychology, con-cern studies in concept attainment, using asa point-of-departure Bruner, Goodnow, andAustin's book, A Study of Thinking. Thethird paper, by a student of biophysics, sum-marizes an attempt to explore the conse-quences of certain assumptions about neuralprocesses and organization. The last two pa-pers are oriented more toward artificial intel-ligence than simulation of human cognition.The fourth paper, by a student of mathemat-ics, deals with an interesting facet of theproblem of analytic integration by machine.The fifth paper, by a member of the staff ofthe Lawrence Radiation Laboratory, is a re-port of an important attempt to programpattern recognition heuristics for the problemof automatic identification of interesting nu-clear events in 3-dimensional bubble cham-ber photographs. The inability to processthese photographs fast enough is perhaps themost important data-processing problemfaced by high-energy physics today.

REFERENCES

Bruner, J. S.,

Goodnow,

J. J., and Austin, G. A. Astudy of thinking. New York: John Wiley,1956.

Feigenbaum, E. A simulation of verbal learning be-havior. Proc. 1961 Western Joint Computer

Conference,

ACM-AIEE-IRE, 1961, pp. 121--131.

Feldman, J. Simulation of behavior in the binarychoice experiment. Proc. 1961 Western JointComputer Conference, ACM-AIEE-IRE, 1961,pp. 133-144.

Gelernter,

H. Realization of a geometry theoremproving machine. Proc. International Confer-ence on Information Processing,

UNESCO,

1959. London: Butterworths, 1960, pp. 27.5--282.

Information Processing Language 5 manual. Engle-wood Cliffs: Prentice-Hall, 1961.

Minsky, M. Steps toward artificial intelligence.Proc. Institute of Radio Engineers, 1961, 49,pp. 8-30.

Newell, A., Shaw, J.

C,

& Simon, H. A. Report ona general problem solver. Proc. InternationalConference on Information Processing,UNESCO. 1959. London: Butterworths, 1960,pp. 256-264.

Tonge, F. A heuristic program for assembly line bal-ancing. Englcwood Cliffs: Prentice-Hall, 1961.

A Simulation Program for Concept At-tainment by Conservative Focus-ing, Wayne A. Wickelgren, Universityof California, Berkeley.

This is a summary of an information-proc-essing model for conservative focusing ina concept-attainment task (Bruner, Good-now, & Austin, 1956). 5 is presented with anarray of cards differing on n attributes withm-f possible values for each attribute. Someof these cards are exemplars of a concept thatE has in mind, and the rest are not. S's taskis to discover the concept, that is, the basisof classification as exemplar or nonexemplar.The model is restricted to apply only to situa-tions in which the permissible concepts areconjunctive combinations of one particularvalue for each of several relevant attributes(i.e., all cards with red circles). Disjunctiveconcepts (i.e., all cards with a red circle or ablack square) and relational concepts (allcards with a greater number of figures thanborders) are excluded, and 5 is aware of thisrestriction. 5 is first presented with a positiveinstance of the concept called the focus card,and is permitted to obtain further informa-tion about the concept by selecting cards fromthe array of all possible combinations ofvalues for each attribute and being toldwhether the card is an example of the con-cept. The conservative focusing strategy se-lects cards differing on only one attributefrom the focus card and systematically de-termines the relevance or irrelevance of everyattribute.

Page 4: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science246

Both cards and concepts are represented inthe IPL 5 computer by lists of attributes,which in turn have description lists that con-tain both attributes and values. The simulationprogram is divided into a subject programand an experimenter program. The experi-menter program has access to the list of rele-vant attributes, whose description list con-tains the correct values. The subject programhas access to only the lists of possible valuesfor each attribute and the focus card, whosedescription list contains the attributes andvalues of the initial positive instance of theconcept.

The subject program has two major sub-divisions: an intratask problem-solving pro-gram and an intertask learning program. Theproblem-solving program is concerned withthe solution of a given concept-attainmenttask, and the learning program is concernedwith the improvement of performance over aset of tasks.

The only learning feature built into theprogram so far is change of AO, the attributeorder, which determines the order in whichthe subject tests the relevance of the possibleattributes. The learning mechanism promotesby one the position (in the priority sequence)of the attributes that proved relevant in thepreceding task. It should be noted that, givena conservative focusing strategy and the typeof concept-attainment tasks that are beingsimulated at present, the learning mechanismdoes not improve performance if this isjudged by the number of trials necessary to

attain the concept. However, if performanceis measured by accuracy of guessing the con-cept after afixed number of trials, and certainreasonable assumptions are made about anindividual's guessing habits under incompleteinformation, then this learning mechanismwill exploit any persistent biases in the ex-perimenter's selection of relevant attributes.The learning mechanism would be of greatestbenefit to a focus gambling or successive scan-ning strategy, and its inclusion in this con-servative focusing model is a consequence ofthe assumption that the most frequently rele-vant attributes will gradually assume in-creased saliency, no matter what the initialsaliency ordering is.

The problem-solving program has three

logical subdivisions: card selection, informa-tion storage, and information evaluation rela-tive to the concept-attainment task. The basicflow of the simulation is: card selection, con-firmation or infirmation by the experimenter,information storage, and testing to determineif sufficient information has been obtained todiscover the concept. When the answer tothis evaluation is yes, the subject generatesthe concept and exits from the cycle.

The heart of any simulation program forthis type of concept-attainment task is thetype of memory structure it posits. In thedevelopment of the information-storage rou-tine the following assumptions were madeabout human memory, borrowing extensivelyfrom Miller (1956):

1. Immediate memory can contain withoutloss only a highly limited number of chunksof information.

2. The maximum number of chunks thatcan be stored in immediate memory is aparameter varying over individuals,but in thevicinity of seven.

3. The amount of information containedin a chunk does not affect the immediatememory capacity for chunks.

4. Humans solve problems requiring ex-tensive storage of information by the con-struction of chunks richer in information thanthe "atomic" chunks in which the problem ispresented.

5. These richer chunks are names of listsof poorer chunks.

6. There are limits to this abstraction proc-ess.

7. There is a semipermanent memorystructure which is the subject program itselfand is conceived to be a part of the structureof the individual.

8. Task instructions and the initial focuscard temporarily become part of this semi-permanent memory structure and do not needto be channeled through immediate memoryto be recalled.

The primary memory structure combinesan attribute with a symbol IO or LO for ir-relevant or relevant, and this pair of symbolsis the basic chunk which is stored in imme-diate memory, MO. The attribute is in MO,and the list of pairs is the description list ofMO. If the number of attributes tested ex-

Page 5: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 247

ceeds the primary capacity of MO, then thesubject program restructures immediatememory by constructing richer chunks of in-formation. These chunks are new symbolsnaming lists containing two attributes each,and the attributes on these new lists are con-sidered to be chunks representing both theattribute and its relevance or irrelevance tothe concept. Depending upon the immediatememory capacity and the complexity of theproblem, the subject program may operateentirely at the first level of abstraction orinitially at the first and later at the secondlevel of abstraction. If the number of attri-butes exceeds both primary and secondarymemory capacity, then the model assumesthe subject is unable to solve the problem.This is an acknowledged oversimplification.Another oversimplification is that loss of ex-actly one chunk occurs at exactly one point,when the subject program exceeds initialmemory capacity. The model assumes thatthe subject does not restructure memory untilhe has exceeded capacity and forgotten onechunk. The intuitive notion is of a box witha capacity for n blocks such that, when then-\- Ist block is forced in, one block has topop out somewhere. Realizing the informationloss, the subject restructures memory in orderto proceed with the task.

The card selection mechanism is really acard generator, in that it constructs a cardout of attributes and values rather than se-lecting from an array of cards read into thecomputer as input. The card selector firstchooses a new attribute to be tested from UO,the list of untested attributes. UO is orderedat the beginning of the task by AO. Both AOand UO are presumed to be part of the tem-porary hardware of the organism and are notchanneled through MO. A value for the cur-rently tested attribute is selected randomlyfrom the list of possible values, excluding thevalue on the focus card. Then the rest of theuntested attributes and the relevant attri-butes are added with their focus values, andthe irrelevant attributes are added with anyadmissible values. A card is selected on thebasis of information contained in the imme-diate memory structure, the list of untestedattributes, and the focus card.

The evaluation program is quite simple.

After each trial it merely tests to see if allthe attributes have been tested and remem-bered, recycles if they have not, and producesthe concept from immediate memory and thefocus card if they have.

The simulation program has been debuggedfor operation under both primary and second-ary memory structures, and the results arequalitatively very similar to those obtainedwith human conservative focusers. Never-theless it is possible to point up a number offeatures that need modification. First, themodel probably forgets less often than nor-mal human subjects. Second, the model nevercodes an attribute incorrectly or remembersit incorrectly. The program either rememberscorrectly whether an attribute was deter-mined to be relevant, or it does not remem-ber at all. Third, the model never offers anincorrect hypothesis before it possesses com-plete information, and human subjects some-times do this.

REFERENCES

Bruner, J.

S., Goodnow,

J. J., & Austin, G. A. Astudy of thinking. New York: John Wiley,1956.

Miller, G. A. The magical number seven. Psychol.Rev., 1956, 63, 81-97.

A Concept Attainment Program thatSimulates a Simultaneous-Scan-ning Strategy, Max Allen, Univer-sity of California, Berkeley.

INTRODUCTION

Bruner, Goodnow, and Austin (1956) havesuggested that humans employ four basicstrategies in concept attainment: (1) simul-taneous scanning, (2) successive scanning,(3) conservative focusing, and (4) focusgambling. The programs described below aresimulations of the simultaneous scanningstrategy demonstrated in one of their experi-mental situations. The situation consists ofan experimenter E, a subject S, and a finitearray of instances known to both E and S.The instances in this case are 27 cards, eachof which may vary in three dimensions: (1)shape of the figures, (2) color of figures, and(3) number of figures. Each of the three di-

Page 6: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science248

mensions in turn may vary in value. Theshape dimension has three possible values,circle, square, and triangle; while the huedimension has two values, black or white.The number-of-figures dimension may takethe values 1 or 2.

In simultaneous scanning E thinks of aconcept (e.g., all cards with two circles) andS attempts to attain the concept by choosinginstances and asking E whether the instancechosen is an exemplar of the concept. Eachinstance serves as an opportunity for 5 todeduce which concepts are still tenable andwhich have been eliminated. Note that thestrategy requires S to keep all possible con-cepts in memory. As can be readily surmised,this strategy, although quite efficient forsmall arrays of instances which have only afew derived concepts, fails for arrays in whichthe possible number of derived concepts islarge. It does, then, have limited use as an"everyday" working strategy in human con-cept attainment.

SOME ASSUMPTIONS

The following assumptions were made con-cerning the concept-attainment behaviortypical of the simultaneous-scanning situa-tion:

/. Humans do not utilize negative infor-mation as well as positive information. Thetwo types of information may be differenti-ated as follows: A negative instance (i.e., acard which is not an exemplar of the conceptto be attained) is said to transmit negative

information,

while a positive instance (a cardwhich is an exemplar of the concept) trans-mits positive information.

2. Each instance is used as an occasion fordeducing which concepts are still tenable andwhich have been eliminated. This assumptionis subject to some modification as a resultof the implications of assumption 1. Becausesubjects are assumed to use positive informa-tion better than negative, the first version ofthe program regulated the use of instancesaccording to the character of the instance. Ifthe instance was positive, it was utilized todeduce which concepts were still tenable inthe light of this new information and whichwere untenable. If the instance was negative,

no deductions concerning tenable and unten-able concepts were made; the instance wasmerely deleted from list 4 (the nature of list4 will be explained below). The second ver-sion of the program satisfied the unmodifiedassumption; each instance, both positive andnegative, served as an occasion for deducingthe tenable and untenable concepts.

3. All possible concepts arc available (i.e.,they are easily retrievable) in the subject'smemory. This implies that S knows all ofthe concepts derivable from the array of in-stances (the 27 cards) and can recall themat will from memory.

4. Information gain is not maximized byhumans employing the strategy. Ideally, thechoice of the next instance will be determinedby the objective of eliminatingas many still-tenable hypotheses as possible. In "real hu-man behavior," however, this is almost im-possible, as it requires careful and complexanalyses of the results of picking each cardas the next instance to be tested (a test ishere defined as choosing a card, and deter-mining whether it is a positive or a negativeinstance). Without pencil and paper suchanalysis, even when the number of instancesis small and the derived concepts few, is wellbeyond the capabilities of most subjects.

DESCRIPTION OF THE PROGRAMS

The 27 instances in the stimulus array arerepresented in the program (written in IPL5 language) by the regional symbols /lthrough 727. The 60 conjunctive conceptsderived from the dimensions and their valuesare represented in the program as describedlists, the names of the concept lists being theregional symbols Bl through BbO. The mainlist of any concept list contains the attributesand values specifying the concept.

Two subroutines simulating the experi-menter and the subject are operative in theprogram. The only function of the experi-menter routine (EO) is to inform the subjectroutine (SO) as to the positive or negativecharacter of the instance SO designated fortesting. This information is then utilized bySO in determiningwhich concepts are tenableand which are to be eliminated.

Before examining the program more close-

Page 7: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 249

ly, the following extensively used lists mustbe briefly described.

7,0:

A describable list of all 27 instances.Ll: A describable list containing all posi-

tive instances of the concept to beattained. Only the experimenter rou-tine (EO) has access to Ll.

Li: Initiallya describable list of the namesof all 60 concepts. As the programproceeds Li is shortened as untenableconcepts are eliminated. At the end ofthe program Li contains only thename of the concept to be attained.

LA: Initially an empty list. As the pro-gram proceeds, instances generatedfrom the still tenable concept lists areadded to LA if they are not alreadyon it. LA can then contain bothpositive and negative instances. Itserves as a source of instances to beused next by SO for deducing whichconcepts have been eliminated. As theprogram proceeds, the negative in-stances are deleted from IA so thatwhen the program runs to completion,IA contains only the positive in-stances of the concept to be attained.

At the start of the programs an initial in-stance is input to the subject routine SO.SO asks the experimenter routine (EO) ifthe instance is positive or negative. If theinstance is negative SO goes to the instancelist (Z.O) and gets another instance to test.This operation is repeated until a positiveinstance is found.

When a positive instance is found from LO,or if the initial instance is positive, the in-stance is placed on LA. Then the main list ofeach concept list (Bl through B60) issearched for the positive instance. If found,the next instance on the main list of thatparticular concept list is placed on LA if thenew instance is not already on LA. If thepositive instance is not found on a particularconcept list structure, that list structure iserased and the name of that concept is de-leted from Li. When the end of Li is reached,the next instance-to-be-processed from LA isfound and the process is repeated. The result,then, is that Li is always being shortened,

until finally only the name of the concept tobe attained remains. LA. on the other hand,starts empty, grows in length to some maxi-mum value (it contains both positive andnegative instances), and then decreases inlength as the negative instances are elimi-nated. Li is printed out after each deletionof a concept from the concept list; IA isprinted out after each attempted insertionof an instance.

In the first version of the program, nega-tive information did not result in any elimi-nation of untenable concepts. This versionworked well on long concepts (e.g., conceptswhich have a great many positive instanceson their description lists) but poorly on shortconcepts. With short concepts not enoughinformation was supplied to eliminate allconcepts except the correct one. Conse-quently, IA was correct at the end of theprogram, but Li was not, since it containedthe names of other concepts as well as theone to be attained.

In the second version, a negative instancefrom LA serves to eliminate concepts but doesnot result in the generation of any new in-stances to be placed on LA. The concept listsnamed on Li are searched, and if the nega-tive instance is found, that concept list struc-ture is erased and the name of the conceptdeleted from Li. This is done for each con-cept on Li. When the last concept on Li isprocessed, the next instance-to-be-processedis found from LA and the operation of test-ing and eliminating is begun again. The pro-gram terminates when the end of LA isreached, by printing out LA and Li. In bothversions, the concept is considered attainedif (1) LA contains just the positive instancesdefining the concept to be attained, and (2)Li contains only the name of the concept tobe attained.

DISCUSSION

A comparison of the two versions showsthat the behavioral differences exhibited aredue to differing interpretations of the firsttwo assumptions. Both versions of the pro-gram satisfy assumption 1, although eachdoes so in a slightly different sense. The firstversion (PI) did not utilize negative infor-

Page 8: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

250 Computers in Behavioral Science

mation as well as the second. A negative in-stance, upon identification, was merely elimi-nated from IA. Thus the instance did notserve to eliminate any untenable concepts,nor did it help directly in confirming whichconcepts were still tenable. It did have anindirect effect, however, since its eliminationfrom IA aided what may be loosely termedthe "focusing" of SO on the correct concept.In the second version (P2) a negative in-stance resulted in both a deduction as to thestill tenable hypotheses and an elimination ofuntenable hypotheses. The only difference inthe function of negative and positive in-stances is that positive instances are respon-sible for executing that part of the programwhich adds another instance to LA, whereasnegative instances do not execute that partof the program. The adding of to-be-testedinstances to LA is therefore dependent upononly the positive instances.

Assumption 2 was not satisfied by the firstversion of the program, since a strict inter-pretation of assumption 1 resulted in a differ-ential use of positive and negative instancesin deducing tenable concepts and eliminatinguntenable ones. As explained above, LA wascorrect for this version of the program, butLi was not, since the correct concept was notthe only concept name remaining on Li. Al-though the correct short concept was not at-tained by this version, it does have oneadvantage over PI; LA is correct before Li.In 7^2 , the correct concept is attained forboth short and long concepts, but Li is cor-rect before Z.4.

Tn other words, P2 simulates behavior inwhich the concept is learned first and thenthe exemplars of it. Numerous studies in psy-chology have indicated that the opposite isgenerally found in human concept-attainmentbehavior. PI more closely simulated humanbehavior in this respect, since LA is correcteven though SO doesn't "know" which of theremaining concepts on Li is the correct one.

REFERENCE

Bruncr, J.

S.,Goodnow,

J. J,, & Austin, G. A. Astudy of thinking. New York: John Wiley,1956.

A Nerve Net Simulation, Robert Wyman,University of California, Berkeley}

In 1956, Rochester, Holland, Haibt, andDuda published a report concerning the simu-lation of a nerve net to test a psychologicalhypothesis put forth by D. O. Hebb. Thispaper shows how certain different types ofprocessing can be used to simulate moreclosely the factors which are probably crucialto real nerve synaptic decisions.

Hebb argues (1949, 1953) that since allbehavior is not immediate stimulus-response,the brain must have some means of holdingstimuli in readiness until the organism isready to act upon them. He suggests that thismay be accomplished by re-entrant nerve cir-cuits, which are internally reverberating. Oncestimulated they can maintain their activityfor some period of time. He cites as evidencethat a freshly excised section of cortex canmaintain a reverberating activity for over anhour after stimulation.

The second important facet of Hebb'stheory suggests that learning takes place bythe association of several of these nerve cir-cuits. Thus if two nerve circuits are activeat the same time, something must change sothat at a later time the activity of one ofthese circuits can elicit the activity of an-other.

However, there is probably not enoughgenetic material to specify explicitly the in-terconnections of 1010 neurons, each con-nected to perhaps 1,000 others. Thus it ap-pears that most of the connections will occurin a random fashion (Ashby, 1960). One maythen ask whether nerves connected randomlywill indeed form these reverberating circuits.This is the problem investigated by Roches-ter et al. (1956).

They computed a threshold for each nervebased on its state of refractoriness andwhether it was fatigued by having been firedexcessively. Their program then summed upthe magnitudes of the signals sent to the re-

1 This investigation was carried out during thetenure of a fellowship from the National Instituteof Mental Health, United States Public HealthService.

Page 9: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 251

ceptor nerve by all the afferent cells and con-sidered the cell as fired if the sum exceededthe threshold value. They postulated mecha-nisms for forming the cell assemblies fromneighboring cells. They introduced inhibitoryneurons between cell assemblies. Within anassembly, then, the interconnections werelargely excitatory, and between cell assem-blies they were largely inhibitory. It seems,however, that both these hypotheses, intro-duced in order to make the cell assemblies

form,

are ad hoc and bear little resemblanceto the neurophysiological condition.

In real nerves the refractory period lastsapproximately one millisecond, a very shorttime compared to the usual frequency withwhich cortical neurons will fire spontaneously.At an alpha rhythm of 10 per sec, or even 10times this rate, the millisecond refractoryperiod would be too small to influence thethreshold state at the next firing.

As to the use of fatigue as a threshold de-termining factor, Rochester et al. mentionEccles' use of firing the same nerves for amillion volleys (1953). It is very difficult totire real nerves, and at any reasonable rateof firing the nerves could fire ceaselessly with-out the onset of any appreciable fatigue.

Therefore, I believe that neither of thethreshold varying factors in this model isa reasonable simulation of the real situation.Unrealistic too is the assignment of inter-connections among nearby nerves only. Asthis is anatomically untenable, they havenoted that one of the proposed revisions oftheir theory is to rectify this. Their use ofinhibitory neurons is also questionable. Verylittle is known of inhibitory neurons in theassociation cortex (Roberts, 1960), butthroughout the rest of the nervous systeminhibitory neurons are used solely for veryspecific functions (Renshaw cells, etc.). Itdoes not seem warranted to assume that thesecells would also be found in the associationcortex in a randomly jumbled fashion. Morelikely they are absent unless they have aspecific purpose.

The whole idea of a "1-number" synapseis questionable. In such a model, one numberis computed (no matter how many factorsare taken into account) and compared witha 1-number threshold value. More likely there

are a number of different conditions underwhich a nerve will fire.

The reinforcement conditions of the modelare the most tenuous of the assumptions madeby Rochester et al., nothing like them everhaving been seen in real nerves. However, inmy own work I have had to adhere to a sim-ilar hypothesis for lack of a better alterna-tive. With the electron microscopy work ofDe Robertis (1961) (revealing little vacu-oles of ACh at the end buttons) it may soonbe possible to determine whether on repeatedstimulation these vacuoles grow in size ornumber, thus giving a possibility of validat-ing the reinforcement hypothesis. However,certainly the forgetting mechanism is not asimple constant, decreasing function overtime of the effectiveness of the synapses, asstated in the model of Rochester et al.

The membranes of nervous tissue cellstake on, in the resting state, a polarization.A depolarization anywhere on the surface ofthe cell may start a wave of depolarizationwhich spreads throughout the whole surface.The afferent neurons project end buttonsonto the surface of the cell body of the effer-ent nerve. The cell body is globular and mayhave as many as 1,000 end buttons on itssurface. All that is required, probably, forthe whole nerve to fire is that a wave of de-polarization be started at any local regionamong the 1,000 end buttons. Thus in thereal case, many infiring terminals scatteredwidely may have no effect while a fewgrouped together and firing simultaneouslymay depolarize the whole cell. This is per-haps the most interesting part of the follow-ing simulation program—the recognition ofthe significance for switching of the geometri-cal relations among the end buttons.

The simulation program I have written inIPL 5 language for an IBM 704 has threemajor segments. The "D" routines take asinput the number of cells in the net and as-sign essentially random connections amongthe cells, and random positions for each ofthe different endings on the cell body of theefferent neuron. The "S" routines simulatethe synapses, taking as input the name of acell fired into and the list of nerves stimulat-ing it, and determine whether the excitationthresholds have been exceeded or not. The

Page 10: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

252 Computers in Behavioral Science

>l

"7?" region routines mediate between the Dand 5 routines, organizing the parallel struc-ture of the net as constructed by the D rou-tines into the linear serial processing whichthe synapse routines perform. In this sum-mary only the synapse routine will be dis-cussed further.

In the simulation program each end buttonhas attached to it a stimulus value, whichis a measure of the efficiency with which itexcites the soma upon which it is resting.This stimulus value is increased each timethe end button participates in the successfuldepolarization of the postsynaptic fiber.Thus every nerve which successfully stimu-lates another nerve will have a greater prob-ability of successfully stimulating it again.This satisfies the Hebb conditions and, hope-fully, endows the net with learning capabili-ties. Of course these stimulus values are al-lowed to build up only to a point of maximalefficiency. Another routine, the "forgetter,"has an opposite effect. After each cycle it de-creases all the stimulus values. However, allare not decreased equally. If a stimulus valueis high it means that it has recently playeda part in an active nerve circuit. While thatcircuit is still active the stimulus valuesshould slowly build up, and the pattern willbe consequently reinforced. However, as soonas that circuit ceases to act, the values shoulddecay quickly to a mean value, so that thecircuit will not start up again immediately.But the decay must not proceed so far thatthe circuit cannot be reactivated easily. Afterreaching this mean value, decay must proceedmore and more slowly as the stimulus valuegets smaller, so that the circuit is never lostpermanently.

The synapse routine itself considers eachpostsynaptic fiber separately. There are foursubroutines, any one of which can indicate asufficiency of stimulation and result in a fir-ing. The first routine simply counts the num-ber of active afferent nerves and checks ifthis exceeds threshold 1. If not, the next rou-tine tallies the stimulus values of the affer-ent nerves to see if their sum exceeds thresh-old 2. The third routine determines whetherthe end buttons of any three active afferentshave contiguous end buttons on the cell soma.This is considered a sufficient stimulation.

The last routine is a compromise of the pre-vious two. It looks for the two contiguousactive afferents, and if these two exist theirstimulus values are summed and comparedwith a fourth threshold value, which is, ofcourse, lower than threshold 2. It is obviousthat no real synapse operates in this man-ner, but it is the purpose of this study toindicate only prototypically how the kindsof factors operating at a synapse may besimulated on a computer.

At present, the only net which has beenprocessed is a net of 16 neurons, each con-nected to an average of five others. About45,000 IPL instructions are executed in thesetting up of the random net, and each firingcycle takes about 7,000 executions. In a netof this size activity initiated at one or severalpoints either spreads through the net or de-cays to quiescence, but the net is not largeenough to allow for differentiated nerve cir-cuits. The time of processing increases veryrapidly with increase in net size. Therefore,a much larger net is being prepared for simu-lation on the faster IBM 7090.

REFERENCES

Ashby, W. R. Design for a brain. New York: Wiley,1960.

De Robertis, E., Pellegrino de Iraldi, A., Rodriguez,G., & Gomez, C. On the isolation of nerveendings and synaptic vesicles. J. biophys. bio-chem. Cytol., 1961,

9,

229-232.Eccles, J. C. The physiology of nerve cells. Balti-

more: Johns Hopkins, 1957.Hebb, D. O. The organization of behavior. New

York: Wiley, 1949.Hebb, D. O. A textbook of psychology. Philadelphia:

Saunders, 1953.Roberts, E. (Ed.), Inhibition in the nervous system

and GABA, Proceedings of an InternationalConference. New York: Pergamon, 1960.

Rochester, N.,

Holland,

J., Haibt, L. H., and Duda,W. L. Test on a cell assembly theory of theaction of the brain, using a large digital com-puter. IRE Transact. Information Theory,

1956, 2,

80-93.

An IPL 5 Program for Formal IntegrationUsing Tables, Herbert Haver, Uni-versity of California, Berkeley.

INTRODUCTION

The research herein described is concernedwith mechanizing a process commonly per-

Page 11: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 253

formed by humans, viz., given an (indefinite)integral and given a table of integrals, evalu-ate the integral when it is possible to do sousing the table. We shall call this process"integration using tables," or simply "inte-gration" for short.

Clearly integration using tables involvestable lookup. But it is a different kind oftable lookup from what is ordinarily thoughtof as a mathematical table lookup (nor is itthe same as dictionary lookup). One does notinterpolate, for example, to evaluate an in-tegral not found in the table. On the otherhand one table entry stands for a whole classof integrals: the entry jaxdx, for example, insome sense evaluates / sudu as well as manyother integrals. Put another way, an as-sistant with very little mathematical trainingcould be taught to use efficiently a table ofthe sine function, say, with little instruction.Teaching the same assistant to use efficientlya table of integrals would require considerableinstruction. Or again, mechanizing the formerprocess on a digitalcomputer is commonplace;mechanizing the latter is not.

OVERVIEW

Given an integral, even one in the giventable, how do we go about finding it? Thetable has an order which we may call itsreading order; but clearly it would be inef-ficient to compare the given integral witheach table entry successively. Most tables(and we assume this true of our given table)are divided into sections. For example, onesection might contain only rational algebraiccombinations of x and (a-\-bx), anothermight contain only rational algebraic com-binations of .v and £a -f- bx, all of the trig-onometric forms may be grouped together ina third section, and so on. A reasonable pro-cedure would be to leaf through the table (orits table of contents, if it has one) until wecome to a promising section before startingto make detailed comparisons of the givenintegral with the table entries. Even then,glancing through the section might indicatethat integrals of a type similar to the one ofinterest were not at the beginning of thesection, so we would do some more skimmingbefore settling down to detailed comparisons.

In short, at each stage we single out a promis-ing subsection of the preceding section forfurther investigation.

Just how we go about "singling out" thesesubsections is an interesting question in itself.The program about to be described makes noattempt to simulate human behavior in thisrespect. The simulation goes only this far:at each stage the program does single out asubsection.

PROGRAM

Ideally, the program should accept in-tegrals in a form as close to that given to ahuman as the hardware of the machine al-lows. Limitations of time have forced me tocompromise on this point. However, it will beclear that a routine could be written whichwould accept integrals in such a form andprocess them so that they are in the form nowrequired by the program.

As it is, the program is written for an IPL5 machine and requires that an integral beinput as a list structure organized in the fol-lowing way. First the integral sign is re-placed by a machine symbol (7. 99). Thissymbol is set as the name of a list structure,so that the structure, strictly speaking, repre-sents the integrand alone. Now the integrand,as a mathematical expression, has a primaryoperation associated with it, e.g., -f-, — , X,/, sine, A " I°S- That is, the integrand isan expressed sum, or an expressed product,or an expressed logarithm, etc. The operands,of course, may themselves be mathematicalexpressions of arbitrary complexity; but againeach of them has an associated primary opera-tion.

The next step is to symbolize the primaryoperation and its associated operand(s), andto list them. The primary operation goes intothe head of the list and is followed by theoperand(s). If an operand itself is an ex-pressed operation then it should be given alocal name and the sublist to which it refersis given the same structure as the main list;otherwise the operand should be given a re-gional name. Proceeding in this way, we canstructure any integral as a finite tree. As anillustration the integral

/[sec x tan

%

-\- e*']adx ( 1 )

Page 12: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

254 Computers in Behavioral Science

/

where a is an arbitrary constant, could bestructured as shown in Figure 1.

Note that each node is an operation ex-cept for possibly terminal nodes. The "con-stant" operation with operand 5 is just theconstant 5. The "variable" operation has asoperand always the variable of integration,which is omitted for simplicity. Note also wehave restricted ourselves to operations tak-

ing at most two operands so that, as in thepresent case, some preliminary associating is

necessary.

This restriction is not essential.Now we need to construct an index for

our table. It has as many headings as thereare primary operations. For example, if theprimary operation is -)-, it will send us tosection 1 ; if the primary operation is —, itwill send us to section 2, and so on. But why

Figure I. TREE STRUCTURE OF INTEGRAL (I)

Page 13: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 255

i

stop there? Why not have a similar index foreach section? Reflection will show that if weare willing to construct our own table we mayhave the table itself act as its own index.This can be clone rather efficiently on theIPL 5 machine as follows:

The table itself is a list structure, witheach list in the form of a description list.The "attributes" on each of these descriptionlists are the primary operations; the "values"are the names of lists which are either them-selves sublists of the table, i.e., further de-scription lists, or else list structures whichevaluate the integral.

The head of each table list contains thename of a process to be performed if the listitself does not contain the operation in ques-tion as an attribute. For example, the mainlist of the table does not contain the at-tribute "addition." In its head, however, itnames a routine which divides the given in-tegrand into its two summands and proceedsto look up each of these in the table; if itevaluates these, it outputs the sum of theirevaluations as answer. Another such processnamed in a sublist head is one to commutetwo factors if the first one is not found in theappropriate table sublist. An example of anextreme process is the one which "gives up."(A morerefined procedure would name in thehead a list of processes to try, with provisionfor marking these procedures after they havebeen tried.)

With the table structured as indicated, itis easy to see how the program proceeds: itworks down the integral list structure (usinga generator)

;

at each entry it looks for amatch in the table; if a match is found, eitherthe integral is evaluated (value = regionalname) or the next sublist to consult is indi-cated (value = local name); if no match isfound, a process is recommended. The pro-gram terminates when either the integral iscompletely evaluated or one of the recom-mended processes is the "give up" process.

EXAMPLE

The following is a description of how thepresent program evaluated the integral:

First of all, the integral was input as thefollowing list structure (the meaning of eachregional name follows it in parentheses) :

The primary operation in the integrandwas M 1. This has the value 9-1 in the table,so next S 1 was looked up in sublist 9-1.

9-1 G 99X 19-60T 29-36M 19-68 0

S 1 is not in sublist 9-1, hence operationG99 was performed, which commutes theorder of the factors following the symbol M 1and again tries to look up the first factor—this time the symbol X I—in1 —in 9-1. X 1 is in/[sec x tan x -\- e']adx.

L 99 M 1 (multiplication)9-19-2 0

9-1 5 1 (addition)9-39-4 0

9-3 M 1 (multiplication)9-59-6 0

9-5 S3 0 (secant)9-6 T2 0 (tangent)9-4 X 1 (exponentiation)

9-79-8 0

9-7 X 1 (constant)£0 0 (c)

9-8 V 1 0 (variable)9-2 X 1 (constant)

.10 0 (a)The table of integrals list structure is

named T9B and has a main sublist whichbegins as follows:

T9B (7 87X 19-15M 19-1X 19-3

Page 14: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

256 Computers in Behavioral Science

.

sublist 9-1 and has the value 9-60. Sublist9-60 of the table:

G9B9-60 0

has no attributes whatsoever. Attempting tolook up AO in 9-60 recommends processG 98, which factors the constant AO outsidethe integral and attempts to integrate the sec-ond factor; i.e., it tests 9-1 as the integrandand starts looking it up in T 98 again, etc.

The program continued in this fashionuntil the integrand was completely evaluated.

RECOG, A Computer Program for theAutomatic Scanning of BubbleChamber Photographs, Mark W.Ilorovitz, University of California,Berkeley.

INTRODUCTION

Bubble chambers form one of the mainexperimental tools of high energy particlephysics. In these chambers, bubbles areformed along the tracks of ionizing particles.Stereo photographs of the chamber aretaken. Tracks on the film are measured andanalyzed to obtain data on the interactionof elementary particles. A bubble chamberinstallation may produce several millionphotographs per year. Consequently a high-speed data analysis system is necessary toprocess all the information contained in thephotos.

At present, a projected image of the filmis scanned by a technician in order to locatethe tracks to be measured and analyzed. Theobject of the work presented is to demon-strate a computer program which can identifyparticle tracks on bubble chamber photo-graphs. No emphasis is placed on scanningin the most efficient possible way. This wouldbecome a matter for concern at a later stageof development.

PROGRAM DESCRIPTION

One method of automatic scanning con-sists of the following sequence of processes:

1. Digitizing. A moving spot of light scansthe film, producing a pattern of "zeros" atpoints on the film where the optical density

is less than some value and "ones" wherethe density is greater than this value.

2. Track identification. One or more pointson each continuous track section are iden-tified.

3. Track following. Starting at the iden-tified points, the tracks are followed to theirends or until a large gap is encountered.

4. Fitting. The separate track segmentsare fitted together and the parameters of acurve that fits the tracks are computed.

5. Selection. Select those tracks which areto be processed further. Methods for accom-plishing items (3) and (4) have been de-scribed by Innes (1960).

The core of the RECOG program is amethod of identifying line segments and thusaccomplishing process 2. The program con-stitutes only one part of a complete system.However, RECOG can be tested by permit-ting it to operate over the whole picturearea. This should produce a picture contain-ing all "clear" track segments.

The RECOG program also includes a num-ber of other functions which are requiredin a complete system.

The program is divided into a FORTRAN/SAP input-output section and an IPL 5processing section. The input conversion rou-tine prepares an IPL list containing a de-scription of the bit pattern. For each con-tinuous series of "ones" on a scan line, thelist contains the X co-ordinates of the leftmost "one" together with the number of"continuous ones." The output routine con-verts the lists prepared by the RECOG pro-gram to a bit pattern which can be printedfor visual inspection. The output of thisroutine conforms to the format of the originalinput data tape.

The main RECOG programThe first function performed by this IPL

5 program is to build a list structure. Theinput data are classified according to thenumber of continuous ones, and the linenumber on which they occur. The basic proc-ess by which track recognition is performedin RECOG is a search for line segments.The picture is divided into sections. A lineof each section is scanned for black points

Page 15: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 257

t

(ones). Each such point found initiates asearch for line segments.

A test is made for a vertical line segmentpassing through the initiation point. An ac-ceptable line segment contains more than acertain number of black points in a verticalzone five points wide, centered about the in-itiation point. The zone width and picturesection size are chosen so that all trackswith radius of curvature greater than somevalue and ionization density above some levelare accepted.

In order to apply this test to nonverticaltracks, the picture section is tilted. This isaccomplished by incrementing the X co-ordi-nates on different lines by varying amounts.In this way, tracks passing through the in-itiation point at different angles, are in turnbrought to the vertical position.

In order to apply the method describedto highly curved tracks, we must reduce thelength of the track segments to be tested.This reduces the discrimination of the proc-ess against noise. To overcome this difficulty,scanning for highly curved tracks is carriedout after high momentum tracks and "iden-tifiable noise" have been removed from theinput data. Also, a crude form of track fol-lowing is used to increase the selectivity ofthe process. Fiducial marks are detected bysearching for straight line segments of speci-fied length in certain well-localized areas ofthe picture.

BACKGROUND NOISE

Isolated dark spots can be ignored; theydo not interfere appreciably with track de-tection. "Fixed features" (thermocouples,flare spots, etc.) can be eliminated by a tablelookup procedure. Electron spirals often in-terfere with track detection. It is not feasibleto detect electron spirals by reconstructingtheir tracks. A combination of the followingconditions will identify many electron spirals:(1 ) number of "ones"per area is greater thansome level, (2) number of 01 pairs per areais greater than some level, and (3) the pres-ence of several neighboring short track seg-ments.

OVER-ALL PROCESS

It is realized that certain refinements haveto be added at a later stage. For instance, acareful search for short tracks emanatingfrom a vertex may be necessary. Note thatdata points are removed from the "picture"only after they have been positively iden-tified. Any "doubtful" tracks are left in thepicture. The amount of data left in the pic-ture after one complete scanning sequencewill be much reduced. Therefore, repetitionof the scanning process should permit theresolution of some of the "doubtful" casesleft by the first scan.

In the system described, all functions be-yond digitizing the data are carried out by acomputer program. It is apparent that manyof these functions could be carried out moreefficiently by special purpose equipment. TheRECOG program could be useful for simu-lating the operation of such equipment andin exploring the feasibility of alternativemethods of scanning.

PROGRAM DEVELOPMENT

Program development will follow the stagesoutlined below:1. Flow charting of program2. Simulation of operations using digitized

pictures3. Coding of the line search process4. Testing of the line search process5. Completionof output section of RECOG6. Completion of the output-input program7. Add "search for highly curved tracks"8. Add "search for fiducial marks"Items 1,2, and 3 have been carried out.

A more detailed description of the programis found in memo No. 282, Lawrence Radia-tion Laboratory, University of California,Berkeley.

REFERENCE

Innes, D. Filter—a topological pattern-separationcomputer

program.

Proc. Eastern Joint Com-puter

Conference,

1960, 25-57.( Manuscript received November 4, 1961)

Page 16: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

258 Computers in Behavioral Science

.

The Procrustes Program: Producing Di-rect Rotation to Test a Hypothe-sized Factor Structure, John R.Hurley and Raymond B. Cat tell, Uni-versity of Illinois.

SETTING OF THE PROBLEMRotation, the final and sometimes fatal

step in a factor analytic experiment, mayserve one of three purposes:

1. To make factors coincide with somepreconceived criteria set by the experimenter,e.g., as in "criterion rotation" (Eysenck,1950), which aligns a factor with the direc-tion of difference between two groups.

2. To discover new structure in new databy following one of the two major principlesfor independent, unique resolution; (a) simplestructure (Tucker, 1953; Thurstone, 1954)or (b) contactor (proportional profiles) rota-tion (Cattell, in press). No computer programhas yet been constructed for the latter, sincecertain difficulties remain to be overcome(Cattell and Cattell, 1955), but a variety ofpropositions and programs (Barlow andBurt, 1954; Mosier, 1939; Neuhaus andW'rigley, 1954) exist for analytically findingsimple structure. Some, unfortunately, arerestricted to an orthogonal outcome, incom-patible with the best approximation to simplestructure; but others, such as oblimax (Saun-ders, 1961), minimax (Carroll, 1953) andmaxplane (Cattell and Muerle, 1960) arefree to reach either oblique or orthogonalsimple structure, according to the needs ofthe data.

3. To test hypotheses, by seeing how wellthe factor patterns obtainable from a givenset of data will fit a previously stated hypoth-esis as to what the factor patterns shouldbe. (The hypothesis about the number offactors will normally already have beenchecked before this stage is reached.)

This paper is concerned with the theory,mathematics, and computer implementationof the third purpose. The authors had al-ready essentially solved the problem whenAhmavaara's independent solution appeared(1957) and had deposited the program withthe University of Illinois Digital ComputerLibrary in February of 1957. Publication was

delayed for two reasons: (1) to accumulatesubstantive experience with the method, and(2) to introduce such statistical checks forjudging the success of the achieved hypothe-sis-matching as would stop abuses throughfalse claims of hypothesis confirmation. Ex-cept for these developments, and the provisionof a working computer program, our solu-tion and Ahmavaara's are essentially thesame.

REQUIREMENTS FOR HYPOTHESESTESTING

In principle it was proposed that hypoth-eses would require that the investigatorwrite out a rotated factor matrix which con-tains the number of factors and variables inthe experiment, as well as the loading patternby which he defines the hypothesis, or hy-potheses. A calculation would then be madetransforming (rotating) the experimentallyobtained unrotated (centroid or principalcomponents) matrix to the best possible(least squares) fit to the hypothesized ma-trix. A statistical test would then be made todetermine the likelihood, by chance, of a fitbeing reached of the obtained degree of good-ness.

The statement of the hypothesis could bebased either on reasoning outside factor ana-lytic evidence, e.g., on the principles ofphysics, as in the Cattell-Dickman ball prob-lem (in press), or on earlier exploratory factoranalyses in the given domain, if a fairlyprecise pattern can be inferred from theircentral tendency. A delicate theoretical issuearises in this stating of the hypothesis. Forthe researcher can either be allowed to benefitfrom the knowledge of the communalities al-ready obtained in the first stages of the newexperiment and state his loadings within thisframework, i.e., so that they sum to the givencommunalities for each variable, or he canstate them without this guidance. There arearguments for each method, and we wouldnot conclude definitely that one is better.But in any use of this method it should beclearly stated which method is being used,and the statistical tests should take cognizanceof the larger degrees of freedom in the lattercase.

Page 17: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

259Computers in Behavioral Science

It might be objected that the latter, "un-guided" procedure is in any case unworkablesince one might inadvertently, or out ofnaivete, write down a series of loadingswhich produce a communality greater thanone for a given variable. This overlooks thefact that our program was specifically de-signed from the beginning for the usual andindispensable conditions holding for obliquefactors (including orthogonality as a specialcase). Thus, providing the investigator knowsenough not to posit a correlation greater thanunity (for any variable with any factor), anyvalues are in principle possible for the seriesof correlations of a given variable (a seriesof, say, 0.96's would simply require that thefactors be highly correlated). On the otherhand, it would be possible to hypothesize cor-relations for a series of variables over a seriesof factors that are statistically, internallyimpossible. But here it may be said that aninvestigator should be free to state absurd,internally inconsistent hypotheses, such as agreat number of arm-chair theories are boundto be. However, considerations of such pos-sible restrictions do bring out what is oftenoverlooked, i.e., that a statement of a factorhypothesis with oblique conditions impliesand demands definition both of (a) the load-ing or correlation pattern for each factor,and (b) the correlations among factors (while(c), the number of

factors,

must be consideredindependently tested beforehand).

Although it has so far not been usual torequire a statement of the expected correla-tions between factors in the hypothesis, ourProcrustes method enables such hypothesesto be tested, though with priority to the pat-tern. For, having stated (a) the hypotheticalfactor pattern, Vlv , and (b) the matrix ofintercorrelation between factors Cf , the in-vestigator uses the program first to reach thebest possible fit to the pattern; and only thendoes he examine the factor intercorrelationswhich achieve this, against his hypothesizedCf. Some might prefer a compromise solu-tion which balances goodness of fit on Vr „and Cf simultaneously. The other possiblechoice mentioned above, namely, of permittingor not permitting the person stating the hy-pothesis to adjust his loadings (transformedto orthogonality) by bringing the sum of their

squares to the discovered communalities ofthe actual experiment, has not been includedin our design, because this also seems to bethrowing away part of the independence ofthe hypothetical statement.

DESIGN OF THE PROGRAM

A number of approaches have been madeto the problem of transforming one factormatrix to closest agreement with another, ora derivative of another. There are variousadvantagesand disadvantagesin assumptions,ease of computation, etc., in each of thoseproposed by Ahmavaara (1957), Cattell andCattell (1955), and Tucker (1951). Our pro-cedure takes the orthogonal, unrotated factormatrix, V„, and sets out the hypothesis withwhich it is to be brought into maximumpossible congruence, stated in the form ofcorrelations of the variables with the refer-ence vectors, i.e., by the matrix Vrs . Thischoice of the reference vector structure fromamong the six main possible dimension pro-files (Cattell, in press) has certain advan-tages, including the fact that the simple struc-ture of the hypothesis will show up as a lotof zero entries in the matrix Vrs . The problemis now to find the transformation matrix Oin the following equation that will transformV„ to VrS', where Vrt* is the closest possible fitto Vn.Premultiplying by V,' we get:

(2)

and thence,

(3)

What we actually possess as the source ofour solution, however, is not F,,,y but a state-ment of the hypothesis presented in F,.,. Thetransformation matrix (which we will callA,,.) that would give Vrs* is not, as such trans-formation matrices usually are, normalizedas to columns. We may obtain it from:

(4)

and then normalize its columns, upon whichit will become Ax and give the Vrs- solution re-quired in equation (1), i.e., the approxima-

V 0l a> =V„: (1)

V 'V I — V 'V, ■' 0 ' O'v' V 0 ' 18

0 = (V 0'V0)-1V0'Vrs:

} .= (vao-'vsv,..,

Page 18: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

260 Computers in Behavioral Science

tion, in a least squares sense, to the hypoth-esized reference vector matrix.

The computations programmed for the solu-tion consist of:

1. Transpositionof V„ to get V'„2. Matrixmultiplication V„'V„i. Inversion of the matrix from 24. Two matrix multiplications: (a) by VJ

and (b) by Vrs,

5. Normalization of the columns of theresulting A,.

6. Matrix multiplication of F„ by the ld.thus obtained.

This program has been used over the lastfive years in our Laboratory of PersonalityAssessment and Group Behavior, on prob-lems varying in size from 8 variables by 3factors to 100 variables by 20 factors, andon data varying from objective personalitymeasures, through sociological and physio-logical measures to high-reliability physicalmeasurements (Cattell and Dickman, inpress; Cattell and

Sullivan,

this issue). Thetotal time required has proved to be prac-ticable—from about five minutes for thesmaller of the above-mentioned problems toabout 40 minutes for the larger one. No un-toward problems have been encountered inthe running of the program.

DISCUSSION

It will be recalled that the Greek hero The-seus encountered in his wanderings a charac-ter called Procrustes, whose beds fitted alltravelers. Those who were too short for hisbeds he cruelly stretched and those who weretoo tall he cut down to size. If an investiga-tor is satisfied—as many are—to announcethat the fit is good, from visual judgment,then this program lends itself to the brutalfeat of making almost any data fit almostany hypothesis! Because of this possible pro-clivity we gave the code name Procrustes tothis program, for thisreference describes whatit does, for better or worse.

To publish widely a program which per-mits any tyro, by pressing a computer button,to seem to verify any theory, is as irrespon-sible as loosing opium on the open market.

That computers and their programs can bea real danger to proper values and directionsin research must already be evident in sev-eral fields. For example, there has begun, andwill doubtless continue, a spate of quickjournal publications, apparently unhinderedby editors, of "orthogonal simple structure"solutions by Kaiser's Varimax program(1958). This neatly worked out program wascertainly intended by its author for the un-common case in which simple structure andorthogonality are compatible, or as a pre-liminary to final oblique adjustment by visualrotation, or his own effective oblique Binor-mamin program (Kaiser and Dickman. 1959).But the extreme facility of application ofVarimax, plus a lack of forewarnings in itspublication, has led to a great number of manhours of research ending in nothing—orworse—when the factor analytically unquali-fied psychologist or editor accepts, as uniquelymeaningful, the machine-given answer. Theobviously still vaster possibilities of abuse inProcrustes constituted a major reason, as ex-plained above, for our abstaining from earlypublication. Fortunately, statistical signifi-cance tests of goodness of fit have been devel-oped by forced marches in the meantime,which make it possible to supply an antidotealong with the drug, or a means of detectionalong with a means of crime. Though stilladmittedly imperfect, the tests of Barlow andBurt (1954), Cattell and Baggaley (1960),Kaiser (1958), and Tucker (1953) give abasis for evaluation. When a Procrustean fitthat might satisfy the eye^and such as hasbeen quoted as verification of a favored hy-pothesis—can be challenged by such statis-tical significance tests, the danger of abuseof Procrustes has passed,

Actually, we have found this program tohave many uses besides that of direct hypoth-esis testing. For example, when a blindvisual rotation seems to be approaching final-ity, but the hyperplane is still broad andfuzzy, we have written a Procrustes matrixin which these putative hyperplane variablesare all exactly zero and have then let Pro-crustes "tidy up" by pulling them all intoa truly narrow hyperplane which can thenbe tested for significance. This has close re-

Page 19: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 261

semblance in intention to Thurstone's singleplane rotation method (1954) and has thesame limitation, namely, that one may in-clude a variable which, strictly speaking, doesnot belong. Another helpful role for Procrusteshas been to shift the centroid matrix as soonas it is obtained, in a field where the mainstructure has long been known (based on a se-ries of initial blind rotations), to an approxi-mately correct position. From that, the finaldetermination of the best simple structurefor the sample, by Rotoplot (Cattell andFoster, in press), can be made quickly, witheconomy of effort. In new fields where onehas already found the apparently maximizedsimple structure by blind rotation, yet anotheruse has been the writing out of some almostrandom alternatives putatively aiming at bet-ter simple structure, to see if such highermaxima actually exist anywhere. Finally, oneshould not despise the convenience which Pro-crustes offers in cases where someone has pro-ceeded from a centroid to a final rotatedposition and has either failed to publish, orhas even lost the transformation matrix, A„.On more than one occasion we have been gladof Procrustes' out-of-character benevolence inretrieving a lost lambda.References to experi-ence with Procrustes primarily in hypothesistesting will be found in the recent monographsurveying fifteen years of consecutive studiesof personality factors (Hundleby, Pawlik, andCattell, in press).

Although, as stated above, devices nowexist for testing the statistical significance ofthe goodness of fit of an achieved to an hy-pothesized factor matrix, this problem still ur-gently needs much greater attention frommathematical statisticians than it has re-ceived. Burt has tackled the matter para-metrically, through correlation of the loadingpatterns of putatively matching factors, andthis approach is currently being followed upby Kaiser, Tucker, and Wrigley. Cattell(1949b) and Cattell and Baggaley (1960)have pursued the nonparametric approach intheir salient variable similarity index, 5. Theaptness of the results of this index has beenexamined in a number of studies (Cattell,1957; Hundleby, Pawlik, and Cattell, inpress). But these are the merest beginnings.Still needed is a thorough logical and sta-

tistical treatment of the goodness of fit prob-lem which includes (a) the number of factorsexpected, (b) the correlation among thefactors, (c) some adjustment to communali-ties in the particular sample, and (d) astatement simultaneously of factor structureand factor pattern (Cattell, in press).

REFERENCES

Ahmavaara, Y. On the unified factor theory ofmind. Ann. Acad. Sci. Fenn., Ser. B. 106, Hel-sinki, Finland, 1957.

Barlow, J. A., & Burt, C. L. The identification offactors from different experiments. Brit. J.Stat. Psychol., 1954, 7, 52-53.

Carroll,

J. B. An analytic solution for approximat-ing simple structure in factor analysis. Psy-chometrika, 1953, 18, 23-28.

Cattell, R. B. A note on factor invariance and theidentification of factors. Brit. J. Stat. Psychol.,

1949,

2, 134-138.Cattell, R. B. Personality and motivation structure

and measurement. New York: Harcourt,Brace, World, 1957.

Cattell,

R. B. The basis of recognition and inter-pretation of factors. Educ. Psychol. Measml.,in press.

Cattell, R. 8., & Baggaley, A. R. The salient variablesimilarity index for factor matching. Brit. J.Stat. Psychol., 1960, 13, 33-46.

Cattell, R.

8.,

& Cattell, A. K. S. Factor rotationsfor proportional profiles: analytical solutionand an example. Brit. J. Stat. Psychol., 1955,

8,

83-92.Cattell, R.

8.,

& Dickman, K. A dynamic modelof physical influences demonstrating the ne-cessity of oblique simple structure. Psychol.Bull., in press.

Cattell,

R. 8., & Muerle, J. L. The maxplane pro-gram for factor rotation to oblique simplestructure. Educ. Psychol. Measml., 1960, 20,569-590.

Cattell, R. 8., & Sullivan, W. The scientific natureof factors: a demonstration by cups of coffee.Behav. Sci.,

1962,

7, 184-193.Hundleby, J., Pawlik, X., &

Cattell,

R. B. The first21 personality dimensions in objective behav-ioral measurements. Univ. Chicago Psycho-metric. Monogs., in press.

Kaiser, H. F. The varimax criterion for analyticrotation in factor analysis. Psychometrika,1958,

3,

187-200.

Kaiser,

H.

F.,

& Dickman, K. W. Analytical deter-mination of common factors. Amer. Psychol.,1959,

14,

425.

Mosier,

C. I. Determining a simple structure whenloadings for certain tests are known. Psycho-metrika,

1939, 4,

149-162.Xeuhaus, J. 0., & Wrigley, C. The quartimax meth-

od. Brit. J. Stat. Psychol., 1954, 7, 81-91.

Page 20: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

262 Computers in Behavioral Science

*

Description: This program computes means,standard deviations, etests between samplemeans, and F-tests of the sample variances fortwo independent samples of subjects (equal orunequal N's) on up to 999 variables. Inputdata (one- to three-digit scores) is on punchedcards with one or more cards per subject andup to 15 variables on each card. Printed output(two-decimal-place accuracy) gives A^'s of bothgroups, and, for each variable, means, standarddeviations, etests, and an F-test. The programis self-restoring so that different pairs of groupscan be processed sequentially without reloadingthe program.

Computer: IBM 7070 with 10K core storage,floating-point arithmetic, on-line card readerand printer. Program Language: Modified Four-Tape Autocorder. Comment: The data cardformat is identical with G. Lotto's IBM 7070CORR2 intercorrelation program. Consequently,groups of data cards prepared for CORR2 canbe processed by the present program.

IBM 7070 Program for Normalized VarimaxFactor Rotation, A. W. Bendig, University ofPittsburgh, Pittsburgh 13, Pa. (CPA 65)

Description: The program rotates the factorloadings of F variables on F factors (2 O V O130, 2 O F O 20) to orthogonal simple struc-ture, using Kaiser's normalized varimax cri-terion. Fixed or floating-point input can be fromtapes or punched cards. The optional card out-put of the principal axis factor analysis program(Bendig, 1962) may be used as input. Pairs offactors are rotated by an iterative process(Kaiser, 1959) until all pairs are stabilized with-in either a programmed tolerance value (angleof rotation of less than ten minutes) or a valueread in from the control card. The normalizedvarimax criterion value is printed at the end ofeach cycle of F(F-l)/2 pairs of rotations and

Computer: IBM 7070 with 10K core storage,floating-point arithmetic, on-line card reader andprinter, optional tape units. Programming Lan-guage: modified Four-Tape Autocoder.

References :

Bendig. A. W. IBM 7070 program for prin-cipal axis factor analysis. Behav. Sci.,1962, 7, 126-127.

Kaiser, H. F. Computer program for varimaxrotation in factor analysis. Educ. psychol.Measmt., 1959,

19,

413-420.

A Bendix G-15D Program for Numerical Math-ematical Analysis, Donald L. Whitley, TheDow Chemical Company, Engineering& Con-struction Services Division, Freeport, Texas.(CPA 65)

Description: The program computes the fol-lowing (singularly or together) :

1. Correlates 1 to 10 sets of (x,;. y0 into apower series polynomial, i.e., curve fitting.

Method: Power Series (Sherwood & Reed,1939, p. 185).

2. Computes the following at selected valuesof

x,:

(a) Value of the polynomial.

fAMethod: Power series polynomial (Sherwood

& Reed. 1939. p. 185).

(b) First and second derivatives,

m - (MMethod: First and second derivatives of New-

ton's Forward Interpolation Formula (Scarbo-rough, 1955, p. 55; Sokolnikoff &

Redheffer,

1958. p. 698).

Saunders,

D. R. The rationale for an "Oblimax"method of transformation in factor analysis.Psychometrika, 1961, 26, 317-324.

Thurstone, L. L. An analytical method for simplestructure. Psychometrika, 1954, 19, 173-182.

Tucker, L. R. A method for synthesis of factoranalysis studies. Personnel Research Section

Report, No. 984, Washington,

D.C,

Person-nel Research Branch, Adjutant General Of-

fice,

Dept. of the Army, 1951.Tucker, L. R. An objective determination of simple

structure in factor analysis. Amer. Psycholo-gist, 1953, 8, 448.

(Manuscript received November 13, 1961)

COMPUTER PROGRAM ABSTRACTSIBM 7070 t-Test Program for Independent

Groups, A. W. Bendig, University of Pitts-burgh, Pittsburgh 13, Pa. (CPA 64)y\ _„,...:_.*.:__ . t>t_ ■_ _ _

.._

. . . .

j

_ . . _ _the rotated factor loadings and transformationmatrix at the end of the final cycle. The programmay be modified to use the straight varimax(unnormalized) criterion.

Page 21: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

263Computers in Behavioral Science

0.90.21.141.70.59.710

Computer: Bendix G-15D, 2K drum. 1 tape.Program language: Intercom 101. Runningtime: Typical time requirements shown in Table1. Time is minutes of computing time per in-

crement of x or set of a, where a t are power se-ries polynomial coefficients. A manual for thisprogram will be published by the American Insti-tute of Chemical Engineers. 25 West 45th Street.New York 36. N.Y., if sufficient interest de-velops. Reprinted from Chem. Eng. Prog., 1961,57, 9. 74-76 where it was program 080.

References:

Sherwood. T. X.. & Reed. C E. Appliedmathematics in chemical engineering.New York: McGraw-Hill. 1939.

Scarborough. J. B. Numerical mathematicalanalysis. Baltimore: Johns Hopkins,1955.

Sokolnikoff,

I. S., &

Redheffer,

R. M. Mathe-matics of physics and modern engineer-ing. New York:

McGraw-Hill,

1958.

An IBM 704 FORTRAN Program for anEquation of Third Degree Polynomial withThree Variables, Donald L. Whitley, The DowChemical Company, Engineering & Construc-tion Services Division, Freeport, Texas. (CPA66)

Time. Minutes

3.70.61.90.96.51.13.41.7

Description: The program computes the equa-tion of a third degree polynomial with threevariables, i.e.. the equation of a "family ofcurves." A third degree polynomial with threevariables may be written as

(1)

There are 16 unknown equation coefficients, i.e..b„. fej, 6

SO

, and & 38. These are calculated by set-ting up 16 simultaneous equations from the 16sets of Xj, yx . and

z,

data input.

These equations are solved by the method ofelimination.

Computer: Bendix G-15D. 2K drum, 1 tape.Program Language: Intercom 101. RunningTime: Equation coefficients of curve, i.e., bO, by

No. of Sets f (x) (£yi/\ (f-/\ J Inherent J J(xti y,) Input a,- JK ' \ dxt J \ dx? ) a Error a a

y. = [b„+bOzi) + b.,(zOftb.i (z iO\+ (*,") [b

lO

+bn (z,) + br, (z,)'- + & 13 (z,-)M+XX IX,, + 62 1 (*i) + b-22 (Z;.)- + &23 (Zi> :M+ (x4)3 [630 + &31 (Z,-) + &32 (Z,.)- + 633 (3 4)3]

y, = [b„+ 6j (z, ) +6_ (z,)2 +&3 (z, ) :l J +■■■■b-.v, (z_) 2 4- 633 Oi)3 ] (*l)

S

C 1)

y2 = [{>„ +&! (Z_) +6_X > 2 +63X>X +....^.(ze-' + (*X (2)

yu. = [&„+&, (Z 16)+62 (Z 16)2 + 63(216)3]+ . . . .632 (zic)2 + &38 OuOO Oie)8 06i

(c) Value of definite integral, (c) Double integration

b b b

j ytdx j {y-Oidx — f A)-,dx

" a aMethod: Simpson's Rule (Scarborough, 1955, Method: Simpson's Rule (Scarborough. 1955

p. 132 ).i ie i

( d ) Inherent error in Simpson's Rule, £

'

„ , , __ , . r „. , T 3. Interpolation (between tabular values)Method: Fourth derivative of Stirling s In-

terpolation Formula (Scarborough. 1955. p. Method: Power series (Sherwood & Reed178). 1939, p. 185).

TABLE 1Running Time

b b b

Page 22: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

264 Computers in Behavioral Science

i

.-

—&30, and &

S3

: 22)/_ minutes. Value of yi afterequation coefficients are known: 5 seconds.

A manual for this program will be publishedby the American Institute of Chemical En-gineers, 25 West 45th Street, New York 36,N.Y., if sufficient interest develops. Reprintedfrom Chem. Eng. Prog. 1961. 57. 10. 102 whereit was program 079.

A Bendix G-15D Program to Compute ArcLength, Donald L. Whitley, The Dow Chem-ical Company, Engineering & ConstructionServices Division, Freeport, Texas. (CPA 67)

Description: The program computes the finitelength of curvilinear lines. A power series poly-nomial is computed from sets of input data(xh y.3) taken from the curvilinear line. Thispolynomial is used to calculate closely spaced(xj, y,) points. The distance between adjacentpoints is computed as the hypotenuse, with xand y as two sides of a right triangle. The sumof the incremental hypotenuses is the arc length.An approximate length is

S (arc length)

When the number of division points is increasedindefinitely while the lengths of the individualsegments tend to zero, the exact length is ob-tained.

Computer: Bendix G-15I), 2K drum, 1 tape.Program Language: Intercom 101. RunningTime: Time required depends primarily on thenumber of sets of input data (x,, y, ) taken fromthe curvilinear line.

Number of (x;, y,)Sets of Input Data

TimeRequired

1.1 minutes9.7 minutes

410

Time required for computing hypotenuse aftercorrelation coefficients of line have been com-puted: 11 seconds.

A manual for this program will be publishedby the American Institute of Chemical En-gineers. 25 West 45th Street. New York 36,N.Y., if sufficient interest develops. Reprintedfrom Chem. Eng. Prog., 1961. 57,

9,

74 whereit was program 078.

BIMED, a Series of Statistical Programs forthe IBM 709, Part 1, W. J. Dixon, Universityof California at Los .Angeles (CPA 68-93)

Under the direction of W. J. Dixon a seriesof statistical programs has been developed atthe Division of Biostatistics of the Departmentof Preventive Medicine and Health of theSchool of Medicine, University of California atLos Angeles. The work was supported by Con-tract SA-43-ph-3039 of the National Chemo-therapy Service Center of the National CancerInstitute. These abstracts were prepared bySteven G. Vandenbcrg.

BIMED 1. Life table and survival rate. (CPA68)

The program computes a selected period sur-vival rate, its standarderror, and effective samplesize. It also computes selected survival ratesand their standard errors with combinations ofseveral cohorts. If desired, the program cancompute survival rates and their standard errorsfor successively reduced periods.

Reference:

Cutler. S. J.. & Ederer, F. Maximum utiliza-tion of the life table method in analyzingsurvival. J. Citron. Diseases, 1958, 8, 699--712.

BIMED 2. Component analysis (CPA 69)

This program computes, for up to 25 vari-ables: (1) correlations, (2) eigenvalues, in-cluding cumulative proportion of total variance,(3) eigenvectors (principal components ofstandardized data), and (4) rank order of eachstandardized case, ordered by size of each prin-cipal component separately.

BIMED 3. Regression on primary principalcomponents (CPA 70)

This program is an extension of BIMED 2and computes, in addition, ( 5 ) coefficients ofregression using up to 20 orthogonal componentsas independent variables for each dependentvariable, (6) reduction in sum of squares ofresiduals due to the use of these orthogonalcomponents, (7) coefficients of the regressionequation when only first, first two. first three,and all components are used as independent vari-ables, with each component expressed in termsof standardized data.

Reference:Kendall, M. G. A course in multivariate anal-

ysis. New York:

Hafner,

1957.

= + (AyaY2

S (arc length)n

= lim v 0 (Ax,,) 2 -4- (Ay,,) 2 , ra-»oo, a=l

Page 23: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 265

i

BIMED 4. Discriminant analysis for up to 5

groups

of different sizes and 25 variables(CPA 71)

This program computes ( 1 ) means, ( 2 ) crossproducts of deviations from means. (3) disper-sion matrix. (4) inverse of dispersion matrix.(5) coefficients and constants. (6) evaluation ofclassification function for each individual, and(7) classification matrix. Maximum group sizeaccepted is 150.Reference:

Kendall. M. G. A course in multivariate anal-ysis. New York: Hafner, 1957.

EIMED 5. Discriminant analysis for two

groups

(CPA 72)

This program computes a linear function whichdiscriminates best between two groups forup to 25 variables for each individual. The cri-terion of "best" is that the difference betweenthe mean indices for the two groups divided bya pooled standard deviation of the indices ismaximized. The analysis may be repeated forsubsets of the variables. The program permitslogarithmic transformations (base 10) of anyvariables. The groups may differ in size but notexceed 150.

Reference:

Hoel. P. G. Introduction to matliematic sta-tistics. New York: Wiley. 1954.

BIMED 6. Multiple regression and correlationanalysis no. 1 (CPA 73)

Up to 30 variables and 5000 cases can beprocessed. Any variable can be named the de-pendent variable. The maximum number of in-dependent variables which can be deleted fromthe equation at one time is 28. The program canmake a log 10. square root or square transforma-tion of any or all variables.Reference:

Dixon. W. J., & Massey, F J. Introduction tostatistical analysis. New York: McGraw-Hill. 1957, pp. 275-278.

BIMED 7. Multiple regression and correlationanalysis no. 2 (CPA 74)

This program performs multiple regressionand correlation analyses by selecting differentcombinations of subsamples, such as diagnosticgroups, types of treatment, etc.. for up to 30variables and 28 subsamples. Total sample size

allowed is 32,000. the largest subsample. 5000.Transformations can be made. Up to 28 inde-pendent variables can be deleted.Reference:

Dixon, W. J.. & Massey, F. J. Introduction tostatistical analysis. New York: McGraw-Hill, 1957. pp. 275-278.

BIMED 8. Polynomial regressions (CPA 75)

This program computes polynomial regres-sions Y= A + BtX + 8..X- .. . BkXk , withk specified by the Program Card but not ex-ceeding 10. for up to 500 observations.

BIMED 9. Step-wise multiple regression (CPA76)

This program computes a multiple linear re-gression equation for m sets of data containingn O 59 independent variables and one inde-pendent variable. A weighting factor IF, can begiven to each set of observations. Regressioncoefficients for k O n variables significant at aspecified significance level are obtained, as wellas a number of intermediate regression equa-tions, adding one variable at a time. The vari-able added is each time the one making thegreatest improvement in the goodness of fit. Alog 10. square root, or square transformationcan be made for any or all variables. Variableswhich are approximate linear combinations ofother independent variables are not entered intothe regression to prevent degeneracy.

BIMED 10. FORTRAN Subroutine: multipleregression and correlation for 29 variables(CPA 77)

This subroutine computes: (1) inverse of cor-relation matrix. (2) multiple correlation coeffi-cient. (3) standard error of estimate. (4) inter-cept (Bv value), (5) joint regression coefficients.(6) standard deviations of regression coefficients.(7) evalues.Reference:

Ostle. B. Statistics in research. lowa City:lowa State College Press, 1954. Ch. 8.

BIMED 11. Analysis of variance no. 1 (CPA78)

This program computes analysis of variancefor a complete factorial design for up to 8variables with any number of replicates, andthe breakdown of sums of squares into orthog-onal components for up to 4 variables, with

Page 24: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science266

1

t

tables of interaction means bordered by columnand row means. The number of categories orlevels of any one variable may not exceed 999,and the data in one replicate cannot exceed20,000, i.e.. 7,,, L,. L :!, . . .L„ O 20,000 whereL ( is level /' of the n variables and n O 8. Trans-formations of data can be performed.

Reference:

Scheffe, H, The analysis of variance. NewYork: Wiley, 1959.

BIMED 12. Analysis of variance no. 2 (CPA79)

This program is the same as BIMED 11, butn 14. The number of interactions computedmay be limited by an entry on the ProgramCard.

Reference:

Scheffe. H. The analysis of variance. NewYork: Wiley, 1959.

BIMED 13. Analysis of covariance (CPA 80)

Two to six variables can be processed with asmany as 8 covariates for up to 999 replicatesand up to 999 levels, but 7,,, L... . . . L„ O 20.000where L, is level i of variables n O 6. Trans-formations of data can be performed.

Reference:

Scheffe. H. The analysis of variance. NewYork: Wiley, 1959.

BIMED 14. General linear hypothesis (CPA81)

This program analyzes the statistical sig-nificance of independent variables for those ex-perimental designs that can be formulated interms of a General Linear Hypothesis model.Two general types of variables can be analyzed:classification or analysis of variance variablesand regression variables or covariates.

The model may include p analysis of variancevariables and 9 covariates where p 60. qSj 60and p + q O 60. The program can analyze un-balanced analysis of variance and covariance de-signs according to up to 9 stated hypotheses.Data transformations are available.

BIMED 15. Data screening no 1 (CPA 82)

This program checks data for later statisticalanalyses by finding (he form of their distribu-tions, frequencies, and possible outliers for v

observation on p variables where n Sj 2000.p 0 50 and np O 20.000. Data transformationsare available. Extreme values are printed out.

BIMED 16. Data screening no. 2 (CPA 83)

This program computes (1 ) frequency dis-tributions, (2) means, variances, standard de-viations, and standard errors of means for aspecified set of p variables conditioned on onevariable in the set, and (3) correlations for thosevariables, where p O 30 and n O 650. Trans-formations can be performed on any variable.

BIMED 17. Factor analysis (CPA 84)

This program computes for up to 80 variablesof any size sample ( 1 ) means and standard de-viations, (2) correlations. (3) eigenvalues in-cluding cumulative proportions of total variance,(4) eigenvectors, (5) factor matrix, (6) factorcheck matrix, (7) varimax rotated factor matrix,(8) original and successive variances, (9) checkon communalities. Data transformations areavailable for any variable.

Rotation can be performed for factors witheigenvalues (1) greater than unity, (2) greaterthan zero when ones are used in the diagonalsof the correlation matrix, or (3) greater thanzero when the squared multiple correlation ofeach variable with the remaining is inserted inthe diagonal.

References:

Harman, H. H. Factor analysis. In A. Ralstonand H. W. If (Eds.), Mathematical meth-ods for digital computers. New York:Wiley, 1959, Ch. 18.

Kaiser, H. F. The varimax criterion for ana-lytic rotation in factor analysis. Psycho-metrika, 1958, 2i, 187-200.

BIMED 18. FORTRAN subroutine for Vari-max rotation "ROTATE" (CPA 85)

1. Main program Dimension statement:DIMENSION A (100. 75), TV (50),H (100), HD (100). HN (100)

2. Calling statement:CALL ROTATE (A. N, L, TV, H,HN, HD, NY)

3. The subroutine provides the rotated matrixA (N, L). the number of iteration cyclesNY. original communalities H (1) to H(N), final communalities HN (1) to HN(N), a column vector of the original vari-ances in TV (2). and successive variancesin each iteration cycle to TV (NY). The

Page 25: IN - Stacksyc559tt7923/yc559tt7923.pdf · borders)are excluded,and5is awareof this ... will exploitany persistent biases in the ex-perimenter's selection of relevant attributes. ...

Computers in Behavioral Science 267

i

ROTATE subroutine requires 1216 storagelocations and takes approximately y± (NL)seconds running time.

Reference:

Kaiser. H. F. The varimax criterion for ana-lytic rotation in factor analysis. Psycho-metrika, 1958. 23. 187-200.

BIMED 19. Varimax rotation (CPA 86)

This program performs the "varimax" orthog-onal rotation of a factor matrix for up to 100variables and 75 factors. The maximum numberof iteration cycles is 49.

Reference:

Kaiser, H. F. The varimax criterion for ana-lytic rotation in factor analysis. Psycho-metrika, 1958, 23, 187-200.

BIMED 20. Analysis of covariance no. 2 (CPA37)

This program performs an analysis of covari-ance with a single variable of classification andunequal sample sizes. Subsamples and combina-tions of subsamples can be selected from thedata. An analysis-of-variance table can be com-puted for each subsample. or one table for thecombined groups. The maximum number ofgroups is 1000. and 999 replicates are permitted.The total sample size of all replicates cannotexceed 32.000. The maximum number of sub-samples is 500. The maximum number of sub-samples in a combination is 23.

BIMED 21. Periodic regression (CPA 88)

This program performs periodic or harmonic-regression analysis by

The regressions are computed up to the har-monic n specified (n 6) or if a good lit isachieved, the program stops after fewer terms.

An analvsis-of-variable table is printed outafter each harmonic; an equal number of up to20 replicates are required in each time intervalfor this output. For analysis of covariance, onlya simple covariate is allowed at one time. Themaximum number of time intervals is 400.

Reference:

Bliss. C I. Periodic regression in biology andclimatology. Connecticut Agric. Exper.Sta. Bull. 615, 1958.

BIMED 22. Cross-tabulationno. 1 (CPA 89)

This program makes cross-tabulations andcomputes correlations for data which may beused in later statistical analyses to find the formof their distributions, outliers, etc. Positive in-tegers on p variables for n cases are acceptedfor n O 1000. p O 100. and up O 16.000. Max-imum and minimum values of each variable tobe cross-tabulated are specified with these re-strictions

2 max. y — mm. x O 342 max. y — mm. y O 99

where x and y are the absissa and ordinate. Upto 50 values falling outside the specified rangefor each variable are printed out under theheading "Values not entered." If there are morethan 50 their number is printed. The maximumfrequency for each cross-tabulation cell is 999.Greater frequencies are counted and listed undera heading "Overflow frequencies" (with row andcolumn indices).

BIMED 23. Cross-tabulation no. 2 (CPA 90)

This is the same as BIMED

22,

but no cor-rections, up O

20,000,

no "Values not entered"or "Overflow frequencies" are allowed. Missingvalues are excluded from cross-tabulation andidentified in the output. Transformations of thedata can be performed, and 9 new variablescreated up to p + q 100.

BIMED 24. Data screening no. 3 (CPA 91)

This is an expanded version of BIMED 15which accepts n O 2000 observations for p O 50variables up to up 20.000. One of severaltransformations can be made of any variable.Simple regression and correlation coefficients arecomputed for paired variables using the originalor transformed values but before the values arestandardized. Extreme values in the data and inresiduals from the computed regressions areidentified and printed out. Extreme deviatesfrom regressions with coefficients specified in th-.'regression card are identified and printed out.This allows f. i. screening for departure from"normal" or standard relationships.

nV — a.All \ a > cos ('0 +bt sin (ic)]

i = 1