AI Magazine Volume 12 Number 2 (1991) (©...

18
Articles 34 AI MAGAZINE AI Magazine Volume 12 Number 2 (1991) (© AAAI)

Transcript of AI Magazine Volume 12 Number 2 (1991) (©...

Page 1: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

Articles

34 AI MAGAZINE

AI Magazine Volume 12 Number 2 (1991) (© AAAI)

Page 2: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

Why is there somuch excitementabout neural net-works today, andhow is this related toresearch in AI? Muchhas been said, in thepopular press, asthough these wereconflicting activities.This seems exceed-ingly strange to mebecause both areparts of the sameenterprise. Whatcaused this miscon-ception?

The symbol-orient-ed community in AI

has brought this riftupon itself by sup-porting models inresearch that are fartoo rigid and special-ized. This focus onwell-defined prob-lems produced manysuccessful applications, no matter that theunderlying systems were too inflexible tofunction well outside the domains for whichthey were designed. (It seems to me that thisoccurred because of the researchers’ excessiveconcern with logical consistency and prov-ability. Ultimately, this concern would be aproper one but not in the subject’s currentstate of immaturity.) Thus, contemporarysymbolic AI systems are now too constrainedto be able to deal with exceptions to rules orto exploit fuzzy, approximate, or heuristicfragments of knowledge. Partly in reaction tothis constraint, the connectionist movementinitially tried to develop more flexible sys-tems but soon came to be imprisoned in itsown peculiar ideology—trying to build learn-ing systems endowed with as little architec-tural structure as possible, hoping to createmachines that could serve all masters equallywell. The trouble with this attempt is thateven a seemingly neutral architecture stillembodies an implicit assumption aboutwhich things are presumed to be similar.

The field called AI includes many differentaspirations. Some researchers simply wantmachines to do the various sorts of thingsthat people call intelligent. Others hope to

understand whatenables people to dosuch things. Stillother researcherswant to simplifyprogramming. Whycan’t we build, onceand for all, machinesthat grow andimprove themselvesby learning fromexperience? Whycan’t we simplyexplain what wewant, and then letour machines doexperiments or readsome books or go toschool—the sorts ofthings that peopledo. Our machinestoday do no suchthings: Connection-ist networks learn abit but show fewsigns of becomingsmart; symbolic sys-

tems are shrewd from the start but don’t yetshow any common sense. How strange thatour most advanced systems can compete withhuman specialists yet are unable to do manythings that seem easy to children. I suggestthat this stems from the nature of what we call specialties—because the very act ofnaming a specialty amounts to celebratingthe discovery of some model of some aspectof reality, which is useful despite being isolat-ed from most of our other concerns. Thesemodels have rules that reliably work—as longas we stay in their special domains. But whenwe return to the commonsense world, werarely find rules that precisely apply. Instead,we must know how to adapt each fragmentof knowledge to particular contexts and cir-cumstances, and we must expect to needmore and different kinds of knowledge as ourconcerns broaden. Inside such simple “toy”domains, a rule might seem to be general, butwhenever we broaden these domains, we findmore and more exceptions, and the earlyadvantage of context-free rules then mutatesinto strong limitations.

AI research must now move from its tradi-tional focus on particular schemes. There isno one best way to represent knowledge or to

…the timehas come to build systems out of diversecomponents…

Articles

SUMMER 1991 35

Engineering and scientific education conditionus to expect everything, including intelligence, tohave a simple, compact explanation. Accordingly,when people new to AI ask “What’s AI all about,”they seem to expect an answer that defines AI interms of a few basic mathematical laws.

Today, some researchers who seek a simple,compact explanation hope that systems modeledon neural nets or some other connectionist ideawill quickly overtake more traditional systemsbased on symbol manipulation. Others believethat symbol manipulation, with a history thatgoes back millennia, remains the only viableapproach.

Marvin Minsky subscribes to neither of theseextremist views. Instead, he argues that AI mustuse many approaches. AI is not like circuit theoryand electromagnetism. There is nothing wonder-fully unifying like Kirchhoff’s laws are to circuittheory or Maxwell’s equations are to electromag-netism. Instead of looking for a “right way,” thetime has come to build systems out of diversecomponents, some connectionist and some sym-bolic, each with its own diverse justification.

—Patrick Winston

0738-4602/91/$4.00 ©1991 AAAI

Page 3: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

heuristic programs based on top-down analy-sis have found many successful applicationsin technical, specialized areas. This progress islargely the result of the maturation of manytechniques for representing knowledge. How-ever, the same techniques have seen less suc-cess when applied to commonsense problemsolving. Why can we build robots that com-pete with highly trained workers to assembleintricate machinery in factories but not robotsthat can help with ordinary housework? It isbecause the conditions in factories are con-strained, and the objects and activities ofeveryday life are too endlessly varied to bedescribed by precise, logical definitions anddeductions. Commonsense reality is too dis-orderly to represent in terms of universallyvalid axioms. To deal with such variety andnovelty, we need more flexible styles ofthought, such as those we see in human com-monsense reasoning, which is based more onanalogies and approximations than on pre-cise formal procedures. Nonetheless, top-down procedures have important advantagesin being able to perform efficient, systematicsearch procedures, manipulate and rearrangethe elements of complex situations, andsupervise the management of intricatelyinteracting subgoals—all functions that seembeyond the capabilities of connectionist sys-tems with weak architectures.

Shortsighted critics have always complainedthat progress in top-down symbolic AI researchis slowing. In one way, this slowing is natural:In the early phases of any field, it becomesever harder to make important new advancesas we put the easier problems behind us; inaddition, new workers must face a squaredchallenge because there is so much more tolearn. However, the slowdown of progress insymbolic AI is not just a matter of laziness.Those top-down systems are inherently poorat solving problems that involve large num-

solve problems, and the limitations of currentmachine intelligence largely stem from seek-ing unified theories or trying to repair thedeficiencies of theoretically neat but concep-tually impoverished ideological positions.Our purely numeric connectionist networksare inherently deficient in abilities to reasonwell; our purely symbolic logical systems areinherently deficient in abilities to representthe all-important heuristic connectionsbetween things—the uncertain, approximate,and analogical links that we need for makingnew hypotheses. The versatility that we needcan be found only in larger-scale architec-tures that can exploit and manage the advan-tages of several types of representations at thesame time. Then, each can be used to over-come the deficiencies of the others. Toaccomplish this task, each formally neat typeof knowledge representation or inferencemust be complemented with some scruffierkind of machinery that can embody theheuristic connections between the knowledgeitself and what we hope to do with it.

Top Down versus Bottom UpAlthough different workers have diverse goals,all AI researchers seek to make machines thatsolve problems. One popular way to pursuethis quest is to start with a top-down strategy:Begin at the level of commonsense psycholo-gy, and try to imagine processes that couldplay a certain game, solve a certain kind ofpuzzle, or recognize a certain kind of object.If this task can’t be done in a single step, thenbreak things down into simpler parts untilyou can actually embody them in hardwareor software.

This basically reductionist technique is typ-ical of the approach to AI called heuristic pro-gramming. These techniques have developedproductively for several decades, and today,

Articles

36 AI MAGAZINE

Figure 1. Conflict between theoretical extremes.

Page 4: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

bers of weaker kinds of interactions such asoccur in many areas of pattern recognitionand knowledge retrieval. Hence, there hasbeen a mounting clamor for finding another,new, more flexible approach, which is onereason for the recent popular turn towardconnectionist models.

The bottom-up approach goes the oppositeway. We begin with simpler elements—theymight be small computer programs, elemen-tary logical principles, or simplified models ofwhat brain cells do—and then move upwardin complexity by finding ways to intercon-nect these units to produce larger-scale phe-nomena. The currently popular form of this,the connectionist neural network approach,developed more sporadically than heuristicprogramming. This development was sporadicin part because heuristic programming devel-oped so rapidly in the 1960s that connection-ist networks were swiftly outclassed. Also, thenetworks needed computation and memoryresources that were too prodigious for thatperiod. Now that faster computers are avail-able, bottom-up connectionist research hasshown considerable promise in mimickingsome of what we admire in the behavior oflower animals, particularly in the areas of pat-tern recognition, automatic optimization,clustering, and knowledge retrieval. However,their performances have been far weaker inprecisely the areas in which symbolic systemshave successfully mimicked much of what weadmire in high-level human thinking, forexample, in goal-based reasoning, parsing,and causal analysis. These weakly structuredconnectionist networks cannot deal with thesorts of tree search explorations and complex,composite knowledge structures required forparsing, recursion, complex scene analysis, orother sorts of problems that involve function-al parallelism. It is an amusing paradox thatconnectionists frequently boast about themassive parallelism of their computations, yetthe homogeneity and interconnectedness ofthese structures make them virtually unableto do more than one thing at a time—at least,at levels above that of their basic associativefunction. This is essentially because they lackthe architecture needed to maintain adequateshort-term memories.

Thus, the current systems of both types showserious limitations. The top-down systems arehandicapped by inflexible mechanisms fordealing with very numerous, albeit very weak,interactions, while the bottom-up systems arecrippled by inflexible architectures and orga-nizational limitations. Neither type of systemhas been developed to be able to exploit mul-tiple, diverse varieties of knowledge.

Which approach is best to pursue? Thisquestion itself is simply wrong. Each hasvirtues and deficiencies, and we need inte-grated systems that can exploit the advantagesof both. In favor of the top-down side, AI

research has told us a little—but only a little—about how to solve problems by using meth-ods that resemble reasoning. If we under-stood more about such processes, perhaps wecould more easily work down toward findingout how brain cells do such things. In favorof the bottom-up approach, the brain sci-ences have told us something—but againonly a little—about the workings of braincells and their connections. More research inthis area might help us discover how theactivities of brain cell networks support ourhigher-level processes. However, right nowwe’re caught in the middle; neither purelyconnectionist nor purely symbolic systemsseem able to support the sorts of intellectualperformances we take for granted even inyoung children. This article aims at under-standing why both types of AI systems havedeveloped to become so inflexible. I’ll arguethat the solution lies somewhere betweenthese two extremes, and our problem will beto find out how to build a suitable bridge. We already have plenty of ideas at eitherextreme. On the connectionist side, we canextend our efforts to design neural networksthat can learn various ways to representknowledge. On the symbolic side, we canextend our research on knowledge represen-tations to the designing of systems that canmore effectively exploit the knowledge thusrepresented. However, above all, we currentlyneed more research on how to combine bothtypes of ideas.

Representation and Retrieval:Structure and Function

In order that a machine may learn, it mustrepresent what it will learn. The knowledgemust be embodied in some form of mecha-nism, data structure, or representation. AI

researchers have devised many ways toembody this knowledge, for example, in theforms of rule-based systems, frames withdefault assignments, predicate calculus, pro-cedural representations, associative databases,semantic networks, object-oriented data-structures, conceptual dependency, actionscripts, neural networks, and naturallanguage.

In the 1960s and 1970s, students frequent-ly asked, “Which kind of representation isbest,” and I usually replied that we’d need

Articles

SUMMER 1991 37

Page 5: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

organizational systems that can support alarge enough diversity of different schemesyet enable them to work together to exploitone another’s abilities.

To solve typical real-world commonsenseproblems, a mind must have at least severaldifferent kinds of knowledge. First, we needto represent goals: What is the problem to besolved. Then, the system must also possessadequate knowledge about the domain orcontext in which this problem occurs. Finally,the system must know what kinds of reason-ing are applicable in this area. Superimposedon all this knowledge, our systems must havemanagement schemes that can operate differ-ent representations and procedures in paral-lel, so that when any particular methodbreaks down or gets stuck, the system canquickly shift to analogous operations in otherrealms that might be able to continue thework. For example, when you hear a naturallanguage expression such as “Mary gave Jackthe book,” you will produce, albeit uncon-sciously, many different kinds of thoughts(Minsky [1987], section 29.2), that is, mentalactivities in such different realms as a visualrepresentation of the scene; postural and tac-tile representations of the experience; a scriptsequence for a typical act of giving; represen-tations of the participants’ roles; representa-tions of their social motivations; defaultassumptions about Jack, Mary, and the book;and other assumptions about past and futureexpectations.

How could a brain possibly coordinate theuse of such different kinds of processes andrepresentations? The conjecture is that ourbrains construct and maintain them in differ-ent brain agencies. (The corresponding neuralstructures need not, of course, be entirely sep-arate in their spatial extents inside the brain.)However, it is not enough to maintain sepa-rate processes inside separate agencies; wealso need additional mechanisms to enableeach of them to support the activities of theothers or, at least, to provide alternative oper-ations in case of failures. Chapters 19 through23 of The Society of Mind (Minsky 1987)sketch some ideas about how the representa-tions in different agencies could be coordinat-ed. These sections introduce the concepts ofthe polyneme, a hypothetical neuronal mecha-nism for activating corresponding slots in dif-ferent representations; the microneme, acontext-representing mechanism that similar-ly biases all the agencies to activate knowl-edge related to the current situation and goal;and the paranome, yet another mechanismthat can simultaneously apply correspondingprocesses or operations to the short-term

more research before answering. But now Iwould give a different reply: “To solve reallyhard problems, we’ll have to use several dif-ferent representations.” This is because eachparticular kind of data structure has its ownvirtues and deficiencies, and none by itselfseems adequate for all the different functionsinvolved with what we call common sense.Each has domains of competence and effi-ciency, so that one might work where anoth-er fails. Furthermore, if we only rely on anysingle, unified scheme, then we’ll have noway to recover from failure. As suggested insection 6.9 of The Society of Mind (Minsky1987), “The secret of what something meanslies in how it connects to other things weknow. That’s why it’s almost always wrong toseek the real meaning of anything. A thingwith just one meaning has scarcely anymeaning at all.”

To get around these limitations, we mustdevelop systems that combine the expressive-ness and procedural versatility of symbolicsystems with the fuzziness and adaptivenessof connectionist representations. Why hasthere been so little work on synthesizingthese techniques? I suspect that it is becauseboth of these AI communities suffer from acommon cultural-philosophical disposition:They would like to explain intelligence in theimage of what was successful in physics—byminimizing the amount and variety of itsassumptions. But this seems to be a wrongideal. We should take our cue from biologyrather than physics because what we callthinking does not directly emerge from a fewfundamental principles of wave-functionsymmetry and exclusion rules. Mental activi-ties are not the sort of unitary or elementaryphenomenon that can be described by a fewmathematical operations on logical axioms.Instead, the functions performed by the brainare the products of the work of thousands ofdifferent, specialized subsystems, the intricateproduct of hundreds of millions of years ofbiological evolution. We cannot hope tounderstand such an organization by emulat-ing the techniques of those particle physicistswho search for the simplest possible unifyingconceptions. Constructing a mind is simply adifferent kind of problem—how to synthesize

Articles

38 AI MAGAZINE

We should take our cuefrom biology rather thanphysics…

Page 6: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

memory agents—called pronomes—of thesevarious agencies.

It is impossible to briefly summarize howall these mechanisms are imagined to work,but section 29.3 of The Society of Mind(Minsky 1987) gives some of the flavor of thetheory. What controls those paranomes? Isuspect that in human minds, this controlcomes from the mutual exploitation of along-range planning agency (whose scriptsare influenced by various strong goals andideals; this agency resembles the Freudiansuperego and is based on early imprinting),another supervisory agency capable of usingsemiformal inferences and natural languagereformulations, and a Freudian-like censor-ship agency that incorporates massive recordsof previous failures of various sorts.

Relevance and SimilarityProblem solvers must find relevant data. Howdoes the human mind retrieve what it needsfrom among so many millions of knowledgeitems? Different AI systems have attempted touse a variety of different methods for this.Some assign keywords, attributes, or descrip-tors to each item and then locate data by fea-ture-matching or using more sophisticatedassociative database methods. Others usegraph matching or analogical case-basedadaptation. Still others try to find relevantinformation by threading their way throughsystematic, usually hierarchical classificationsof knowledge—sometimes called ontologies. Tome, all such ideas seem deficient because it isnot enough to classify items of informationsimply in terms of the features or structures ofthe items themselves: We rarely use a repre-sentation in an intentional vacuum, but wealways have goals—and two objects mightseem similar for one purpose but different for

Articles

SUMMER 1991 39

another purpose. Consequently, we must alsoaccount for the functional aspects of what weknow, and therefore, we must classify things(and ideas) according to what they can beused for or which goals they can help usachieve. Two armchairs of identical shapemight seem equally comfortable as objects forsitting in, but these same chairs might seemvery different for other purposes, for example,if they differ much in weight, fragility, cost,or appearance. The further a feature or differ-ence lies from the surface of the chosen repre-sentation, the harder it will be to respond to,exploit, or adapt to, which is why the choiceof representation is so important. In eachfunctional context, we need to represent par-ticularly well the heuristic connectionsbetween each object’s internal features andrelationships and the possible functions ofthat object. That is, we must be able to easilyrelate the structural features of each object’srepresentation to how this object mightbehave in regard to achieving our currentgoals (see sections 12.4, 12.5, 12.12, and12.13, Minsky [1987]).

New problems, by definition, are differentfrom those we have already encountered; so,we cannot always depend on using records ofpast experience. However, to do better thanrandom search, we have to exploit what waslearned from the past, no matter that it mightnot perfectly match. Which records shouldwe retrieve as likely to be the most relevant?

Explanations of relevance in traditionaltheories abound with synonyms for nearnessand similarity. If a certain item gives badresults, it makes sense to try something differ-ent. However, when something we try turnsout to be good, then a similar one might bebetter. We see this idea in myriad forms, andwhenever we solve problems, we find our-selves using metrical metaphors: We’re “get-

Figures 2A and 2B. Armchair

Page 7: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

contend that it is now time to move to anoth-er stage of research: Although each such con-cept or method might have merit in certaindomains, none of them seem powerfulenough alone to make our machines moreintelligent. It is time to stop arguing overwhich type of pattern-classification techniqueis best because that depends on our contextand goal. Instead, we should work at a higherlevel of organization and discover how tobuild managerial systems to exploit the dif-ferent virtues and evade the different limita-tions of each of these ways of comparingthings. Different types of problems and repre-sentations may require different concepts ofsimilarity. Within each realm of discourse,some representation will make certain prob-lems and concepts appear more closely relat-ed than others. To make matters worse, evenwithin the same problem domain, we mightneed different notions of similarity for descrip-tions of problems and goals, descriptions ofknowledge about the subject domain, anddescriptions of procedures to be used.

For small domains, we can try to apply allour reasoning methods to all our knowledgeand test for satisfactory solutions. However,this approach becomes impractical when thesearch becomes too huge—in both symbolicand connectionist systems. To constrain theextent of mindless search, we must incorporateadditional kinds of knowledge, embodyingexpertise about problem solving itself and,particularly, about managing the resources thatmight be available. The spatial metaphor helpsus think about such issues by providing uswith a superficial unification: If we envisionproblem solving as searching for solutions ina spacelike realm, then it is tempting to analo-gize between the ideas of similarity and near-ness, to think about similar things as being insome sense near or close to one another.

But near in what sense? To a mathemati-cian, the most obvious idea would be toimagine the objects under comparison to belike points in some abstract space; then eachrepresentation of this space would induce (orreflect) some sort of topologylike structure orrelationship among the possible objects beingrepresented. Thus, the languages of many sci-ences, not merely those of AI and psychology,are replete with attempts to portray familiesof concepts in terms of various sorts of spacesequipped with various measures of similarity.If, for example, you represent things in termsof (allegedly independent) properties, then itseems natural to try to assign magnitudes toeach and then to sum the squares of their dif-ferences—in effect, representing these objectsas vectors in Euclidean space. This approach

ting close” or “on the right track,” using wordsthat express proximity. But what do we meanby “close” or “near?” Decades of research ondifferent forms of this question have producedtheories and procedures for use in signal pro-cessing, pattern recognition, induction, classi-fication, clustering, generalization, and so on,and each of these methods has been founduseful for certain applications but ineffectivefor others. Recent connectionist research hasconsiderably enlarged our resources in theseareas. Each method has its advocates, but I

Articles

40 AI MAGAZINE

Figure 3. Functional similiarity

Figure 4. “Heureka!”

Page 8: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

further encourages us to formulate the func-tion of knowledge in terms of helping us todecide “which way to go.” This method is oftenusefully translated into the popular metaphorof hill climbing because if we can impose asuitable metric structure on this space, wemight be able to devise iterative ways to findsolutions by analogy with the method of hillclimbing or gradient ascent; that is, when anyexperiment seems more or less successfulthan another, then we exploit this metricalstructure to help us make the next move inthe proper direction. (Later, I emphasize thathaving a sense of direction entails a littlemore than a sense of proximity: It is notenough just to know metrical distances; wemust also respond to other kinds of heuristicdifferences, and these differences might bedifficult to detect.)

Whenever we design or select a particularrepresentation, this particular choice will biasour dispositions about which objects to con-sider more or less similar to us (or to the pro-grams we apply to them) and, thus, will affecthow we apply our knowledge to achieve goalsand solve problems. Once we understand theeffects of such commitments, we will be betterprepared to select and modify these represen-tations to produce more heuristically usefuldistinctions and confusions. Thus, let us nowexamine, from this point of view, some of therepresentations that have become popular inthe AI field.

Heuristic Connections of Pure Logic

Why have logic-based formalisms been sowidely used in AI research? I see two motivesfor selecting this type of representation. Onevirtue of logic is clarity, its lack of ambiguity.Another advantage is the preexistence ofmany technical mathematical theories aboutlogic. But logic also has its disadvantages.Logical generalizations only apply to their lit-eral lexical instances, and logical implicationsonly apply to expressions that precisely instan-tiate their antecedent conditions. No excep-tions are allowed, no matter how closely theymatch. This approach permits you to use nonear misses, no suggestive clues, no compro-mises, no analogies, and no metaphors. Toshackle yourself so inflexibly is to shoot yourown mind in the foot—if you know what Imean.

These limitations of logic begin at the foun-dation with the basic connectives and quanti-fiers. The trouble is that worldly statements ofthe form “For all x, P(x)” are never beyond

suspicion. To be sure, such a statement canindeed be universally valid inside a mathe-matical realm; however, this validity isbecause such realms are themselves based onexpressions of this kind. The use of such for-malisms in AI has led most researchers to seekuniversal validity, to the virtual exclusion ofpracticality or interest, as though nothingwould do except certainty. Now, this approachis acceptable in mathematics (wherein weourselves define the worlds in which we solveproblems), but when it comes to reality, thereis little advantage in demanding inferentialperfection when there is no guarantee thateven our assumptions will always be correct.Logic theorists seem to have forgotten thatany expression in actual life—that is, in aworld that we find but don’t make—such as(x)(Px) must be seen as only a convenientabbreviation for something more like the fol-lowing: “For any thing x being considered inthe current context, the assertion P{x} is likelyto be useful for achieving goals like G, provid-ed that we apply it in conjunction with otherheuristically appropriate inference methods.”In other words, we cannot ask our problem-solving systems to be absolutely perfect oreven consistent; we can only hope that theywill grow increasingly better than blindsearch at generating, justifying, supporting,rejecting, modifying, and developing evi-dence for new hypotheses.

It has become particularly popular in AI

logic programming to restrict the representa-tion to expressions written in first-order pred-icate calculus. This practice, which is sopervasive that most students engaged in itdon’t even know what “first order” meanshere, facilitates the use of certain types ofinference but at a high price: The predicatesof such expressions are prohibited from refer-ring in certain ways to one another. This

Articles

SUMMER 1991 41

Figure 5. Default assumption

Page 9: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

as close to the representation’s surface struc-ture. However, such a discovery is too muchto expect in general, so it is usually necessaryto gauge the similarity of two expressions byusing more complex assessments based, forexample, on the number of set-inclusionlevels between them, or on the number ofavailable operations required to transformone into the other, or on the basis of the par-tial ordering suggested by their lattice ofcommon generalizations and instances. Thismeans that making good similarity judg-ments might itself require the use of otherheuristic kinds of knowledge, until eventual-ly—that is, when our problems grow hardenough—we are forced to resort to tech-niques that exploit knowledge that is not sotransparently expressed in any such “mathe-matically elegant” formulation.

Indeed, we can think about much of AI

research in terms of a tension between solv-ing problems by searching for solutionsinside a compact and well-defined problemspace (which is feasible only for prototypes)versus using external systems (that exploitlarger amounts of heuristic knowledge) toreduce the complexity of that inner search.Compound systems of this sort need retrievalmachinery that can select and extract knowl-edge that is relevant to the problem at hand.Although it is not especially hard to writesuch programs, it cannot be done in first-order systems. In my view, this can best beachieved in systems that allow us to simulta-neously use object-oriented structure-baseddescriptions and goal-oriented functionaldescriptions.

How can we make logic more expressivegiven that each fundamental quantifier andconnective is defined so narrowly from thestart? This deficiency could well be beyondrepair, and the most satisfactory replacementmight be some sort of object-oriented frame-based language. After all, once we leave thedomain of abstract mathematics and free our-selves from these rigid notations, we can seethat some virtues of logiclike reasoning mightremain, for example, in the sorts of deductivechaining we used and the kinds of substitu-tion procedures we applied to these expres-sions. The spirit of some of these formaltechniques can then be approximated byother, less formal techniques of makingchains (see chapter 18, Minsky [1987]). Forexample, the mechanisms of defaults andframe arrays could be used to approximatethe formal effects of instantiating generaliza-tions. When we use heuristic chaining, ofcourse, we cannot assume absolute validity ofthe result; so, after each reasoning step, we

restriction prevents the representation ofmetaknowledge, rendering these systemsincapable, for example, of describing whatthe knowledge that they contain can be usedfor. In effect, it precludes the use of function-al descriptions. We need to develop systemsfor logic that can reason about their ownknowledge and make heuristic adaptationsand interpretations of it by using knowledgeabout this knowledge; however, the afore-mentioned limitations of expressivenessmake logic unsuitable for such purposes.

Furthermore, it must be obvious that toapply our knowledge to commonsense prob-lems, we need to be able to recognize whichpairs of expressions are similar in whateverheuristic sense may be appropriate. But thisseems too technically hard to do—at least forthe most commonly used logical formalisms—namely, expressions in which absolutequantifiers range over stringlike normal forms.For example, to use the popular method ofresolution theorem proving, one usually endsup using expressions that consist of logicaldisjunctions of separate, almost meaninglessconjunctions. Consequently, the naturaltopology of any such representation willalmost surely be heuristically irrelevant toany real-life problem space. Consider howdissimilar these three expressions seem whenwritten in conjunctive form:

A ∨ B ∨ C ∨ D , AB ∨ AC ∨ AD ∨ BC ∨ BD ∨ CD ,

and ABC ∨ ABD ∨ ACD ∨ BCD .The simplest way to assess the distances or

differences between expressions is to comparesuch superficial factors as the numbers ofterms or subexpressions they have in common.Any such assessment would seem meaning-less for expressions such as these. In most sit-uations, however, it would almost surely beuseful to recognize that these expressions aresymmetric in their arguments and, hence,will clearly seem more similar if we rerepre-sent them—for example, by using Sn to meann of S’s arguments have truth value T—sothat they can then be written in the form S1,S2 , and S3. Even in mathematics itself, weconsider it a great discovery to find a newrepresentation for which the most natural-seeming heuristic connection can be recognized

Articles

42 AI MAGAZINE

…I try to regard conflicts as opportunitiesrather than obstacles…

Page 10: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

might have to look for more evidence. If wenotice exceptions and disparities, then laterwe must return to each or else rememberthem as assumptions or problems to be justi-fied or settled at some later time, all thingsthat humans so often do.

Heuristic Connections of Rule-Based Systems

Although logical representations have beenused in research, rule-based representationshave been more successful in applications. Inthese systems, each fragment of knowledge isrepresented by an if-then rule, so that when-ever a description of the current problem situ-ation precisely matches the rule’s antecedentif condition, the system performs the actiondescribed by this rule’s then consequent.What if no antecedent condition applies? Theanswer is simple: The programmer addsanother rule. It is this seeming modularitythat made rule-based systems so attractive.You don’t have to write complicated programs.Instead, whenever the system fails to performor does something wrong, you simply addanother rule. This approach usually workswell at first, but whenever we try to movebeyond the realm of toy problems and start toaccumulate more and more rules, we usuallyget into trouble because each added rule isincreasingly likely to interact in unexpectedways with the others. Then, what should weask the program to do when no antecedentfits perfectly? We can equip the program toselect the rule whose antecedent most closelydescribes the situation; again, we’re back to“similar.” To make any real-world applicationprogram resourceful, we must supplement itsformal reasoning facilities with matchingfacilities that are heuristically appropriate forthe problem domain it is working in.

What if several rules match equally well? Ofcourse, we could choose the first on the list,choose one at random, or use some othersuperficial scheme—but why be so unimagi-native? In The Society of Mind, I try to regardconflicts as opportunities rather than obstacles,as openings that we can use to exploit otherkinds of knowledge. For example, section 3.2of The Society of Mind (Minsky 1987) suggests

Articles

SUMMER 1991 43

invoking a principle of noncompromise to dis-card sets of rules with conflicting antecedentsor consequents. The general idea is thatwhenever two fragments of knowledge dis-agree, it may be better to ignore them bothand refer to some other, independent agency.In effect, this approach is managerial: Oneagency can engage some other body of exper-tise to help decide which rules to apply. Forexample, one might turn to case-based rea-soning to ask which method worked best insimilar previous situations.

Yet another approach would be to engage amechanism for inventing a new rule bytrying to combine elements of those rulesthat almost fit already. Section 8.2 of TheSociety of Mind (Minsky 1987) suggests usingK-line representations for this purpose. To doso, we must be immersed in a society-of-agents framework in which each response toa situation involves activating not one but avariety of interacting processes. In such asystem, all the agents activated by severalrules can then be left to interact, if onlymomentarily (both with one another andwith the input signals) to make a useful self-selection about which of the agents shouldremain active. This could be done by combin-ing certain current connectionist conceptswith other ideas about K-line mechanisms.But that can’t be done until we learn how todesign network architectures that can supportnew forms of management and supervisionof developmental staging.

In any case, current rule-based systems arestill too limited in ability to express “typical”knowledge. They need better default machin-ery. They deal with exceptions too passively;they need censors. They need better “ring-closing” mechanisms for retrieving knowl-edge (see Minsky [1987], section 19.10).Above all, we need better ways to connectthem with other kinds of representations sothat we can use them in problem-solvingorganizations that can exploit other kinds ofmodels and search procedures.

Connectionist NetworksUp to this point, we have considered ways toovercome the deficiencies of symbolic sys-

…it is less important for agencies to cooperate than to exploitone another…

Page 11: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

a net can be described as nothing more thana simple vector—and the network’s input-output characteristics as nothing more than amap from one vector space into another. Thisarrangement makes it easy to reformulatepattern recognition and learning problems insimple terms, for example, of finding the bestsuch mapping. Seen in this way, the subjectpresents a pleasing mathematical simplicity.It is often not mentioned that we still possesslittle theoretical understanding of the compu-tational complexity of finding such map-pings, that is, of how to discover good valuesfor the connection weights. Most currentpublications still merely exhibit successfulsmall-scale examples without probing intoeither assessing the computational difficultyof these problems themselves or of scalingthese results to similar problems of larger size.

However, we now know of many situationsin which even such simple systems have beenmade to compute (and, more important, tolearn to compute) interesting functions, par-ticularly in such domains as clustering, classi-fication, and pattern recognition. In someinstances, this has occurred without anyexternal supervision; furthermore, some ofthese systems have also performed acceptablyin the presence of incomplete or noisy inputand, thus, correctly recognized patterns thatwere novel or incomplete. This achievementmeans that the architectures of those systemsmust indeed have embodied heuristic con-nectivities that were appropriate for thoseparticular problem domains. In such situa-tions, these networks can be useful for thekind of reconstruction-retrieval operations wecall ring closing.

However, connectionist networks have lim-itations as well. The next few sections discusssome of these limitations along with sugges-tions on how to overcome them by embed-ding such networks in more advancedarchitectural schemes.

Limitation of Fragmentation:The Parallel Paradox

In the Epilogue to Perceptrons, Papert and Iargued as follows:

It is often argued that the use of dis-tributed representations enables a systemto exploit the advantages of parallel pro-cessing. But what are the advantages ofparallel processing? Suppose that a cer-tain task involves two unrelated parts. Todeal with both concurrently, we wouldhave to maintain their representations intwo decoupled agencies, both active at

tems by augmenting them with connectionistmachinery. However, this kind of researchshould go both ways. Connectionist systemshave equally crippling limitations, whichmight be ameliorated by augmentation withthe sorts of architectures developed for sym-bolic applications. Perhaps such extensionsand synthesis will recapitulate some aspectsof how the primate brain grew over millionsof years by evolving symbolic systems tosupervise its primitive connectionist learningmechanisms.

What do we mean by connectionist? The useof this term is still rapidly evolving, but hereit refers to attempts to embody knowledge byassigning numeric conductivities or weightsto the connections inside a network of nodes.The most common form of such a node ismade by combining an analog, nearly linearpart that “adds up evidence” with a nonlin-ear, nearly digital part that makes a decisionbased on a threshold. The most popular suchnetworks today take the form of multilayerperceptrons, that is, sequences of layers of suchnodes, each layer sending signals to the next.More complex arrangements are also understudy that can support cyclic internal activi-ties; hence, they are potentially more versa-tile but harder to understand. What makessuch architectures attractive? Mainly, theyappear to be so simple and homogeneous. Atleast on the surface, they can be seen as waysto represent knowledge without any complexsyntax. The entire configuration state of such

Articles

44 AI MAGAZINE

Figure 6. Weighty decisions

Page 12: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

the same time. Then, should either ofthose agencies become involved with twoor more sub-tasks, we’d have to deal witheach of them with no more than a quar-ter of the available resources! If that pro-ceeded on and on, the system wouldbecome so fragmented that each jobwould end up with virtually no resourcesassigned to it. In this regard, distributionmay oppose parallelism: the more dis-tributed a system is—that is, the moreintimately its parts interact—the fewerdifferent things it can do at the sametime. On the other side, the more we doseparately in parallel, the less machinerycan be assigned to each element of whatwe do, and that ultimately leads toincreasing fragmentation and incompe-tence. This is not to say that distributedrepresentations and parallel processingare always incompatible. When we simul-taneously activate two distributed repre-sentations in the same network, they willbe forced to interact. In favorable circum-stances, those interactions can lead touseful parallel computations, such as thesatisfaction of simultaneous constraints.But that will not happen in general; itwill occur only when the representationshappen to mesh in suitably fortunateways. Such problems will be especiallyserious when we try to train distributedsystems to deal with problems thatrequire any sort of structural analysis inwhich the system must represent rela-tionships between substructures of relat-ed types—that is, problems that are likelyto demand the same structural resources.(Minsky and Papert 1988, p. 277)

(See also Minsky [1987], section 15.11.)For these reasons, it will always be hard for

a homogeneous network to perform parallelhigh-level computations, unless we can arrangefor it to become divided into effectively dis-connected parts. There is no general remedyfor this problem, and it is no special peculiari-ty of connectionist hardware; computers havesimilar limitations, and the only answer isproviding more hardware. More generally, itseems obvious that without adequate memorybuffering, homogeneous networks must remainincapable of recursion, as long as successivefunction calls have to use the same hardware.This inability is because without such facili-ties, either the different calls will cause sideeffects for one another, or some of them mustbe erased, leaving the system unable to exe-cute proper returns or continuations. Again,this might easily be fixed by providingenough short-term memory, for example, inthe form of a stack of temporary K-lines.

Limitations of Specializationand Efficiency

Each connectionist net, once trained, canonly do what it has learned to do. To make itdo something else—for example, to computea different measure of similarity or to recog-nize a different class of patterns—would, ingeneral, require a complete change in thematrix of connection coefficients. Usually, wecan change the function of a computer muchmore easily (at least, when the desired func-tions can each be computed by compact algo-rithms) because a computer’s memory cellsare so much more interchangeable. It is curi-ous how even technically well-informedpeople tend to forget how computationallymassive a fully connected neural network is.It is instructive to compare its size with the

Articles

SUMMER 1991 45

Society of Agencies and Managers

Learn - 1

Learn - 2

Learn - 3

B-brain

C-brain

A-brain

Critic - 1

Critic - 2

More advanced systems will be able to learn new ways to learn.

Fixed Learning Procedure

Output

Input Critic

Homogeneous Network

The simplest schemes apply a fixed learning procedure to a single network.

Figure 7. Homostructural vs. heterostructural

Page 13: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

these specialists, to keep the system from toomuch fruitless conflict for access to limitedresources. These management agencies cannotdirectly deal with all the small interior detailsof what happens inside their subordinates.Instead, they must work with summaries ofwhat those subordinates seem to do. This alsosuggests that there must be constraints oninternal connectivity: Too much detailedinformation would overwhelm those man-agers. Such constraints also apply recursivelyto the insides of every large agency. Thus, inchapter 8 of The Society of Mind (Minsky1987), I argue that relatively few direct con-nections are needed except between adjacentlevel bands.

All this suggests (but does not prove) thatlarge commonsense reasoning systems willnot need to be fully connected. Instead, thesystem could consist of localized clumps ofexpertise. At the lowest levels, these clumpswould have to be densely connected to sup-port the sort of associativity required to learnlow-level pattern-detecting agents. However,as we ascend to higher levels, the individualsignals must become increasingly abstractand significant, and accordingly, the densityof connection paths between agencies canbecome increasingly (but only relatively)smaller. Eventually, we should be able tobuild a sound technical theory about the con-nection densities required for commonsensethinking, but I don’t think that we have theright foundations yet. The problem is thatcontemporary theories of computationalcomplexity are still based too much on worst-case analyses or coarse statistical assump-tions, neither of which suitably representsrealistic heuristic conditions. The worst-casetheories unduly emphasize the intractableversions of problems that, in their usualforms, present less practical difficulty. Thestatistical theories tend to uniformly weightall instances for lack of systematic ways toemphasize the types of situations of mostpractical interest. However, the AI systems ofthe future, like their human counterparts,will normally prefer to satisfice rather thanoptimize—and we don’t yet have theoriesthat can realistically portray these mundanesorts of requirements.

Limitations of Context, Segmentation, and Parsing

When we see seemingly successful demon-strations of machine learning in carefully prepared test situations, we must be carefulabout how we draw more general conclu-

few hundred rules that drive a typically suc-cessful commercial rule-based expert system.

How connected do networks need to be?Several points in The Society of Mind suggestthat commonsense reasoning systems mightnot need to increase the density of physicalconnectivity as fast as they increase the com-plexity and scope of their performances.Chapter 6 (Minsky 1987) argues that knowl-edge systems must evolve into clumps of spe-cialized agencies, rather than homogeneousnetworks, because they develop differenttypes of internal representations. As this evo-lution proceeds, it will become decreasinglyfeasible for any of these agencies directly tocommunicate with the interior of others. Fur-thermore, there will be a tendency for mostnewly acquired skills to develop from the rel-atively few that are already well developed,which again will bias the largest-scale con-nections toward evolving into recursivelyclumped, rather than uniformly connected,arrangements. A different tendency to limitconnectivities is discussed in section 20.8,which proposes a sparse connection schemethat can simulate in real time the behavior offully connected nets—in which only a smallproportion of agents are simultaneouslyactive. This method, based on a half-centu-ry–old idea of Calvin Mooers, allows manyintermittently active agents to share the samerelatively narrow, common connection bus.This might seem, at first, a mere economy,but section 20.9 suggests that this techniquecould also induce a more heuristically usefultendency if the separate signals on that buswere to represent meaningful symbols. Final-ly, chapter 17 suggests other developmentalreasons why minds might virtually be forcedto grow in relatively discrete stages ratherthan as homogeneous networks. Our progressin making theories about this area might par-allel our progress in understanding the stageswe see in the growth of every child’s thought.

If our minds are assembled of agencies withso little intercommunication, how can thoseparts cooperate? What keeps them workingon related aspects of the same problem? Thefirst answer I propose in The Society of Mind isthat it is less important for agencies to coop-erate than to exploit one another becausethose agencies tend to become specialized,developing their own internal languages andrepresentations. Consequently, they cannotunderstand each other’s internal operationswell—and each must learn to exploit some ofthe others for the effects that those othersproduce—without knowing in any detail howthese other effects are produced. Similarly,there must be other agencies to manage all

Articles

46 AI MAGAZINE

Page 14: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

sions. This is because there is a large stepbetween the abilities to recognize objects orpatterns when they are isolated and whenthey appear as components of more complexscenes. In section 6.6 of Perceptrons (Minskyand Papert 1988), we see that we must be pre-pared to find that even after training a certainnetwork to recognize a certain type of pat-tern, we might find it unable to recognize thissame pattern when embedded in a more com-plicated context or environment. (Somereviewers have objected that our proofs ofthis fact applied only to simple three-layernetworks; however, most of these theoremsare much more general, as these critics mightsee if they’d take the time to extend thoseproofs.) The problem is that it is usually easyto make isolated recognitions by detectingthe presence of various features and thencomputing weighted conjunctions of them.This is easy to do in three-layer acyclic nets.But in compound scenes, this method won’twork unless the separate features of all thedistinct objects are somehow properlyassigned to the correct objects. Similarly, wecannot expect neural networks generally to beable to parse the treelike or embedded struc-tures found in the phrase structure of naturallanguage.

How could we augment connectionist net-works to make them able to do such things asanalyze complex visual scenes or extract andassign the referents of linguistic expressionsto the appropriate contents of short-termmemories? It will surely need additionalarchitecture to represent the structural analy-sis of a visual scene into objects and theirrelationships, for example, by protecting eachmidlevel recognizer from seeing input derivedfrom other objects, perhaps by arranging forthe object-recognizing agents to compete inassigning each feature to itself, but denying itto competitors. This method has been suc-

cessfully used in symbolic systems, and partshave been done in connectionist systems (forexample, by Waltz and Pollack), but manyconceptual missing links remain in this area,particularly in regard to how a second con-nectionist system could use the output of onethat managed to parse the scene. In any case,we should not expect to see simple solutionsto these problems. It might be an accidentthat so much of the brain is occupied withsuch functions.

Limitations of OpacityMost serious of all is what we might call theproblem of opacity, that the knowledgeembodied inside a network’s numeric coeffi-cients is not accessible outside that net. Thischallenge is not one we should expect ourconnectionists to easily solve. I suspect it is sointractable that even our own brains have

Articles

SUMMER 1991 47

origin

apple-tree

structure member-of

body supported-by stem

fruit

shape size color taste

round tennis-ballor

red green

apple-y

Symbolic Apple Connection Apple

6

-3

45

59

8

37 2

7 93

-518 32

36

8313

96

.857.245

5.398

11.9

.369

21.6.7236

2.348

13.43 .243.983

.825

Figure 8. Recognition in context

Figure 9. Numerical opacity

Page 15: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

thing with no parts provides nothing that wecan use as pieces of explanation” (Minsky1987).

For such reasons, although homogeneous,distributed learning systems may work well toa certain point, they eventually may tend tofail when confronted with problems of largerscale—unless we find ways to compensate theaccumulation of many weak connectionswith some opposing mechanism that favorsinternal simplification and localization.Many connectionist writers positively seemto rejoice in the holistic opacity of represen-tations within which even they are unable todiscern the significant parts and relation-ships. However, unless a distributed systemhas enough ability to crystallize its knowl-edge into lucid representations of its newsubconcepts and substructures, its ability tolearn will eventually slow, and it will beunable to solve problems beyond a certaindegree of complexity. In addition, althoughthis situation suggests that homogeneous net-work architectures might not work well past acertain size, this restriction should be badnews only for those ideologically committedto minimal architectures. For all we currentlyknow, the scales at which such systems crashare large enough for our purposes. Indeed,the society of mind thesis holds that most ofthe agents that grow in our brains only needto operate on scales so small that each byitself seems no more than a toy. But when wecombine enough of them—in ways that arenot too delocalized—we can make them doalmost anything.

In any case, we should not assume that wealways can—or always should—avoid the useof opaque schemes. The circumstances ofdaily life compel us to make decisions basedon “adding up the evidence.” We frequentlyfind (when we value our time) that even if wehad the means, it wouldn’t pay to analyze.The society of mind theory of human thinkingdoesn’t suggest otherwise; on the contrary, itleads us to expect to encounter incomprehen-sible representations at every level of themind. A typical agent does little more thanexploit other agents’ abilities; hence, most ofour agents accomplish their job knowing vir-tually nothing of how it is done.

evolved little such capacity over the billionsof years it took to evolve from anemonelikereticulae. Instead, I suspect that our societiesand hierarchies of subsystems have evolvedways to evade the problem by arranging forsome of our systems to learn to model whatsome of our other systems do (see Minsky[1987], section 6.12). They might do thismodeling in part by using informationobtained from direct channels into the interi-ors of these other networks, but mostly, I sus-pect, they do it less directly—so to speak,behavioristically—by making generalizationsbased on external observations, as thoughthey were like miniature scientists. In effect,some of our agents invent models of others.Regardless of whether these models might bedefective or even entirely wrong (and here Irefrain from directing my aim at peculiarlyfaulty philosophers), it suffices for thesemodels to be useful in enough situations. Tobe sure, it might be feasible, in principle, foran external system to accurately model a con-nectionist network from outside by formulat-ing and testing hypotheses about its internalstructure. However, of what use would such amodel be if it merely repeated redundantly?It would not only be simpler but also moreuseful for that higher-level agency to assem-ble only a pragmatic, heuristic model of thisother network’s activity based on conceptsalready available to that observer. (This is evi-dently the situation in human psychology.The apparent insights we gain from medita-tion and other forms of self-examination areonly infrequently genuine.)

The problem of opacity grows more acuteas representations become more distribut-ed—that is, as we move from symbolic toconnectionist poles—and it becomes increas-ingly more difficult for external systems toanalyze and reason about the delocalizedingredients of the knowledge inside distribut-ed representations. It also makes it harder tolearn, past a certain degree of complexity,because it is hard to assign credit for success,or formulate new hypotheses (because the oldhypotheses themselves are not “formulated”).Thus, distributed learning ultimately limitsgrowth, no matter how convenient it mightbe in the short term, because “the idea of a

Articles

48 AI MAGAZINE

The future work of mind design will not be much like what wedo today.

Page 16: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

Analogous issues of opacity arise in thesymbolic domain. Just as networks sometimessolve problems by using massive combina-tions of elements, each of which has littleindividual significance, symbolic systemssometimes solve problems by manipulatinglarge expressions with similarly insignificantterms, such as when we replace the explicitstructure of a composite Boolean functionwith a locally senseless canonical form.Although this technique simplifies somecomputations by making them more homoge-neous, it disperses knowledge about the struc-ture and composition of the data and, thus,disables our ability to solve harder problems.At both extremes—in representations that areeither too distributed or too discrete—we losethe structural knowledge embodied in theform of intermediate-level concepts. This lossmight not be evident as long as our problemsare easy to solve, but those intermediate con-cepts might be indispensable for solving moreadvanced problems. Comprehending com-plex situations usually hinges on discoveringa good analogy or variation on a theme. How-ever, it is virtually impossible to do this witha representation, such as a logical form, alinear sum, or a holographic transforma-tion—each of whose elements seem meaning-less because they are either too large or toosmall—thus leaving no way to represent sig-nificant parts and relationships.

Many other problems invite the synthesisof symbolic and connectionist architectures.How can we find ways for nodes to refer toother nodes or to represent knowledge aboutthe roles of particular coefficients? To see thedifficulty, imagine trying to represent thestructure of the arch in Patrick Winston’sthesis—without simply reproducing its topol-ogy. Another critical issue is how to enablenets to make comparisons. This problem ismore serious than it might seem. Section 23.1of The Society of Mind discusses the impor-tance of differences and goals, and section23.2 points out that connectionist networksdeficient in short term memory will find itpeculiarly difficult to detect differencesbetween patterns (Minsky 1987). Networkswith weak architectures will also find it diffi-cult to detect or represent (invariant) abstrac-tions; this problem was discussed as early as the Pitts-McCulloch paper of 1947. Stillanother important problem for memory-weak, bottom-up mechanisms is controllingsearch: To solve hard problems, one mighthave to consider different alternatives, exploretheir subalternatives, and then make compar-isons among them—yet still be able to returnto the initial situation without forgetting

what was accomplished. This kind of activity,which we call thinking, requires facilities fortemporarily storing partial states of the systemwithout confusing these memories. Oneanswer is to provide, along with the requiredmemory, some systems for learning and exe-cuting control scripts, as suggested in section13.5 of The Society of Mind (Minsky 1987). Todo this effectively, we must have some insula-tionism to counterbalance our connectionism.Smart systems need both of these components,so the symbolic-connectionist antagonism isnot a valid technical issue but only a transientconcern in contemporary scientific politics.

Mind SculptureThe future work of mind design will not bemuch like what we do today. Some program-mers will continue to use traditional languagesand processes. Other programmers will turntoward new kinds of knowledge-based expertsystems. However, eventually all this workwill be incorporated into systems that exploittwo new kinds of resources. On one side, wewill use huge preprogrammed reservoirs ofcommonsense knowledge. On the other side,we will have powerful, modular learningmachines equipped with no knowledge at all.Then, what we know as programming willchange its character entirely—to an activitythat I envision to be more like sculpturing. Toprogram today, we must describe things verycarefully because nowhere is there any marginfor error. But once we have modules that knowhow to learn, we won’t have to specify nearlyso much—and we’ll program on a granderscale, relying on learning to fill in details.

This doesn’t mean, I hasten to add, thatthings will be simpler than they are now.Instead, we’ll make our projects more ambi-tious. Designing an artificial mind will bemuch like evolving an animal. Imagine your-self at a terminal, assembling various parts ofa brain. You’ll be specifying the sorts ofthings that were only heretofore described intexts about neuroanatomy. “Here,” you’ll findyourself thinking, “we’ll need two similar networks that can learn to shift time signalsinto spatial patterns so that they can be com-pared by a feature extractor sensitive to a con-text about this wide.” Then, you’ll have tosketch the architectures of organs that canlearn to supply appropriate input to thoseagencies and draft the outlines of intermedi-ate organs for learning to suitably encode theoutput to suit the needs of other agencies.Section 31.3 of The Society of Mind (Minsky1987) suggests how a genetic system mightmold the form of an agency that is predes-

Articles

SUMMER 1991 49

Page 17: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

about one agency imposes additional con-straints and requirements on several othersand, in turn, on how to train those others. Inaddition, as in any society, there must bewatchers to watch each watcher, lest any oneor a few of them get too much control of therest.

Each agency will need nerve bundle–likeconnections to certain other ones for sendingand receiving signals about representations,goals, and constraints, and we’ll have to makedecisions about the relative size and influenceof every such parameter. Consequently, Iexpect that the future art of brain design willhave to be more like sculpturing than like ourcurrent craft of programming. It will be muchless concerned with the algorithmic details ofthe submachines than with the balancing oftheir relationships; perhaps this situationbetter resembles politics, sociology, or man-agement than present-day engineering.

Some neural network advocates mighthope that all this will be superfluous. Whynot seek to find, instead, how to build onesingle, huge net that can learn to do all thesethings by itself. Did not our own humanbrains come about as the outcome of onegreat learning search? We could only regardthis as feasible by ignoring the facts—theunthinkable scale of that billion-year ventureand the octillions of lives of our ancestors.Remember, too, that even so, in all that evo-lutionary search, not all the problems haveyet been solved. What will we do when oursculptures don’t work? Consider a few of thewonderful bugs that still afflict even our owngrand human brains: obsessive preoccupationwith inappropriate goals; inattention and inabili-ty to concentrate; bad representations; excessivelybroad or narrow generalizations; excessive accu-mulation of useless information; superstition:defective credit-assignment schema; unrealisticcost-benefit analyses; unbalanced, fanaticalsearch strategies; formation of defective catego-rizations; inability to deal with exceptions torules; improper staging of development, or livingin the past; unwillingness to acknowledge loss;depression or maniacal optimism; and excessiveconfusion from cross-coupling.

tined to learn to recognize the presence ofparticular human individuals. A functionalsketch of such a design might turn out toinvolve dozens of different sorts of organs,centers, layers, and pathways. The humanbrain might have many thousands of suchcomponents.

A functional sketch is only the start. When-ever you use a learning machine, you mustspecify more than just the sources of inputand the destinations of output. It must alsosomehow be impelled toward the sorts ofthings you want it to learn—what sorts ofhypotheses it should make, how it shouldcompare alternatives, how many examplesshould be required, how to decide whenenough has been done, when to decide thatthings have gone wrong, and how to dealwith bugs and exceptions. It is all very wellfor theorists to speak about “spontaneouslearning and generalization,” but there aretoo many contingencies in real life for suchwords to mean anything by themselves.Should this agency be an adventurous risktaker or a careful, conservative reductionist?One person’s intelligence is another’s stupidi-ty. And how should that learning machinedivide and budget its resources of hardware,time, and memory?

How will we build such grand machineswhen so many design constraints are involved?No one will be able to track all the detailsbecause just as a human brain is constitutedby interconnecting hundreds of differentkinds of highly evolved subarchitectures, sowill these new kinds of thinking machines.Each new design will have to be assembled byusing libraries of already developed, off-the-shelf subsystems already known to be able tohandle particular kinds of representationsand processes. Also, the designer will be lessconcerned with what happens inside theseunits and more concerned with their inter-connections and interrelationships. Becausemost components will be learning machines,the designer will have to specify not onlywhat each one will learn but also which agen-cies should provide what incentives andrewards for which others. Every such decision

Articles

50 AI MAGAZINE

…as in any society, there must be watchers to watch eachwatcher, lest any one or a few of them get too much control of the rest.

Page 18: AI Magazine Volume 12 Number 2 (1991) (© AAAI)web.mit.edu/6.034/www/6.s966/Minsky-NeatVsScruffy.pdf · 36 AI MAGAZINE Figure 1. Conflict between theoretical extremes. bers of weaker

Seeing this list, one has to wonder, “Canpeople think?” I suspect there is no simpleand magical way to avoid such problems inour new machines; it will require a great dealof research and engineering. I suspect that itis no accident that human brains contain somany different and specialized brain centers.To suppress the emergence of serious bugs,both those natural systems and the artificialones we shall construct will probably needintricate arrangements of interlocking checksand balances, in which each agency is super-vised by several others. Furthermore, each ofthese other agencies must themselves learnwhen and how to use the resources availableto them. How, for example, should eachlearning system balance the advantages ofimmediate gain over those of conservative,long-term growth? When should it favor theaccumulation of competence over compre-hension? In the large-scale design of ourhuman brains, we still don’t know much ofwhat all the different organs do, but I’m will-ing to bet that many of them are largelyinvolved in regulating others, to keep thesystem as a whole from falling prey to thesorts of bugs that were mentioned above.Until we start building brains ourselves tolearn what bugs are most probable, it willremain hard for us to guess the actual func-tions of much of that hardware.

There are countless wonders yet to be dis-covered in these exciting new fields ofresearch. We can still learn a great manythings from experiments on even the simplestnets. We’ll learn even more from trying tomake theories about what we observe. Andsurely, soon we’ll start to prepare for thatfuture art-of-mind design by experimentingwith societies of nets that embody morestructured strategies—and, consequently,make more progress on the networks thatmake up our own human minds. And indoing all these experiments, we’ll discoverhow to make symbolic representations thatare more adaptable and connectionist repre-sentations that are more expressive.

It is amusing how persistently peopleexpress the view that machines based on sym-bolic representations (as opposed, presum-ably, to connectionist representations) couldnever achieve much or ever be conscious andself-aware. I maintain it is precisely becauseour brains are still mostly connectionist, thatwe humans have so little consciousness! It’salso why we’re capable of so little parallelismof thought—and why we have such limitedinsight into the nature of our own machinery.

Acknowledgment

This research was funded over a period ofyears by the Computer Science Division ofthe Office of Naval Research. Illustrations byJuliana Minsky.

Notes1. Adapted from Logical versus Analogical or Sym-bolic versus Connectionist or Neat versus Scruffy.In AI at MIT: Expanding Frontiers, eds. Patrick H.Winston, with S. A. Shellard, 219–243. Cambridge,Mass.: MIT Press.

BibliographyMinsky, M. 1988. Preface. In Connectionist Modelsand Their Implications: Readings from Cognitive Sci-ence, eds. D. L. Waltz and J. Feldman, vii–xvi. Nor-wood, N.J.: Ablex.

Minsky, M. 1987. The Society of Mind. New York:Simon and Schuster.

Minsky, M. 1974. A Framework for RepresentingKnowledge, Report AIM, 306, Artificial IntelligenceLaboratory, Massachusetts Institute of Technology.Reprinted in The Psychology of Computer Vision, ed.P. H. Winston, 211–277. New York: McGraw-Hill.

Minsky, M., and Papert, S. 1988. Perceptrons, 2d ed.Cambridge, Mass.: MIT Press.

Marvin Minsky ’s work inmachine cognition has notonly involved heuristic pro-gramming, cognitive psychol-ogy, and connectionistlearning networks but also themathematical foundations ofcomputer science and thepractical technology ofmechanical robotics. He hascontributed to the domains of

symbolic description, knowledge representation,symbolic applied mathematics, computationalsemantics, and machine perception. His main con-cern over the past decade has been to discover howto make machines do commonsense reasoning. Thefoundations of his new conception of human psy-chology are described in his book The Society ofMind, which proposes revolutionary concepts aboutthinking, learning, memory, language, and concep-tual development as well as the administrativestructures that underlie the functions of the brain.Minsky is a 1990 recipient of the prestigious JapanPrize that recognizes original and outstandingachievements in science and technology.

Articles

SUMMER 1991 51