Functional Constraints on Architectural Mechanisms Christian Lebiere ([email protected]) Carnegie Mellon...

Functional Constraints on Architectural Mechanisms

Christian Lebiere ([email protected])Carnegie Mellon University

Bradley Best ([email protected])Adaptive Cognitive Systems

Introduction

• Goal: Strong Cogsci – single integrated model of human abilities that is robust, adaptive and general

• Not just an architecture that supports it (Newell test evaluation) but a system that actually does it

• Not strong AI (means matter), weak Cogsci (general)• Plausible strategies:

• Build a single model from scratch – traditional AI strategy• Incremental assembly – successful CS system strategy

• But little/no reuse of models limits complexity!

7/29/09 22009 ACT-R Workshop

Model Fitting Constraint

• Fitting computational models to human data is the “coin of the realm” of cognitive modeling

• Is it a sufficient constraint to achieve convergence toward the goal of model integration and robustness

• Good news: cognitive architectures are increasingly converging toward a common modular organization

• Bad news: still very little model reuse – almost every task results in a new model developed tabula rasa

• Question: have we gotten right the tradeoff between precision (fitting data) and generality (reuse/integration)?


You Can’t Play 20 Models…

• 35 years ago Newell raised a similar issue with convergence in experimental psychology

• He diagnosed much of the issue with the lack of emphasis on the control structure to solve a problem

• He offered 3 prognoses for “putting it together”:– Complete processing models (and PS suggestion) – check!– Analyze a complex task (chess suggestion) – progress but…– One program for many tasks (integration, e.g WAIS) – fail?

• What have been the obstacles to putting it together?


Obstacles to Integration

• Models tend to be highly task-specific – they usually cannot be used directly even for closely related tasks

• They tend to represent the final point of the process from initial task discovery to final asymptotic behavior

• Modeler’s meta-cognitive knowledge of the task gets hardwired into the model• Experience with High-Level Language (HLSR) compilation

• Task discovery processes, including metacognitive processes, should be part of the model/architecture

• Tackles broader category of tasks through adaptation


Forcing Functions for Integration

• Model comparison challenges (e.g. DSF) that feature:– Breadth of applicability (e.g. multiple conditions)– Unknown conditions and/or data (tuning vs testing sets)– Integration of multiple functionalities (control, prediction)

• Unpredictable domains, e.g. adversarial behavior:– Breadth and variability of behavior– Constant push for adaptivity and unpredictability– Strong incentive to maximize functionality

• Architectural implications to model integration?– Focus on both control and representation structure


A Tour of Four Modules

Procedural Module

Declarative Module

Retrieval Buffer

Intentional Module

Goal Buffer

Vision Module

Visual Buffer

Motor Module

Manual Buffer

Working Memory ModuleImaginal

Buffer

Environment


• All modules have shortcomings in robustness and generality• Ability to craft models for lab tasks does not guarantee plausible behavior in open-ended situations

Module 1: Declarative• Base-level learning can lead to looping if unchecked

– Most active chunk is retrieved, then its activation boosted…

• Very hard to control if compiling higher-level model– Many logical conditions require repeated retrieval loops

• Old solution: tag chunk on retrieval (e.g. list learning)

• New solution: declarative finsts to perform tagging


+retrieval> isa item index =index :recently-retrieved nil

(sgp :declarative-num-finsts 5 :declarative-finst-span 10)

+retrieval> isa item index =index - retrieved =goal

=retrieval> retrieved =goal

Base-Level Inhibition (BLI)

Also in other domains: arithmetic, web navigation, physical

environments

Odds by Quintile - Brittanica

0.0001

0.001

0.01

0.1

1

1 10 100

Lag

Odds

Q1

Q2Q3

Q4

Q5

€

Bi = log tj−d

j=1

n

∑ − log 1+tn

−ds

ts

⎛

⎝ ⎜

⎞

⎠ ⎟

-3

-2.5

-2

-1.5

-1

-0.5

0

1 10 100

BLL

PL(0.75;10)

PL(1.0;10)

PL(1.0;5.0)

PL(3;1.0;10)

PL(2;1.0;10)

Provides inhibition of return resulting in soft, adaptive round-robin in free-recall procedures w/o requiring any additional constraints


Emergent Robustness

• Running the retrieval mechanism unsupervised leads to the gradual emergence of an internal power law distribution

• It differs from both the pathological behavior of the default BLL, and from the hard and fixed round-robin of the tag/finst version

1

10

100

1000

10000

1 10

n=100

n=1000n=10000

Frequencies of Free Recall as a Function of Item Rank


Module 2: Procedural

• Procedural module– Production rule set need careful crafting to cover all cases– Degenerate behavior in real environments (stuck, loop, etc)– Esp. difficult in continuous domains (ad hoc thresholds, etc)

• Generalization of production applicability– Often need to use declarative module to leverage semantic

generalization through partial matching mechanism

• Unification between symbolic (matching) and subsymbolic (selection) processes is desirable for robustness, adaptivity and generalization


Production Partial Matching (PPM)• Same principle as partial matching in declarative memory

– Unification is good and logical given representation (neural models)

• Matching Utility

• Dynamic generalization: production condition defines ideal “prototype” situation, not range of application conditions

• Adaptivity: generalization expands with success as utility rises, contracts with failure as production over-generalizes

• Safe version: explicit ~ test modifier similar to -, <, >, etc• Learning new productions can collapse across range and learn

differential sensitivity to individual buffer slot values

€

MUp = Up + BMP • Sim( pi ,bi )i=1

n

∑


Building Sticks Task

Standard Production Model (Lovett, 1996)

• 4 productions– Force-over– Force-under– Decide-over– Decide-under

• Hardwired range• Utility Learning

Instance-based Model (Lebiere, 1997)

• Chunks: under, over, target & choice slots

• Partial matching on closeness of over and under to target

• Base-level learning w/ degree of match

New Partial-Matching Production Model

• 2 productions– Over: match over stick

against target– Under: match under

stick against target• Utility learning mixed

with degree of match7/29/09 132009 ACT-R Workshop

Procedural or Instance-based?

• One of Newell’s decried “oppositions” reappeared in the computational modeling context

• Neuroscience (e.g., fMRI) might provide arbitrating data between modules but likely not within module

• Correct solution is likely a combination of initial declarative retrieval to procedural selection

• Need a smooth transition from declarative to procedural mechanism without modeler-induced discontinuity in terms of arbitrary control structure

7/29/09 2009 ACT-R Workshop 14

Module 3: Working Memory

• Current WM: Named, fixed buffers, types, slots– Pros

•Precise reference makes complex information processing not only possible but relatively easy•Familiar analogy to traditional programming

– Cons•Substantial modeling effort required

– Modeling often time-consuming and error-prone

•Hard limit on flexibility of representation– Fine in laboratory tasks, more problematic in open-ended, dynamic, unpredictable environments

Representation Implications

• Explicit slot (and also type, buffer) management– Add more slots to represent all information needed

•Pro: slots have clear semantics•Con: profligate, dilution of spreading activation

– Reuse slots for different purposes over time•Pro: keep structures relatively compact•Con: uncertain semantics (what is in this slot right now?)

– Use different (goal) types over time•Pro: cleaner semantics, hierarchical control•Con: increase management of context transfer

– More buffers or reuse buffers as storage•Less of that for now but same general drawbacks as slot, type•Integration issues (episodic memory)

Working Memory Module

• Replace chunk structures in buffers with sets of values associated with fast decaying short-term activation– Faster decay rate than LTM and no reinforcement

• Generalize pattern matching to ordered set of values– Double match of semantic and position content

• Assumptions about context permanence– Short-term maintenance w/ quick decay (sequence learning)– Explicit rehearsal possible but impact on strength and ordering

N-Back Task

• Nback working memory task: is current stimulus same as the one n back?

• Default ACT-R model holds and shifts items in buffer: perfect recall!

• Working memory model adds item to WM, then decays and partial match

• Performance decreases with noise and n up to plateau – good fit to data

(p back4 =goal> isa nback stimulus =stimulus match nil +intentional> =back1 =back2 =back3 =back4==> !output! (Stimulus =stimulus retrieving 4-back =back4) =goal> match =back4)

(p back4 =goal> isa nback stimulus =stimulus match nil =imaginal> isa four-back back1 =back1 back2 =back2 back3 =back3 back4 =back4==> !output! (Stimulus =stimulus matching 4-back =back4) =goal> match =back4)

Module 4: Episodic Memory• Need integration of information in LTM across modalities• Main role of episodic memory is support goal management• Store snapshots of working memory

– Concept of chunk slot is replaced with activation– Similar to connectionist temporal synchrony binding

• Straightforward matching of WM context to episodic chunks– Double, symmetrical match of semantic and activation content

• Issues:– Creation signal: similar to current chunk switch in buffer– Reinforcement upon rehearsal?– Relation to traditional LTM? Similar to role of HC in training PC?

List Memory

• Pervasive task requires multi-level indexing representation– “micro-chunks” vs traditional representation

• Captures positional confusion and failures• Is it strategy choice or architectural feature?• How best to provide this function pervasively


+retrieval> isa item parent =group position fourth :recently-retrieved nil

Related Work

• Instruction following (Anderson and Taatgen)• General model for simple step-following tasks

• Minimal control principle (Taatgen)• Limit modeler-imposed control structure

• Threading and multitasking (Salvucci and Taatgen)• Combine independent models and reduce interference

• Metacognition (Anderson)• Enable model to discover original solution to new problem

• Call for new thinking on “an increasingly watered down set of principles for the representation of knowledge” (Anderson)


Conclusion• Available data is often not enough to discriminate

between competing models of single tasks– Newell might have been too optimistic about the ability to

uniquely infer the control method given data and system• More data can help but often leads to more specialized

and complex models and away from integration• Focus on functionality, esp. Newell’s 2nd (complex tasks)

and 3rd (multiple tasks) criteria for further discrimination• Focusing on tasks that require open-ended behavior can

enhance the robustness and generality of cognitive architectures without compromising their fidelity


Functional Constraints on Architectural Mechanisms Christian Lebiere ([email protected]) Carnegie Mellon...

Documents

Transcript of Functional Constraints on Architectural Mechanisms Christian Lebiere ([email protected]) Carnegie Mellon...