[email protected] Dragos Calitoiu Bank of America [email protected] Self-optimization...

[email protected]

Dragos CalitoiuBank of America

[email protected]

Self-optimization and self-organization with Goore Game

(a distributed non-cooperative non-zero-sum N-person game):

Theoretical Results, Applications and Open Problems

CORS – Ottawa sectionMarch 26, 2009

Self-optimization and self-organization with Goore Game

(a distributed non-cooperative non-zero-sum N-person game):

Theoretical Results, Applications and Open Problems

CORS – Ottawa sectionMarch 26, 2009

[email protected]

Self-optimization and Self-organization with Goore Game CONTENT OF PRESENTATION

Goore Game – Introduction

Learning Automata and Goore Game implemented with LA

Applications

My Research

Open Problems

Objective: To introduce the methodology of distributed control with GG for future research

[email protected]

Self-optimization and Self-organization with Goore Game GG - BACKGROUND

Goore Game - Description:

An example of self-organization and self-optimization game studied in the field of AI

Presented by Tsetlin in 1963 [1] and analyzed in detail in [2] and [3].

[1] M.L. Tsetlin, “Finite automata and the modeling of the simplest formsof behavior,” Uspekhi Matem Nauk, vol. 8, pp. 1-26, 1963.[2] K.S. Narendra and M.A.L. Thathachar, Learning Automata, Prentice-Hall, 1989.[3] M.A.L. Thathachar M.T. Arvind, “Solution of Goore game using models of stochastic learning automata,” Journal of Indian Institute of Science, no. 76, pp. 47-61, 1997.

[email protected]

Self-optimization and Self-organization with Goore Game GG - BACKGROUND

Goore Game’s features:

* Imagine a large room with N cubicles and a raised platform. A voter sits in each cubicle and a Referee stands on the platform. The Referee conducts a series of voting rounds:

- On each round the voters vote either “Yes” or “No” (the issue is unimportant) simultaneously and independently (they do not see each other) and the Referee counts the fraction, f, of “Yes” votes.

* The Referee has a unimodal performance criterion G(f), which is optimized when the fraction of “Yes” votes is exactly f0.

* The voting round ends with the Referee awarding a dollar with probability G(f) and assessing a dollar with probability 1 - G(f) to every voter independently.

* On the basis of their individual gains and losses, the voters then decide, again independently, how to cast their votes on the next round.

* No matter how many players there are, after enough trials, the number of “Yes” votes will approximate N*f0.

[email protected]

Self-optimization and Self-organization with Goore GameTHE GAME – STEP BY STEP

2

Referee

1

3

4

5

6

7

8

9

10

Imagine a large room containing N (N=10 in our picture) cubicles and a raised platform.

A voter sits in each cubicle and a Referee stands on the platform.

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

The Referee conducts a series of voting rounds as follows: On each round the voters vote either “Yes” or “No” (the issue is unimportant) simultaneously and independently (they do not see each other)

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

Counts how many Yes votes there are : 6 out

of 10.


of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

Referee has a unimodal performance criterion G(f) Referee has a unimodal

performance criterion G(f)


of 10.


of 10.

Referee awards a dollar with probability G(f) and assesses a dollar with probability 1 - G(f) to every voter independently. On the basis of their individual gains and losses, the voters

then decide, again independently, how to cast their votes on the next round.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

G=0.9e-[(0.7-x)*(0.7-x)/0.0625] Referee has a unimodal

performance criterion G(0.6) =0.76692941

G=0.9e-[(0.7-x)*(0.7-x)/0.0625] Referee has a unimodal

performance criterion G(0.6) =0.76692941


of 10.


of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

X

G(X

)

[email protected]

Self-optimization and Self-organization with Goore GameREWARD/PENALTY

2

Referee

1

3

4

5

6

7

8

9

10

Referee has a unimodal performance criterion

G(0.6) =0.7669


G(0.6) =0.7669


of 10.


of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

Generate a random

variable R1

Generate a random

variable R1

IF R1 < 0.7669 THEN Reward ELSE Penalize


[email protected]


2

Referee

1

3

4

5

6

7

8

9

10


G(0.6) =0.7669


G(0.6) =0.7669


of 10.


of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

Generate a random

variable R2

Generate a random

variable R2



The player 2 can be rewarded independently from what he said (No or Yes). “Yes” is not a better

decision than “No”.

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10


G(0.6) =0.7669


G(0.6) =0.7669


of 10.


of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYesGenerate a

random variable R10

Generate a random

variable R10



[email protected]


2

Referee

1

3

4

5

6

7

8

9

10On the basis of their individual gains and losses, the voters then decide, again independently, how to cast

their votes on the next round

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

Using implementations with Learning Automata, after enough iterations, the number of players that will say

YES is correlated with the maximum of G(f). If the maximum happens for 7 players, we will find that 7 players will say “Yes” and 3 players will say “No”.

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA – LEARNING LOOP

={1, 2,…, r} - r actions

{c1, c2,…, cr} - action penalty probabilities

={0,1} - response from the Environment: reward and penalty

Learning AutomatonLA

Random Environment

RE

={0,1}={1, 2,…, r}

{c1, c2,…, cr}

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA –LEARNING LOOP

LA chooses one of the possible set of actions {1,., r} offered by the Random Environment RE

RE rewards or penalizes the chosen action based on penalty probabilities

The RE's response is the input to automaton: LA chooses next action

Chosen action (t) is given as input to the RE

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA –LRI SCHEME

• p2 increased• p1 decreased

If 2 is the best action:

2 chosen and rewarded = 0.1

p1()

p2()

0

1

•Example:

=

0.4

0.6

P(t) =

0.3

0.7

P(t+1)

Reward – Inactive Scheme (only reward / no penalty)

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA –LRI SCHEME

•Action Probability Updating Scheme:

p1(t+1)=p1(t)+(1-p1(t)) - if 1 is rewarded

p2(t+1)= 1-p1(t)

p2(t+1)=p2(t)+(1-p2(t)) - if 2 is rewarded

p1(t+1)= 1-p2(t)

When 1 or 2 is penalized:

DO NOT MODIFY p1 or p2

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

Referee

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6Generate a random number Q1: 0.2344Generate a random number Q1: 0.2344

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10


IF Q1 < P1 THEN “YES” (Action1) ELSE “NO” (Action 2)

IF Q1 < P1 THEN “YES” (Action1) ELSE “NO” (Action 2)

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10


Q1(=0.2344) < P1(=0.4) “YES”

Q1(=0.2344) < P1(=0.4) “YES”

[email protected]


2

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Random Q1: 0.2344Random Q1: 0.2344 0.2344 < 0.4 : YES 0.2344 < 0.4 : YES

Random Q2: 0.6798Random Q2: 0.6798 0. 6798 < 0.45 : NO 0. 6798 < 0.45 : NO

Random Q3: 0.448Random Q3: 0.448 0.448 < 0.1 : NO 0.448 < 0.1 : NO


Random Q5: 0.8976Random Q5: 0.8976 0. 8976 < 0.45 : NO 0. 8976 < 0.45 : NO






[email protected]


2

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

- There are 5 YES votes out of 10.- We compute G(5/10)=0.4745.- The Referee rewards/ penalizes each player independently.

Referee

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

X

G(X

)

[email protected]


2

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Random M1: 0.32550.325 < 0.4745? : True – Reward

He said YES P1:= P1 +0.2(1-P1) = 0.52

P2:=1-0.52 = 0.48


He said YES P1:= P1 +0.2(1-P1) = 0.52

P2:=1-0.52 = 0.48

- IF Mi<G(0.5) THEN Reward ELSE Penalize i=1..10

We are using a LRI Scheme :- Reward : If voter i said Yes : increase P1 and decrease P2

If voter i said No : increase P2 and decrease P1

- Penalize : Don’t act !

Referee

[email protected]


2

1

3

4

5

6

7

8

9

10

P1=0.52; P2=0.48P1=0.52; P2=0.48

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Random M2: 0.77890.7789 < 0.4745? : Not true - Penalize

Do not modify P1 and P2

Random M2: 0.77890.7789 < 0.4745? : Not true - Penalize

Do not modify P1 and P2


We are using a LRI Scheme :- Reward : increase P1 and decrease P2


Referee

[email protected]

Self-optimization and Self-organization with Goore Game

2

1

3

4

5

6

7

8

9

10

P1=0.52; P2=0.48P1=0.52; P2=0.48

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15


We are using a LRI Scheme :- Reward : increase P1 and decrease P2


Referee


He said NO P2:= P2+0.2(1-P2) = 0.92

P1:= 1-P1=0.08


He said NO P2:= P2+0.2(1-P2) = 0.92

P1:= 1-P1=0.08

IMPLEMENTATION WITH LRI

[email protected]


2

1

3

4

5

6

7

8

9

10

P1=0.52; P2=0.48P1=0.52; P2=0.48

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.08; P2=0.92P1=0.08; P2=0.92

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

- The maximum of G happens for 0.7 (7 YES out of 10 )- After enough iterations: 7 players will say all the time YES (P1 = 1 and P2=0) and 3 players will say all the time NO (P1=0 and P2=1)

Referee


[email protected]


2

1

3

4

5

6

7

8

9

10

P1=1; P2=0P1=1; P2=0

P1=0; P2=1P1=0; P2=1

P1=0; P2=1P1=0; P2=1

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

P1=0; P2=1P1=0; P2=1

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

Referee

YesYes

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

NoNo

YesYes

YesYes


[email protected]

Self-optimization and Self-organization with Goore Game GOORE GAME


Each player plays solely in a greedy fashion, voting each time the way that seems to give the player the best payoff.

This is somewhat unexpected. Greed affects outcomes in an unpredictable manner: the player does not attempt to predict the behavior of other players.

Instead, each player performs by trial and error and simply preferentially repeats those actions that produce the best result for that player.

[email protected]

Self-optimization and Self-organization with Goore Game GOORE GAME


The game is a non-zero-sum game.

Unlike the games traditionally studied in the AI literature (Chess, Checkers, etc.) GG is essentially a distributed game.

The players of the game are ignorant of all of the parameters of the game. All they know is that they have to make a choice, for which they are either rewarded or penalized. They have no clue as to how many other players there are, how they are playing, or even how/why they are rewarded/ penalized.

The stochastic function used to reward or penalize the players, after measuring their performance as a whole, can be completely arbitrary, as long as it is uni-modal.

The game can achieve a globally optimal state with N-players without having to explicitly dictate the action to each player. The players self-organize and self-optimize based on the reward function.

[email protected]

Self-optimization and Self-organization with Goore Game GENERAL PROBLEM

The general problem:

Coordination of decentralized decision makersCoordination of decentralized decision makers

A lot of military applications !

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS

Applications of Goore Game:

Telecom

Mobots

Flight control

My research:

D. Calitoiu, B. John Oommen (Carleton U.) and Ole-Christoffer Granmo (Agder University, Norway): Identify Traitors.

D. Calitoiu : Search algorithms.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SENSOR NETWORKS

QoS Control in Sensor Networks (adapted from [4])

Consider a basic sensor network that consists of a number of sensors and a single base station.

Each sensor can be either powered-down, powered-up, or damaged.

The base station receives packets only from powered-up sensors.

[4] R. Iyer and L. Kleinrock: QoS control for sensor networks in IEEE International Conference on Communications, 2003. Vol.1, Pages: 517 - 521

[email protected]


Question: How can the base station control the sensors so that exactly Q of them are powered-up at any given time?

Question: How can the base station control the sensors so that exactly Q of them are powered-up at any given time?

[email protected]


More problems:- Decentralization: (1) The sensors cannot communicate with each other, and

(2) the base station cannot address sensors individually.- Unknown Environment: The number of sensors is unknown to the base station- Stochastic Environment: Communication is noisy in the sense that messages may be lost- Dynamic Environment: The number of available functioning sensors is

changing with time

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: MOBOTS

B. Tung and L. Kleinrock, “Using finite state automata to produce self-optimization and self-control,” IEEE Transactions on parallel and distributed systems vol. 7, no. 4, 1996.

Mobots

Question: Is it possible to get the mobots to complete a complex task (one that requires the

cooperation of many) without individually directing each one through every subtask?

Response: Yes! GG can produce such cooperation:

Consider a landscape containing pieces of ore (minerals). It is desired that the ore be collected and sorted by type (exp: by colors).

This is a task that can be completed correctly by one mobot, but it is faster to utilize more than one.

Suppose that we have a population of 6 mobots.

The mobots have access to a single shared access communication channel with the base station.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: MOBOTS

B. Tung and L. Kleinrock, “Using finite state automata to produce self-optimization and self-control,” IEEE Transactions on parallel and distributed systems vol. 7, no. 4, 1996.

Actions:

Collecting (Searching out ore, retrieving it with the mobot arm, and placing it in a sorting bin.)

Sorting (Retrieving ore from the sorting bin, sorting it based on its color, and placing it in the correct finished bin).

The mobots can communicate their action. A base station rewards/penalizes them.

After few iterations, mobots 1-4 settle on collecting behavior and mobots 5-6 choose sorting behavior (the entire population consists of 6 mobots).

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: FLIGHT CONTROL

S. Ho, N. Nassef, N. Pornsin-Sirirak, Y-C. Tai, C-M. Ho, “Flight dynamics of small vehicles,” ICAS CONGRESS, 2002.

Small payload carrying flight vehicle – for remote sensing missions where access is restricted due to various hazards

These vehicles have a typical wingspan of 15 cm, with a weight restriction of less than 100g.

The goal is to consider a flapping wing design: flow control technique.

Solution: microvalve actuator

Fabricated on wing membrane (thin layers of parylene and gold)

Electrically actuated

Add virtually no inertia load (few microns thick)

Flight control with GG

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: FLIGHT CONTROL

S. Ho, N. Nassef, N. Pornsin-Sirirak, Y-C. Tai, C-M. Ho, “Flight dynamics of small vehicles,” ICAS CONGRESS, 2002.

Goore Game with microvalve actuator. The reward function was based on CL/CT (aerodynamic lift and thrust coefficients).

GG proved capable of significantly altering the aerodynamic performance of the wings.

Over 300% changes in CL\CT ratios were achieved using single and double variable optimization and control.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS

Applications of Goore Game:

Telecom

Mobots

Flight control

My research:

D. Calitoiu, B. John Oommen (Carleton U.) and Ole-Christoffer Granmo (Agder University, Norway): Identify Traitors.

D. Calitoiu : Search algorithms.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: IDENTIFY TRAITORS

Joint work: Dragos Calitoiu and B. John Oommen (Carleton U.) and Ole-Christoffer Granmo (Agder University, Norway).

One or many players deliberately decide to use a different rule to respond to the teacher or to learn: a traitor or many traitors in a group with honest players.

Our task is to discover the conditions under which a teacher is able to realize that there are traitors in the group, to estimate their number and also, if it is possible, to identify (to label) them.

Investigating the Goore Game with Traitor Players

[email protected]


2

Referee

1

3

4

5

6

7

8

9

10

There are many algorithms to model the Traitors.

The traitor’s main characteristic is that they behave differently from honest players.

The Teacher is able to discover their contribution in the collective response!

APPLICATIONS: IDENTIFY TRAITORS

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SEARCH ALGORITHMS

New search algorithm for randomly located objects: a non-cooperative agent based approach

The main application: anti-personal mine detection. This research can be extended to any type of exploration on ground or aerial vehicle (on Earth or for conducting planetary science missions).

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SEARCH ALGORITHMS

Movement step = 1 Movement step from A Levy-flight distribution =2 and length=3

Movement step from A Levy-flight distribution =2 and length=4

Levy-flight distribution p(length)=length- ;

[email protected]

Self-optimization and Self-organization with Goore Game OPEN PROBLEMS

Q1 – G – variable in time? What type of variability is accepted in order to obtain convergence?

Q2 – G – can be multimodal criterion?

Q3 – G – can be discontinuous criterion?

Q4 – Co-operation and competition between two teams?

Q5 – Search algorithms with adaptive steps

[email protected]

Self-optimization and Self-organization with Goore GameOPEN FLOOR

QUESTIONS AND THOUGHTS?

The presentation will be uploaded on www.corsottawa.org

[email protected] Dragos Calitoiu Bank of America [email protected] Self-optimization...

Documents

Transcript of [email protected] Dragos Calitoiu Bank of America [email protected] Self-optimization...