[email protected] Dragos Calitoiu Bank of America [email protected] Self-optimization...

47
[email protected] Dragos Calitoiu Bank of America [email protected] m Self-optimization and self-organization with Goore Game (a distributed non-cooperative non-zero-sum N-person game): Theoretical Results, Applications and Open Problems CORS – Ottawa section March 26, 2009

Transcript of [email protected] Dragos Calitoiu Bank of America [email protected] Self-optimization...

Page 1: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Dragos CalitoiuBank of America

[email protected]

Self-optimization and self-organization with Goore Game

(a distributed non-cooperative non-zero-sum N-person game):

Theoretical Results, Applications and Open Problems

CORS – Ottawa sectionMarch 26, 2009

Self-optimization and self-organization with Goore Game

(a distributed non-cooperative non-zero-sum N-person game):

Theoretical Results, Applications and Open Problems

CORS – Ottawa sectionMarch 26, 2009

Page 2: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game CONTENT OF PRESENTATION

Goore Game – Introduction

Learning Automata and Goore Game implemented with LA

Applications

My Research

Open Problems

Objective: To introduce the methodology of distributed control with GG for future research

Page 3: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game GG - BACKGROUND

Goore Game - Description:

An example of self-organization and self-optimization game studied in the field of AI

Presented by Tsetlin in 1963 [1] and analyzed in detail in [2] and [3].

[1] M.L. Tsetlin, “Finite automata and the modeling of the simplest formsof behavior,” Uspekhi Matem Nauk, vol. 8, pp. 1-26, 1963.[2] K.S. Narendra and M.A.L. Thathachar, Learning Automata, Prentice-Hall, 1989.[3] M.A.L. Thathachar M.T. Arvind, “Solution of Goore game using models of stochastic learning automata,” Journal of Indian Institute of Science, no. 76, pp. 47-61, 1997.

Page 4: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game GG - BACKGROUND

Goore Game’s features:

* Imagine a large room with N cubicles and a raised platform. A voter sits in each cubicle and a Referee stands on the platform. The Referee conducts a series of voting rounds:

- On each round the voters vote either “Yes” or “No” (the issue is unimportant) simultaneously and independently (they do not see each other) and the Referee counts the fraction, f, of “Yes” votes.

* The Referee has a unimodal performance criterion G(f), which is optimized when the fraction of “Yes” votes is exactly f0.

* The voting round ends with the Referee awarding a dollar with probability G(f) and assessing a dollar with probability 1 - G(f) to every voter independently.

* On the basis of their individual gains and losses, the voters then decide, again independently, how to cast their votes on the next round.

* No matter how many players there are, after enough trials, the number of “Yes” votes will approximate N*f0.

Page 5: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameTHE GAME – STEP BY STEP

2

Referee

1

3

4

5

6

7

8

9

10

Imagine a large room containing N (N=10 in our picture) cubicles and a raised platform.

A voter sits in each cubicle and a Referee stands on the platform.

Page 6: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameTHE GAME – STEP BY STEP

2

Referee

1

3

4

5

6

7

8

9

10

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

The Referee conducts a series of voting rounds as follows: On each round the voters vote either “Yes” or “No” (the issue is unimportant) simultaneously and independently (they do not see each other)

Page 7: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameTHE GAME – STEP BY STEP

2

Referee

1

3

4

5

6

7

8

9

10

Counts how many Yes votes there are : 6 out

of 10.

Counts how many Yes votes there are : 6 out

of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

Page 8: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameTHE GAME – STEP BY STEP

2

Referee

1

3

4

5

6

7

8

9

10

Referee has a unimodal performance criterion G(f) Referee has a unimodal

performance criterion G(f)

Counts how many Yes votes there are : 6 out

of 10.

Counts how many Yes votes there are : 6 out

of 10.

Referee awards a dollar with probability G(f) and assesses a dollar with probability 1 - G(f) to every voter independently. On the basis of their individual gains and losses, the voters

then decide, again independently, how to cast their votes on the next round.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

Page 9: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameTHE GAME – STEP BY STEP

2

Referee

1

3

4

5

6

7

8

9

10

G=0.9e-[(0.7-x)*(0.7-x)/0.0625] Referee has a unimodal

performance criterion G(0.6) =0.76692941

G=0.9e-[(0.7-x)*(0.7-x)/0.0625] Referee has a unimodal

performance criterion G(0.6) =0.76692941

Counts how many Yes votes there are : 6 out

of 10.

Counts how many Yes votes there are : 6 out

of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

X

G(X

)

Page 10: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameREWARD/PENALTY

2

Referee

1

3

4

5

6

7

8

9

10

Referee has a unimodal performance criterion

G(0.6) =0.7669

Referee has a unimodal performance criterion

G(0.6) =0.7669

Counts how many Yes votes there are : 6 out

of 10.

Counts how many Yes votes there are : 6 out

of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

Generate a random

variable R1

Generate a random

variable R1

IF R1 < 0.7669 THEN Reward ELSE Penalize

IF R1 < 0.7669 THEN Reward ELSE Penalize

Page 11: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameREWARD/PENALTY

2

Referee

1

3

4

5

6

7

8

9

10

Referee has a unimodal performance criterion

G(0.6) =0.7669

Referee has a unimodal performance criterion

G(0.6) =0.7669

Counts how many Yes votes there are : 6 out

of 10.

Counts how many Yes votes there are : 6 out

of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYes

Generate a random

variable R2

Generate a random

variable R2

IF R2 < 0.7669 THEN Reward ELSE Penalize

IF R2 < 0.7669 THEN Reward ELSE Penalize

The player 2 can be rewarded independently from what he said (No or Yes). “Yes” is not a better

decision than “No”.

Page 12: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameREWARD/PENALTY

2

Referee

1

3

4

5

6

7

8

9

10

Referee has a unimodal performance criterion

G(0.6) =0.7669

Referee has a unimodal performance criterion

G(0.6) =0.7669

Counts how many Yes votes there are : 6 out

of 10.

Counts how many Yes votes there are : 6 out

of 10.

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

NoNo

YesYesGenerate a

random variable R10

Generate a random

variable R10

IF R10 < 0.7669 THEN Reward ELSE Penalize

IF R10 < 0.7669 THEN Reward ELSE Penalize

Page 13: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameREWARD/PENALTY

2

Referee

1

3

4

5

6

7

8

9

10On the basis of their individual gains and losses, the voters then decide, again independently, how to cast

their votes on the next round

Page 14: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameREWARD/PENALTY

2

Referee

1

3

4

5

6

7

8

9

10

YesYes

NoNo

YesYes

YesYes

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

Using implementations with Learning Automata, after enough iterations, the number of players that will say

YES is correlated with the maximum of G(f). If the maximum happens for 7 players, we will find that 7 players will say “Yes” and 3 players will say “No”.

Page 15: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA – LEARNING LOOP

={1, 2,…, r} - r actions

{c1, c2,…, cr} - action penalty probabilities

={0,1} - response from the Environment: reward and penalty

Learning AutomatonLA

Random Environment

RE

={0,1}={1, 2,…, r}

{c1, c2,…, cr}

Page 16: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA –LEARNING LOOP

LA chooses one of the possible set of actions {1,., r} offered by the Random Environment RE

RE rewards or penalizes the chosen action based on penalty probabilities

The RE's response is the input to automaton: LA chooses next action

Chosen action (t) is given as input to the RE

Page 17: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA –LRI SCHEME

• p2 increased• p1 decreased

If 2 is the best action:

2 chosen and rewarded = 0.1

p1()

p2()

0

1

•Example:

=

0.4

0.6

P(t) =

0.3

0.7

P(t+1)

Reward – Inactive Scheme (only reward / no penalty)

Page 18: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameLEARNING AUTOMATA –LRI SCHEME

•Action Probability Updating Scheme:

p1(t+1)=p1(t)+(1-p1(t)) - if 1 is rewarded

p2(t+1)= 1-p1(t)

p2(t+1)=p2(t)+(1-p2(t)) - if 2 is rewarded

p1(t+1)= 1-p2(t)

When 1 or 2 is penalized:

DO NOT MODIFY p1 or p2

Page 19: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

Referee

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Page 20: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

Referee

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6Generate a random number Q1: 0.2344Generate a random number Q1: 0.2344

Page 21: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

Referee

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6Generate a random number Q1: 0.2344Generate a random number Q1: 0.2344

IF Q1 < P1 THEN “YES” (Action1) ELSE “NO” (Action 2)

IF Q1 < P1 THEN “YES” (Action1) ELSE “NO” (Action 2)

Page 22: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

Referee

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6Generate a random number Q1: 0.2344Generate a random number Q1: 0.2344

Q1(=0.2344) < P1(=0.4) “YES”

Q1(=0.2344) < P1(=0.4) “YES”

Page 23: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Random Q1: 0.2344Random Q1: 0.2344 0.2344 < 0.4 : YES 0.2344 < 0.4 : YES

Random Q2: 0.6798Random Q2: 0.6798 0. 6798 < 0.45 : NO 0. 6798 < 0.45 : NO

Random Q3: 0.448Random Q3: 0.448 0.448 < 0.1 : NO 0.448 < 0.1 : NO

Random Q4: 0.1388Random Q4: 0.1388 0.1388 < 0.3 : YES 0.1388 < 0.3 : YES

Random Q5: 0.8976Random Q5: 0.8976 0. 8976 < 0.45 : NO 0. 8976 < 0.45 : NO

Random Q6: 0.6887Random Q6: 0.6887 0.6887 < 0.8 : YES 0.6887 < 0.8 : YES

Random Q7: 0.2983Random Q7: 0.2983 0.2983 < 0.3 : YES 0.2983 < 0.3 : YES

Random Q8: 0.5543Random Q8: 0.5543 0.5543 < 0.4 : NO 0.5543 < 0.4 : NO

Random Q9: 0.6235Random Q9: 0.6235 0.6235 < 0.5 : NO 0.6235 < 0.5 : NO

Random Q10: 0.349Random Q10: 0.349 0.349 < 0.85 : YES 0.349 < 0.85 : YES

Page 24: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

- There are 5 YES votes out of 10.- We compute G(5/10)=0.4745.- The Referee rewards/ penalizes each player independently.

Referee

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

X

G(X

)

Page 25: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

1

3

4

5

6

7

8

9

10

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Random M1: 0.32550.325 < 0.4745? : True – Reward

He said YES P1:= P1 +0.2(1-P1) = 0.52

P2:=1-0.52 = 0.48

Random M1: 0.32550.325 < 0.4745? : True – Reward

He said YES P1:= P1 +0.2(1-P1) = 0.52

P2:=1-0.52 = 0.48

- IF Mi<G(0.5) THEN Reward ELSE Penalize i=1..10

We are using a LRI Scheme :- Reward : If voter i said Yes : increase P1 and decrease P2

If voter i said No : increase P2 and decrease P1

- Penalize : Don’t act !

Referee

Page 26: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameIMPLEMENTATION WITH LRI

2

1

3

4

5

6

7

8

9

10

P1=0.52; P2=0.48P1=0.52; P2=0.48

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

Random M2: 0.77890.7789 < 0.4745? : Not true - Penalize

Do not modify P1 and P2

Random M2: 0.77890.7789 < 0.4745? : Not true - Penalize

Do not modify P1 and P2

- IF Mi<G(0.5) THEN Reward ELSE Penalize i=1..10

We are using a LRI Scheme :- Reward : increase P1 and decrease P2

- Penalize : Don’t act !

Referee

Page 27: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game

2

1

3

4

5

6

7

8

9

10

P1=0.52; P2=0.48P1=0.52; P2=0.48

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.1; P2=0.9P1=0.1; P2=0.9

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

- IF Mi<G(0.5) THEN Reward ELSE Penalize i=1..10

We are using a LRI Scheme :- Reward : increase P1 and decrease P2

- Penalize : Don’t act !

Referee

Random M3: 0.1260.126 < 0.4745? : True – Reward

He said NO P2:= P2+0.2(1-P2) = 0.92

P1:= 1-P1=0.08

Random M3: 0.1260.126 < 0.4745? : True – Reward

He said NO P2:= P2+0.2(1-P2) = 0.92

P1:= 1-P1=0.08

IMPLEMENTATION WITH LRI

Page 28: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game

2

1

3

4

5

6

7

8

9

10

P1=0.52; P2=0.48P1=0.52; P2=0.48

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.08; P2=0.92P1=0.08; P2=0.92

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.45; P2=0.55P1=0.45; P2=0.55

P1=0.8; P2=0.2P1=0.8; P2=0.2

P1=0.3; P2=0.7P1=0.3; P2=0.7

P1=0.4; P2=0.6P1=0.4; P2=0.6

P1=0.5; P2=0.5P1=0.5; P2=0.5

P1=0.85; P2=0.15P1=0.85; P2=0.15

- The maximum of G happens for 0.7 (7 YES out of 10 )- After enough iterations: 7 players will say all the time YES (P1 = 1 and P2=0) and 3 players will say all the time NO (P1=0 and P2=1)

Referee

IMPLEMENTATION WITH LRI

Page 29: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game

2

1

3

4

5

6

7

8

9

10

P1=1; P2=0P1=1; P2=0

P1=0; P2=1P1=0; P2=1

P1=0; P2=1P1=0; P2=1

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

P1=0; P2=1P1=0; P2=1

P1=1; P2=0P1=1; P2=0

P1=1; P2=0P1=1; P2=0

Referee

YesYes

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

NoNo

YesYes

YesYes

IMPLEMENTATION WITH LRI

Page 30: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game GOORE GAME

Goore Game’s features:

Each player plays solely in a greedy fashion, voting each time the way that seems to give the player the best payoff.

This is somewhat unexpected. Greed affects outcomes in an unpredictable manner: the player does not attempt to predict the behavior of other players.

Instead, each player performs by trial and error and simply preferentially repeats those actions that produce the best result for that player.

Page 31: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game GOORE GAME

Goore Game’s features:

The game is a non-zero-sum game.

Unlike the games traditionally studied in the AI literature (Chess, Checkers, etc.) GG is essentially a distributed game.

The players of the game are ignorant of all of the parameters of the game. All they know is that they have to make a choice, for which they are either rewarded or penalized. They have no clue as to how many other players there are, how they are playing, or even how/why they are rewarded/ penalized.

The stochastic function used to reward or penalize the players, after measuring their performance as a whole, can be completely arbitrary, as long as it is uni-modal.

The game can achieve a globally optimal state with N-players without having to explicitly dictate the action to each player. The players self-organize and self-optimize based on the reward function.

Page 32: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game GENERAL PROBLEM

The general problem:

Coordination of decentralized decision makersCoordination of decentralized decision makers

A lot of military applications !

Page 33: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS

Applications of Goore Game:

Telecom

Mobots

Flight control

My research:

D. Calitoiu, B. John Oommen (Carleton U.) and Ole-Christoffer Granmo (Agder University, Norway): Identify Traitors.

D. Calitoiu : Search algorithms.

Page 34: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SENSOR NETWORKS

QoS Control in Sensor Networks (adapted from [4])

Consider a basic sensor network that consists of a number of sensors and a single base station.

Each sensor can be either powered-down, powered-up, or damaged.

The base station receives packets only from powered-up sensors.

[4] R. Iyer and L. Kleinrock: QoS control for sensor networks in IEEE International Conference on Communications, 2003. Vol.1, Pages: 517 - 521

Page 35: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SENSOR NETWORKS

Question: How can the base station control the sensors so that exactly Q of them are powered-up at any given time?

Question: How can the base station control the sensors so that exactly Q of them are powered-up at any given time?

Page 36: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SENSOR NETWORKS

More problems:- Decentralization: (1) The sensors cannot communicate with each other, and

(2) the base station cannot address sensors individually.- Unknown Environment: The number of sensors is unknown to the base station- Stochastic Environment: Communication is noisy in the sense that messages may be lost- Dynamic Environment: The number of available functioning sensors is

changing with time

Page 37: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: MOBOTS

B. Tung and L. Kleinrock, “Using finite state automata to produce self-optimization and self-control,” IEEE Transactions on parallel and distributed systems vol. 7, no. 4, 1996.

Mobots

Question: Is it possible to get the mobots to complete a complex task (one that requires the

cooperation of many) without individually directing each one through every subtask?

Response: Yes! GG can produce such cooperation:

Consider a landscape containing pieces of ore (minerals). It is desired that the ore be collected and sorted by type (exp: by colors).

This is a task that can be completed correctly by one mobot, but it is faster to utilize more than one.

Suppose that we have a population of 6 mobots.

The mobots have access to a single shared access communication channel with the base station.

Page 38: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: MOBOTS

B. Tung and L. Kleinrock, “Using finite state automata to produce self-optimization and self-control,” IEEE Transactions on parallel and distributed systems vol. 7, no. 4, 1996.

Actions:

Collecting (Searching out ore, retrieving it with the mobot arm, and placing it in a sorting bin.)

Sorting (Retrieving ore from the sorting bin, sorting it based on its color, and placing it in the correct finished bin).

The mobots can communicate their action. A base station rewards/penalizes them.

After few iterations, mobots 1-4 settle on collecting behavior and mobots 5-6 choose sorting behavior (the entire population consists of 6 mobots).

Page 39: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: FLIGHT CONTROL

S. Ho, N. Nassef, N. Pornsin-Sirirak, Y-C. Tai, C-M. Ho, “Flight dynamics of small vehicles,” ICAS CONGRESS, 2002.

Small payload carrying flight vehicle – for remote sensing missions where access is restricted due to various hazards

These vehicles have a typical wingspan of 15 cm, with a weight restriction of less than 100g.

The goal is to consider a flapping wing design: flow control technique.

Solution: microvalve actuator

Fabricated on wing membrane (thin layers of parylene and gold)

Electrically actuated

Add virtually no inertia load (few microns thick)

Flight control with GG

Page 40: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: FLIGHT CONTROL

S. Ho, N. Nassef, N. Pornsin-Sirirak, Y-C. Tai, C-M. Ho, “Flight dynamics of small vehicles,” ICAS CONGRESS, 2002.

Goore Game with microvalve actuator. The reward function was based on CL/CT (aerodynamic lift and thrust coefficients).

GG proved capable of significantly altering the aerodynamic performance of the wings.

Over 300% changes in CL\CT ratios were achieved using single and double variable optimization and control.

Page 41: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS

Applications of Goore Game:

Telecom

Mobots

Flight control

My research:

D. Calitoiu, B. John Oommen (Carleton U.) and Ole-Christoffer Granmo (Agder University, Norway): Identify Traitors.

D. Calitoiu : Search algorithms.

Page 42: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: IDENTIFY TRAITORS

Joint work: Dragos Calitoiu and B. John Oommen (Carleton U.) and Ole-Christoffer Granmo (Agder University, Norway).

One or many players deliberately decide to use a different rule to respond to the teacher or to learn: a traitor or many traitors in a group with honest players.

Our task is to discover the conditions under which a teacher is able to realize that there are traitors in the group, to estimate their number and also, if it is possible, to identify (to label) them.

Investigating the Goore Game with Traitor Players

Page 43: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game

2

Referee

1

3

4

5

6

7

8

9

10

There are many algorithms to model the Traitors.

The traitor’s main characteristic is that they behave differently from honest players.

The Teacher is able to discover their contribution in the collective response!

APPLICATIONS: IDENTIFY TRAITORS

Page 44: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SEARCH ALGORITHMS

New search algorithm for randomly located objects: a non-cooperative agent based approach

The main application: anti-personal mine detection. This research can be extended to any type of exploration on ground or aerial vehicle (on Earth or for conducting planetary science missions).

Page 45: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game APPLICATIONS: SEARCH ALGORITHMS

Movement step = 1 Movement step from A Levy-flight distribution =2 and length=3

Movement step from A Levy-flight distribution =2 and length=4

Levy-flight distribution p(length)=length- ;

Page 46: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore Game OPEN PROBLEMS

Q1 – G – variable in time? What type of variability is accepted in order to obtain convergence?

Q2 – G – can be multimodal criterion?

Q3 – G – can be discontinuous criterion?

Q4 – Co-operation and competition between two teams?

Q5 – Search algorithms with adaptive steps

Page 47: Dragos.calitoiu@mbna.com Dragos Calitoiu Bank of America dragos.calitoiu@mbna.com Self-optimization and self-organization with Goore Game (a distributed.

[email protected]

Self-optimization and Self-organization with Goore GameOPEN FLOOR

QUESTIONS AND THOUGHTS?

The presentation will be uploaded on www.corsottawa.org