Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS...

Post on 20-Jan-2016

214 views 1 download

Transcript of Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS...

Reinforcement Learning in Strategy Selection for a

Coordinated Multirobot SystemIEEE TRANSACTIONS ON SYSTEMS, MAN, AND

CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007

Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen,and Ching-Huang Lee

Advisor : Ming-Yuan Shieh

Student : Ching-Chih Wen

S/N : M9820108

PPT製作率︰ 100%

1

Abstract Introduction SYSTEM FORMATION

Basic Behavior Role Assignment Strategies Learning System Dispatching System

EXPERIMENTS CONCLUSION

OUTLINE

2

This correspondence presents a multi-strategy decision-making system for robot soccer games. Through reinforcement processes, the coordination between robots is learned in the course of game.

The responsibility of each player varies along with the change of the role in state transitions. Therefore, the system uses several strategies, such as offensive strategy, defensive strategy, and so on, for a variety of scenarios.

The major task assignment to robots in each strategy is simply to catch good positions.

Utilizing the Hungarian method, each robot can be assigned to its assigned spot with minimal cost.

ABSTRACT

3

Reinforcement learning has attracted increasing interest in the fields of machine learning and artificial intelligence recently since it promises a way to use only reward and punishment in achieving a specific task [1].

Fig.1

INTRODUCTION(1/3)

4

Traditional reinforcement-learning algorithms are often concerned with single-agent problems; however, no agent can act alone since it must interact with other agents in the environment to achieve a specific task [3].

Therefore, we here focus on high-level learning rather than the basic-behavior learning.

The main objective of this correspondence is to develop the reinforcement-learning architecture for multiple coordinate strategies in a robot soccer system.

INTRODUCTION(2/3)

5

In this correspondence, we utilize the robot soccer system as our test platform since this system can fully implement a multi-agent system.

Fig.2

INTRODUCTION(3/3)

6

Fig.3

SYSTEM FORMATION

7

1) Go to a Position 2) Go to a Position With Avoidance 3) Kick a Ball to a Position

Fig.4 Fig.5

SYSTEM FORMATION-Basic Behavior

8

1) Attacker position

Fig.6 Fig.7

Fig.8

SYSTEM FORMATION-Role Assignment(1/3)

9

2) Sidekick position

Fig.9 Fig.10

3) Backup position

4) Defender position

SYSTEM FORMATION-Role Assignment(2/3)

10

5) Goalkeeper position

Fig.11

SYSTEM FORMATION-Role Assignment(3/3)

11

1) Primary part:

The attacker’s weighting is . 2) Offensive part:

The weighting of sidekick and backup are and , respectively.

3) Defensive part:

The weighting of defender and goalkeeper are and , respectively.

SYSTEM FORMATION- STRATEGIES(1/2)

aW

sW bW

dW gW

12

According to the different weightings, different strategies can be developed. We can develop three strategies as follows:

1) Normal strategy:

is an example used in our simulations.

2) Offensive strategy:

is an example used in our simulations.

3) Defensive strategy:

is an example used in our simulations.

SYSTEM FORMATION- STRATEGIES(2/2)

, , , , 1,1,1,1,1a s b d g a s b d gW W W W W W W W W W

> max , min , max , 2,1.5,1.5,1,1a s b s b d g a s b d gW W W and W W W W W W W W W

max , min , max , , , , , 2,1,1,1.5,1.5a d g d g s b a s b d gW W W and W W W W W W W W W

13

Fig.12

SYSTEM FORMATION- LEARNING SYSTEM(1/3)

14

1) States:

Fig.13 2) Actions :The actions of Q-learning are spontaneous decisions on

the strategies taken in each learning cycle. Each action is represented by a set of weights.

SYSTEM FORMATION- LEARNING SYSTEM(2/3)

15

3) Reward Function:

— Gain a point: r = 1.

— Lose a point: r = −1.

—Others: r = 0. 4) Q-Learning:Based on the states, actions, and reward function, we

can fully implement the Q-learning method.

Here, the ε-greedy method is chosen as action selection policy, and the probability of exploration ε is 0.1. The learning rate α is 0.8, and the discount factor γ is 0.9.

SYSTEM FORMATION- LEARNING SYSTEM(3/3)

16

First, we introduce the method to compute cost.

Since the cost of each robot reaching each target is known, we can compute the summation costs of all robots to their dispatching positions.

SYSTEM FORMATION- DISPATCHING SYSTEM

17

Multiple Strategy Versus the Benchmark

Fig.14 Fig.15

EXPERIMENTS(1/4)

18

Multiple Strategy Versus Each Fixed Strategy

Fig.16 Fig.17

EXPERIMENTS(2/4)

19

Multiple Strategy Versus Defensive Strategy

Fig.18 Fig.19

EXPERIMENTS(3/4)

20

Multiple Strategy Versus Normal Strategy

Fig.20 Fig.21

EXPERIMENTS(4/4)

21

1) Hierarchical architecture: The system is designed hierarchically, from basic behaviors to strategies. In other vehicle-systems, the basic behaviors can also be utilized.

2) A general learning system platform: If another strategy is designed, it can easily be added into our learning system without much alteration. Through the learning process, we can map the state to the best strategy.

3) Dynamic and quick role assignment: In this system, the role of each robot is changeable. We use the linear programming method to speed up our computation and to find the best dispatch under a strategy.

CONCLUSION

22

[1] F. Ivancic, “Reinforcement learning in multiagent systems using game theory concepts,” Univ. Pennsylvania, Philadelphia, Mar. 2001. Tech. Rep. [Online]. Available: http://citeseer.ist.psu.edu/531873.html

[2] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. ControlOptim. , vol. 42, no. 4, pp. 1143–1166, 2003. [3] Y. Shoham, R. Powers, and T. Grenager, “On the agenda(s) of research on multi-agent learning,” in ArtificialMul tiagent

Learning: Papers From the 2004 Fall Symposium, S. Luke, Ed. Menlo Park, CA: AAAI Press, Tech. Rep. FS-04-02, 2004, pp. 89–95.

[4] M. Kaya and R. Alhajj, “Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1210–1223, Apr. 2004.

[5] M. C. Choy, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent architecture for real-time traffic signal control,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 5, pp. 597–607, Sep. 2003.

[6] K. S. Hwang, S. W. Tan, and C. C. Chen, “Cooperative strategy based on adaptive-learning for robot soccer systems,” IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 569–576, Aug. 2004.

[7] K. H. Park, Y. J. Kim, and J. H. Kim, “Modular Q-learning based multiagent cooperation for robot soccer,” Robot. Auton. Syst., vol. 35, no. 2, pp. 109–122, May 2001.

[8] H. P. Huang and C. C. Liang, “Strategy-based decision making of a soccer robot system using a real-time self-organizing fuzzy decision tree,” Fuzzy Sets Syst., vol. 127, no. 1, pp. 49–64, Apr. 2002.

[9] M. Asada and H. Kitano, “The RoboCup Challenge,” Robot. Autonom. Syst., vol. 29, no. 1, pp. 3–12, 1999. [10] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research. Boston, MA: McGraw-Hill, 2001. [11] V. Chvatal, Linear Programming. San Francisco, CA: Freeman, 1983. [12] Accessed on 22th of March 2003. [Online]. Available: http://www.fira. net/soccer/simurosot/overview.html [13] C. H. Papadimitriou and K. Steiglitz, CombinatorialOptimization: Algorithms and Complexity. Englewood Cliffs, NJ:

Prentice-Hall, 1982. [14] K. S. Hwang, Y. J. Chen, and T. F. Lin, “Q-learning with FCMAC in multi-agent cooperation,” in Proc. Int. Symp.

NeuralNetw ., 2006, vol. 3971, pp. 599–602.

REFERENCES

23