Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS...

Reinforcement Learning in Strategy Selection for a

Coordinated Multirobot SystemIEEE TRANSACTIONS ON SYSTEMS, MAN, AND

CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007

Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen,and Ching-Huang Lee

Advisor : Ming-Yuan Shieh

Student : Ching-Chih Wen

S/N : M9820108

PPT製作率︰ 100%

Abstract Introduction SYSTEM FORMATION

Basic Behavior Role Assignment Strategies Learning System Dispatching System

EXPERIMENTS CONCLUSION

OUTLINE

This correspondence presents a multi-strategy decision-making system for robot soccer games. Through reinforcement processes, the coordination between robots is learned in the course of game.

The responsibility of each player varies along with the change of the role in state transitions. Therefore, the system uses several strategies, such as offensive strategy, defensive strategy, and so on, for a variety of scenarios.

The major task assignment to robots in each strategy is simply to catch good positions.

Utilizing the Hungarian method, each robot can be assigned to its assigned spot with minimal cost.

ABSTRACT

Reinforcement learning has attracted increasing interest in the fields of machine learning and artificial intelligence recently since it promises a way to use only reward and punishment in achieving a specific task [1].

INTRODUCTION(1/3)

Traditional reinforcement-learning algorithms are often concerned with single-agent problems; however, no agent can act alone since it must interact with other agents in the environment to achieve a specific task [3].

Therefore, we here focus on high-level learning rather than the basic-behavior learning.

The main objective of this correspondence is to develop the reinforcement-learning architecture for multiple coordinate strategies in a robot soccer system.

INTRODUCTION(2/3)

In this correspondence, we utilize the robot soccer system as our test platform since this system can fully implement a multi-agent system.

INTRODUCTION(3/3)

SYSTEM FORMATION

1) Go to a Position 2) Go to a Position With Avoidance 3) Kick a Ball to a Position

Fig.4 Fig.5

SYSTEM FORMATION-Basic Behavior

1) Attacker position

Fig.6 Fig.7

SYSTEM FORMATION-Role Assignment(1/3)

2) Sidekick position

Fig.9 Fig.10

3) Backup position

4) Defender position

5) Goalkeeper position

Fig.11

1) Primary part:

The attacker’s weighting is . 2) Offensive part:

The weighting of sidekick and backup are and , respectively.

3) Defensive part:

The weighting of defender and goalkeeper are and , respectively.

SYSTEM FORMATION- STRATEGIES(1/2)

According to the different weightings, different strategies can be developed. We can develop three strategies as follows:

1) Normal strategy:

is an example used in our simulations.

2) Offensive strategy:

3) Defensive strategy:

SYSTEM FORMATION- STRATEGIES(2/2)

, , , , 1,1,1,1,1a s b d g a s b d gW W W W W W W W W W

> max , min , max , 2,1.5,1.5,1,1a s b s b d g a s b d gW W W and W W W W W W W W W

max , min , max , , , , , 2,1,1,1.5,1.5a d g d g s b a s b d gW W W and W W W W W W W W W

Fig.12

SYSTEM FORMATION- LEARNING SYSTEM(1/3)

1) States:

Fig.13 2) Actions :The actions of Q-learning are spontaneous decisions on

the strategies taken in each learning cycle. Each action is represented by a set of weights.

3) Reward Function:

— Gain a point: r = 1.

— Lose a point: r = −1.

—Others: r = 0. 4) Q-Learning:Based on the states, actions, and reward function, we

can fully implement the Q-learning method.

Here, the ε-greedy method is chosen as action selection policy, and the probability of exploration ε is 0.1. The learning rate α is 0.8, and the discount factor γ is 0.9.

First, we introduce the method to compute cost.

Since the cost of each robot reaching each target is known, we can compute the summation costs of all robots to their dispatching positions.

SYSTEM FORMATION- DISPATCHING SYSTEM

Multiple Strategy Versus the Benchmark

Fig.14 Fig.15

EXPERIMENTS(1/4)

Multiple Strategy Versus Each Fixed Strategy

Fig.16 Fig.17

EXPERIMENTS(2/4)

Multiple Strategy Versus Defensive Strategy

Fig.18 Fig.19

EXPERIMENTS(3/4)

Multiple Strategy Versus Normal Strategy

Fig.20 Fig.21

EXPERIMENTS(4/4)

1) Hierarchical architecture: The system is designed hierarchically, from basic behaviors to strategies. In other vehicle-systems, the basic behaviors can also be utilized.

2) A general learning system platform: If another strategy is designed, it can easily be added into our learning system without much alteration. Through the learning process, we can map the state to the best strategy.

3) Dynamic and quick role assignment: In this system, the role of each robot is changeable. We use the linear programming method to speed up our computation and to find the best dispatch under a strategy.

CONCLUSION

[1] F. Ivancic, “Reinforcement learning in multiagent systems using game theory concepts,” Univ. Pennsylvania, Philadelphia, Mar. 2001. Tech. Rep. [Online]. Available: http://citeseer.ist.psu.edu/531873.html

[2] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. ControlOptim. , vol. 42, no. 4, pp. 1143–1166, 2003. [3] Y. Shoham, R. Powers, and T. Grenager, “On the agenda(s) of research on multi-agent learning,” in ArtificialMul tiagent

Learning: Papers From the 2004 Fall Symposium, S. Luke, Ed. Menlo Park, CA: AAAI Press, Tech. Rep. FS-04-02, 2004, pp. 89–95.

[4] M. Kaya and R. Alhajj, “Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1210–1223, Apr. 2004.

[5] M. C. Choy, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent architecture for real-time traffic signal control,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 5, pp. 597–607, Sep. 2003.

[6] K. S. Hwang, S. W. Tan, and C. C. Chen, “Cooperative strategy based on adaptive-learning for robot soccer systems,” IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 569–576, Aug. 2004.

[7] K. H. Park, Y. J. Kim, and J. H. Kim, “Modular Q-learning based multiagent cooperation for robot soccer,” Robot. Auton. Syst., vol. 35, no. 2, pp. 109–122, May 2001.

[8] H. P. Huang and C. C. Liang, “Strategy-based decision making of a soccer robot system using a real-time self-organizing fuzzy decision tree,” Fuzzy Sets Syst., vol. 127, no. 1, pp. 49–64, Apr. 2002.

[9] M. Asada and H. Kitano, “The RoboCup Challenge,” Robot. Autonom. Syst., vol. 29, no. 1, pp. 3–12, 1999. [10] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research. Boston, MA: McGraw-Hill, 2001. [11] V. Chvatal, Linear Programming. San Francisco, CA: Freeman, 1983. [12] Accessed on 22th of March 2003. [Online]. Available: http://www.fira. net/soccer/simurosot/overview.html [13] C. H. Papadimitriou and K. Steiglitz, CombinatorialOptimization: Algorithms and Complexity. Englewood Cliffs, NJ:

Prentice-Hall, 1982. [14] K. S. Hwang, Y. J. Chen, and T. F. Lin, “Q-learning with FCMAC in multi-agent cooperation,” in Proc. Int. Symp.

NeuralNetw ., 2006, vol. 3971, pp. 599–602.

REFERENCES

Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS...

Documents

Transcript of Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS...

1374 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND …sipi.usc.edu/~kosko/SMCFinal.D05.pdf1374 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO. 6, DECEMBER

Learning in MultiRobot Systems. Multirobot Approach Why group behavior is useful How group behavior can be controlled Why group behavior is very hard.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B … · 142 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 36, NO. 1, FEBRUARY 2006 Fig.

Cybernetics and Systems

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS… Saliency... · IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS 1 Saliency Detection by Multiple Instance

Off-Line Programming Techniques for Multirobot Cooperation ...€¦ · International Journal of Advanced Robotic Systems Off-Line Programming Techniques for Multirobot Cooperation

Progress of Cybernetics - pangaro.com meaning of cybernetics in behavioural... · 1) Cybernetics drew attention to the form and dynamics, i.e. the organi~ zation of systems, which

Yuan Zhou, Hesuan Hu, Member, IEEE IEEE Proof · 2017. 2. 27. · IEEE Proof IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS 1 Collision and Deadlock Avoidance in Multirobot

Cybernetics, information theory, interactional theory systems paradigms.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS… · ieee transactions on systems, man, and cybernetics—part c: applications and reviews, vol. 40, no. 6, november 2010 601 educational

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS ...

2008 IEEE International Conference on Systems, Man and Cybernetics

CURRENT TOPICS IN CYBERNETICS AND SYSTEMS - …978-3-642-93104-8/1.pdf · CURRENT TOPICS IN CYBERNETICS AND SYSTEMS (Proceedings of the Fourth International Congress of Cybernetics

A Taxonomy of Multirobot Systems - UTKweb.eecs.utk.edu/~leparker/Courses/CS594-spring03/Student_Presentations/Lan_Lin-3-11.pdfA Taxonomy of Multirobot Systems ... • Multiple Robots

Evolutionary Online Learning in Multirobot Systems

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B ...swiftlet.ucsd.edu/publications/2009/doshi_SMCB09.pdf · IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B:

Multirobot Coordination for Space Exploration

Collective Transport in Autonomous Multirobot systems

Introduction to Cybernetics and the Design of Systems, Working ...

on Systems, Man and Cybernetics IEEE Transactions Pattern ...cgm.cs.mcgill.ca/~godfried/teaching/pr-notes/nn.edit.pdf- 25 - on Systems, Man and Cybernetics, vol. SMC-6, June 1976,