The Reinforcement Learning Competition 2014

Competition Reports

FALL 2014 61Copyright © 2014, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

The Reinforcement Learning Competition

2014

Christos Dimitrakakis, Guangliang Li, Nikolaos Tziortziotis

Reinforcement learning (RL) is the problem of learninghow to act only from interaction and limited rein-forcement. An agent takes actions in an unknown envi-

ronment, observes their effects, and obtains rewards. Theagent’s aim is to learn how the environment works in orderto maximize the total reward obtained during its lifetime. RLproblems are quite general. They can subsume almost anyartificial intelligence problem, through suitable choices ofenvironment and reward. For example, learning to playgames can be formalized as a problem where the environ-ment is adversarial and there is a positive (negative) rewardwhen the agent wins (loses). An introduction to RL is givenby the book of Sutton and Barto (1998), while a goodoverview of recent algorithms is given by Szepesvári (2010).

Most RL problems are formalized by modeling the interac-tion between the agent and the environment as a discrete-time process. At each time t, the agent takes an action at fromsome action set A and obtains an observation xt+1 ∊ X and ascalar reward rt+1 ∊ R. Part of the problem is estimating amodel for the environment, that is, learning how the obser-vations and rewards depend upon the actions taken. Howev-er, the agent’s actual objective is to maximize the total reward

n Reinforcement learning is one of the mostgeneral problems in artificial intelligence. Ithas been used to model problems in automat-ed experiment design, control, economics,game playing, scheduling, and telecommuni-cations. The aim of the reinforcement learn-ing competition is to encourage the develop-ment of very general learning agents forarbitrary reinforcement learning problemsand to provide a test bed for the unbiasedevaluation of algorithms.

Competition Reports

62 AI MAGAZINE

Figure 1. History of Reinforcement Learning Competition and Collocated Events.

Competition Reports

FALL 2014 63

during its lifetime. Thus, it must not waste a lot oftime learning aboutenvironment details that are notrelevant for obtaining rewards. It should properlybalance exploitation of what is already known withexploration of unknown parts of the environment.This exploration-exploitation dilemma is at the coreof the reinforcement learning problem, and it is dueto its interactive nature.

Comparisons between different RL algorithms areinherently difficult. Authors of algorithms mayspend a lot of time tuning them to work well in spe-cific types of domains, even when they are ostensiblylearning from scratch. What is needed is a way to per-form unbiased comparisons. This is offered by the RLcompetition. While unbiased comparisons for staticdata sets are a well-established practice, the interac-tive nature of RL means that no fixed data sets arepossible. However, the same principle applies to com-plete environments. A designer can tune algorithmson a small set of representative environments, andthen use another set of similar, but nonidenticalenvironments to test them on.

This is the principle used by the reinforcementlearning competition, through the use of generalizeddomains. Each domain corresponds to a differenttype of problem, such as helicopter control. Trainingand testing is performed on specific domaininstances of the problem, for example, helicopterswith different physical characteristics. The competi-tion format ensures an unbiased comparisonbetween algorithms, that truly tests their ability tolearn. The next section gives an overview of the com-petition’s history and its evolution to the current for-mat. The goals and format of the upcoming compe-tition are detailed in the 2014 competition section.

History of the CompetitionThe competition aims to foster development of gen-eral RL algorithms and to promote their systematiccomparison. It has generated a set of benchmarkdomains and results, which were used as a basis ofcomparisons in further work. The competition andthe related open source code have been used inteaching and research projects (Whiteson, Tanner,and White 2010) and have greatly increased theinterest and focus in RL by clarifying its objectivesand challenges, making the area more exciting andenjoyable, especially for novices in the RL communi-ty.

The competition history is summarized in figure 1,starting with a workshop at the 2004 Neural Infor-mation Processing Systems (NIPS) conference, organ-ized by Rich Sutton and Michael Littman. This eventdiscussed a standard set of benchmarks and a series ofcompetitive events to enhance reinforcement learn-ing research. The next event was held at NIPS 2005,

U = rtt=1

T

! with more than 30 teams participating. This eventfeatured a new RL framework (later called RL-Glue)developed at the University of Alberta. This offers alanguage independent API for evaluating agents indifferent environments and experimental setups(Tanner and White 2009). RL-Glue was used by all lat-er competitions and events.

After another event at the International Confer-ence on Machine Learning (ICML) in 2006, organizedby Shie Mannor and Doina Precup, the first annualRL competition was held at NIPS 2006. This was thefirst official competition with prizes and winner cer-tificates. Eight teams participated, and the winnerspresented their work at the workshop.

The second annual RL competition was held byShimon Whiteson, Brian Tanner, and Adam White in2008 at an ICML workshop, in which 21 teams par-ticipated (Whiteson, Tanner, and White 2010). Afterthis success, the third reinforcement learning com-petition was held in 2009 with a joint workshop atICML, Uncertainty in Artificial Intelligence (UAI),and Conference on Learning Theory (COLT). Thecompetition attracted 51 teams comprising 90 mem-bers. Prizes included electronic devices and gift cer-tificates, while travel scholarships were available forparticipants to defray the cost of attendance.

The 2008 and 2009 competition had higher levelsof participation than previous events. However, thecompetition was not revived until 2013, with anassociated ICML workshop, to ensure its continuingexistence.

DomainsThe competition includes domains with various char-acteristics and complexity to encourage researchersto develop methods for challenging problems. The2005 event featured three continuous domains(Mountain Car, Puddle World, and Cart Pole) andthree discrete domains (Blackjack, Sensor Network,and Taxi). In addition, there were two triathlonevents, in which the same agent competed on all thethree continuous or discrete domains.

The 2006 competition added the adversarial cat-mouse domain and octopus arm control. The latter isa large-scale domain with high-dimensional contin-uous observation and action spaces (Yekutieli et al.2005). However, the proliferation of domains meantthat many domains had very few entrants. For thatreason, the total number was reduced steadily, butnew domains were introduced every year. The 2008competition featured Helicopter Hovering (Bagnelland Schneider 2001; Ng et al. 2006), Keepaway Soc-cer (Stone et al. 2006; Stone, Sutton, and Kuhlmann2005), and Real-Time Strategy. Helicopter Hoveringand Keepaway were high-dimensional domains,which were problematic for traditional RL methods.The 2009 competition introduced the classic Acrobotproblem (DeJong and Spong 1994) and the Infinite

Mario Game. Finally, the 2013 competition intro-duced the Invasive Species Problem (Muneepeerakulet al. 2007), a very high-dimensional structureddomain.

Polyathlon domains are a special characteristic ofthe RL competition, starting with the triathlonevents of 2006. In those, agents are tested in a seriesof tasks, which could be completely unrelated andare not known in advance. Consequently, entrantsmust submit very general agents. They must be ableto learn to act optimally in essentially arbitrary envi-ronments, while exploiting whatever structure theycan discover.

Evaluation of Agent PerformanceIn the competition, the agent performance is basedon the total reward the agent received. At the firstbenchmarking event in 2005, there was no separa-tion between training and testing, and consequentlyallowed considerable prior knowledge to be used(Whiteson, Tanner, and White 2010).

To promote agents that really learn, the 2006 eventseparated agent evaluation into training and testingphases. During training, participants were allowedunlimited runs on a fixed training set of problems.The testing phase used a different but similar set ofproblems. Agents were allowed a period of free explo-ration in the test set, which did not count toward thefinal score. Later events separated this out in anexplicit proving phase, where entrants had a limitednumber of attempts to try out their agents in newenvironments. The combination of this with aleaderboard, which compared the results of differentteams, motivated participants to continuallyimprove their agents throughout the competition.

The 2014 CompetitionThe organizers of the 2014 competition are ChristosDimitrakakis (Chalmers University of Technology),Guangliang Li (University of Amsterdam), and Niko-laos Tziortziotis (University of Ioannina). The com-petition server will open this fall, and the competi-tion will take place in November–December 2014.Competitors will have the opportunity to compete indomains with various characteristics (such as com-plexity, difficulty, and others) in an unbiased man-ner. At the end of the competition, authors will berequested to submit a paper to the competition work-shop, collocated with a major conference in the AIfield, present their methods, and receive their prizes.

Competition FormatThe competition consists of three main phases: train-ing, proving, and testing. During the first two phasesthe participants will have the opportunity to refinetheir agents. Only the results of the testing phase willbe taken into account for evaluation. As in the lastfour competitions, our evaluation criterion will be

the online performance of the agents: the totalreward collected. As the testing instances will differfrom both the training and proving instances, thisformat benefits agents that can learn efficientlyonline.

In the training phase, competitors receive a set oftraining environments, which are specific instancesof the generalized domain. For the helicopter exam-ple, this would mean that the physical characteris-tics of the helicopter may vary across instances. Thecompetitors can use the training environments tomake their algorithms as robust as possible. Thislocal development allows an unlimited number ofattempts.

In the proving phase of the competition, com-petitors can try their algorithms in preliminaryexperiments in new instances on the competitionserver. They can thus verify that their algorithms cangeneralize well outside the training set. The per-formance of various competitors in this phase is vis-ible in a leaderboard, and helps to ensure that com-petitors can perform well. The number of attemptsper week in this phase is limited. However, the scoresduring this phase do not count toward the final com-petition score.

The final competition phase is testing. During thisphase, competitors test their algorithms in newdomain instances on the competition server. Thescore during this phase is used for the agent’s evalu-ation. The scores across different domains are treat-ed independently, and every team is allowed to com-pete in multiple domains.

The competition platform is based on the RL-Glue1 API. This is a standard interface for connectingRL agents and environments and allows participantsto create their agents in their language of choice.Apart from the training phase, participants connectto the central competition server over the Internet,to obtain new environments and report results. Dur-ing proving and testing, the competition softwareupdates the leaderboards on the web. In this way,participants can see how their agents compare withothers.

Getting Involved.We are actively looking for volunteers to help withvarious aspects of the competition. First, we welcomethe donation of prizes to foster additional involve-ment in the competition. These can include travelscholarships, monetary prizes, or trophies. Second,we would like to invite volunteers to help with tech-nical issues. To ensure the continuity of the compe-tition, we request enthusiastic individuals that canhelp with the technical side of the competition for2014. Third, we request submissions of newdomains. You are welcome to email us your sugges-tions. Of particular interest for the competition thisyear will be (1) adversarial problems and (2) gamesproblems with changing tasks. Finally, and most

Competition Reports

64 AI MAGAZINE

importantly, we encourage your participation withagent submissions. We hope to have a vibrant andexciting 2014 competition.

ReferencesBagnell, J. A., and Schneider, J. G. 2001. Autonomous Heli-copter Control Using Reinforcement Learning Policy SearchMethods. In Proceedings of the IEEE International Conferenceon Robotics and Automation, volume 2, 1615–1620. Piscat-away, NJ: Institute for Electrical and Electronics Engineers.

DeJong, G., and Spong, M. W. 1994. Swinging Up the Acro-bot: An Example of Intelligent Control. In Proceedings of theAmerican Control Conference, volume 2, 2158–2162. Piscat-away, NJ: Institute for Electrical and Electronics Engineers.

Muneepeerakul, R.; Weitz, J. S.; Levin, S. A.; Rinaldo, A.; andRodriguez-Iturbe, I. 2007. A Neutral Metapopulation Modelof Biodiversity in River Networks. Journal of Theoretical Biol-ogy 245(2): 351–363. dx.doi.org/10.1016/j.jtbi.2006.10.005

Ng, A. Y.; Coates, A.; Diel, M.; Ganapathi, V.; Schulte, J.; Tse,B.; Berger, E.; and Liang, E. 2006. Autonomous Inverted Hel-icopter Flight Via Reinforcement Learning. In ExperimentalRobotics IX, 363–372. Berlin: Springer.

Stone, P.; Kuhlmann, G.; Taylor, M. E.; and Liu, Y. 2006.Keepaway Soccer: From Machine Learning Testbed toBenchmark, 93–105. In RoboCup 2005: Robot Soccer WorldCup IX. Berlin: Springer.

Stone, P.; Sutton, R. S.; and Kuhlmann, G. 2005. Reinforce-ment Learning for Robocup Soccer Keepaway. AdaptiveBehavior 13(3): 165–188. dx.doi.org/10.1016/j.jtbi.2006.10.005

Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning:An Introduction. Cambridge, MA: The MIT Press.

Szepesvári, C. 2010. Algorithms for Reinforcement Learn-ing. In Synthesis Lectures on Artificial Intelligence and MachineLearning, volume 4. San Rafael, CA: Morgan & ClaypoolPublishers.

Tanner, B., and White, A. 2009. RL-Glue: Language-Inde-pendent Software for Reinforcement-Learning Experi-ments. The Journal of Machine Learning Research 10: 2133–2136.

Whiteson, S.; Tanner, B.; and White, A. 2010. Report on the2008 Reinforcement Learning Competition. AI Magazine31(2): 81.

Yekutieli, Y.; Sagiv-Zohar, R.; Aharonov, R.; Engel, Y.;Hochner, B.; and Flash, T. 2005. Dynamic Model of theOctopus Arm. I. Biomechanics of the Octopus ReachingMovement. Journal of Neurophysiology 94(2): 1443–1458.dx.doi.org/10.1152/jn.00684.2004

Christos Dimitrakakis is a researcher and lecturer atChalmers University of Technology. His research interestsinclude decision theory and reinforcement learning, secu-rity, and privacy. He obtained his Ph.D. in 2006 from EPFL,Switzerland.

Guangliang Li is a Ph.D. candidate at the University ofAmsterdam. His research focuses on reinforcement learningand human computer interaction.

Nikolaos Tziortziotis is a Ph.D. candidate at University ofIoannina. His research interests include machine learningand reinforcement learning.

Competition Reports

FALL 2014 65

CALL FOR PAPERSThe Twenty Eighth International Conference on Industrial, Engineering

& Other Applications of Applied Intelligent Systems (IEA/AIE-2015)

Seoul, South Korea – June 10 - 12, 2015IEA/AIE-2015 Conference website:

http://www.ieaaie2015.org/

Sponsored by: International Society of Applied Intelligence (ISAI) and In Cooperation with: Association for the Advance-ment of Artificial Intelligence (AAAI); Association for Computing Machinery (ACM/SIGART); Austrian Association for Artifi-cial Intelligence (ÖGAI); Catalan Association for Artificial Intelligence (ACIA); Delft University of Technology; InternationalNeural Network Society (INNS); Italian Artificial Intelligence Association (AI*IA); Japanese Society for Artificial Intelligence(JSAI); Lithuanian Computer Society-Artificial Intelligence Section (LIKS-AIS); Spanish Society for Artificial Intelligence(AEPIA); Society for the Study of Artificial Intelligence and the Simulation of Behaviour (AISB); Taiwanese Association forArtificial Intelligence (TAAI); Taiwanese Association for Consumer Electronics (TACE); Korea Business Intelligence and DataMining Society; Texas State University-San Marcos; VU University Amsterdam.

IEA/AIE 2015 continues the tradition of emphasizing applications of applied intelligent systems to solve real-life problemsin all areas including engineering, science, industry, automation & robotics, business & finance, medicine and biomedicine,bioinformatics, cyberspace, and human-machine interactions. IEA/AIE-2015 will include oral presentations, invited speak-ers, and special sessions. Paper submission is required by November 23, 2014. Submission instructions and additionaldetails may be obtained from the website http://www.ieaaie2015.org/.

The Reinforcement Learning Competition 2014

Documents

Transcript of The Reinforcement Learning Competition 2014