Hidden Information, Teamwork, and Prediction in Trick-Taking Card...
Transcript of Hidden Information, Teamwork, and Prediction in Trick-Taking Card...
-
RulesofFourHundred Results
Betting
References
CardPlay
Baselines
Hadi Elzayn,MikhailHayhoe,HarshatKumar,MohammadFereydounian
Hidden Information, Teamwork, and Prediction in Trick-Taking Card Games
[email protected],[email protected],[email protected],[email protected]
FutureWork
Random Greedy Heuristic Over400
HeuristicWin% 100 100 - 10.5
Over400Win% 100 100 89.5 -
• Playerssitacrossfromteammembersandaredealt13cardseach• Playersbetexpectednumberof‘tricks’theyplantotake• Ineachroundafterbetsareplaced• Playerstaketurnschoosingacardtoplayinorder,startingwiththeplayerwhotooktheprevioustrick
• Suitoffirstcardplayeddetermines‘leadsuit’• Playersmustplay‘leadsuit’iftheyhaveit• Winneroftrickisdeterminedbyhighestcardof‘leadsuit’orhighestcardof‘trumpsuit’(Hearts)ifanywereplayed
• After13rounds,scoreofeachplayerisincreasedbynumberoftrickstheybetiftheymetorexceededit
• Scoreofplayerisdecreasedbybetiftheywereunabletomeetit• Gameconcludeswhenoneteammemberhas41pointsormore,andtheotherplayerhaspositivepoints
(a) Tricks won over 1,000 games, based on bet made. No bets were higher than 6
(b) Histogram showing difference between tricks won and bet made
(c) Average score showing the performance of the self play NN model
(d) Loss function for the betting NN, trained in tandem with the playing NN
Neural Network Architecture for Card Play
Neural Network Architecture for Betting
LearningProblem:Findanoptimalplayingpolicy𝜋!",∗ tomaximize
expectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:
ReinforcementLearningApproach:
Wedefinetherewardattheendofeachroundby
ApproachinformedbyNeuralFittedQ-iteration[2]inwhichtherewardisprovidedasalabeltothestaterepresentation.Wedefinethelabelas
Weincludetheindicatorasrewardshaping[3]tospeedupconvergence
Thestaterepresentation:• Similarto AlphaGo[4],weencodethestatebyamatrixrepresentationwherethelocationandvalueindicatethecard,timeitwasplayed,andbywhomitwasplayed
• Initialbetsandnumberoftrickswonarealsoencodedinthestaterepresentation
LearningProblem:Findanoptimalbettingpolicy𝛽",∗ tomaximizeexpectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:
SupervisedLearningApproach:Givensomeparticularcard playingstrategyandinitialhand,theplaycanexpecttowinsomenumberoftricks.Therefore,developamodelwhichpredictstrickstakenbytheendoftheroundgivensomeobservedinitialhandcompositions
GenerateData:Initialhandsserveasinputdata,numberoftrickswonfunctionsaslabel.
ImplementNeuralNetworkforregressiontominimizelossdefinedbysquaredifferencebetweenscorereceivedandthebestpossible.
Lossfunctionisasymmetric:Penalizesmoreforbetswhicharehigherthanthetricksobtained
RandomPlay-RandomBet:Randombetselectsarandomvaluefrom{2,3,4,5}duringthebettingstage,andineachroundchoosesarandomcardtoplayfromthesetofvalidcards.
GreedyPlay-ModelBet: Duringplay,greedysimplyselectsthehighestcardinitshandfromthesetofvalidcards,withoutconsiderationofitspartner
HeuristicPlay-ModelBet: Aheuristicstrategywasdefinedbasedonhumanknowledgeofthegame.Thisheuristicplayertakesintoaccountavailableknowledgeoftheteammember’sactionsaswellasopponents’actions
• ExamineapproachonsimilargamessuchasSpades,Hearts,andTarneeb• Considerinvariancediscovery forgeneralizeation
[1]A.J.Hoane Jr.M.CampbellandF.Hsu.Deepblue.ArtificialIntelligence,134(1-2):57-83,2002[2]M.Riedmiller.NeuralFittedQiteration– Firstexperienceswithadataefficientneuralreinforcementlearningmethod.EuropeanconferenceonMachineLearning, 05:317-328,2005[3]V.Minhetal.Human-levelcontrolthroughdeepreinforcementlearning.Nature,518:529-533,2015[4]D.Silveretal.MasteringthegameofGowithouthumanknowledge.Nature,550:354-359,2017