Reinforcement learning for stochastic cooperative multi-agent-systems

Autor(en): Lauer, M.
Riedmiller, M.
Herausgeber: Jennings, N.R.
Sierra, C.
Sonenberg, L.
Tambe, M.
Stichwörter: Algorithms; Convergence of numerical methods; Decision theory; Learning systems; Markov processes; Reinforcement; Vectors, Markov decision process (MDP); Multi-agent domains; Optimal policy; Q-learning, Multi agent systems
Erscheinungsdatum: 2004
Journal: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
Volumen: 3
Startseite: 1516
Seitenende: 1517
Zusammenfassung: 
A distributed variant of Q-learning that allows to learn the optimal cost-to-go function in stochastic cooperative multi-agent domains without communication between the agents was presented. The framework considered is a standard Markov Decision Process (MDP) where actions are vectors of agent individual decisions. The goal is to find an optimal policy that maximizes the sum of discounted rewards. A value estimation based on that information will mix the rewards of several different joint action vectors and become meaningless.
Beschreibung: 
Conference of Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 ; Conference Date: 19 July 2004 Through 23 July 2004; Conference Code:63521
ISBN: 9781581138641
Externe URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-4544226982&partnerID=40&md5=d49582e0e8f20f6490ad47801cf86913

Show full item record

Page view(s)

2
Last Week
0
Last month
0
checked on Feb 25, 2024

Google ScholarTM

Check

Altmetric