Reducing policy degradation in neuro-dynamic programming

Autor(en): Gabel, T.
Riedmiller, M.
Stichwörter: Machine learning; Neural networks; Reinforcement learning, Function approximation; Learning process; Neuro dynamic programming; Reinforcement learning method; Value functions, Dynamic programming
Erscheinungsdatum: 2006
Herausgeber: d-side publication
Journal: ESANN 2006 Proceedings - European Symposium on Artificial Neural Networks
Startseite: 653
Seitenende: 658
Zusammenfassung: 
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is better to cease learning, and thus obtains more stable learning results. © 2006 i6doc.com publication. All rights reserved.
Beschreibung: 
Conference of 14th European Symposium on Artificial Neural Networks, ESANN 2006 ; Conference Date: 26 April 2006 Through 28 April 2006; Conference Code:149251
ISBN: 9782930307060
Externe URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-79952421861&partnerID=40&md5=2da04247afa6ac939ef369fc5e9474c1

Zur Langanzeige

Seitenaufrufe

1
Letzte Woche
0
Letzter Monat
0
geprüft am 19.05.2024

Google ScholarTM

Prüfen

Altmetric