Reducing policy degradation in neuro-dynamic programming
Autor(en): | Gabel, T. Riedmiller, M. |
Stichwörter: | Machine learning; Neural networks; Reinforcement learning, Function approximation; Learning process; Neuro dynamic programming; Reinforcement learning method; Value functions, Dynamic programming | Erscheinungsdatum: | 2006 | Herausgeber: | d-side publication | Journal: | ESANN 2006 Proceedings - European Symposium on Artificial Neural Networks | Startseite: | 653 | Seitenende: | 658 | Zusammenfassung: | We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is better to cease learning, and thus obtains more stable learning results. © 2006 i6doc.com publication. All rights reserved. |
Beschreibung: | Conference of 14th European Symposium on Artificial Neural Networks, ESANN 2006 ; Conference Date: 26 April 2006 Through 28 April 2006; Conference Code:149251 |
ISBN: | 9782930307060 | Externe URL: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-79952421861&partnerID=40&md5=2da04247afa6ac939ef369fc5e9474c1 |
Zur Langanzeige
Seitenaufrufe
1
Letzte Woche
0
0
Letzter Monat
0
0
geprüft am 19.05.2024