An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

verfasst von
Mohammad Salahaldeen Ahmad Alsalti, Victor Gabriel Lopez Mejia, Matthias Müller
Abstract

In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.

Organisationseinheit(en)
Institut für Regelungstechnik
Typ
Aufsatz in Konferenzband
Seiten
312-323
Anzahl der Seiten
12
Publikationsdatum
14.07.2024
Publikationsstatus
Veröffentlicht
ASJC Scopus Sachgebiete
Artificial intelligence, Software, Steuerungs- und Systemtechnik, Statistik und Wahrscheinlichkeit
Elektronische Version(en)
https://doi.org/10.48550/arXiv.2312.03451 (Zugang: Offen)
https://proceedings.mlr.press/v242/alsalti24a.html (Zugang: Offen)