User:Turingsk/Books/RL

From Wikipedia, the free encyclopedia


Bellman equation
Dopamine
Hamilton–Jacobi–Bellman equation
Hidden Markov model
Leabra
Marcus Hutter
Markov decision process
Multi-armed bandit
Optimal control
Partially observable Markov decision process
Predictive state representation
PVLV
Q-learning
Reinforcement learning
Rescorla–Wagner model
Reward system
SARSA
Temporal difference learning
Dynamic treatment regime