User:Ob7/sandbox/Off-policy learning
Appearance
This is not a Wikipedia article: It is an individual user's work-in-progress page, and may be incomplete and/or unreliable. For guidance on developing this draft, see Wikipedia:So you made a userspace draft. Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
In reinforcement learning, off-policy learning refers to the problem of learning about a policy while data is generated using a different policy. A typical case is policy evaluation, where the objective is to estimate the value function for a given stationary policy. In off-policy policy evaluation, the objective is then to estimate that function while a different policy is followed.