Talk:Q-learning

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

lacking citations[edit]

The article makes several claims about Markov decision processes claims like "It has been proven that for any finite MDP" but offers no proof, explanation or citations for this.. — Preceding unsigned comment added by 69.191.178.36 (talk) 02:47, 2 June 2015 (UTC)[reply]

Convergence[edit]

The article says that "The convergence proof was presented later by Watkins and Dayan", however, it does not explain what exactly is meant by 'convergence' - what converges, to what, and in what conditions? --Erel Segal (talk) 16:49, 23 December 2012 (UTC)[reply]


Convergence means to an optimal policy. There are a lot of details of the reinforcement learning model not included in the article currently, but this is a class of learning algorithms which are unsupervised in the sense that the learning agent has a set of states it can exist in, a set of actions that it can take in each state, and a set of rewards it earns following each action that tells it how good the action was. The policy is the function that gives the action to take in each state. When there are several possible actions in a state, the policy is a probability function on the actions, i.e. the learning agent might select any of the actions available in the given current state, but some with higher probability than others. In turn, the rewards associated with a given state-action pair might be probabilistic as well. A policy is optimal if, for any state the agent finds itself in, the expected total value of rewards over all time, starting from that state, are the maximum achievable.


Some of the citations, such as [4], point to dead links. I couldn't find a link to correct it to. 108.35.116.197 (talk) 17:11, 28 November 2013 (UTC)[reply]


Variants[edit]

Would be good to have more details on variations of the Q-algo (with references) Dm1911 (talk) 20:52, 26 May 2015 (UTC)[reply]

  • I agree. Like R-learning for example, which is a discount-free variant of Q-learning. —Kri (talk) 17:54, 17 September 2016 (UTC)[reply]

Citations[edit]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 3 external links on Q-learning. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 16:15, 21 July 2016 (UTC)[reply]

When was Q-learning first described?[edit]

There should be some reference to the paper in which Q-learning was first described, if such exists. So that it's possible to see how old the method is, and read more details about it. —Kri (talk) 17:57, 17 September 2016 (UTC)[reply]

Such a reference is already present, in the section titled "Early study". Perhaps this short, two-sentence sentence should be merged into the introduction? —50.181.176.188 (talk) 02:58, 30 December 2016 (UTC)[reply]

Patent on Deep-Q-Learning[edit]

Should this page not mention that some aspects of Deep-Q-Learing are patented by Google? Otherwise this might be a problem for some persons ... see https://patents.google.com/patent/US20150100530 and https://www.reddit.com/r/MachineLearning/comments/3c5f5j/google_patented_deep_qlearning/ — Preceding unsigned comment added by 84.63.193.140 (talk) 11:43, 4 April 2018 (UTC)[reply]

Usage of abbreviation DQN[edit]

The article uses the abbreviation "DQN" but doesn't explain it. — Preceding unsigned comment added by 131.188.3.226 (talk) 19:09, 24 September 2019 (UTC)[reply]

Selecting Actions[edit]

I think it would be helpful to give some information on techniques to select actions. Currently no Q-learning algorithm is fully specified since there is no explanation of how actions are selected during learning. — Preceding unsigned comment added by 205.132.0.41 (talk) 20:16, 31 May 2020 (UTC)[reply]