Talk:Vanishing gradient problem

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles

Mid

This article has been rated as Mid-importance on the project's importance scale.

This article is supported by WikiProject Computer science.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Uh... what is the problem itself?[edit]

Shouldn't the article define what the problem is? --Doradus (talk) 02:29, 23 January 2015 (UTC)[reply]

I made an attempt. It is difficult to explain this in a non-technical way. Bhny (talk) 17:12, 23 January 2015 (UTC)[reply]

Well, I am a student in ML, I understand everything what article says, but it just says nothing about what the problem actually is. Linguiloce (talk) 14:04, 1 October 2016 (UTC)[reply]

I just came back to this article, and found this quote: "The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value." Works for me. --Doradus (talk) 16:35, 2 December 2018 (UTC)[reply]

Size of Problem?[edit]

How many nodes in an unfolded RNN are viable without LSTM? i.e. where is the practical cut off point where the gradient hasn't vanished? There must be some rule of thumb that if your patterns in time occur in less than N samples then you can use RNN. If greater than M samples you are better off with LSTM? robertbowerman (talk) 04:30, 9 February 2017 (UTC)[reply]

Suggested rename: extreme gradient problem[edit]

I really don't see the point of having both vanishing gradient and exploding gradient pages. We just have two inbound redirects, and bold both inbound terms in the lead. Should be fine IMO. — MaxEnt 00:10, 21 May 2017 (UTC)[reply]

It is a well known problem in ML and pretty much everyone calls it the vanishing gradient problem. Sometimes they'll say vanishing/exploding gradient problem, but even that is rare. I've never heard it called the extreme gradient problem. Themumblingprophet (talk) 02:21, 15 April 2020 (UTC)[reply]

Uh... what is the problem itself?[edit]

Other solutions[edit]

Size of Problem?[edit]

Suggested rename: extreme gradient problem[edit]