Talk:Variational autoencoder

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Low

This article has been rated as Low-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Hi, the article is really interesting and well detailed, I believe it will be a really helpful starting point for those who are willing to study this topic. I just fixed some minor things, like a missing comma or repleced a term with a synonim. It would be nice if you could add a paragraph with some applications of this neural network :) --Lavalec (talk) 14:00, 18 June 2021 (UTC)[reply]

Hi, I confirm that the article is interesting and detailed. I'm not expert in this field, but I understood the basic things. --Beatrice Lotesoriere (talk) 14:32, 18 June 2021 (UTC)Beatrice Lotesoriere[reply]

Very well written article. I just made some minor language changes in a few sections. The only thing I would probably do, I would add some citations in the formulation section. --Wario93 (talk) 15:40, 18 June 2021 (UTC)[reply]

Good article, but I had to get rid of a bunch of unnecessary fluff in the Architecture section which obscured the point (diff : https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&type=revision&diff=1040705234&oldid=1039806485 ). 26 August 2021

I disagree, the article really needs attention, it is very hard to understand the "Formulation" part now. I propose the following changes for the first paragraphs, but subsequent ones need revision as well:

From a formal perspective, given an input ~~dataset~~ vector $\mathbf {x}$ ~~characterized by~~ from an unknown probability ~~function~~ distribution $P(\mathbf {x} )$ ~~and a multivariate latent encoding vector $\mathbf {z}$~~ , the objective is to model ~~the data~~ $P(\mathbf {x} )$ as a parametric distribution with density $p_{\theta }(\mathbf {x} )$ , where $\theta$ is a vector of parameters to be learned. ~~defined as the set of the network parameters.~~

For the parametric model we assume that each $\mathbf {x}$ is associated with (arises from) a latent encoding vector $\mathbf {z}$ , and we write $p_{\theta }(\mathbf {x} ,\mathbf {z} )$ to denote their joint density.

~~It is possible to formalize this distribution as~~ We can then write

p_{\theta }(\mathbf {x} )=\int _{\mathbf {z} }p_{\theta }(\mathbf {x,z} )\,d\mathbf {z}

where $p_{\theta }$ is the evidence of the model's data with marginalization performed over unobserved variables and thus $p_{\theta }(\mathbf {x,z} )$ represents the joint distribution between input data and its latent representation according to the network parameters $\theta$ .

193.219.95.139 (talk) 10:18, 2 October 2021 (UTC)[reply]

Observations and suggestions for improvements

The following observations and suggestions for improvements were collected, following expert review of the article within the Science, Tecnology, Society and Wikipedia course at the Politecnico di Milano, in June 2021.

"Minor corrections:

- single layer perceptron => single-layer perceptron

- higher level representations => higher-level representations

- applied with => applied to

- composed by => composed of

- Information retrieval benefits => convoluted sentence

- modelling the relation between => modelling the relationship between

- predicting popularity => predicting the popularity"

Ettmajor (talk) 10:06, 11 July 2021 (UTC)[reply]

Does the prior $p(z)$ depend on $\theta$ or not?

In a vanilla Gaussian VAE, the prior follows a standard Gaussian with zero mean and unit variance, i.e., there is no parametrization ( $\theta$ or whatsoever) concerning the prior $p(z)$ of the latent representations. On the other hand, the article as well as [Kingma&Welling2014] both parametrize the prior $p_{\theta }(z)$ with $\theta$ , just as the likelihood $p_{\theta }(x\mid z)$ . Clearly, the latter makes sense, since it is the very goal to learn $\theta$ through the probabilistic decoder as generative model for the likelihood $p_{\theta }(x\mid z)$ . So is there a deeper meaning or sense in parametrizing the prior as $p_{\theta }(z)$ as well, with the very same parameters $\theta$ as the likelihood, or is it in fact a typo/mistake? — Preceding unsigned comment added by 46.223.162.38 (talk) 22:11, 11 October 2021 (UTC)[reply]

The prior is not dependent on the paramterers $\theta$ , but rather on a different set of parameters $\phi$ . — Preceding unsigned comment added by 134.106.109.104 (talk) 12:22, 14 September 2022 (UTC)[reply]

I also found this incredibly confusing. As the prior on z is usually fixed and doesn't depend on any parameter. EitanPorat (talk) 00:16, 19 March 2023 (UTC)[reply]

The image shows just a normal autoencoder, not a variational autoencoder

There is an image with a caption saying it is a variational autoencoder, but it is showing just a plain autoencoder.

In a different section, there is something described as a "trick", which seems to be the central point that distinguishes autoencoders from variational autoencoders.

I'm not sure that image should just be removed, or whether it make sense in the section anyway. Volker Siegel (talk) 14:18, 24 January 2022 (UTC)[reply]

This is a highly technical topic

In the past users have removed much of the technicality involved in the topic. Wikipedia does not have a limit to the depth of technicality, however Simple Wikipedia does. If you find yourself wanting to remove technical depth from the article, please edit the Simple Wikipedia article. 2A01:C23:7C81:1A00:2B9B:EB91:3CC5:3222 (talk) 10:31, 19 November 2022 (UTC)[reply]

Overview section is poorly written

The architecture section is filled with unclear phrases and undefined terms. For example, "noise distribution", "q-distributions or variational posteriors", "p-distributions", "amortized approach", "which is usually intractable" (what is intractable?), "free energy expression". None of these are defined. It is unclear if this section of the article is useful to anyone who is not already familiar with how variational autoencoders work. Joshuame13 (talk) 15:14, 31 January 2023 (UTC)[reply]

The ELBO section needs more derivation

"The form given is not very convenient for maximization, but the following, equivalent form, is:"

There should be more steps to explain how the equivalent form is obtained from the "given" one. Also, the dot placeholder notation is inconsistent, changing from $p_{\theta }(\cdot |x)$ to $p_{\theta }(\cdot )$ . PromethiumL (talk) 18:08, 12 February 2023 (UTC)[reply]

I agree p_theta(z) doesn't make sense. EitanPorat (talk) 00:17, 19 March 2023 (UTC)[reply]

Rating this article C-class

This article has great potential. Excellent technical content. But I just rated it "C" because it seems to have gained both content and noise over the past six months. I've tried for a couple of hours to improve the clarity of the central idea of a VAE, but I'm not satisfied with my efforts. In particular, it is still unclear to me whether both the encoder and decoder are technically random, whether any randomness should be added in the decoder, or what (beyond Z) is modeled with a multimodal Gaussian in the basic VAE. I see no reason why this article should not be accessible both to casual readers and to the technically proficient, but we are far from there yet.

In particular, the introductory figure shows x being mapped to a gaussian figure and back to x'. It would be good to explicitly state how the encoder and decoder in this figure relate to the various distributions used throughout the article, but I'm not confident on how to do so. Yoderj (talk) 19:25, 15 March 2024 (UTC)[reply]