Draft:Covariate shift

Submission declined on 17 April 2024 by Cambalachero (talk).

This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Cambalachero 2 months ago. Last edited by Cambalachero 2 months ago. Reviewer: Inform author.

Resubmit

Please note that if the issues are not fixed, the draft will be declined again.

Submission declined on 10 October 2023 by Phuzion (talk).

This draft's references do not show that the subject qualifies for a Wikipedia article. In summary, the draft needs multiple published sources that are:

in-depth (not just passing mentions about the subject)
reliable
secondary
independent of the subject

Make sure you add references that meet these criteria before resubmitting. Learn about mistakes to avoid when addressing this issue. If no additional references exist, the subject is not suitable for Wikipedia.

Declined by Phuzion 8 months ago.

Comment: Several sections lack sources Cambalachero (talk) 13:47, 17 April 2024 (UTC)

Covariate Shift is a phenomenon in machine learning and statistics where the distribution of input features (covariates) changes between the training and test datasets, usually affecting the performance of a machine learning model.^[1] It is a common challenge faced in real-world applications, as models are often trained on historical data and expected to generalize to new, unseen data.^[2] Covariate shift can lead to decreased model performance or even model failure,^[3] as it violates the assumption that training and test data follow the same distribution.

Covariate shift is also referred to as domain shift and is a special case of dataset shift where only the covariates (inputs) are changing. That is, only $P(X)$ changes. This is distinct from both label shift (where $P(Y)$ changes) and concept drift (where $P(Y|X)$ changes).^[4]

Mathematical definition[edit]

Pure covariate shift occurs when the distribution of input features changes between the training and test data, while the conditional distribution of the target variable given the input features remains the same.^[5] Let $P_{train}(X)$ denote the distribution of input features in the training data and $P_{test}(X)$ denote the distribution in the test data. Covariate shift is defined as:

$P_{train}(X)\neq P_{test}(X)\quad {\text{and}}\quad P_{train}(Y|X=x)=P_{test}(Y|X=x),;\forall x\in {\mathcal {X}}$

where $X$ represents the input features, $Y$ represents the target variable, and ${\mathcal {X}}$ is the feature space.^[6]

Measuring covariate shift[edit]

Covariate shift is usually measured using statistical distances, divergences, and two-sample tests. Some measurement methods work on continuous features, others on categorical features, and some on both. Additionally, some methods are capable of measuring univariate drift while others are capable of measuring multivariate drift.

Statistical distances[edit]

Maximum mean discrepancy (MMD) (Continuous): MMD is a kernel-based method that measures the distance between two probability distributions by comparing the means of their samples in a reproducing kernel Hilbert space.^[7] MMD provides a symmetric, non-negative measure of the difference between the training and test distributions, with higher values indicating a greater degree of covariate shift.
Wasserstein distance (Continuous and Categorical): Also known as the Earth mover's distance, the Wasserstein distance quantifies the difference between two probability distributions by measuring the minimum cost required to transform one distribution into the other.^[8] This metric provides a symmetric and non-negative measure of the divergence between the training and test distributions, with higher values indicating a more substantial degree of covariate shift.
Hellinger distance (Continuous and Categorical): The Hellinger distance is another symmetric measure of the difference between two probability distributions. It is derived from the Bhattacharyya coefficient, a measure of the similarity between two probability distributions. The Hellinger distance is defined as the square root of the sum of the squared differences between the square roots of the probabilities in the two distributions. Like other statistical distances, the Hellinger distance is non-negative, with higher values indicating a more significant divergence between the training and test distributions.
Jensen-Shannon Distance (Continuous and Categorical): The Jensen-Shannon Distance is derived from the JS divergence by applying a transformation to obtain a true distance metric that satisfies the properties of non-negativity, identity of indiscernibles, symmetry, and triangle inequality. Specifically, the Jensen-Shannon Distance is defined as the square root of the JS Divergence: $JSDistance(P,Q)={\sqrt {JSDivergence(P,Q)}}$

Divergences[edit]

Kullback-Leibler (KL) Divergence (Continuous and Categorical): KL divergence is a measure of the difference between two probability distributions. It can be used to compare the training distribution q(x) and test distribution p(x), providing a non-negative value that quantifies the dissimilarity between the two distributions. A higher KL divergence value indicates a more significant degree of covariate shift. However, it is important to note that KL divergence is not symmetric, meaning the divergence from q(x) to p(x) may not be equal to the divergence from p(x) to q(x).
Jensen-Shannon (JS) Divergence (Continuous and Categorical): The JS Divergence is a symmetric measure of the difference between two probability distributions, derived from the Kullback-Leibler (KL) Divergence. It can be interpreted as the average of the KL divergences between each distribution and a mixture of the two distributions. The JS Divergence is non-negative, with higher values indicating a greater degree of dissimilarity between the training and test distributions. Unlike the KL Divergence, the JS Divergence is symmetric, providing a more consistent measure of the divergence between the distributions.

Two-sample tests[edit]

Kolmogorov-Smirnov test (Continuous and Categorical): The Kolmogorov-Smirnov test is a non-parametric statistical hypothesis test used to assess whether two samples come from the same underlying distribution. This test provides a p-value, which can be used to determine the presence of covariate shift. A small p-value (typically below a predetermined significance level, such as 0.05) indicates that the training and test distributions are significantly different, suggesting the presence of covariate shift.
Chi-Squared Test (Categorical): The Chi-Squared Test is a statistical method for detecting covariate shift in categorical features. It evaluates the association between the categorical variables representing the training and test distributions by comparing their observed frequencies in a contingency table to the expected frequencies under the assumption of independence. The test assesses the null hypothesis that there is no significant difference between the training and test distributions. If the null hypothesis is rejected, it suggests the presence of covariate shift. The Chi-Squared Test is applicable only for categorical variables and requires a sufficient sample size and minimum expected frequencies in the contingency table.

Software for measuring covariate shift[edit]

Python[edit]

SciPy: SciPy is an open-source library for the Python programming language, widely used for scientific computing and data analysis tasks. It provides tools for conducting statistical tests, such as the Chi-Squared Test and Kolmogorov-Smirnov Test and tools for calculating statistical distances and divergences, all of which can be utilized to detect the presence of covariate shift between training and test distributions.
NannyML: An open-source Python library for model monitoring that has functionality for detecting univariate and multivariate distribution drift and estimating machine learning model performance without ground truth labels. NannyML offers statistical tests, statistical distances and divergences.

Univariate vs. multivariate covariate shift[edit]

Covariate shift can occur in different forms depending on the number of features involved. Univariate covariate shift involves a single feature experiencing a change in distribution, whereas multivariate covariate shift can involve multiple features changing simultaneously or alterations in the correlation structure between features.

Univariate covariate shift[edit]

Univariate covariate shift occurs when the distribution of a single feature changes between the training and test datasets. As it involves only one dimension, univariate covariate shift is generally simpler to detect and address compared to its multivariate counterpart. Common techniques for detecting univariate covariate shift include statistical distances such as the Jensen-Shannon distance and Wasserstein (earth mover's) distance.

Multivariate covariate shift[edit]

Multivariate covariate shift arises when the distributions of multiple features change simultaneously between the training and test datasets or when the correlation structure between features is altered. The latter case, where the marginal distributions of individual features remain unchanged but the dependencies among them change, can be particularly challenging to detect and handle. In multivariate covariate shift, the complexity of the distribution shift and potential interactions between features require more advanced techniques for detection.

To address multivariate covariate shift, techniques such as Maximum Mean Discrepancy (MMD) with appropriate kernel functions that consider the relationships between multiple features can be employed.

Internal covariate shift[edit]

The term internal covariate shift was introduced in "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift."^[9] Internal covariate shift occurs when the distribution of the inputs of a given hidden layer in a neural network shifts due to the parameters of a previous layer changing. It is hypothesized that batch normalization can reduce internal covariate shift,^[9] however this is contested.^[10]

Difference between covariate shift and concept drift[edit]

Covariate shift and concept drift are two related but distinct phenomena in machine learning, both of which involve changes in the underlying data distribution. Covariate shift and concept drift can occur independently or simultaneously, and both can negatively impact the performance of machine learning models.

The main difference between covariate shift and concept drift is that covariate shift refers to changes in the distribution of input features between the training and test datasets, while concept drift involves changes in the relationship between input features and the target variable over time. In covariate shift, the underlying relationship between the features and the target remains constant, whereas, in concept drift, this relationship itself changes due to evolving processes or external factors.

References[edit]

^ Huyen, Chip (2022-02-07). "Data Distribution Shifts and Monitoring". Chip Huyen. Retrieved 2024-02-27.
^ Sugiyama, Masashi; Kawanabe, Motoaki (2012-03-30). Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation. The MIT Press. doi:10.7551/mitpress/9780262017091.003.0007. ISBN 978-0-262-01709-1.
^ Babic, Boris; Cohen, I. Glenn; Evgeniou, Theodoros; Gerke, Sara (2021-01-01). "When Machine Learning Goes Off the Rails". Harvard Business Review. ISSN 0017-8012. Retrieved 2024-03-02.
^ Ataei, Erdogdu, Kocak, Ben-David, Saleh, Pesaranghader, Alberts-Scherer, Sanchez, Ghazi, Nguyen, Khayrat, Zhao. "Understanding Dataset Shift and Potential Remedies" (PDF). Retrieved March 2, 2024.{{cite web}}: CS1 maint: multiple names: authors list (link)
^ Y, Geeta Dharani.; Nair, Nimisha G; Satpathy, Pallavi; Christopher, Jabez (October 2019). "Covariate Shift: A Review and Analysis on Classifiers". 2019 Global Conference for Advancement in Technology (GCAT). IEEE. pp. 1–6. doi:10.1109/GCAT47503.2019.8978471. ISBN 978-1-7281-3694-3. S2CID 211058700.
^ Quiñonero-Candela, Joaquin, ed. (2009). Dataset shift in machine learning. Neural information processing series. Cambridge, Mass.: MIT Press. ISBN 978-0-262-17005-5.
^ Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte J. M.; Scholkopf, Bernhard; Smola, Alexander (2012). "A Kernel Two-Sample Test" (PDF). The Journal of Machine Learning Research. 13: 723–773.
^ Rüschendorf, Ludger (1985-03-01). "The Wasserstein distance and approximation theorems". Probability Theory and Related Fields. 70 (1): 117–129. doi:10.1007/BF00532240. ISSN 1432-2064.
^ ^a ^b Ioffe, Sergey; Szegedy, Christian (2015-03-02), Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167
^ Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (2019-04-14), How Does Batch Normalization Help Optimization?, arXiv:1805.11604

Category:Data mining Category:Machine learning Category:Data analysis

[1] Huyen, Chip (2022-02-07). "Data Distribution Shifts and Monitoring". Chip Huyen. Retrieved 2024-02-27.

[2] Sugiyama, Masashi; Kawanabe, Motoaki (2012-03-30). Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation. The MIT Press. doi:10.7551/mitpress/9780262017091.003.0007. ISBN 978-0-262-01709-1.

[3] Babic, Boris; Cohen, I. Glenn; Evgeniou, Theodoros; Gerke, Sara (2021-01-01). "When Machine Learning Goes Off the Rails". Harvard Business Review. ISSN 0017-8012. Retrieved 2024-03-02.

[4] Ataei, Erdogdu, Kocak, Ben-David, Saleh, Pesaranghader, Alberts-Scherer, Sanchez, Ghazi, Nguyen, Khayrat, Zhao. "Understanding Dataset Shift and Potential Remedies" (PDF). Retrieved March 2, 2024.{{cite web}}: CS1 maint: multiple names: authors list (link)

[5] Y, Geeta Dharani.; Nair, Nimisha G; Satpathy, Pallavi; Christopher, Jabez (October 2019). "Covariate Shift: A Review and Analysis on Classifiers". 2019 Global Conference for Advancement in Technology (GCAT). IEEE. pp. 1–6. doi:10.1109/GCAT47503.2019.8978471. ISBN 978-1-7281-3694-3. S2CID 211058700.

[6] Quiñonero-Candela, Joaquin, ed. (2009). Dataset shift in machine learning. Neural information processing series. Cambridge, Mass.: MIT Press. ISBN 978-0-262-17005-5.

[7] Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte J. M.; Scholkopf, Bernhard; Smola, Alexander (2012). "A Kernel Two-Sample Test" (PDF). The Journal of Machine Learning Research. 13: 723–773.

[8] Rüschendorf, Ludger (1985-03-01). "The Wasserstein distance and approximation theorems". Probability Theory and Related Fields. 70 (1): 117–129. doi:10.1007/BF00532240. ISSN 1432-2064.

[:0-9] Ioffe, Sergey; Szegedy, Christian (2015-03-02), Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167

[10] Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (2019-04-14), How Does Batch Normalization Help Optimization?, arXiv:1805.11604

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]