Jump to content

Wikipedia:Reference desk/Archives/Mathematics/2022 August 27

From Wikipedia, the free encyclopedia
Mathematics desk
< August 26 << Jul | August | Sep >> Current desk >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 27

[edit]

Multiple same data points in scatter plot

[edit]

Hi! Is there any way to represent two data points that are the same in a scatter plot? E.g., if I have the data set [0,1 ; 0,1; 3,4 ; 5, 6], how would I represent the two [0,1]'s? Cheers, 🥒 EpicPickle (they/them | talk) 21:37, 27 August 2022 (UTC)[reply]

At a glance to preserve all information you'd probably want the degeneracy (the number of times a point is repeated) be a third coordinate: [0,1,2 ; 3,4,1 ; 5,6,1]. But depending on how you're using the data there's better ways to do this. If this is a large dataset, then it might not be important if each x-coordinate is strictly an integer (I'm assuming the x-coordinates (abscissa) are required to be integers or equivalent to a subset, as it's very unlikely to get degeneracy when sampling from real or rational numbers, and if you do there's no problem to simply offset them by an arbitrarily small fraction), in which case, as the parenthetical just suggested, you can merely offset your degenerate point's abscissa by an arbitrarily small fraction (so you have [0,1 ; 0.0001,1 ; 3,4 ; ...]) and then you can process your data as normal. This is also fine if you're planning on doing any kind of smoothing or other binned processing on your dataset. You only really want to be careful when you're doing analysis that is restricted to integer fields, in which case if you've stored your abscissas as fractions you can just round them to get the original integer values (necessary in some types of statistics and algorithms, but the practical implementations on big datasets are often pretty forgiving anyway if you are off in your rounding), and that would probably be the most efficient means of storage and processing for everything else you do anyway (unless of course a significant portion your data are degenerate, in which case you should use a third coordinate)). Hopefully that all made some sense. SamuelRiv (talk) 21:56, 27 August 2022 (UTC)[reply]
Another option could be to make the dot bigger. --116.86.4.41 (talk) 16:12, 30 August 2022 (UTC)[reply]

Arrow on top of "y" in scatter plots

[edit]

Hi! I've frequently seen the Y axis or Y coordinates in scatter plots be labeled with an arrow on top of it (which looks like a circumflex). This doesn't seem to appear on top of "x'es"; is there a reason? Cheers, 🥒 EpicPickle (they/them | talk) 22:16, 27 August 2022 (UTC)[reply]

This is ringing my intro statistics bell but I cannot remember, so could you post an example? If it's one of these two then it's probably not exactly meant to define the y-axis as a whole (that would be y), but rather the values of the y coordinates from the function being plotted specifically (I'm not phrasing that well I know -- just read the site link and if that's not it, post an example). SamuelRiv (talk) 23:33, 27 August 2022 (UTC)[reply]