User:Omegatron/Binary prefix rationale

From Wikipedia, the free encyclopedia

My opinion on the use of binary prefixes in Wikipedia.

I'm sick of the unending unproductive debate, the gross incivility, armies of rampant sockpuppets, and childish edit warring that surrounds this topic, and I hope typing this all up helps in preventing me from having to repeat the same things over and over:

SI prefixes[edit]

SI prefixes are used in a great number of Wikipedia's articles. In the vast majority of cases, the prefixes are used in the common decimal way. This is the way these prefixes have been used for centuries, the way they are written in standards, and the way they are commonly used everywhere in the real world. Although some will try to tell you otherwise, this is also the convention originally used in computing, and subsequently used in many of our computing-related articles, like Hard disk drive, List of device bandwidths, FireWire, USB, Tape drive, Quarter inch cartridge, DVD, HD DVD, Blu-ray, etc.

Using these same SI prefixes to mean two different things (as if they were multiples of 1024), is ambiguous, confusing, and unfamiliar to our readers.[1] Even within the fields in which it is used, it is used inconsistently. It is discouraged by computer scientists, like Donald Knuth[2] and Markus Kuhn,[3] and officially deprecated by every relevant standards organization.[4] Whenever we use SI prefixes, they should have the normal decimal meaning, as per the standards and overwhelmingly common usage on Wikipedia.

So what do we do about things like memory, where it makes the most sense to describe things as binary multiples? We use binary multiples, of course. A number of proposals have been made over the years for representing these (κ/κ², bK/bK², KKB/MMB, K₂B/M₂B), but the only one that's achieved widespread adoption is the standard created by the IEC (KiB/MiB). This system has been endorsed by all of the major standards organizations, such as IEEE, CIPM, NIST, SAE, CENELEC[5][6][7][8][9][10] and is increasingly adopted for use in software (Linux kernel, The Pirate Bay, Mozilla Firefox, and many more) and academia.[11][12][13][14][15]

Though some are stubbornly and vocally resistant to their use, many other Wikipedians have found these units useful in articles that discuss computing concepts, such as the comparisons of the different floppy disk formats (which have been measured with several conflicting conventions over the years), explaining DVD, CD and CD-R speeds and capacities, the size limitations of disk formatting schemes, the ATA interface, etc. They have already been adopted by editors in many Wikipedia articles[16] and this usage is only growing. They make our articles more professional and precise, and readers don't need to wonder which convention is meant from one paragraph to the next. If they're not familiar with a unit, they're only a click away from an explanation.

I'm not married to the IEC prefixes. The abbreviated form is the best of all the proposals I've seen, though I agree that the written-out form is unfortunate, and would prefer to write "kilobinary byte". (Though it can't honestly be said that "gibba-bite" is more funny-sounding than "gigga-bite"; one is just more familiar.) But my primary concern has always been the misuse of SI prefixes. I'm not dedicated to IEC, but I am dedicated to not using SI prefixes in a sloppy, deprecated, ambiguous way. We should never use kilo- to mean anything but 1,000. IEC prefixes are just the simplest, most widely-used way to avoid this.

The only argument that can be made for continued misuse of the deprecated units is that some computer scientists working in certain fields have a long tradition of using them this way, so we should use the units they are familiar with. But Wikipedia is not a computer science textbook. Wikipedia is not PC World. Most of our readers are not computer scientists. Inconsistent, ambiguous, trade-specific jargon has no place in a general reference work. Yes, the majority of our readers are unfamiliar with the IEC prefixes, but they are equally unfamiliar with the binary convention.[1] What they are familiar with is the standard decimal meaning of SI. Every country in the world uses the metric system to some degree. Even in the US we are familiar with kilowatt-hours and millimeters. The argument that we shouldn't use units our readers are unfamiliar with leads to the conclusion that we should use the standardized units everywhere. Consistently following this convention is simple, reduces confusion, and increases the reliability of our articles.

References[edit]

  1. ^ a b Examples of common knowledge taken from Yahoo! Answers
  2. ^ "Q: A kilobyte (kB or KB) is 1000 bytes, and a megabyte (MB) is 1000 kB. What are the official names and abbreviations for the larger numbers of bytes? A: 1000 MB = 1 gigabyte (GB), 1000 GB = 1 terabyte" - The Art of Computer Programming Volume 1, Donald Knuth, pp. 24 and 94
  3. ^ "The units defined here can be used together with other SI units and SI prefixes. As in the SI, the prefixes denote powers of ten." - Standardized units for use in information technology, Markus Kuhn, 1996-12-29
  4. ^ "§3.1 SI prefixes". The International System of Units (SI) (PDF) (in French/English) (8th edition ed.). Paris: STEDI Media. 2006. pp. p. 127. ISBN 92-822-2213-6. Retrieved 2007-02-25. [Side note:] These SI prefixes refer strictly to powers of 10. They should not be used to indicate powers of 2 (for example, one kilobit represents 1000 bits and not 1024 bits). {{cite book}}: |edition= has extra text (help); |pages= has extra text (help)CS1 maint: unrecognized language (link)
  5. ^ IEEE Trial-Use Standard for Prefixes for Binary Multiples (PDF). New York. 2003-02-12. ISBN 0-7381-3386-8. Retrieved 2007-02-25. This standard is prepared with two goals in mind: (1) to preserve the SI prefixes as unambiguous decimal multipliers and (2) to provide alternative prefixes for those cases where binary multipliers are needed. The first goal affects the general public, the wide audience of technical and nontechnical persons who use computers without much concern for their construction or inner working. These persons will normally interpret kilo, mega, etc., in their proper decimal sense. The second goal speaks to specialists—the prefixes for binary multiples make it possible for persons who work in the information sciences to communicate with precision. {{cite book}}: Check date values in: |date= (help)
  6. ^ "§3.1 SI prefixes". The International System of Units (SI) (PDF) (in French/English) (8th edition ed.). Paris: STEDI Media. 2006. pp. p. 127. ISBN 92-822-2213-6. Retrieved 2007-02-25. [Side note:] The IEC has adopted prefixes for binary powers in the international standard IEC 60027-2: 2005, third edition, Letter symbols to be used in electrical technology — Part 2: Telecommunications and electronics. The names and symbols for the prefixes corresponding to 210, 220, 230, 240, 250, and 260 are, respectively: kibi, Ki; mebi, Mi; gibi, Gi; tebi, Ti; pebi, Pi; and exbi, Ei. Thus, for example, one kibibyte would be written: 1 KiB = 210 B = 1024 B, where B denotes a byte. Although these prefixes are not part of the SI, they should be used in the field of information technology to avoid the incorrect usage of the SI prefixes. {{cite book}}: |edition= has extra text (help); |pages= has extra text (help)CS1 maint: unrecognized language (link)
  7. ^ Prefixes for Binary Multiples — The NIST Reference on Constants, Units, and Uncertainty
  8. ^ Rules for SAE Use of SI (Metric) Units — Section C.1.12 — SI prefixes
  9. ^ HD 60027-2:2003 Information about the harmonization document (obtainable on order)
  10. ^ prEN 60027-2:2006 Information about the EN standardization process
  11. ^ "On older, modified 2.4.21 kernels, we could not achieve much more than 300MiB/s on parallel buffered write loads. Now, on patched 2.6.5 kernels, customers are seeing higher than 1GiB/s under the same loads." - Exploring High Bandwidth Filesystems on Large Systems - Dave Chinner and Jeremy Higdon, Silicon Graphics, Inc.
  12. ^ "The distributor of some content runs an application which splits the content into small blocks, usually 256 KiB to 1 MiB in size." - Investigation of Swarming Content Delivery Systems - Pramod Korathota, University of Technology, Sydney (Bachelor's thesis)
  13. ^ "a Texas Instruments (TI) TMS320C6701 32-bit DSP which operates at 167 MHz, and up to 128 MiB of RAM to name but a few of its features." - A Reconfigurable Four-Channel Transceiver Testbed with Signalling-Wavelength-Spaced Antennas - J. Andy Harriman, University of New Brunswick (Master's thesis)
  14. ^ "With the optimized mesh objects, the visualization calculation reads a much smaller amount of data from disk: 15.7 MiB instead of 56.9 MiB." - Accelerating Large Data Analysis By Exploiting Regularities - David Ellsworth, Patrick J. Moran. Advanced Management Technologies Incorporated, NASA Ames Research Center
  15. ^ "All tests for the evaluation were performed on a Xeon dual processor system running at 2.2 GHz with 4 GiB RAM (1 GiB= 1024 MiB, 1 MiB=1024 KiB, 1 Kib= 1024 bytes)." - Integration of the FreeBSD TCP/IP-Stack into the Discrete Event Simulator Omnet++ - R. G. Ingalls, M. D. Rossetti, et al. Proceedings of the 2004 Winter Simulation Conference. Institute of Telematics, University of Karlsruhe
  16. ^ Google en.wikipedia.org search