Talk:Binary prefix/Archive 5

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 3

→

More Logical?

The section titled "Usage Notes" on hard disk drives notes the following:

“When a stream of data is stored, it's more logical to indicate how many thousands, millions, or billions of bytes have been stored versus how many multiples of 1024, 1,048,576, or 1,073,741,824 bytes have been.”

According to whom? The basic unit on all modern systems is 512 bytes (see LBA). It's more logical to indicate how many blocks (units of 512 bytes) have been stored. If you didn't already know, 512 is half of 1024.—Kbolino 04:12, 5 July 2006 (UTC)

Paper punch cards and magnetic tape have inherent 512 byte block sizes? — Omegatron 16:13, 5 July 2006 (UTC)

I think you mean the common unit of storage in all modern hard drives is 512 bytes. Because CDROM and DVD drives have larger blocks and some tape drives have block sizes that can be set by the user to 32K, 64K, 128K, or even more. All that makes it even less logical to indicate how many blocks have been stored. But that's only assuming we're talking about the actual stored or streamed data, not the capacity. IMO, capacity is best expressed in decimal units and stored data is best expressed in binary units. Maybe the wording should make the distinction between capacity and the total size of the stored data. --JJLatWiki 21:14, 5 July 2006 (UTC)

This serves merely as a reflection of my own ignorance (and, judging by the wording of the last sentence, arrogance). I was evaluating the situation from the limited perspective of hard disk drives. However, I will postulate that it is still more logical to use units of 1024 rather than 1000 because blocks (however they are defined) are multiples of 512. (32K = 32,768 bytes = 512 × 64, for example.). I will similarly disagree with the assertion that stored data and capacity should be represented with different units, as that would be akin to representing the total distance of a journey and the distance already travelled, for example, with different units (i.e., not derived from the same basic definition—miles and kilometers, for example). And finally, punch cards are no longer in use and so should have little bearing on this argument.—Kbolino 07:47, 8 July 2006 (UTC)

The sentence in question is referring to punch cards. To paraphrase, "the decimal measurements for hard drives are used because they were first used for serially accessed storage like punch cards, and there is nothing about a continuous surface of magnetic material that lends itself to a certain block size". — Omegatron 14:08, 8 July 2006 (UTC)

Hmm. How 'bout we forget I said anything and I will work on controlling spontaneous, unjustified outbursts that result from selective reading?—Kbolino 16:54, 8 July 2006 (UTC)

Wow, you are the first wise person I see on Wikipedia. You have my deepest respect! 84.58.178.53 08:43, 20 April 2007 (UTC)

I agree with you that we should represent capacity and stored data size with the same standard, with one caveat: if Microsoft will agree to start reporting drive statistics in decimal terms. As long as the operating system reports drive stats in binary terms, we are stuck with the confusing standards. It's not logical to represent a file that is exactly 1,000,000 bytes in size as 976KB or .95MB, but that is in fact what we have and it's too late to go back. It's unfortunate that consumers trust Microsoft more than the hard drive manufacturers. But unless the hard drive guys rate drive size based on the worst-case-scenario (longest file name permitted by most liberal OS with a drive filled only with files smaller than the largest cluster size option), the consumer will ALWAYS get less storage than the capacity rating. --JJLatWiki 22:57, 10 July 2006 (UTC)

Some facts to perhaps put an historical perspective on this:

The first disk drive the IBM 350 (1950's) had 5,000,000 100 6 bit characters organized in 100 character sectors. This predates the SI system.
In the 1960's virtually all disk drives used IBM's variable block length format (called, Count Key Data or "CKD"). Any block size could be specified up to the maximum track length. Blocks ("records" in IBM's terminology) of 88, 96, 880 and 960 were often used for obvious reasons. The drive capacity was usually stated in full track record blocking, for example, the 100 Megabye 3330 disk pack only achieved that capacity with a full track block size of 13,030 bytes.
CKD continued into the 1990's and perhaps into this day. In the 70's and 1980's most drives were offered with unformatted tracks (the unformatted capacity) with the particular block size and formatted capacity a function of the controller design. For example, the ST412 of IBM PC/XT fame had an unformatted capacity of 12.75 MB (not MiB) and with the Xebec controller and 512 byte sectors it formatted down to 10.0 MB (not MiB). Other controllers supported other block sizes resulting in other formatted capacities.
The advent of intelligent interfaces (SCSI and IDE) in the early 1990's took the block size decision into the drive and virtually all chose 512 bytes, for no reason other than that was what IBM had chosen when they picked the Xebec controller for the PC/XT.

So, until relative recently, it was very logical to measure in decimal numbers with decimal prefixes because God gave us 10 fingers and that's the system we learned and used until some bad GUI's took the system reported binary number and mispresented it in the mixed decimal number/binary prefix mess we are in. IMO, this history shows that there is no reason, other than sloppy programming, for the current misuse of SI prefixes and therfore there is no reason to use 512 byte blocks as a measure. As it turns out, this is a particularly bad size for current HDD technology and the industry will likely move to a larger size in the near future. While it is likely to be a binary size, this imminent change will futher make measuring in 512 blocks quite arbitrary.--Tom94022 04:34, 16 July 2006 (UTC)

That should be in the article. — Omegatron 15:54, 16 July 2006 (UTC)

If you think the sole reason for using a power of two sector size is that its what IBM happened to use then your knowlage of low level programming is severely lacking. power of two block sizes make sense all over computing because they allow use of efficiant bitwise operations rather than expensive modulo and integer division operations. Also standard paging based virtual memory systems would suffer very badly under a non power of two disk block size.

If and when drives move to a larger sector size (which i doubt will happen any time soon) i STRONGLY suspect it will again be a power of two. Plugwash 18:59, 6 January 2007 (UTC)

Supporting Plugwash's point, it's worth noting that the sectors on a CD-ROM drive are larger and are still a power of 2: 2048 bytes. It's pretty clear that as long as computers operate on binary principles, all "multiples" of storage units in the system are likely to remain powers of two for reasons of economical processing of these addresses. I do note, though, that some historical disk/drum systems didn't implement sectors at all; you just had a collection of tracks and could read any contiguous set of storage units (words) (or even bits?) from a given track.

Atlant 19:24, 6 January 2007 (UTC)

Not entirely true. A CD sector contains 2352 bytes of data (plus about 96 bytes of subcode data) that aren't too difficult to access. "Normal" CDs use the extra bytes mainly for error correction, but they're commonly used for video data in VCDs, at the expense of a higher error rate. Of course, last time I checked, OS X can't actually read data in those sectors. Elektron 20:11, 22 August 2007 (UTC)

It's all Marketing's fault. Here is the logic. Let's say that in 1981 I am using a Vic-20 with 3.5 KiB RAM and I have a typed in a 2 KiB program and I want to save it to a mythical (in the day) 2K flash-drive. my 2048 bytes won't fit on the drive will it? that is why the OS'es give you binary bytes, the Marketing guys used to blame the difference in what you bought (250 MB) and what you got (238MiB) as overhead lost to drivers and formatting. Really they just want to say they have the bigger capacity even if it doesn't mach up with real world usage of RAM Seanm9 16:06, 3 September 2007 (UTC)

Distinguish common use and scientific use

Just like nonmetric measures never disappeared despite of any standards (horsepower, calories, inches, feet, gallons, ...) so current common use of binary measures will never disappear nor get more precisely defined on a broader basis. It's another thing in scientific and engineering use - kiB, MiB, GiB, ... are easy, exact notations and hopefully their use will increase. What I'm missing in the discussion is the fact that the terms KB, MB, GB are looked at here as combinations of SI prefix (k, M, G, ...) and information unit (byte). Why not treat them as atomic terms with the common meaning they have in the field for decades? Exactly like B stands for 8 bit, KB stands for 1024 Bytes etc. - Wmk 08:09, 3 August 2006 (UTC)

Approximately-equal-to

Can someone fix it so that the approximately-equal-to symbols in this section show up better? Like, increase the font size or something? They're not recognizable over on my end (a pretty standard WinXP and Firefox). Oz Lawyer 17:00, 3 August 2006 (UTC)

Same in mine, but that's not our problem. That's a bug in Firefox and should be reported to them. — Omegatron 17:36, 3 August 2006 (UTC)

Decimal bits, binary bytes

Apple may be ducking this controversy by counting decimal bits. Relevant to the article's claim "Operating systems usually report disk space using the binary version" is the reference "Apple Publications Style Guide" of January 2006, available in Google. That guide claims:

"kilobyte (1024 bytes) KB computer memory"
"kilobit (1000 bits) kbit computer memory"

—Preceding unsigned comment added by 192.42.249.130 (talk • contribs) 02:08, 11 August 2006

Wow, much talk

Wikipedia EditThisPage complains:

This page is 127 kilobytes long. This may be longer than is preferable; see article size.

I very nearly didn't contribute, once I had decided I would not read that much for context.

192.42.249.130 01:11, 11 August 2006 (UTC)

Wear Leveling

People have removed my {{Fact}} tag from the section that states that the difference between the binary capacity and the decimal capacity is used for wear-leveling. I've never heard that this is the case, and as such would like to know where people are getting this information since it's (apparently) common knowledge.

Any flash memory invariably has extra cells to keep the yield at a manageable level, so there's really no reason that manufacturers couldn't add extra cells for wear levelling too, other than economics (indeed, some flash memories DO match their binary capacities).

Until somebody shows evidence that wear-levelling is responsible for the "lost" capacity, I'm adding the {{Fact}} tag again. 129.128.213.126 20:40, 4 October 2006 (UTC)

I rephrased the entire paragraph to exclude wear-levelling as a reason for describing flash drives in binary multiples. Wear-leveling is NOT the reason or a factor in the "decision" to call them 256MB, but I didn't state that in the article. Flash drives almost universally offer the full decimal megabyte capacity of their designation, but my data is original research that isn't published anywhere, so I left the article more vague. Either way, the {{Fact}} tag is much less important, I think. --JJLatWiki 00:56, 5 October 2006 (UTC)

Legal Disputes, part deux

I just added another lawsuit. And I find it more interesting in that now, 3 of the 4 cases were brought by the same legal team of Gutride and Safier. That's 2 for 3 for Gutride Safier. Half million dollars + $2.4 million. Not a bad deal just for screwing over a few million consumers who will never even know Gutride Safier screwed them. That's just my personal opinion. All of it is personal opinion. Every word was opinion. I was claiming no facts. Please don't sue me, the company I work for, my internet service provider, or Wikipedia for my personal opinion. --JJLatWiki 19:20, 6 November 2006 (UTC)

Why aren't they suing software manufacturers for misrepresenting the size of files? :-) Or maybe the hard drive manufacturers should sue Microsoft for getting them into a lawsuit by reporting the sizes of their drives wrong. — Omegatron 21:00, 6 November 2006 (UTC)

ZACKLY! Microsoft needlessly sucks up more of my drive space than anything else and they're the one giving consumers the wrong capacity. But I would really like to file a class action lawsuit against the the lawyers and the plaintiffs who brought these suits. Because of them, the cost of all hard drives, MP3 players, and flash drives is higher than it would have been. Other than the lawyers getting millions of dollars, and the plaintiffs getting their $1000, no one was helped by any of these lawsuits. (all entries made by me are my own personal opinion, I am not stating any facts or making assertions of any kind) --JJLatWiki 22:17, 6 November 2006 (UTC)

question

is there even, in the entire internet, or all the ocmputers in the world, one yottabyte of storage overall? In bandwidth it may be possible, but in harddisk storage? —Preceding unsigned comment added by 64.26.148.160 (talk • contribs) 21:55, 17 December 2006

Windows File Sizes

Under Windows' File Properties window, it uses the ambiguous "KB", is that referring to 1000 or 1024? --Wulf 06:14, 21 December 2006 (UTC)

1024 Plugwash 12:10, 21 December 2006 (UTC)

I don't know for sure but i bet it is KiB, that is 1024. Why don't u make up several varying length text files and see how they are reported, say 999, 1001, 1023, 1025? Tom94022 18:28, 6 January 2007 (UTC)

Outside of Wikipedia and wikipedia-based sources, there is no confusion or ambiguity on this point. Any reference in Windows to "KB" or "KiloBytes" in the Microsoft Windows operating system indicates 1,024 Bytes. (I submit to you that the wikipedia articles, as they are now, are a greater source of confusion and ambiguity than the industry standard is.) -Libertas 19:59, 7 January 2007 (UTC)

There is plenty of real customer confusion - why do most HDD manufacturer websites (e.g. [|Seagate] have a FAQ explaining this problem? Since at least Windows 3 (probably Windows 1) Microsoft has consistently and without prominent explanation used the SI system of prefixes in a manner that is incorrect in accordance with the published industry standard for kilo, etc. On the other hand, most of the HDD industry has consistently used the SI system as specified since it was standardized in the early 1960's. Microsoft has the perfect right to define its own system of units but because they are virtually identical to the industry standard I submit Microsoft is obligated to prominently display it's deviant definition, say by putting information lines, e.g.

1 GB = 1,073,741,824 Bytes

in appropriate displays. Microsoft's failure to prominently display such information has caused real customer confusion such as when things don't fit or HDD's are not reported at their specified size. Furthermore this has resulted in several lawsuits. I don't think Wikipedia even come close on any confusion scale. Tom94022 02:38, 8 January 2007 (UTC)

Properties (click to enlarge); note number of bytes in parenthesis.

Tom94022 - Please note that the "point" I was referring to was the question about the "Windows' File Properties window"; in other words, in the Windows operating system (as well as other operating systems, like Mac OS), the values have always been consistently expressed. Likewise, virtually all code written from the "early 1960's" has used the industry standard values; which came first, the software or the Hard Drive manufacturers? ;-) To answer your question, though, the HDD manufacturers have faqs because they are the ones who are confusing people (note that the HDD makers get sued for misrepresenting the values, not the software companies) :-)

As for your comment "Microsoft is obligated to prominently display it's ... definition": please see the Properties window of any file or filesystem (see image to right). Regards, -Libertas 11:47, 8 January 2007 (UTC)

Liberty Miller, you keep refering to an "industry standard". To which industry are you referring? The general purpose operating system industry? Or the spinning magnetic disk industry? But I agree with you that all present day OS's and their forefathers have consistently expressed file sizes using the inaccurate and technically incorrect definition of kilo, mega, giga, etc. But because of that, it is they who are the source of the confusion. Prior to spinning disks, there were streaming tapes, and even paper punch cards. Such storage media had capacities expressed with decimal prefixes, and the expression of those capacities could predate the software reporting of the capacity.

You also said, "the HDD manufacturers have faqs because they are the ones who are confusing people (note that the HDD makers get sued for misrepresenting the values, not the software companies)". That is a logical fallacy. Wheel barrel manufacturer "get sued" when some moron hurts someone because he strapped his wheel barrel to his truck and uses it as a trailer. Plastic bag manufacturers "get sued" when someone's child suffocates in their plastic bag. Ford "gets sued" when someone gets hurt or killed because an uninsured drunk driver runs a red light at 90 MPH and T-bones some innocent family in their Ford Escape. Using law suits to decide on whom to blame the binary prefix confusion is worse than those examples, and not just because they ARE frivolous and no one got harmed in ANY way. Try find a lawyer willing to sue Microsoft (which would be truly entertaining) because its OS misrepresents the true capacity of a new hard drive, not to mention the amount of drive space it pirates and doesn't report for things other than the storage of the actual data. If the industries reversed their respective definitions of Kilo and Giga, do you think anyone would sue the OS guys for making it seem as if a 10MB file only used 9MB of disk space? --JJLatWiki 17:10, 8 January 2007 (UTC)

U say "all present day OS's and their forefathers have consistently expressed file sizes using the inaccurate and technically incorrect definition of kilo, mega, giga, etc." FWIW, I'm pretty sure DOS DIR never used prefixes at all and they were also not in the original FDISK's. Likewise I am pretty sure that many if not all UNIX variants have switches on commands which allow you to get capacity in decimal or binary strings, but I also think these are without prefixes. Furthermore, at the OS level I believe all you get is a string of digits (binary in most cases, decimal if switched), so it is when the OS reported capacity is converted to a GUI representation that the confusion occurs. To me it feels like sloppy programming - sort of like the meters to feet problem that caused a Mars probe to crash Tom94022 17:34, 8 January 2007 (UTC)

I concede that the MS DOS FDISK and DIR commands probably never used prefixes, but I don't think that is a sufficient test to disqualify my statement (on which I would not have bet money). In the Windows lineage, other commands have reported sizes and capacities like the CHKDSK command. I only have bootable floppies going back to a DOS 3.x, and I have no interest in climbing through boxes of crap to verify it. Since the need to report KB diminishes as you go back in time, I'll change my earlier posit slightly to say that WHEN they used prefixes, all present day OS's and their forefathers used the inaccurate and technically incorrect definition for those prefixes. I qualify my definition of "OS" to include all the standard OS utilities like CHKDSK, SMARTDRV, SCANDISK, du, df, CATALOG, and the GUI that the OS publisher created for that OS, which obviously opens a large gray area, especially for the various Unixii. --JJLatWiki 17:17, 9 January 2007 (UTC)

Some of the later FDISK's did include an information line (1M = 1,048, 576 bytes) but I think you will have to concede many CHKDSK's and SCANDISK's did not use binary prefixes. For intellectual curiosity, awhile ago, i went thru various DOS's (MS & PC) on this very subject and I think I found most DOS command line utilities displayed decimal bytes without commas, or prefixes. It was painful, so I don't want to repeat the experience, but I did pull out my trusty MS-DOS Encyclopedia, (c) 1988 by Microsoft, and indeed the CHKDSK command reports in decimal digits without commas or prefixes; this corresponds to MS/PC DOS 3.3. I also ran my SCANDISK in Win2k and got a decimal digits without comma's presentation. Note that both Int 13H(hex) function 08H, BIOS "Get Current Drive Parameters" and Int 21H function 1CH System "Get Drive Data", return disk capacity as the product of several binary strings so some programmer had to convert binary strings into decimal numbers in order generate the CHKDSK and other displays in at the DOS level. Why they then chose to divide by binary numbers rather than just shift the decimal point is beyond me :-)Tom94022 20:26, 9 January 2007 (UTC)

Ahah. Maybe the blame goes a little deeper. Who were the BIOS writers who created the Int 13H/08H call and why did they output binary strings? Who decided on 512 byte cylinders and why? It was probably simpler and less computationally expensive to use 512 or 1024 and binary in general instead of 500 or 1000 because of the nature of binary processors. And maybe the conversion to Ki (bit or byte) took place at an earlier point in the execution sequence before the conversion to decimal. And no one was ever interested in correcting those first mistakes for fear of causing some serious cascade of subsequent required changes. "Why fix it if it ain't broken?" was probably spoken as proportionally often back then as it is today. --JJLatWiki 21:03, 9 January 2007 (UTC)

It is likely that the decisions were made by Xebec controller firmware engineers and Microsoft BIOS / Device Driver firmware engineers, all under contract to and direction of IBM for the PC/XT (I seen to recall that IBM PC/XT BIOS was written by Microsoft not IBM). I suspect most of this programming was done in assembly languages, so that Hex notation made sense. Sectors sized in binary increments of bytes also makes sense for the same reason, 512 byte sector size, was probably in someone's estimation the Goldilocks number, not too big (1024) because of truncation of small files, not too small (256) because of sector overhead, just right :-). BUT, at the DOS level, for the most part Microsoft went thru the process of calculating disk size by doing a series of hex computations AND then translating the result into decimal units - thats what u see in CHKDSK, DIR, etc. No prefixes, no problems (other than the lack of comma's made reading hard). However with the advent of the Windows GUI, someone decided to take the same calculation of drive capacity, move the binary point (shift operation) and then convert to decimal for display using decimal prefixes to account for the shift. This is mathematically improper and the source of the confusion. The whole idea of SI was to eliminate conversion factors that are required when u have non-integrated systems of units but thats what these guys created when they did this. The only possible reason is that shifting is faster (and easier) than dividing, but we are talking about a fairly high level programming by the time we get to Windows GUI's - IMHO, there is simply no reason, other than sloppy programming, why they couldn't have converted the drive capacity calculated in Hex to decimal and then divided by the SI proper values to move the decimal point. I suspect they were oblivious to the problem they were creating, simply thinking that no one would care about the difference between 1,000,000 and 1,048,576 - it's only 4% Tom94022 20:36, 11 January 2007 (UTC)

Libertas - The HDD manufacturers have consistently and properly used the SI Prefixes since they were defined in the 1960's well before Microsoft and Apple even existed. Furthermore, as near as I can tell the originally dominant software company, IBM, correctly used SI Prefixes in describing it's DASD (IBM term for what we now call HDD). Ditto for Digital HDD's (they may have fallen from grace with some early floppy products). Finally, Apple bought its disk drives from disk drive companies using SI prefixes in their specifications and contracts (still do :-). I bet Microsoft does the same in its game line. Therefore, I suggest:

1) Microsoft and Apple are responsible for the confusion, and

2) because they caused the problem, they have an obligation to inform the consumer of their deviant usage.

You and I know the problem caused by mixing unit systems, the casual user does not. Therefore the screens cited are not sufficient to stop the confusion. The only way for them to stop this is to include a statement something like:

1 GB = 1,073,741,824 Bytes not 1,000,000,000 Bytes

every place they use KB, MB or GB in a deviant manner. BTW, Microsoft File Properties has NOT always displayed both the full decimal value along with its deviant prefix, see, e.g. Windows 3.11. Tom94022 17:23, 8 January 2007 (UTC)

Here Here! --JJLatWiki 17:17, 9 January 2007 (UTC)

No! The "1 GB = 1,073,741,824 Bytes not 1,000,000,000 Bytes" statement sounds like "Don't listen to others, we tell you the truth!". Some people might think "Oh, according to my HDD box writing it should be 1000000000 Bytes, but if Microsoft says it isn't I should believe them.". I think that they should write "We don't care about the SI standard which says that 1 GB is 1000000000 Bytes. Instead we like to define it with the unremembarable number of 1073741824 Bytes". But if they wrote that they could as well just use the standard...

Once More On HDD

An anonymous user added text regarding HDD manufacturer usage of hybrid GB prefixes. I added a {{Fact}} because I am not sure of any such current usage. Can anyone cite such current usage? If not, I'm likely to revert this Tom94022 18:28, 6 January 2007 (UTC) I did the fact checking and was unable to find hybrid usage by any of the significant HDD manufacturers, so I rewrote the HDD portion of the article including eliminating the hybrid usage comment Tom94022 19:50, 6 January 2007 (UTC)

0x prefixes before base 16 numbers

This prefix is probably should be omitted, because it may probably confuse some people who don't know C/C++ but know math. --Yonkie 16:26, 20 January 2007 (UTC)