Wikipedia talk:Manual of Style/Dates and numbers/Archive 88

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 85

Archive 86

Archive 87

→

Unit symbols are never italic

It will be damn hard to get the articles to comply with basic rules such as italicizing variables, and having mathematical operators (such as ln, cos, etc.) upright and unit symbols upright, if the Manual of style doesn't follow the basic rules.

For example, here is NIST Special Publicaion 811, Guide for the Use of the International System of Units (SI), 1999, section 6.1.1:[1]

6.1.1. Unit symbols are printed in roman (upright) type regardless of the type used in the surrounding text.

Gene Nygaard 16:30, 12 October 2007 (UTC)

I would support the adoption of this (SI) rule. It is simple to follow and avoids unnecessary ambiguities. Thunderbird2 18:59, 12 October 2007 (UTC)

Strongly oppose; it makes utter nonsense of italicized material to anyone who is not intimately familiar with the SI style, especially if the non-italicized item comes at the end of the italicized passage. Look in WP:MOSNUM edit history for Gene Nygaard's recent versions where he changed all of the italicized examples to suit this curious style; the result was confusing and essentially unreadable, losing the syntactic fuctionality of italicization in the first place. See the archives here for extensive debates on various topics that always end up resolved as "WP is not SI, and we don't have to do everything that SI recommends; we are writing a general-audience encyclopedia, not a paper for Nature." Using the SI style would also in many cases necessarily entail altering quoted material, which we studiously avoid. Lastly, it just violates the basic logic of italicization of passages, which follows that of quotation, parentheticalization, etc., of passages - if it belongs inside, it goes inside, and if it doesn't, it doesn't. — SMcCandlish [talk] [cont] ‹(-¿-)› 19:30, 14 October 2007 (UTC)

PS: The passage Gene quotes is not from SI, it is from NIST, and we don't know whether NIST (a strange organization I won't get into) got this from SI documents or simply made it up for their own internal purposes. And the recommendation is not mirrored elsewhere, e.g. in the Chicago Manual of Style. — SMcCandlish [talk] [cont] ‹(-¿-)› 19:33, 14 October 2007 (UTC)

A quibble. Nothing is "from SI". SI is a system of units. Not a person, not even in the corporate person sense. Various organizations are involved in its development, such as the CGPM (which deals with the fundamental issues and makes a framework fleshed out by others), CIPM, BIPM, ISO, NIST, NPL, ASTM International, IEC, etc. Gene Nygaard 05:43, 15 October 2007 (UTC)

Right; I should have said CGPM, which appears to be the SI standardization body/process. Doesn't affect the overal point I'm trying to get across. WP isn't CGPM, and isn't bound to do what CGPM recommends for its target audience which is not our target audience. — SMcCandlish [talk] [cont] ‹(-¿-)› 05:52, 15 October 2007 (UTC)

The CGPM deals, as I said, with the essential framework; the details of everyday use are fleshed out by various international, professional, and national standards organizations. Especially in the case of a rule that is not SI-specific, but rather borrowed from the general rules of typography, the CGPM isn't going to bother with it. Furthermore, when the CGPM does decide something, it is often couched in legalese. It didn't tell us, for example, to "stop using the term micron to refer to the micrometre". Rather, it "abrogated" (the French term for that is actually the official one) its decision of such-and-such date which included a definition of the micron. Gene Nygaard 15:48, 15 October 2007 (UTC)

And when I make statements about the non-existence of something, I often turn out to be wrong. In fact, the CGPM did address it, way back twelve years before the International System of Units even existed, Resolution 7 of the 9th CGPM (1948):[2]

Principles
Roman (upright) type, in general lower case, is used for symbols of units; if, however, the symbols are derived from proper names, capital roman type is used. These symbols are not followed by a full stop.

Gene Nygaard 23:33, 15 October 2007 (UTC)

It isn't "SI" style. It is basic notation general in scienctific literature. It doesn't make a damn bit of difference if it is only ft and lb or whatever. Unit symbois are upright. Variables are italic, or bold or with an arrow on top for vectors. Mathematical operators such as log and sin are upright, not italic. Gene Nygaard 22:07, 14 October 2007 (UTC)

Here, for example, is the AIP Style Manual, 4th ed. 1990:

2. Roman versus italic type
. . .
(3) Some latin letters, considered abbreviations of words, are properly roman instead of italic—for example, chemical symbols (O, Ne), most multiletter abbreviations (fcc, ESR, exp, sin) and most units of measure (K, Hz). But the editorial staff of the journal is rained to spot these, and authors need not mark them for roman type unless confusion is especially likely:

Gene Nygaard 22:30, 14 October 2007 (UTC)

It is supported by, but not totally unambiguous in, U.S. Government Printing Office Style Manual, section 11.12:[3]

11.12. All letters (caps, small caps, lowercase, superiors, and inferiors) used as symbols are italicized. In italic matter roman letters are used. Chemical symbols (even in italic matter) and certain other standardized symbols are set in roman.

Gene Nygaard 22:43, 14 October 2007 (UTC)

Furthermore, if SMcCandlish doesn't know enough to understand that the NIST SP 811 is a widely recognized authority in this regard, he need look no further than the BIPM current 8th revision of its SI brochure in section 4.2:[4]

There are many more non-SI units, which are too numerous to list here, which are either of historical interest, or are still used but only in specialized fields (for example, the barrel of oil) or in particular countries (the inch, foot, and yard). The CIPM can see no case for continuing to use these units in modern scientific and technical work. However, it is clearly a matter of importance to be able to recall the relation of these units to the corresponding SI units, and this will continue to be true for many years. The CIPM has therefore decided to compile a list of the conversion factors to the SI for such units and to make this available on the BIPM website at

www.bipm.org/en/si/si_brochure/chapter4/conversion_factors.html.

[end of quote]Then from that page, click on that last offset link. Then note carefully the url on the page that link takes you to. It isn't in fact on the bipm.org site. Rather, it takes you to nothing other than the appendix of that very same NIST SP811, whose veracity SMcCandlish has had the temerity to challenge. If you don't want to go to the trouble to go to the BIPM site first to click on the link, here is that same url made into a link you can access here, intentionally left visible in its full form, and pay attention to where it takes you: http://www.bipm.org/en/si/si_brochure/chapter4/conversion_factors.html

Now, while I have repeatedly emphasized above that this rule is not anything specific to the SI, what do you suppose the BIPM has to say about the italic symbols? Ibid., section 3.1, table 5:[5]

Prefix symbols are printed in roman (upright) type, as are unit symbols, regardless of the type used in the surrounding text, and are attached to unit symbols without a space between the prefix symbol and the unit symbol.

In other words, the BIPM if anything is even more straightforward and direct than NIST is about this. Gene Nygaard 01:32, 15 October 2007 (UTC)

Please do not engage in straw man arguments with me. I did not challenge the veracity of NIST SP811, as you allege; I challenge its relevance here. (I also noted that the source of NIST's "always upright" recommendation wasn't clear from what you'd written at the time, and as a side point I noted that I'm not particularly impressed with NIST as an organization.) — SMcCandlish [talk] [cont] ‹(-¿-)› 06:31, 15 October 2007 (UTC)

To go along with SMcCandlish's bashing of the U.S. national standards laboratory, maybe she/he would like to take on the national standards laboratory of the UK as well. Here is the National Physical Laboratory, UK page on "SI Conventions":[6]

"Unit symbols are in roman type, and quantity symbols are in italic type with superscripts and subscripts in roman or italic type as appropriate."

N.B. While the NPL does include this short list of conventions on their website, note who they defer to as the experts on the subject at the bottom of the page:

NIST have produced a complete 80+ page document covering all aspects of this which can be downloaded at: http://physics.nist.gov/cuu/Units/rules.html

Gene Nygaard 02:12, 15 October 2007 (UTC)

Gene, you can cite this stuff for 100 years, but the response will remain the same: Wikipedia is not NIST or SI or BIPM or AIP, it is Wikipedia. WP is not bound to use the style preferred by physics PhDs just because they are physics PhDs and they prefer it. Please note that this page has almost 100 archives. Don't you think that this issue has come up before? It's been proposed and rejected here before that SI recommendations be followed to the letter, that the CMOS recommendations be followed to the letter, and so on, and there isn't a compelling reason to do so in any case. This is a general-public work, and we can't adopt every style convention preferred by the science community (but no one else); MOS has adopted as many of them as we collectively think the readership can handle, generally with very specific rationales for every such adoption, and not without opposition even for some of those baseline adoptions. Likewise, we generally do what the CMOS says, modulo conflicts it has with other general-usage style guides, but happily abandon it where it does not make sense for the Wikipedia medium and context.

One major issue I have with the style you are advocating here is that it was devised for a visual, typographic medium, and does not mesh at all with digital semantic markup. It's just fine and dandy for books and PDF files, and jars radically with XMLish thinking. If you can't see what is conceptually off-kilter, with regard to this medium, with specifying that examples (for example) are italicized and then contradictorily saying one should write examples with units in forms like for example ''a width of 4.2'' cm, not ''a width of 4.2''cm, then I really don't know what else to say to you. This not NISTopedia. PS: I'm sorry I gored your sacred cow in being unimpressed with NIST (for reasons that honestly don't relate to this discussion at all, so my dig at them was a non sequitur, I admit), but boosting their support of this "always upright" style by citing other NIST-like sources is missing the point entirely; there could be 100 such sources and it doesn't affect the issue here. — SMcCandlish [talk] [cont] ‹(-¿-)› 05:45, 15 October 2007 (UTC)

One possible compromise on this would be creating a template, for use only in hard-core science articles where this convention could perhaps arguably be reasonably insisted upon, called {{itunit}} for units in italics, the entire transcluded source code of which would read <span style="font-style:normal;">{{1}}</span>, used as in for example ''a width of 4.2 {{itunit|cm}}'', not ''a width of 4.2{{itunit|cm}}''. This would preserve the semantic integrity of the italics (which, remember, are an XHTML construct by the time they reach the reader's browser, enclosing the italicized phrase/passage (i.e., note the placement of the italics, which differs from the similar example given above). — SMcCandlish [talk] [cont] ‹(-¿-)› 06:24, 15 October 2007 (UTC)

I believe it is wrong to apply a significantly lower (or different) standard to articles in Wikipedia than those in Nature, except where it is about required existing knowledge. Every WP reader can be expected to understand basic units and symbols, which encompasses most if not all of the SI.

One of the major pros of the International System is the strict and simple style guide it comes with. This is most useful if followed by everyone – the goal of standards is consistency, and consistency is good. The particular case of italics, however, should not be interpreted as strict as, for instance, the composition of the symbols themselves. In mathematic text, formulas and the like, their font style should be selected the same as for the symbols of trigonometric functions etc., i.e. upright, whereas variables are usually italic. In normal text, on the other hand, they should, in my opinion, not be treated special; this includes examples and quotations. Christoph Päper 12:58, 28 October 2007 (UTC)

The simple style guides don't make such an exception.

The "do as I do" principle will make it impossible to enforce those guidelines elsewhere, if our own Manual of Style doesn't follow them.

We can achieve our desired emphasis in a number of ways which will not entail violating these rules. Some possibilities include:

Using quotes rather than italics
Setting examples off from the text (bulleted or otherwise)
Reevaluating the say we use emphasis in the manual:
- Using emphasis on Examples:
- Using emphasis on instructions such as "use this, not that", X is preferred, but Y is acceptable
A combination of both offset examples and italicized instructions, as in Wikipedia:Manual of Style#Inside or outside.

There are other areas besides units of measures where this would apply. If anyone put a title of a book or a binomial name of the genus and species of a plant in one of our italicized examples, that title or scientific name would be set upright by some of our editors in pretty quick order, I would bet. But if a symbol for a variable is used in the MoS, it would be set in italics, whether it appeared in normal text or in italic text. The rules are different in the various cases: 1) italic in normal text, upright in italic text 2) always italic, or 3) always upright. Gene Nygaard 13:27, 28 October 2007 (UTC)

Whatever we actually do in our MoS becomes a rule by example. Gene Nygaard 14:08, 28 October 2007 (UTC)

But the manual of style is, like all other guidelines, a codification of what is already done at large on the encyclopedia (it is intended mainly so that new articles are not wildly different from existing one, and to advise where practice outside Wikipedia might follow conflicting styles). As the vast majority of articles use upright characters, changing it before practice changes is simply pointless and won't work, and isn't the way it is supposed to work anyway. Circeus 03:55, 29 October 2007 (UTC)

Circeus, I don't entirely agree with your take on MOS. It's influenced by a number of factors, one of which may be what is already done on WP (that comes especially into play when there are back-compatibility issues), what the experts here think is best for an online encyclopedia, with all of its unique conditions (and gain consensus for), and what is done "out there". I think you're oversimplifiying these factors. Tony (talk) 14:25, 29 October 2007 (UTC)

That doesn't mean the point I make (virtually no article uses italics unless the unit is supposed to) isn't an important element to take into account. Primum non nocere: I utterly fail to see why a sudden switch to italics can do anything but cause widespread controversy. Circeus 21:24, 29 October 2007 (UTC)

Geez. It seems clear to me that, with the context "Unit symbois are upright. Variables are italic, or bold or with an arrow on top for vectors. Mathematical operators such as log and sin are upright, not italic.", that this is intended to apply _within formulas_ - i.e. of course, unit symbols should be upright, for the same reason digits are upright. But in running text, italic doesn't mean the same as it means within a formula. —Random832 16:51, 2 November 2007 (UTC)

In other words, in saying 1) italic in normal text, upright in italic text 2) always italic, or 3) always upright. we ignore the trivial case: 0) upright in normal text, italic in italic text. (even if the "italic text" consists solely of the unit symbol itself, which is italicized for emphasis) —Random832 16:54, 2 November 2007 (UTC)

I have to entirely agree with Random832 that this "always upright for units" thing applies to formulas, not general prose. Regardless, the {{itunit}} template I've provided solves this problem from both Gene's and my perspective: He gets his forced non-italicization of units when they are part of an italicized unit, and I get my semantic integrity of italicized unit. Everybody wins! — SMcCandlish [talk] [cont] ‹(-¿-)› 16:05, 24 November 2007 (UTC)

Date-formatting tool for lists of people

I have developed an external-editor-based tool for standardizing dates of birth and death for lists of people. An example of its successful use is List of jazz musicians. It also flags entries where the dates are exceptionally peculiar, as (b. ce. 1822) [I'm still trying to figure out what that one means], so I can try to edit those by hand. The jazz list took 5 minutes total, including the manual fiddling. Since it relies on cut-and-paste, some special characters get clobbered, especially where the names are Japanese, Scandinavian, Polish or Turkish, so I select lists where these are few and I can restore them by hand. I invite opinions, suggestions and requests. I watch this page, or you can dump them on my talk page. Chris the speller 01:27, 14 October 2007 (UTC)

Looks good, Chris, except that it would be swimmingly good to make the closing year two digits alone (Louis Armstrong (1901–71) rather than Louis Armstrong (1901–1971). But is that going to be complicated programming to chart a course around changed centuries, and to make it CE only? Tony (talk) 03:40, 14 October 2007 (UTC)

The "b. ce. 1822" thing: Typo for "b. ca. 1822", surely. — SMcCandlish [talk] [cont] ‹(-¿-)› 19:24, 14 October 2007 (UTC)

One might think so, but there were two separate entries like this in List of preachers; one could not be easily traced (no article, and searching the Web did not provide an easy answer), and the other led to an article which had an exact year, so who knows what was meant? I left them alone, go have fun.

It could also, of course, be born Christian Era/Common Era 1822, with no indication of uncertainty. As anybody who was around for the BC/BCE debates should be well aware of. Gene Nygaard 22:54, 14 October 2007 (UTC)

That doesn't resolve the uncertainty at all, simply shifts it. We'd still have to convert to "ca." because we do not know, short of tracking down the editor responsible for it and asking, whether it meant "Christian Era", "Common Era" or "circa"; thus "ca." would still work just fine pending someone else coming along later with a reliable source showing that the exact year was meant. I find it hard to credit that any editor here would have put "ce. 1822" when they really meant "1822 Christian Era" because they would really write "1822 A.D." or "1822 AD"; meanwhile people who use Common Era know that it goes at the end and is not rendered "ce." So, this is a non-issue - "ca. 1822" includes "exactly 1822" as a subset, and we cannot be certain that "exactly 1822" was intended. — SMcCandlish [talk] [cont] ‹(-¿-)› 05:59, 15 October 2007 (UTC)

Two-digit years

The 2-digit year thing would be a snap, but I don't favor that format as much in lists as I do in running prose ("coached the Detroit Lions 1943–47 before moving to Bora Bora"). Before Tony's huge overhaul, I think the guideline more or less tolerated 2-digit years at the end of a range, but now it seems to downright favor 2 digits over 4, where allowable. Have I misremembered it? Is this now a case of clear consensus? I think it would be easier to scan a list and find a person who died in 1956 if the list did not look like this:

Herman Dwizzlebot (1811–56)
Elmer Fringthorpe (1798–1856)
Marvin Krummblatt (1922–56)

But if there is clear agreement that 2-digit years are preferred, even in lists, I can easily handle it. Chris the speller 20:52, 14 October 2007 (UTC)

Abbreviating the years for some people would be ludicrous in any list where some of the people span centuries and could not be abbreviated. And the MoS doesn't require doing so now, and I'm glad of that. As it is, I strongly suspect that the current project page, perhaps because of some of the recent changes in this area, go far beyond what is actually agreed upon as far as using two-digit years. Gene Nygaard 22:54, 14 October 2007 (UTC)

Concur with Gene. And to me it simply looks lazy and imprecise. I never (that I know of) use "1983-87" but "1983-1987". — SMcCandlish [talk] [cont] ‹(-¿-)› 06:05, 15 October 2007 (UTC)

To me, it is so much easier to read when only two digits are used, and as a reading psychologist, I'm pretty sure that 1983–1987 makes the readers work harder to determine that it's a five-year range than 1983–87. The two-digit closing year also more clearly indicates that the range lies within a single century. Since there is no clear vertical alignment in a list—the names are all of different lengths—there's no issue of neatness/alignment in using both nine- and seven-character ranges, as you see above (Fringthorpe). Repeating the century (18) helps to focus my eyes on the fact that the range occurs in two centuries. I think there was consensus for the wording on two-digit closing years at the time. I'd prefer that it was extended to include page ranges, which are ridiculous when big closing numbers are rendered in full: 12686–12689 versus 12686–89. The former is not uncommon in reference lists of schorarly/scientific journal articles and conference proceedings, and does no one a favour. Tony (talk) 06:58, 15 October 2007 (UTC)

Call me crazy, but to me it's much harder to read ranges such as 1983–87. My thought process goes something like this: "1983 to 87? Hm, that's a range of almost -2000! Wait, this is abbreviated... OK, let's see, the first number starts with 19; 19 times 100 plus 87 is 1987. What was the original number again? Oh, 1983. Ok, they really meant 1983-1987". Sure, it takes only a moment to go through this process, but the momentary confusion it causes makes me severely uncomfortable. I never use abbreviated ranges like this. The same goes for page numbers even if they have five digits. --Itub 14:05, 17 October 2007 (UTC)

Um ... ok, you're crazy. Tony (talk) 14:30, 17 October 2007 (UTC)

Crazy doesn't quite fit Tony, however—just a little strange, perhaps weird, not processing information the way others do. Most people aren't going to see his span as five years in any case, whether it is written as 1983-1987 or 1983-87. They'll just subtract the numbers and get four years. Furthermore, while if you are talking about spans covering the complete years (or including seasons in a sport and the like), it is indeed the five separate years 1983, 1984, 1985, 1986, and 1987, when it comes to lifespans most people's intuitive subtraction is more correct. In that case, it can vary from just over 3 years to just under 5 years (3 yr 1 d to 4 yr 365 d), and will average out at about four years. Gene Nygaard 14:43, 17 October 2007 (UTC)

After processing dozens and dozens of lists, I am slightly surprised that I have not seen a single case where an editor had compressed the date of death to 2 digits (out of respect for the deceased, perhaps?), so I think that for all intents and purposes the question is answered; out of probably hundreds of editors, nobody is using 2 digits for death years in lists, as Tony recommends. But I wholeheartedly agree with his feelings about huge numbers in page ranges. Chris the speller 01:21, 17 October 2007 (UTC)

I agree. I think there's an unconscious formal standard when addressing lifespans. After all, those were two of the most significant years in any person's life. Outside of that, I think two-digit "end" years are fine, if slightly informal. Askari Mark (Talk) 02:25, 17 October 2007 (UTC)

Bits and bytes

The MOSNUM page includes guidelines for use of the byte and its multiples kilobyte, megabyte etc. Isn't there a similar need for guidance with the bit? For example, is a kilobit to be abbreviated kb or kbit? (The kilobit article contradicts itself in this respect, so I suspect the problem is widespread). Thunderbird2 10:42, 14 October 2007 (UTC)

I would say "yes". — SMcCandlish [talk] [cont] ‹(-¿-)› 19:23, 14 October 2007 (UTC)

According to IEEE 1541 the IEC symbol for the bit is b, so presumably a kilobit is kb, a megabit is Mb and so on. Have I got that right? Thunderbird2 19:36, 14 October 2007 (UTC)

That sounds right to me, but we can probably find an actual source on this. PS: Can someone refresh my memory on the reason for the "k" (lower-case) vs. "M" (upper-case) inconsistency? I suspect this is another of those SI things that is only used in science writing; I certainly detect no such convention in general usage (e.g. computer magazines and such, not that any of them mention kilo-anything very often these days; even RAM is measured in gigabytes now). — SMcCandlish [talk] [cont] ‹(-¿-)› 06:13, 15 October 2007 (UTC)

The reason for the lower case k, as you rightly guess, is the SI. The exception is made because the symbol for kilogram (kg) was lower case before the SI laid down its rules. I'm not sure of the precise history, but perhaps it was considered too sacred to change. But this is not a "science writing" issue, it as an "unambiguous writing" issue. To be unambiguous, WP simply needs to adopt a convention and stick to it. And where there is a widely accepted (universally accepted in the scientific community) convention already in place, there seems little to be gained in departing from it. Does anyone out there have a copy of IEEE 1541 to check the correct symbol for the bit? Thunderbird2 10:31, 15 October 2007 (UTC)

According to the bit article, the IEEE prefers b while the IEC prefers bit, which is consistent also with Sec 6.1 of binary prefix. The NIST follows IEC, while BIPM (ie SI) remains silent on the issue. Although I don't have access to either of the two named standards (IEC 60027 or IEEE 1541) , I can confirm that IEEE uses b in the related standard IEEE Standard Letter Symbols for Units of Measurement. On balance I propose we follow IEC and adopt bit, kbit, Mbit etc. Thunderbird2 11:58, 15 October 2007 (UTC)

I agree. The term 'bit' is unambiguous. It is ironic that binary is inherently precise yet the prefix and the unit symbols cause so much discussion. I have seen lots of misuse of 'b' for byte and 'B' for bit. I support your proposal. Lightmouse 17:27, 18 October 2007 (UTC)

I am also in favor of the unambiguous 'bit'. Odo Benus 18:52, 24 October 2007 (UTC)

I have obtained a copy of IEC 60027-2. On p115 it defines the symbol for the bit as bit, and on p120 states (verbatim)

one kibibit: 1 Kibit = 2^10 bit = 1 024 bit
one kilobit: 1 kbit = 10^3 bit = 1 000 bit

Thunderbird2 15:52, 20 October 2007 (UTC)

I’ve composed some text for Sec 4.3, intended to replace the existing bullet "The symbol for the byte is B". I’m sure it can be improved, and is deliberately lacking in advice about the kilobyte, megabyte etc (I thought it wise to steer clear of the megabyte vs mebibyte controversy by sticking mainly to the bit.) The proposed text reads:

The symbol for the byte is 'B'. The symbol for the bit is 'bit'. Unless explicitly stated otherwise, 1 byte is 8 bits (1 B = 8 bit). Decimal or binary prefix symbols may be added to either unit symbol ~~but not to the unit names~~. Thus
- the quantity 1000 bits (one kilobit) is represented by 1 kbit (and not 1 kb);
- the quantity 8,000,000 bits (eight megabits) is represented by 8 Mbit (and not 8 Mb or 8 Mbits);
- the quantity 8,192 bits (eight kibibits) is represented by 8 Kibit (and not 8 kbit ~~or 1 Mbyte~~);
- the quantity 1,048,576 bits (one mebibit, or 1024 * 1024 bits) is represented by 1 Mibit.

Thunderbird2 08:45, 21 October 2007 (UTC)

I agree that we should have a single symbol 'bit' for bit. Permitting an unambiguous bit symbol solves half of the problem of b/B, we should also permit an unambiguous byte symbol for the other half of the problem. We should tolerate two symbols: either 'B' or 'byte' (e.g. kbyte or kB, Mbyte or MB). For example, the following is the sort of error with bytes: "a 10-megabyte (Mb) hard drive". Lightmouse 09:11, 21 October 2007 (UTC)

The proposal doesn't explicitly rule out Mbyte to mean 10^6 bytes, but I take your point. How about this instead (see minor edits) Thunderbird2 09:34, 21 October 2007 (UTC)

The first sentence still implies that 'Mbyte' is forbidden in Wikipedia and must be replaced with 'MB'. The authors that make the error with byte are unlikely to be influenced by what we say here. I would be quite happy if you said nothing about the symbol for byte. You could say "Do not use 'b' for byte. Or you could have some other wording like "The byte can be shown as an upper case 'B' or unchanged (e.g. 'MB' or 'Mbyte'). Or something like that. Lightmouse 21:51, 22 October 2007 (UTC)

But if you are going to use "byte" as a symbol, at least say "never Mbytes". Gene Nygaard 01:00, 23 October 2007 (UTC)

How about this:

The symbol for the bit is 'bit'. The byte may be represented by either one of the symbols ‘B’ and ‘byte’. Unless explicitly stated otherwise, one byte is eight bits (1 B = 8 bit), and no 's' is added to either symbol in the case of a plural. Decimal or binary prefix symbols may be added to either unit symbol. Thus
- the quantity ~~1,000 bits (~~one kilobit) is represented by 1 kbit (and not 1 kb);
- the quantity ~~8,000,000 bits (~~eight megabits) is represented by 8 Mbit (and not 8 Mb or 8 Mbits);
- ~~the quantity 8,192 bits (eight kibibits) is represented by 8 Kibit (and not 8 kbit);~~
- ~~the quantity 1,048,576 bits (one mebibit, or 1024 * 1024 bits) is represented by 1 Mibit;~~
- the quantity ten megabytes is represented by either 10 MB or 10 Mbyte (and not 10 Mb or 10 Mbytes).

Thunderbird2 15:49, 23 October 2007 (UTC)

The prefixes Kibi~ Mebi~ etc (and their unit representations) are not widely used by most reliable sources whereas the traditional Kbit and Mbit (and other similar variations) are widely used and understood by looking at their context to mean the power of two values or base ten values. I suggest having the established style of the article and the style of the reliable sources used in the article dictate what symbols to use. If there really needs to be disambiguation then place that (once only as per other disambiguation examples) as a specifc number in the article in parenthesis after the symbols used. Then if at some later date the majority of relevant reliable sources changes to use the new prefixes the article can be changed to follow their example. This has parity with other similar MOS guidelines such as English and American spelling where for example the reliable sources use British English style but an American English editor suddenly decides to "spell check" copy-edit. Wikipedia isn't a place for one body to decide how to spell "Colour" for example. Fnag aton 16:12, 23 October 2007 (UTC)

If the use of these prefixes is controversial, perhaps it is best to omit explicit mention of them. Is it OK like this? Thunderbird2 06:42, 24 October 2007 (UTC)

Thank you yes, much better with those bits removed. Fnag aton 00:23, 25 October 2007 (UTC)

This discussion is not complete without thinking about how to represent "bits per second", which is the most common way of specifying data transfer rates. This is usually abbreviated to "bps", which is in conflict with declaring "bit" as the only way of writing bit. I cannot see "bitps" being advised. To me it seems more logical to reverse the case and state that the full name is "bit", the standard abbreviation (symbol) is "b". −Woodstone 07:41, 24 October 2007 (UTC)

I prefer to get consensus on the bit first. Compound units can be added later. I think the objection to using 'b' as a symbol for bit would be that this symbol is too often used (abused?) to mean byte. How would you resolve that conflict? Further, I see no conflict between use of 'bit' for bit and 'bps' for bit/s. Thunderbird2 08:45, 24 October 2007 (UTC)

I like the version you have now. The 'b/B' confusion exists in compound units too. For example: "Japan’s 30 Gbps (giga-bytes per second) of Internet circuit capacity".

The unambiguous forms (bit/s, kbit/s, Mbit/s, and Gbit/s) are in significant use outside Wikipedia (perhaps 10% to 50%). They are dominant on Wikipedia (over 90%). Lightmouse 11:15, 24 October 2007 (UTC)

How sure are you that the Japanese capacity is really in bytes, not bits? It might be a misinterpretation by the editor. I would be for standardising on b=bit, B=byte. Disambiguation can be achieved by a unit conversion like 30 Gbps (=28 Gib/s) or if the editor is mistaken, by 30 Gbps (=28 GiB/s). This is actually a double disambiguation. It indicates wheter the prefix is binary or decimal and how to interpret "b" (or "B"), assuming that the conversion always follows the agreed style rules. −Woodstone 12:06, 24 October 2007 (UTC)

I do not know what units the Japanese capacity is in. I take it that you don't either. The text does not parse. We have to do more work to check or rely on an educated guess. That is why I gave the example. As you say, somebody has misunderstood the use of the single letter. That is the point. At least the error in the text is visible in this case. It is the tip of the iceberg. Most errors are invisible. You and I can be trusted to use b/B correctly, but I don't 100% trust text from others. You have done sufficient work with units of measurement to know that people can be sloppy. I hope that you can support an unambiguous form. Perhaps that is why many editors here use it. Lightmouse 13:12, 24 October 2007 (UTC)

To summarise:

8 bit equal 1 byte except in certain historic contexts where it must be noted explicitly
bits take the symbol ‘bit’, bytes are either ‘B’ or ‘byte’
consequently bits or bytes per second are abbreviated ‘bit/s’ and ‘B/s’ or ‘byte/s’
the use of binary prefixes (Ki) is not mandatory, but not prohibited either, therefore decimal prefixes (k) may be used with a binary meaning, if the context makes this obvious

Some standards bodies – I think I read about this in a text by DIN – deprecate the use of ‘B’ for byte, because the symbol is already taken by the bel, as in ‘dB’. I begin to like the unambiguous French ‘o’ for octet, although it may look too similar to a digit zero in those rare cases where it appears without a prefix. (This is not a proposal of its adoption for WP of course.) Christoph Päper 14:00, 28 October 2007 (UTC)

British meaning of "billion"

The text on large numbers contains the text

"Where the British meaning [of billion] is required for some reason, a footnote or inline comment is appropriate. For instance, the budget item was several billion pounds (note this is billion in the common British usage).... "

I find this confusing because in my experience the old British use of this word to mean 10^12 is no longer common. In other words there is no longer any difference between common British usage and common American usage. How about rephrasing it along the lines of

"Where the old British meaning (10^12) is required for some reason, a footnote or inline comment is appropriate. For instance, the budget item was several billion pounds (note this is billion in the traditional British usage).... "

Thunderbird2 20:49, 15 October 2007 (UTC)

Perhaps "British" should be replaced by long scale ([[long and short scales|long scale]])? After all, it's still used in French.... — Arthur Rubin | (talk) 20:57, 15 October 2007 (UTC)

Why is the long scale ever required? The only circumstance I can think of is in direct quotations, where an editorial note would be required (I guess the inline replacement by the modern scale in square brackets would also be acceptable. BTW, a budget of several billion pounds, old-speak, would be several times more than a British budget has ever been. (= trillion). Tony (talk) 01:36, 16 October 2007 (UTC)

Indeed. I don't think the long billion has been used commonly for about the last 50 years - certainly not in my memory, which is nearly as long - and its use other than in direct quotes should be discouraged. -- Arwel (talk) 06:36, 17 October 2007 (UTC)

Outside of money, the short billion isn't used in British usage either. They generally avoid it, using "thousand million" and "29,000 million" and the like. Furthermore, there is usually one very simple alternative we can all generally use to replace these "billions" in most cases:

2.35 billion can be replaced with
2,350,000,000

Just using digits takes up about the same space. Let people call it whatever they like, in their own heads as they read it. That is often the best choice when such numbers only occur once or twice or some other small number of times in an article. Then you don't have to waste a lot of time explaining it, neither in the text nor in a footnote. Gene Nygaard 11:14, 17 October 2007 (UTC)

Seems to take up significantly more space here, and besides, is much harder to read. Tony (talk) 14:28, 17 October 2007 (UTC)

The British government adopted American usage for the the billion in the 1970s (as noted in the Long and short scales#History) it has been used by the British financial industry and the British chapter of the dismal science ever since to mean 1,000 million. I think there is no reason to note the British usage for the short scale as that is the norm in the financial industry and government. It should be noted if for some reason the long scale is used (Perhaps in EU comparison tables between France and Italy for example) --Philip Baird Shearer 21:06, 4 November 2007 (UTC)

I agree entirely with PBS. Let's leave behind the fuss about old-speak. Tony (talk) 23:30, 4 November 2007 (UTC)�