Wikipedia:Reference desk/Archives/Language/2006 August 5

From Wikipedia, the free encyclopedia
Humanities Science Mathematics Computing/IT Language Miscellaneous Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions at one of the pages linked to above.

< August 4 Language desk archive August 6 >


Think you know the English language? (British and American) __Take the RETF challenge__[edit]

RegExTypoFix (Regular Expression Typographical error Fixer, or RETF) is a set of over 1600 regular expressions used to automatically fix common typos and misspellings. The only need for a human is to verify that the replacements are correct and hit Save.

However, when I started I used a list of misspellings that were utter crap. They included many "misspellings" that were actually words. Now I know better and check multiple sources for every word I add.

That's where you come in. There are 1600 lines of words. All need to be checked.

I know they're kind of lame prizes, but I wanted to give something to anyone who could help me out. I have a deep passion for RETF and have made over 18,000 typo fixing edits with its assistance. I have 733 articles to fix right now.

Help Wikipedia out and start checking
Thank you!!! --mboverload@ 00:02, 5 August 2006 (UTC)[reply]

Appart from the many rare and indangered variant forms you are trying to cull, there is:
tenacle - a holder, forceps, a thing you carry a flag in.
persue - the trail of blood left by prey
and where would the world be witout humerous - "that hath great shoulders". MeltBanana 01:56, 5 August 2006 (UTC)[reply]
I went through a bunch. Check your talk page. --ColourBurst 01:57, 5 August 2006 (UTC)[reply]
OK I admit it I am just trying to make life difficult for you:
dispair - un-pair
discribe - un-write
belive - quickly, lively
curch - hanky hat MeltBanana 02:32, 5 August 2006 (UTC)[reply]

Wow, thanks! I guess estimating that you would only be able to find one error was quite a misjudgement on my part! --mboverload@ 04:24, 5 August 2006 (UTC)[reply]

If anyone has any more that would be awesome =D --mboverload@ 06:15, 5 August 2006 (UTC)[reply]
You mean to say you assumed you'd never have to hand out the barnstar? You deceiving rascal you! DirkvdM 07:01, 5 August 2006 (UTC)[reply]

<Typo find="\b(C|c)onneticut\b" replace="$1onnecticut" />: it's always capitalized. What you want is: <Typo find="\b(C|c)onneticut\b" replace="Connecticut" /> - Nunh-huh 07:13, 5 August 2006 (UTC)[reply]

"Completely automatic"? Wouldn't that require a rather advanced grammatical AI? 惑乱 分からん 12:30, 5 August 2006 (UTC)[reply]
It replaces the words automatically, but a human has to verify the replacements are correct before hitting save. Sorry, should have made that clear. --mboverload@ 12:37, 5 August 2006 (UTC)[reply]

<Typo find="\b(V|v)etween\b" replace="$1etween" />

Wouldn't this produce "vetween"? Perhaps it should be between? Road Wizard 12:41, 5 August 2006 (UTC)[reply]
<Typo find="\b(D|d)ieing\b" replace="$1ying" />
Possible variants dyeing and dying. Road Wizard 13:18, 5 August 2006 (UTC)[reply]
<Typo find="\b(C|c)overted\b" replace="$1onverted" />
Possible variants coveted and converted. Road Wizard 13:18, 5 August 2006 (UTC)[reply]
<Typo find="\b(C|c)opywrit(e|ed|es)\b" replace="$1opyright$2" />
For option 1, wouldn't this produce "copyrighte"? Road Wizard 13:18, 5 August 2006 (UTC)[reply]
<Typo find="\b(C|c)opywrite\b" replace="$1opyright" />
Possible variants copywriter and copyright. Road Wizard 13:18, 5 August 2006 (UTC)[reply]
<Typo find="\b(D|d)almatio(n|ns)\b" replace="$1almatia$2" />
<Typo find="\b(D|d)almation\b" replace="$1almatian" />
Isn't this a double post of the same spell check? Road Wizard 13:18, 5 August 2006 (UTC)[reply]
<Typo find="\b(C|c)ritisi(m|ms)\b" replace="$1riticis$2" />
<Typo find="\b(C|c)ritisism\b" replace="$1riticism" />
<Typo find="\b(C|c)ritisisms\b" replace="$1riticisms" />
A triple post? Road Wizard 13:18, 5 August 2006 (UTC)[reply]
<Typo find="\b(D|d)ecyphe(r|red)\b" replace="$1eciphe$2" />
Isn't 'decypher' an acceptable variant (as in cypher)? Skittle 19:27, 6 August 2006 (UTC)[reply]
<Typo find="\b(D|d)rummless\b" replace="$1rumless" />
And is 'Drumless' a word, however you spell it? I can't find it in a dictionary or the web, except as a sort of made-up thing. Does that count? If so, can it have a set spelling? Or have I missed a usage? Skittle 19:31, 6 August 2006 (UTC)[reply]
<Typo find="\b(E|e)conomics(t|ts)\b" replace="$1economis$2" />
Have I misunderstood, or would this return 'eeconomist'? And people might mean 'economics'. Skittle 19:36, 6 August 2006 (UTC)[reply]

Irregular Plurals[edit]

I read that there are only 13 irregular plurals in English. Is this true? I found

  • foot/feet
  • goose/geese
  • louse/lice
  • man/men
  • mouse/mice
  • tooth/teeth
  • woman/women
  • die/dice

what are some more? Reywas92 00:57, 5 August 2006 (UTC)[reply]

There's a lot more than 13. Especially those words of Latin origin whose endings turn from -us into -i into the plural. Also there are words like hypothesis and axis whose plural forms are respectively hypotheses and axes. --Chris S. 01:09, 5 August 2006 (UTC)[reply]
Moose/moose, deer/deer, child/children... zafiroblue05 | Talk 01:22, 5 August 2006 (UTC)[reply]
Any loanword from Japanese is most likely irregular; sushi/sushi, nunchaku/nunchaku, etc. --ColourBurst 01:25, 5 August 2006 (UTC)[reply]

Forms such as cactuses, instead of the more proper cacti, are becoming accepted. After all, who uses hippopotami these days? I call them hippos. People have starting using plurals like criteria as a single form, in place of criterion.

There are those odd words whose plurals don't mean quite what non-native speakers sometimes assume - if the plural of noodle is noodles, then what is a spaghetti, and what happens if you have more than one? Perhaps with breads, cheeses and wines.... and maybe even sheeps.... --TheMadBaron 03:11, 5 August 2006 (UTC)[reply]

Since spaghetti is from Italian, isn't the singular form spaghetto? —Bkell (talk) 04:50, 5 August 2006 (UTC)[reply]
Let's be a little more careful. Spaghetti, bread, cheese, wine (and, I'm almost certain, sushi) are uncountable nouns in English; sheep and nunchaku are countable nouns which have identical singular and plural forms. HenryFlower 09:27, 5 August 2006 (UTC)[reply]

But those are words from other languages. We have our own native irregular plurals. See English plural and ablaut. Adam Bishop 05:21, 5 August 2006 (UTC)[reply]

Even sticking just to words of Germanic origin, the above list omits child/children, man/men, ox/oxen. User:Angr 06:59, 5 August 2006 (UTC)[reply]


If we take into account all words that have migrated into English,there are many variations such as gateau/x, kibbutz/im, radius/radii. Then there are the remnants of ye olde English- brother/brethren and some archaic terms such as cow/kine. Perhaps it might be a useful addition to the plural article to have a list of plural endings.Lemon martini 14:44, 5 August 2006 (UTC)[reply]

So to sum up, there's well over thirteen. I'm thinking there's about a hundred of 'em. - THE GREAT GAVINI {T-C} 18:18, 5 August 2006 (UTC)[reply]

By irregular, I ment as in with a vowel change in the root of the word, as in my examples. Reywas92 16:58, 7 August 2006 (UTC)[reply]

Meaning and Origin of the expression"dum de dum."[edit]

I am interested in finding out the origin, meaning, and correct spelling of the expression "dum de dum". To me, it means a repetitive drum beat or sound, and, more generally, an expression used to denote boredom, perhaps because something is repetitive. Am I close? Thanks for any help you can give. --24.8.231.168 02:46, 5 August 2006 (UTC)[reply]

It comes from the Bible. It's what Jesus said to himself before beginning the Sermon on the Mount.--Teutoberg 03:09, 5 August 2006 (UTC)[reply]

Actually, Jesus was simply quoting Genesis 2:2: "And on the seventh day God said 'dum de dum, tum te tum,' and he rested from all the work that he had done." Ashibaka tock 06:02, 5 August 2006 (UTC)[reply]
How do you spell that in Hebrew and Greek? —Keenan Pepper 06:36, 5 August 2006 (UTC)[reply]
Hebrew: דם די דם, טם טי טם and Greek: δυμ δι δυμ, τυμ τι τυμ. —Daniel (‽) 09:23, 5 August 2006 (UTC)[reply]
Haha, cool. δαμ δι δαμ is closer to the English pronunciation, but δυμ δι δυμ sounds much funnier, especially when I imaging God saying it. —Keenan Pepper 09:41, 5 August 2006 (UTC)[reply]
No, δαμ δι δαμ is closer to the American pronunciation. I don't think the Southern English pronunciation can be represented in Greek, but δυμ δι δυμ is exactly right for Northern dialects of English. — Haeleth Talk 17:16, 8 August 2006 (UTC)[reply]

In all seriousness, I always associated it with quietly singing a song to yourself to pass the time. Sort of in the way young children do; just kind of making up a tune. I've heard it used to denote boredom ("Well, next bus is in an hour... dum de dum..."), along with a certain sort of naive stupidity in the manner of Forrest Gump. ("So he just crosses the street without looking - dum de dum de dum - and gets hit by a bus.") I've also heard it in a "do de do" type form, particularly for the latter, that's probably the same thing. --ByeByeBaby 06:34, 5 August 2006 (UTC)[reply]

No, it's from Genesis 5:2 (5th album "Selling England by the Pound", 2nd song "I Know What I Like (In Your Wardrobe)"), which starts with "Its one o'clock and time for lunch, (dum de dum de dum)". The gardener takes a break and starts humming. So it's a sound of being content. DirkvdM 07:24, 5 August 2006 (UTC)[reply]
Huh. I always figured it was just the theme from Jaws, you know, the music that plays right before the shark is about to chew someone up. I've heard it used in terms of boredom before, but where I am from we usually just hum the Jeopardy music that they play during the final question: "Dooh, do, dooh, do, dooh, do, dooh, DOOH! Dah-do-do-do, dooh...dooh...dooh". In terms of slowness/stupidity, people are starting to say "dee dee dee" alot nowadays because of a comedian they have on television now, Carlos Mencia, I think is his name. Never really liked his acts much though. --69.138.61.168 08:08, 8 August 2006 (UTC)[reply]

English names ending in "ett"[edit]

What if any is the common origin of English names ending in "ett", like Barnett, Bennett, Burnett, Crockett and Hewlett? Thank you. (August 5 2006, 08:10 UTC)

I believe this was asked before and I believe the answer was that the suffix has a French origin, the French 'ette' being a French diminuitive (or female, as in 'Jeanette'). DirkvdM 10:49, 5 August 2006 (UTC)[reply]
Seemingly, but apparently not for all surnames. 惑乱 分からん 12:26, 5 August 2006 (UTC)[reply]
What about "ott" as in Abbott, Arnott, Harnott & Arrott jonocorry 04:00, 5 February 2007 (UTC)

Naughties[edit]

If the last decade was the nineties, then what is this decade called? I just read the term 'naughties', which I found amusing enough to start using it myself. But I doubt that's the official term. DirkvdM 10:46, 5 August 2006 (UTC)[reply]

Isn't it simply "the 2000s"?
There is no official term. Unlike some other languages, English does not have a language academy. --Ptcamn 11:28, 5 August 2006 (UTC)[reply]
The naughties/noughties sounds nice enough... ;) 惑乱 分からん 12:27, 5 August 2006 (UTC)[reply]
Does not knowing that any language was regulated make me a moron? --mboverload@ 12:41, 5 August 2006 (UTC)[reply]
No, but don't worry, I'm sure there are loads of other thngs that do. :) DirkvdM 17:12, 5 August 2006 (UTC)[reply]
lol, that was cold =D --mboverload@ 05:33, 6 August 2006 (UTC)[reply]
Back in the 1990's, there was a lot of speculation on alt.usage.english and other forums as to what the decade would be called, and now we're halfway through it, and we still don't know. "The 2000's" will be ambiguous as soon as the 2010s become relevant... AnonMoos 15:03, 5 August 2006 (UTC)[reply]
Anyway, the preferred spelling seems to be "Noughties" -- see 2000s#Names of the decade AnonMoos 15:05, 5 August 2006 (UTC)[reply]

come on dirkvdm with all your free-society rhetoric, I know you know the nuances of the definitive/prescriptive language debate.

Are you talking to this DirkvdM? Am I guilty of 'free-society rhetoric'? What does it mean anyway? Or 'the definitive/prescriptive language debate' for that matter? DirkvdM 17:17, 5 August 2006 (UTC)[reply]
you're anti-xenophobic, you're into open source, dollar voting, democracy, things like that. I would call that "free society rhetoric"..nothing wrong with it....that's just the perception I get of you. If you're American and I had to wager, I'd bet you've voted at least once for a libertarian.
In any case, I was also betting that you would be very much so on the side of definitive linguistics wherein a language is defined as how it's used in reality, as opposed to prescriptive linguistics where a language is defined by an institution (dictionary publisher or government academy).
Have you been keeping an eye on me? That first sentence is quite right. However, I'm very Dutch. I suppose you mean by 'libertarian' what I call 'liberal' (in both the social and the economic sense - I have voted VVD and D66, but I now vote GroenLinks for the sake of the climate).
But I'm also very much into standardisation and logic. In technology (and clothing and whatever), but also in language. So I suppose I'm on the prescriptive side. Who should prescribe those rules is a different matter. For example, I try to stick to English English because that's closest to me. So circumstances picked the standard for me. The main thing is to be consistent. Ultimately, everyone should preferably use the same words for the same things. Whatever that is, but also preferably in a logical context. So if previous decades were called the eighties and nineties and such, then this decade should be called the 'zeroties', but since that sounds awful 'noughties' will do quite well. Although 'naughties' sounds a bit tempting. :)
You edited here as 74.227.197.63, but that ip address hasn't done many edits, so did you forget to sign in or do you have a rotating ip address? Who goes there? DirkvdM 07:00, 6 August 2006 (UTC)[reply]
I propose we cease referring to any decade as "the ____ies" in order to solve this problem. We can simply buy records called "Best of the Period Between 1960 and 1969". Therefore we call this "the period between 2000 and 2009". Taiq 13:59, 6 August 2006 (UTC)[reply]
There is no problem when you can't define a very young person by the number of his/her decades. A man in his forties, yes ; a teenager ... exists also : why no teenies for the next 10 years ? For the moment being, some Bush era, google era or anything else [era] shall prevail one day, like les années folles in the 1910's in France. --DLL 18:18, 6 August 2006 (UTC)[reply]
In Dutch we do have words for people at other 'age-decades' - after 'tiener' we've got 'twintiger', 'dertiger', etc. In English that would sound a bit strange - twentager, thirtager. They are almost the same as the Dutch words, but somehow the don't 'work' in English. Btw, funny, my spellchecker recognises 'twintiger' as an English word, and so it is. :) DirkvdM 06:37, 7 August 2006 (UTC)[reply]
How about using scientific notation, so that we have "the 2.00 × 10³s" for the years 2000 to 2009, "the 2.0 × 10³s" for the years 2000 to 2099, and "the 2 × 10³s" for the years 2000 to 2999? ;-) —Bkell (talk) 05:48, 12 August 2006 (UTC)[reply]
No matter what you decide, you're just gonna have to argue about it again when we get to the tens.  freshofftheufoΓΛĿЌ  17:51, 12 August 2006 (UTC)[reply]

Limits of Historical Linguistics[edit]

Hi. I have a somewhat technical question. There seems to be an assumption (I have read it in several places, including on Wikipedia I think) that one can't work out what languages sounded like more than 10,000 years ago. However, surely the only justification for this belief is glottochronology, which seems to be discredited, as otherwise one could just go through more sound changes (obviously they would be harder to be certain of, but still...) In fact, one of the things that strikes me about some core vocab is how stable it is. To use a particularly stable example, the PIE word for name has an n and an m and keeps the in the same order in nearly every daughter language List_of_common_Indo-European_roots#h.E2.82.81.2C_e.2Fo. So why do some people think that all languages started out as one, but it is impossible to reconstruct it?--Estrellador* 18:12, 5 August 2006 (UTC)[reply]

As far as I know, what's discredited about glottochronology is that it can be used to reliably indicate how long ago two related languages parted ways. The idea that historical reconstruction can only take us so far and no further is not discredited, although I suspect most historical linguists would be reluctant to put a specific figure on the number of years it can take us. The reason most say it's impossible to reconstruct "Proto-World" is simply that our knowledge of "existing" (if you will) proto-languages is already spotty enough. We have a fairly good idea of what Proto-Indo-European was like, but there are still gaps. We also have a fairly good idea of what Proto-Semitic was like, but although we know it has to have descended from Proto-Afro-Asiatic, our knowledge of that is even spottier. But what we do know about PIE and PAA is enough to show us that we can't reconcile the two and reconstruct a proto-proto-language to be the parent of both of them. In other cases, it's not so clear; there's still a debate as to whether Proto-Turkic and Proto-Mongolic can be reconciled and derived from a Proto-Altaic or not, and so there is correspondingly little information about what Proto-Altaic may have looked like if it did exist. Trying to reconcile Proto-Altaic with PIE or PAA is even more hopeless. And so on. That answers the second part of your final question. The answer to the first part of the question, "why do some people think that all languages started out as one" is that if they didn't, then that means language arose independently in different populations of early humans at different times, but yet managed to reach every single human society, which is just too unlikely to be believed. All human societies have grammatically complex languages that follow certain linguistic universals; the most parsimonious and plausible assumption is that this facility arose only once during the evolution of human beings. It would just be too weird if it arose multiple times in multiple locations around the world, and yet still had as much in common as human languages do. User:Angr 18:35, 5 August 2006 (UTC)[reply]
It's very unlikely that an IE language has many common roots with other families, and if such roots were found in other families, it would be something very vague, like one vowel or something. There's just far too much variation in languages. - THE GREAT GAVINI {T-C} 18:48, 5 August 2006 (UTC)[reply]

Thanks for the extremely prompt answers. However, I am not quite sure I understand how the argument works. Surely either all languages began as one, in which case we would be able to see some similarity, such as shared vocabulary, or they didn't, in which case we wouldn't. Maybe the shared features (which are fundamental, true - all langs, or so I have read, have pronouns, nouns and verbs, and I have read about certain IFTHEN features) are really a system of categorisation, else why would there be so little similarity in anything except grammar - which appears to be heavily dependent on a few set assumptions like S, O and V order for all its smaller quirks? Really, IMHO, spontaneous generation in one place seems equally as likely as that in multiple places, as once is odd enough that several times makes no difference, particularly as the langs don't seem to share vocab, which is what has been used to reconstruct all the proto-languages so far. Surely if one person is necessary to generate the language, the others all have to have the capacity to absorb it, and, are thus just as capable of spontaneous generation, given enough time? Maybe I don't understand linguistic universals well enough, in which case I would be happy to be referred to some good sources. It just seems, as I say, odd that languages are all postulated to come from one common ancestor when no vocabulary, which as I say seems moderately stable, survives. --Estrellador* 20:46, 5 August 2006 (UTC)[reply]

Have a look at the extensive article at Zompist, which explains very well the problems in finding linguist roots... basically, because of the changes that languages have suffered in the past few thousand years or even less, words with a common ancestor can be as distinct as (Hindi)"chah" and (English) "six", whereas words without a common ancestor can be as similar as (Japanese) "so" and (German) "so". Junesun 21:16, 5 August 2006 (UTC)[reply]
Or possibly Japanese miru and Spanish mirar...although I think that is just a coincidence... - THE GREAT GAVINI {T-C} 07:00, 6 August 2006 (UTC)[reply]

Thanks for all the responses. They have been helpful. --Estrellador* 16:13, 13 August 2006 (UTC)[reply]

I recall something about the similarity of the word for "water" in many languages. Someone told me that there are only six words reconstructed for proto-world. One of them is finger: *tik AEuSoes1 21:48, 13 August 2006 (UTC)[reply]

Is Emperor capitalized?[edit]

It was built by the German emperor Geiliom II who visited the city in 1898.

This question looks familiar - I'm experiencing déjà lu... - THE GREAT GAVINI {T-C} 19:35, 5 August 2006 (UTC)[reply]
There seem to be different conventions on this. I capitalize titles when referring to a specific person but not when referring to the position in general: "Of the many Germans emperors, Emperor Gailiom was the only one who visited the city in 1898." StuRat 22:17, 5 August 2006 (UTC)[reply]
There was no German Emperor Geiliom II or Gailiom. There was, however, a Wilhelm II. User:Zoe|(talk) 22:37, 5 August 2006 (UTC)[reply]
Geiliom looks like somebody's unusual idea of how to spell the French equivalent Guillaume. JackofOz 01:27, 6 August 2006 (UTC)[reply]