Talk:Grapheme

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Wiki Education Foundation-supported course assignment[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Ephemeralives.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 22:38, 16 January 2022 (UTC)[reply]

Grapheme & Letter Case[edit]

I was confused after reading this article about graphemes and letter case. I searched the web a bit and ended up more confused. Maybe there's someone who knows about this that can clarify the point in this article. Is the letter 'a' a single grapheme, or is the lowercase 'a' one grapheme and the uppercase 'A' another grapheme? PlaysWithLife (talk) 08:43, 6 June 2009 (UTC)[reply]

PWL - I'm not absolutely sure, but I believe neither version is a grapheme; it's the "fundamental" letter which is a grapheme, and this can be written in various different ways. A and a are both versions of the same letter, which is "ae". So I suspect A and a are different glyphs of the grapheme, just like it will look different in two different fonts. -- Shimmin Beg (talk) 15:19, 19 February 2010 (UTC)[reply]
Further to your question:
-- Shimmin Beg (talk) 15:32, 19 February 2010 (UTC)[reply]

But "the times" and "the Times" don't mean the same thing, nor do "polish" and "Polish", so I would say caps and lowers are different graphemes. But I don't have a source to refer to.Linguistatlunch (talk) 18:10, 28 December 2012 (UTC)[reply]

Grapheme vs character[edit]

In computing and telecommunications, a "character" is either a representation of a grapheme or is another grapheme-like unit as required for text processing, and it often manifests in encoded form. There is often not a one-to-one relationship between characters and graphemes. Unicode has many non-graphic/'control' characters, for example, and it has a half-dozen hyphen (or overloaded hyphen/dash/minus-sign) characters, each with different behaviors and histories, whereas the Latin script has only one hyphen grapheme.

The writing system article says that "character" is synonymous with grapheme. That article uses the term character — referring to a grapheme, not a computing/telecom character — quite a bit. I would like to know if experts in the study of written languages use the term character and grapheme interchangably like this. If so, it should be mentioned here in the grapheme article. Any comments/info appreciated. Thanks — mjb 9 July 2005 03:31 (UTC)

Of the two terms, grapheme is the one with a more rigorous and technical definition (base or atomic unit of a writing system), and which can be more readily used in descriptions of writing systems in general, since it was coined for that purpose (as a parallel to phoneme, in linguistic study of spoken languages). By contrast, the term character seems to be in use mainly for specific sub-fields of writing system studies, most notably perhaps in Asian writing systems (cf. Chinese character). When used this way character may be (nearly) synonymous with grapheme, although it is likely that this is not adhered to exactly, and furthermore when used in the study of an individual writing system such as Chinese, that character has a particular meaning ascribed to it which has come about to meet the specific needs of that discipline, and may not be completely transferable in that sense to studies of other writing systems. I have further expanded the terminology section in the writing system article to try and capture these notions.--cjllw | TALK 08:26, 2005 July 11 (UTC)
In typography circles, typographers, type designers, typesetters and graphic designers refer to typographic characters as letters, glyphs or characters. The term "grapheme" has been used in the font article as a synonym for glyph or character.
"Grapheme" is certainly a technical and rigorous definition, but whoever put it into the font article is coming from a writing system viewpoint alien to typographers, designers and and lay users of typography, and quite inappropriate for writen discourse on typography.
Wikipedia is supposed to be a general knowledge resource that demystifies, rather than obscures its subjects for the benefit of lay readers. On that basis "grapheme" is inappropriate for any of the typography articles. I am replacing instances with glyph or the next most appropriate word, wherever appropriate. Arbo 06:19, 18 April 2006 (UTC)[reply]

First, about the grapheme article. If it is true that «Different glyphs can represent the same grapheme», then it is obvious that «Not all glyphs are graphemes». I cannot understand what is the meaning of the following phrase: «Not all glyphs are graphemes in the phonological sense». Which is the «phonological sense» of the «grapheme»? Have «Chinese characters, numerals, punctuation marks, and all the individual symbols of any of the world’s writing systems» any phonological sense?

Second, about Grapheme vs Character. In Linguistics, character is the only general and neutral term for refering to any graphic element of writing systems. Grapheme — like lexeme, phoneme, semanteme, sememe, and so on — is evocative of certain structuralist-functionalist schools; therefore, there are a lot of linguists who do not use it at all.Walad1913 (talk) 13:07, 2 March 2008 (UTC)[reply]

Remove EB pronunciation link[edit]

I have removed the link to the Merrian-Webster pronunciation link. We can easily provide our own free version and host it here. Superm401 - Talk 09:58, 8 November 2007 (UTC)[reply]

Boks egzample[edit]

It says that box, using three graphemes, represents a word containing four phonemes, /b/, /o/, /k/ and /s/. But isn't /ks/, being an affricative, one phoneme? So box would represent /b/, /o/ and /ks/? —Preceding unsigned comment added by 67.212.110.120 (talk) 22:28, 29 April 2009 (UTC)[reply]

Good question. This is certainly not a good example. Moreover, it is not in the reference given. I just had tagged this as such, along with a number of other statements that appear to be original research. After reading your question, and seeing that it hasn't been answered for over 2 years, I will simply remove these. (See also below) — Sebastian 23:03, 28 November 2011 (UTC)[reply]

Characters (letters) versus graphemes[edit]

I understand a grapheme to be a letter (character) or group of letters that correspond to one sound (one phoneme). I understand that some use 'character' and 'grapheme' interchangeably, but i don't think this is a useful definition.

Using my preferred definition, in a word like SHIP, there are three graphemes: SH, I, and P. These correspond to the three phonemes "sh", "i" and "p".

  • Where single letters correspond to single phonemes, e.g. the letters in BAT are also three separate graphemes, that correspond to the three phonemes "b", "a", and "t"), I call them single-letter graphemes.
  • SH I call a two-letter grapheme (or a digraph).
  • TCH (as in the word WATCH) I call a three-letter grapheme (or digraph).
  • The A_E in a word like BAKE, where the A and E together correspond to the long A vowel phoneme, I call a split-grapheme.
  • X is a grapheme, but it is something of an exception - it is the only grapheme that corresponds to more than one sound: X corresponds to the two phonemes "k" and "s" (which I understand to be two separate phonemes).

This website uses a similar definition to the one I use: http://www.literacytrust.org.uk/Database/Primary/phonics_definitions.html

I would like to see a change to the wiki page to reflect this, though I wanted to discuss it first to see what others think. SCPritchard (talk) 23:08, 19 January 2010 (UTC)[reply]

A couple of points to add here, SCPritchard. The first one is a gentle reminder; Wikipedia is supposed to be an encyclopaedia, and so changing a page to your own preferred definition isn't acceptable. If you can find a reputable source that supports this explanation of graphemes, please do add it.
Secondly, I think you've misunderstood something here. Admittedly, the page isn't that clear at the moment. The grapheme is basically the smallest independently meaningful unit within a writing system. In English, which is alphabetic, one grapheme typically corresponds to one phoneme (though the actual phoneme might vary, as s in slug vs. lugs, or o in lock vs. look), or several graphemes can correspond to a phoneme (as in sh, tch and so on). However, this depends on the writing system in question. Graphemes in a syllabary correspond to a whole syllable (so Japanese カ corresponds to ka, Korean 글 corresponds to geul). Heiroglyphs and other logograms correspond to a whole word. -- Shimmin Beg (talk) 15:19, 19 February 2010 (UTC)[reply]
You're right about Wikipedia, Shimmin Beg - but that goes both ways. Turns out, there is currently no source for the claim that "SHIP" has 4 graphemes, either. I guess one can see it either way, depending on what one wants to research. To me at least, both SCPritchard's view and yours seem plausible. (BTW, even English can be rendered in a way such that "SH" becomes one grapheme, see i.t.a..) So, ideally, we might be able to say that some people count "SHIP" as 4, and others as 3, but we would need a reference for that, too. Failing this, I will simply remove this sentence, along with the one about "BOX" (see above). I'll leave the other original research for now, since it hasn't been questioned until now. — Sebastian 23:17, 28 November 2011 (UTC)[reply]


I realize this section hasn't been active in years, but it interested me. Particularly the statement that the "x" is unique in that it's the only letter that represents more than one sounds. This isn't strictly true, I thought I would share my thoughts, in case anyone else was curious like me; and in case such someone decides to add the statement to the main article.
For consonants it appears rarer but there are some.
"G" or "J" as in "George" or "Jam" is technically "dʒ" which is more than one sound as in /dʒɔːdʒ/ and /dʒæm/.
It seems more common in vowels, depending what you class as one sound of course, but usinf the phonetic symbols we use here.
Y or I can be 2 sounds. Fly is /flaɪ/. Pi is /Paɪ/
O can be "oʊ" as in go /goʊ/
Most words are shorter to write phonetically than normally but this makes some longer, like Biology = /baɪ'ɒlədʒɪ/. unit = /'ju:nɪt/ there are probably more examples of letters I can't think of.  Carlwev  14:50, 17 June 2015 (UTC)[reply]

different symbols which are the same grapheme in one language but separate graphemes in others[edit]

To illustrat what this is about could we (should we) mention where different symbols are the same grapheme in one language but separate graphemes in others - this is similar to the situations with phonemes not being equivalent across different languages. An example I know is that a crossed Z is usually an alternative to a normal Z, and understood as equivalent in English or other languages (it's commonly used particularly by mathematicians to distinguish it from a number 2). In Polish a crossed Z is equivalent to one of their "zh" sounds (either the dotted or acute accent Z - don't remmeber which), and therefore distinct from an uncrossed Z. I have seen printed advertisments using fonts with both symbols. —Preceding unsigned comment added by 88.212.36.193 (talk) 19:17, 22 May 2011 (UTC)[reply]

Brackets for graphemes[edit]

I put some system in the mess of different layouts, which included single quotes, double quotes, italics, and slashes. Now the article uses:

  • Angle brackets <> for graphemes (as per the Cambridge Encyclopedia of Language);
  • Slashes // for phonemes;
  • Double quotes "" for quoted letters.

Actually, for angle brackets I was using less-than and greater-than signs <>. Another user used real angle brackets ⟨⟩ for the same purpose here, which I like, because it looks good on my monitor and doesn't require Nowiki, as e.g. <s> does. The reason why I didn't do that was that I'm not sure if it displays well for all readers. What do others think - should we use those instead? — Sebastian 00:24, 29 November 2011 (UTC)[reply]

I made de:Vorlage:Graphem in the German Wikipedia which uses the preferred brackets by default. Nobody complained yet, although I know fonts which produce bad spacing around them. I didn’t have the time yet to put the same amount of work into the English articles that I have put into de:Graphem, although they certainly need it – very much. — Christoph Päper 08:10, 29 November 2011 (UTC)[reply]
Wow, that looks like you put in a lot of work! And btw, I'm impressed with de:Graphem, too; I think we all here could learn from that article; in particular in light of the previous two discussions. Now I have one question about the template: When are the optional parameters actually used? (Or not used, as when even the first parameter is omitted, resulting in a gamma.) If I may have a wish, it would be that the test&demo section contained some text for each example, explaining what it's good for. — Sebastian 18:15, 29 November 2011 (UTC)[reply]
I’ve expanded the documentation since. — Christoph Päper 20:54, 29 August 2013 (UTC)[reply]

Has any progress been made on this? I've noticed on several pages that there are some issues when selecting text or clicking links on lines immediately above and below lines containing angled brackets. I haven't had a chance to test this on another computer, but I can't imagine that mine is the only one affected. This problem does not exist on the German page. Jaxcp3 (talk) 15:59, 29 August 2013 (UTC)[reply]

Definition[edit]

I don't think this is a very well established concept. Or at least there's a fair amount of confusion in the lit. David Crystal (A Dict. of Linguistics and Phonetics, 2008, Blackwell) states that A, a, A, a, A, a etc. are allographs of "the letter A", which is the grapheme ⟨A⟩. He says this because capitalization is context-dependent: you capitalize proper nouns and the beginning of a sentence. (And indeed, the choice of which words to capitalize may vary with convention, and the word ⟨WORLD⟩ in a headline is the same word as ⟨world⟩ or ⟨World⟩.) However, R.L. Trask (Language and Linguistics, 2007, Routledge) states that ⟨A⟩ and ⟨a⟩ are distinct graphemes, and that a "more sophisticated" analysis "might prefer" to set up additional graphemes, such as digraphs (⟨sh⟩, ⟨ea⟩, etc.) – Meyer (2009) says "The word though ... has six graphemes but only two phonemes", but he's quite unsophisticated, equating graphemes with letters; Evans writing on Kayardild, on the other hand, speaks of digraphs such as rd and th. Trask even says that the positional variants of Arabic letters are distinct graphemes, which strikes me as odd.

Bussman (1996) & Malmkjær (2002) aren't detailed enough to be useful; ELL2 articles use the term, but it isn't defined anywhere. The OED even says that the allographs of ⟨f⟩ are f, ff, F, Ff, gh, ph, Ph, but I think they're confusing grapheme with phoneme.

Collinge (ed, An Encyc. of Language, 1990) notes that there is a fair amount of variation in use: for example, some linguists equate "grapheme" with "letter", which the author thinks is not a useful definition as it excludes digits, punctuation, etc. and thus would not constitute a writing system. (They mention Stetson (1937) 'The Phoneme and the Grapheme' in Mélanges de linguistique et de philologie offerts à Jacques van Ginneken, which might be worth checking out.) There may be some inconsistency there too, though: They speak of Amharic as having 231 CV graphemes, since the vowel marks are diacritics analogous to ç, ø, etc, yet they then say that the fidel are arranged according to the order of the consonant graphemes, which serve as a filing alphabet: do they mean zero-marked consonant letters? Or each of the 33 rows in the syllabic chart? For Nagari, they speak of conjunct "graphs", suggesting they do not analyze these as distinct graphemes, but as positional variants. For English, they speak of the 26 spelling graphemes (as opposed to digits, punct, space, etc. graphemes), suggesting they see caps as allographs. They certainly don't mention Arabic and Hebrew positional variants as distinct graphemes.

Pennington (ed, 2007) says "Spanish and Serbo-Croatian, for example, have a highly consistent and reliable set of grapheme-to-phoneme correspondences, in which each letter generally corresponds to only one phoneme", and Comrie, "Some other languages with an alphabetic writing system, such as Finnish, have an almost complete correspondence between grapheme and phoneme." This would suggest they do not consider caps to be distinct graphemes, though they don't actually *say* that: they might be ignoring caps as a simplification. (You also don't have the Polish–polish problem in those languages.)*

Anyway, I don't think we can give a clear answer as to whether caps are distinct graphemes or not. It seems to depend on the analysis. Likewise digraphs. (Ligatures are, I believe, always considered graphemes, however.) On the other hand, the idea that a grapheme is a letter, while found in the lit, is roundly rejected as unhelpful. — kwami (talk) 20:11, 28 December 2012 (UTC)[reply]

* Wait, I don't think this means anything. If the New Yorker capitalized 'proper adjectives' like French and Polish, and the Atlantic Monthly decided not to (french, polish, so that the "polish" had two pronunciations, analogous to read or lead), would that mean that P and p were distinct graphemes in the New Yorker, but not in the Atlantic Monthly? And in Swift's day, both were capitalized, yet we don't consider Gulliver's Travels to be written in a different alphabet. I don't think we can call things "graphemes" when the difference depends on conventions like these that do not define the writing system. — kwami (talk) 20:24, 28 December 2012 (UTC)[reply]

Well, different usages of the term are quite expected, since there are several schools of thought. Some still consider the grapheme to be a dependent representation of the phoneme in a different medium. The more modern and better accepted definition is independent, i.e. the smallest graphic unit in linguistics distinguishing but not carrying meaning. Like with phonemes you identify these with minimal pair analysis. Since Polish and polish make a minimal pair, ‘P’ and ‘p’ – and by extension uppercase and lowercase letters in general – have to be different graphemes in English. And because there is no difference in meaning between colour and color ‘ou’ and (second) ‘o’ must be part of the same grapheme.
Some scholars abstract this a little further. They postulate functional graphemes that have no visible corpus by themselves but act on other graph(eme)s. ‘P’ and ‘p’ are allographs then – as with the dependency hypothesis in phonocentric schools – and ‘Polish’ has a functional grapheme at its beginning that changes the following letter to uppercase. (In German, for instance, this is much more relevant.) This would be an orthographic functional grapheme (sometimes dubbed orthographeme), whereas you could also have (grapho)stylistic reasons, e.g. within song titles or headlines.
Ligatures may be orthographic (encyclopedia / ~paedia / ~pædia) or stylistic / typographic (fish / fish), too. (⇒ *typographeme?)
Despite this, grapheme is often recognized to be a sloppy synonym for other terms. Daniels/Bright (1995), for instance, acknowledge in their glossary that a grapheme was a “term intended to designate a unit of a writing system, parallel to phoneme and morpheme, but in practice used as a synonym for letter [= ‘a self-contained unit of an abjad, alphabet, or abugida’], diacritic [= ‘a mark added to a character to indicate a modified pronunciation (or sometimes to distinguish homophonous words)’], character [= ‘conventional term for a unit of the Chinese writing system in East Asian scripts’], or sign [= ‘conventional term for a self-contained unit of cuneiform script’]” — Christoph Päper 11:12, 30 December 2012 (UTC)[reply]

Allography: Typography[edit]

If anyone is still watching this article, I would appreciate a review of Allography#Typography, which I wrote recently. (It may be that the citations I've added there can be used here too?) --John Maynard Friedman (talk) 17:50, 24 November 2019 (UTC)[reply]

Comments invited on new template: Linguistics notatation[edit]

I don't want upset any applecarts accidentally, so this is to alert editors to my proposal.

I have created a new sidebar [right] along the lines of {{contains special characters}} for articles that are mainly about graphemes but inevitably include information about IPA symbols and sometimes phonemes. We must assume that most visitors will not be familiar with the ⟨x⟩, /x/ and [x] notations, so the new template will tell them where to go to find out.

I have opened a discussion section at Template talk:Linguistics notation#Invitation to comment to invite any advice, comment, observations or reservations, please. --John Maynard Friedman (talk) 21:40, 27 February 2021 (UTC)[reply]

E, Е and Ε: Allographs or alloglyphs? or none of the above?[edit]

The Latin E, the Cyrillic Ε and the Greek Ε all look alike but have different meanings and code-points. So could anyone advise, please – are these the same glyph, the same grapheme, the same something else or just mere coincidence? --John Maynard Friedman (talk) 19:59, 6 December 2021 (UTC)[reply]

Those are allographs. (P.S., alloglyphs aren’t a thing.) Editmakerer (talk) 22:41, 26 April 2024 (UTC)[reply]
Wait, they’re homoglyphs. My mistake. Editmakerer (talk) 22:44, 26 April 2024 (UTC)[reply]
If you spot your error within a few minutes, you can revert but in a case such as this, you need to use strikeout, thus (and yes, you were right: homoglyph for letters, Homograph for words). --𝕁𝕄𝔽 (talk) 17:55, 27 April 2024 (UTC)[reply]
Thank you 😅 Editmakerer (talk) 00:30, 28 April 2024 (UTC)[reply]

Glyph[edit]

Following consensus at talk:Glyph, I have edited the article Glyph so that it is exclusively about letterforms, the primary topic. It still doesn't read well so fresh eyes are welcome to improve it further, please. 𝕁𝕄𝔽 (talk) 23:51, 28 November 2022 (UTC)[reply]