Talk:Precomposed character

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Post[edit]

so to say, Chinese characters are precomposed characters, as they can always be decomposed to their elementary radicals and strokes. --Abdull 20:48, 23 December 2005 (UTC)[reply]

This page needs a simple example, e.g. E with acute accent. Richard Donkin 09:01, 29 January 2006 (UTC)[reply]

Unicode[edit]

Why has Unicode many code points for precomposed characters assigned? --84.61.63.235 09:28, 17 October 2006 (UTC)[reply]

The official reason is to provide backwards compatibility with preexisting character encodings. That is, if in some character encoding (with which Unicode is meant to be backwards-compatible) there is a character assigned to a certain "precomposed" letter, they will also include it in Unicode, so as to provide a one-to-one correspondence between the original encoding and its "Unicodified" form (so that one could take the text encoded in Unicode and unambiguously convert it back to its original encoding). This was meant to entice users into switching to Unicode, but on the other hand it actually defeats the original intention behind Unicode (which was to use decomposed characters as the standard encoding), and is also unsatisfactory in that for example some accented letters have their own Unicode points assigned (which means Unicode fonts that cover their range are very likely to include a precomposed glyph for them, which guarantees their correct rendering), while others (particularly frequently when dealing with rare characters necessary for the orthographies of some minority languages or of newly devised orthographies for previously unwritten languages, which are the less likely to have been encoded in old character maps) do not enjoy that privilege (which means most fonts will not include a precomposed glyph for them, and so these characters will be improperly rendered on most computer displays, which currently do not satisfactorily support diacritic composition in the least: as an example, compare the likely proper rendering of the precomposed letters Ã/ã, Ẽ/ẽ, Ĩ/ĩ, Ñ/ñ, Õ/õ, Ũ/ũ, Ỹ/ỹ, with the very likely improper rendering of the non-precomposed letter G̃/g̃, which means you can't satisfactorily type the standard orthography of Guarani on a modern computer). The absence of precomposed characters that are necessary for the actual orthographies of some languages, also unnecessarily complicates text editing and word processing tasks. All of which poses a big burden against the use of certain minority languages in the computer age (as compared to the ease of use of languages such as English whose orthography poses no problem whatsoever to be typed correctly on a computer); quite the contrary to what Unicode was conceived and supposed to do (which was to facilitate, rather than hinder, the computerized use of any language, including neglected minority ones). 213.37.6.23 (talk) 15:24, 24 April 2008 (UTC)[reply]

reads like an advert[edit]

I removed the following line for the above reason:

The Cocoa text system that is an central component of Apple’s Mac OS X operating system and the Safari web browser has advanced Unicode support and handles such examples easily.

It exclusive spotlight on a single system implies other systems don't work as well. Nevertheless, I don't see why this is relevant here. 195.24.29.51 08:04, 16 May 2007 (UTC)[reply]

Example with combining characters renders wrongly on Chrome[edit]

On Chrome (and I suspect on other WebKit-based browsers as well) the example with the combining characters doesn't render correctly.

The combining marks are not rendered on the base characters but slightly to the right and in a wrong horizontal position as well.

I strongly suspect that this is based on the fact that the diacritic marks are in their own span elements which probably leads to them being rendered separately on those browsers. And I also suspect that this is correct according to the relevant specs, because I don't see a reason to combine characters that are from two different logical text units.

That's unfortunate, because the color-highlighting adds a lot, but the wrong rendering gives a wrong impression (that the pre-combined characters do not render the same as the combining characters, even on correct implementations).

Maybe images showing the correct rendering should be used instead or in addition to the text. That would have the added benefit of being legible even on completely broken browsers.

Also note that Firefox 7 on Ubuntu, for example, 'does' render the text correctly but doesn't color the combining characters differently. — Preceding unsigned comment added by 90.146.115.243 (talk) 09:25, 10 October 2011 (UTC)[reply]

Rentar (talk) 09:23, 10 October 2011 (UTC)[reply]

Agreed. The Unicode standard states that a combining character applies to the glyph immediately preceding it, which it does not if it's in a separate HTML element. Unless there's an objection, I'll take the oranginess out of the combining character examples. Sneftel (talk) 21:44, 20 August 2013 (UTC)[reply]

Unicode Consortium must change their politika![edit]

Per arguments above, U.C. must add precomposed characters for really exist languages in code table (but not lot of mindless arrows, dingbats and emoticons, as they do). --Jugydmort (talk) 20:19, 22 September 2012 (UTC)[reply]

Use of color[edit]

Except for the different colors, the two solutions are equivalent and should render identically.

In some situations, the precomposed green k, u and o with diacritics

Assumes that the reader is not blind, color-blind, or reading a monochrome rendering of the article. Violates WP:COLOR. Enoent (talk) 14:18, 6 August 2015 (UTC)[reply]

The colors are not necessary for understanding, since the prose describes which characters have diacritics. A more fundamental question is whether the article should contain what is essentially a conformance test of the reader's browser's text renderer. Sneftel (talk) 11:43, 7 August 2015 (UTC)[reply]