Talk:Universal Character Set characters

	Writing portal This article falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.Writing systemsWikipedia:WikiProject Writing systemsTemplate:WikiProject Writing systemsWriting system articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Korea[edit]

OP has been blocked.

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Perhaps a bizarre coincidence, but Korea suffers from overconfident map-makers the most. Scrabbles on Wikimedia Commons and lack of updates to the legend trashed any value of the present picture with respect to any part of Korea. Incnis Mrsi (talk) 21:06, 15 August 2019 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

The BMP[edit]

The cited text has long since been changed

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

This article states, "The characters outside the first plane usually have very specialized or rare use."

I disagree. Does the increasingly popular use of emoji, many of which are defined outside of the BMP, really constitute "specialized or rare use"? The current design of many Web tools (such as Java and JavaScript), which use UCS-2 or UTF-16 encoding, makes support of emoji awkward, sometimes needing a clumsy Unicode feature called "surrogates", yet this fact does not yet seem to be reflected in Wikipedia articles as yet.

In general, Unicode has quirky programming solutions, few of which work in general. Unicode can construct arbitrarily complex graphemes, which cannot be recognized simply or elegantly in programming languages such as JavaScript. David Spector (talk) 13:14, 9 September 2019 (UTC)[reply]

The statement is true if you consider that while there are a few dozen emoji and emoticons in the SMP, there are tens of thousands of mostly rare hanzi/kanji in the SIP and TIP, and over a hundred historic scripts in the SMP, including thousands of Egyptian hieroglyphs and Tangut character, etc. Nevertheless, it would be useful to qualify the statement along the lines of "The characters outside the first plane usually have very specialized or rare use, although there are some characters such as emoji and emoticons which are widely used on social media." BabelStone (talk) 13:35, 9 September 2019 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Universal Coded Character Set vs. Universal Character Set[edit]

There's another Wikipedia article called Universal Coded Character Set. Which name is correct, that one or the one here (Universal Character Set)? The acronym certainly looks like it comes from Universal Character Set. Mcswell (talk) 03:12, 23 April 2021 (UTC)[reply]

Good catch - thank you, Mcswell! Indeed, it looks like they should be merged, so i'm adding merge tags to both, linking to this discussion. ◅ Sebastian 11:40, 21 June 2023 (UTC)[reply]

Welcome to the world of Character Set standards, where every decision that made sense at the time got frozen into immutability and now only makes sense to people who know the (usually tedious) historical background.

The title of the official ISO/IEC standard is
"ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)"
so the offical name is "Universal Coded Character Set" and the official abbreviation for that four-word name is the Three-Letter Acronym "UCS". This leads to 10646 often (perhaps usually?) being called the "Universal Character Set".

(Mind you, this is an ISO standard, where "ISO" stands for "International Organization for Standardization". You have to stop expecting these things to make sense.)

So "Universal Coded Character Set" is the correct title for our article about ISO/IEC 10646. We could rename this article to (for example) Unicode characters (which is currently a redirect to this article), but the current name is technically valid, even if slightly confusing. FWIW, I slightly prefer the current name.

OTOH, I would strongly oppose merging this article with "Universal Coded Character Set". The two article are both long, and they cover related but separate topics: what's in 10646 (and Unicode), vs 10646 as a whole.

Cheers, CWC 05:11, 17 January 2024 (UTC)[reply]

Closing, given the uncontested objection with stale discussion; no merge. Klbrain (talk) 10:49, 17 February 2024 (UTC)[reply]