Jump to content

Template:General Category (Unicode)/sandbox

From Wikipedia, the free encyclopedia
General Category (Unicode Character Property)[a]
Value Category Major, minor Basic type[b] Character assigned[b] Count (as of 12.1) Remarks
 
Letter
Lu Letter, uppercase Graphic Character 1,788
Ll Letter, lowercase Graphic Character 2,151
Lt Letter, titlecase Graphic Character 31 Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)
Lm Letter, modifier Graphic Character 259 A modifier letter
Lo Letter, other Graphic Character 121,414 An ideograph or a letter in a unicase alphabet
Mark
Mn Mark, nonspacing Graphic Character 1,826
Mc Mark, spacing combining Graphic Character 429
Me Mark, enclosing Graphic Character 13
Number
Nd Number, decimal digit Graphic Character 630 All these, and only these, have Numeric Type = De[c]
Nl Number, letter Graphic Character 236 Numerals composed of letters or letterlike symbols (e.g., Roman numerals)
No Number, other Graphic Character 888 E.g., vulgar fractions, superscript and subscript digits
Punctuation
Pc Punctuation, connector Graphic Character 10 Includes "_" underscore
Pd Punctuation, dash Graphic Character 24 Includes several hyphen characters
Ps Punctuation, open Graphic Character 75 Opening bracket characters
Pe Punctuation, close Graphic Character 73 Closing bracket characters
Pi Punctuation, initial quote Graphic Character 12 Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
Pf Punctuation, final quote Graphic Character 10 Closing quotation mark. May behave like Ps or Pe depending on usage
Po Punctuation, other Graphic Character 588
Symbol
Sm Symbol, math Graphic Character 948 Mathematical symbols (e.g., +, , =, ×, ÷, , , ). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation".
Sc Symbol, currency Graphic Character 62 Currency symbols
Sk Symbol, modifier Graphic Character 121
So Symbol, other Graphic Character 6,161
Separator
Zs Separator, space Graphic Character 17 Includes the space, but not TAB, CR, or LF, which are Cc
Zl Separator, line Format Character 1 Only U+2028 LINE SEPARATOR (LSEP)
Zp Separator, paragraph Format Character 1 Only U+2029 PARAGRAPH SEPARATOR (PSEP)
Other
Cc Other, control Control Character 65 (will never change)[c] No name,[d] <control>
Cf Other, format Format Character 161 Includes the soft hyphen, joining control characters (zwnj and zwj), control characters to support bi-directional text, and language tag characters
Cs Other, surrogate Surrogate Not (but abstract) 2,048 (will never change)[c] No name,[d] <surrogate>
Co Other, private use Private-use Not (but abstract) 137,468 total (will never change)[c] (6,400 in BMP, 131,068 in Planes 15–16) No name,[d] <private-use>
Cn Other, not assigned Noncharacter Not 66 (will never change)[c] No name,[d] <noncharacter>
Reserved Not 836,536 No name,[d] <reserved>
  1. ^ "Table 4-4: General Category" (PDF). The Unicode Standard. Unicode Consortium. March 2019.
  2. ^ a b "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. March 2019.
  3. ^ a b c d e Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal).
  4. ^ a b c d e "Table 4-9: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. March 2019. A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.

References