Chinese character structures

Chinese character structures (simplified Chinese: 汉字结构; traditional Chinese: 漢字結構; pinyin: hànzì jiégòu) are the patterns or rules in which the characters are formed by their writing units. ^[1] There are two aspects of Chinese character structures: The external structures are on the writing strokes, components and whole characters as well as their structural relations on the pure dimension of character forms. The internal structures studies the relationship between the forms, sounds and meanings of Chinese characters.^[2]

External structures

Chinese character external structure is on how the writing units are combined level by level into a complete character. There are three levels of structural units of Chinese characters: strokes, components, and whole characters.^[3] For example, character 字 (character) is composed of two components, each of which is composed of three stokes:

字 = 宀(㇔㇔㇇) + 子(㇇㇚㇐).

Strokes

Strokes (笔画; 筆劃; bǐhuà) are the smallest building units of Chinese characters. When writing a Chinese character, the trace of a dot or a line left on the writing material (such as paper) from pen-down to pen-up is called a stroke.^[4]

Strokes combine with each other in a Chinese character in different ways. There are three types of combinations between two strokes:^[5]

Separation: the strokes are separated from each other. Such as: 八, 三, 小.
Connection: the strokes are connected, such as 匕, 正, 厂, 弓, 凹, 凸.
Intersection: the strokes are intersected. Such as: 十, 丈, 車.

Components

Chinese character components (部件; bùjiàn) are Chinese character building blocks composed of strokes.^[6] In most cases, a component is larger than a stroke (i.e., consists of more than one stroke) and smaller than the whole character (combines with some other components to form a character). For example, in character "件", there are two components (亻 and 牛), each with more than one strokes, (亻: ㇓㇑) and (牛: ㇓㇐㇐㇑). In the special cases of one-stroke characters, such as "一" and "乙", a stroke is a component and is a character.

Chinese character component analysis is to divide or separate a character into components. There are two ways for Chinese character dividing, hierarchical dividing and plane dividing. Hierarchical dividing separate layer by layer from larger to smaller components, and finally get the primitive components. Plane dividing separate out the primitive components at one time. Hierarchical dividing can display the external structure of Chinese characters, while plane splitting can be regarded as omitting the higher splitting levels, and directly writing out the final separating result of primitive components.^[7]

Whole characters

A Chinese whole character (整字; zhěngzì) is a complete character. It lies at the final level of the stroke-component-character composition. According to their structures, Chinese characters can be divided into undecomposable characters and decomposable characters.^[8]

An undecomposable character (独体字; 獨體字) consists of one primitive component, which is directly formed by strokes and can not be decomposed into smaller components, for example, "一, 二, 三, 止, 正".^[9]

A decomposable character (合体字; 合體字) consists of more than one components. There are two frequently-used modes of component combination in the study of Chinese character structures: first-level component combination and primitive component combination.^[10]

According to first-level component combination, the structures of decomposable characters can be divided into 13 categories:^[11]^[12]

Left to right (⿰, 2FF0 ^[a]), for example: 部, 件, 結 and 構.
Left to middle and right (⿲, 2FF2): 衡, 班 and 辯.
Above to below (⿱, 2FF1): 要, 思 and 想.
Above to middle and below (⿳, 2FF3): 鼻, 曼 and 率.
Full surround (⿴ , 2FF4): 圍, 國 and 囪
Surround from above (⿵, 2FF5): 問, 同 and 風
Surround from below (⿶, 2FF6): 凶, 画 and 函
Surround from left (⿷, 2FF7): 匡, 匠 and 匣
Surround from upper left (⿸, 2FF8): 廣, 居 and 病.
Surround from upper right (⿹, 2FF9): 句, 可 and 氧.
Surround from lower left (⿺, 2FFA): 這, 建 and 題.
Surround from lower right (N/A)：斗 and 头.
Overlaid (⿻, 2FFB): 巫, 爽 and 承.

According to primitive component combination, the structures of decomposable characters can be divided into:^[13]

A. For characters composed of two primitive components, there are 9 different structures, as shown by the following example characters: 吕认压达勾问区凶团.
B. For characters composed of three components, there are 21 different structures, such as: 荣花型培树缠抛挺润抠捆部庶厢逞逊闾圄幽乖巫.
C. For characters composed of four components, there are 20 different structures, such as: 营蕊蓝寤嫠筐辔椁摄燃游榧额韶欧剩腐遮阔匿.
D. For characters composed of five components, there are 20 different structures, such as: 赢蒿膏寝蘧嚣篮樊搞澡缀渤漉髂齁敲酃戳魔噩.
E. For characters composed of six components, there are 10 different structures, such as: 臀翳麓瀛灌骥歌豁豌衢.
F. For characters composed of seven components, there are 3 different structures, such as: 戆麟饕.
G. For characters composed of eight components, there is 1 structure, such as: 齉.
H. For characters composed of nine components, there is 1 structure, such as: 懿.

Internal structures

In the analysis of internal structures, Chinese characters are decomposed into internal structural components in relations with the sound and meaning of the characters.^[14]

Traditional classification

In Shuowen Jiezi, Xu Shen proposed six categories (六书; 六書; liùshū) of Chinese characters, including ^[15]

Pictograms (象形; xiàngxíng; 'form imitation'), single-semantic-component characters which are drawings of the objects they represent.^[b]
Simple ideograms (指事; zhǐshì; 'indication'), express an abstract idea with an iconic form.
Compound ideographs (會意; 会意; huìyì; 'joined meaning'), combine two or more semantic components to indicate the meaning of the character.
Phono-semantic characters (形声; 形聲; xíngshēng; 'form and sound'), consist of phonetic components and semantic components.
Derivative cognates (轉注; 转注; zhuǎnzhù; 'reciprocal meaning'), two characters had similar Old Chinese pronunciations and may have had the same etymological root.
Rebus (phonetic loan) characters (假借; 'borrowing', 'making use of'), are characters borrowed to write other morphemes with similar pronunciations.

Modern classification

The traditional liushu presupposed that every internal component, usually called pianpang (偏旁), can either represent the sound or meaning of the character. But, after the long evolution of the Chinese writing systems, quite a few components can no longer effectively play the roles and have become pure form components, or pure signs. From the internal structure point of view, modern Chinese characters are composed of semantic components (义符; 義符; yìfú), phonetic components (音符; yīnfú) and pure form components (记号; 記號; jìhào). And they have formed seven categories of modern Chinese characters:^[16]^[17] ^[18]

Semantic component characters (义符字; 義符字; yìfúzì) are composed of semantic components and include:^[19]^[20]

Pictograms, such as 田 (field), 井 (well), 門 (door).
Simple ideograms, such as 一 (one), 二 (two), 刃 (blade).
Compound ideographs. For example, 拿 (take): 合 (close) 手 (hands) together to take; 掰 (break apart): 分 (separate) with two 手 (hands); 从 (follow): one 人 (person) follows another person; 泪 (tears): 氵 (water) from 目 (eyes).
Special methods, such as 叵 (cannot): turn 可 (can) to the opposite (right) side; 冇 (none, not have): 有 (have) taken away the 二 (contents).

Phonetic component characters (音符字; yīnfúzì) are composed of phonetic components.^[19] For example,

Phonetic-loan, for example, character 花 (huā, flower) was borrowed to mean 'spending' (huā).
Used in a transliterated foreign word, e.g. the characters in words 打 (dá, dozen) and 馬達 (mǎdá, motor).
Multi-phonetic component characters, for example, 新 was originally a semantic-phonetic character, but its modern meaning of "new" has nothing to do with the original semantic component of 斤 (meaning "0.5kg" in modern Chinese), though the sounds are similar. In this way, 新 (sounds "xīn") then has two phonetic components: 亲 (qīn) and 斤 (jīn). Another example, Vietname Chinese character 𢁋 (blăng;^[c] 'Moon') was created as a compound of 巴 (ba) and 陵 (lăng).^[21]

Pure form characters (记号字; 記號字; jìhàozì) are composed of form components, which neither represent the sound nor the meaning of the characters.^[22] For example:

日 (Sun): The 日 character in modern regular script is no longer round like the Sun.
广 (wide, broad): The phonetic component 黄 in the traditional character 廣 has been omitted for this simplified character.
鹿 (deer): The oracle form resembled a deer.

Semantic-phonetic characters (义音字; 義音字; yìyīnzì), also called "phono-semantic characters", consist of semantic components and phonetic components.^[23] There are six combinations:

Left meaning (semantic) and right sound (phonetic), such as 肝 (sound: gān, meaning: liver), 惊 (jīng, fear), 湖 (hú, lake);
Right meaning and left sound, such as 鵡 (wǔ, parrot), 剛 (gāng, firm), 甥 (shēng, nephew);
Upper meaning and lower sound: 霖 (lín, rain), 茅 (máo, cogongrass) and 竿 (gān, pole);
Lower meaning and upper sound: 盂 (yú, bowl), 岱 (dài, Mount Tai), 鯊 (shā, shark);
Outer meaning and inner sounds: 癢 (yǎng, itch), 園 (yuán, garden), 衷 (zhōng, heart), 座 (zuò, seat), 旗 (qí, flag);
Inner meaning and outer sound: 辮 (biàn, braid), 悶 (mèn, dull), 摹 (mó, imitation).

Semantic-form characters (义记字; 義記字; yìjìzì) are composed of semantic components and pure form components.^[24] Many of these characters were originally semantic-phonetic characters. Due to subsequent changes in the shape or pronunciation of the phonetic components or the characters, the phonetic components could not effectively represent the pronunciation of the character and became pure form. For example: ^[25]

布 (bù, cloth): used to have semantic 巾 (scarf) and phonetic 父 (fù), the phonetic component is no longer 父.
急 (jí, urgent): used to have semantic 心 (heart) and phonetic 及 (jí). Now the upper component no longer looks like 及.
鸡 (jī, chicken), is a 鸟 (bird), but not read as 又 (yòu).

Phonetic-form characters (音记字; 音記字; yīnjìzì) are composed of phonetic components and pure form components.^[26] They mostly came from ancient semantic-phonetic characters, where the semantic components lost their functions and became pure form. For example,

球 (qiú, ball): Originally refers to a kind of beautiful jade, with semantic component 王 (玉, jade). Later, it was borrowed to represent a ball, and then extended to any spherical object, and 王 (jade) became a pure form component, while 求 (qiú) remains a phonetic component.
笨 (bèn, stupid): Originally refers to the inner white layer of bamboo, with semantic component 竹 (bamboo) and phonetic 本 (běn). Later, the character was borrowed by sound to mean 'stupid'.
华 (huá, magnificent): This is a simplified character with phonetic 化 and pure form component 十.

Semantic-phonetic-form characters (义音记字; 義音記字; yìyīnjìzì) consist of the three kinds of components. For example,^[22]

岸 (àn, bank, shore): originally had the semantic ⿱山厂 and phonetic 干 (gàn). In modern Chinese, ⿱山厂 is not a character or radical with a sound or meaning, but 山 (hill) can still express meaning, while 厂 remains a pure form component.
聽 (tīng, listen): semantic 耳 (ear) and phonetic 壬 (tǐng). In modern Chinese characters, the right part has become a pure form component.

Semantic–phonetic–form characters are very rare and the examples above are not quite persuasive. Whether they can be justified as an internal structural category remains to be further studied. If not a category, then the classification above can also be called "New six writings".^[18]

According to Yang, ^[24] among the 3,500 frequently used Chinese characters of their experiment, semantic component characters are the least, accounting for about 5%; pure form component characters account for about 18%; Semantic–form and phonetic–form characters account for about 19%. The largest group is semantic-phonetic characters, accounting for about 58%.

Differences

For most characters, the dividing results for internal structures are similar to the first-level external structures. For example, 江 (river) is divided into components 氵and 工 in both cases. However, the explanations are not the same. ^[15]

External structure of 江: external component 氵+ external component 工.
Internal structure of 江: semantic component 氵 + phonetic component 工.

In a few cases, even the physical structures are different, for examples^[27] ^[28]

辯 (biàn, debate), external structure: ⿲ 辛 + 言 + 辛,
辯 (biàn, debate), internal structure: ⿴ phonetic 辡 (biàn) + semantic 言 (speak);
裹 (guǒ, wrap), external structure: ⿳ 亠 + 果 + 𧘇,
裹 (guǒ, wrap), internal structure: ⿴ semantic 衣 (cloth) + phonetic 果 (guǒ)
穎 (yǐng, ear of grain), external structure: ⿰⿱ 匕禾頁,
穎 (yǐng, ear of grain), internal structure: phonetic ⿹ 頃 (qǐng) + semantic 禾 (grain, rice plant).

Notes

^ Unicode 2FF0, IDC (Ideographic description character) LEFT TO RIGHT
^ Examples are available in the next section.
^ This is the Middle Vietnamese pronunciation; the word is pronounced in modern Vietnamese as trăng.

References

Citations

^ National Language Commission 2009, p. 2.
^ Su 2014, pp. 73–74.
^ Peking University 2004, pp. 148–152.
^ Su 2014, pp. 74–75.
^ Su 2014, p. 82.
^ National Language Commission 2009, p. 1.
^ Su 2014, p. 86.
^ Su 2014, p. 94.
^ National Language Commission 2009a, p. 1.
^ Su 2014, p. 98.
^ Su 2014, pp. 98–99.
^ Ideographic Description Characters https://www.unicode.org/charts/PDF/U2FF0.pdf
^ Fu 1999, pp. 39–41.
^ Li 2013, pp. 122–124.
^ ^a ^b Qiu 2013, pp. 102–108.
^ Yin & Wang 2007, pp. 97–100.
^ Su 2014, pp. 102–111.
^ ^a ^b Zhang & Li 2024.
^ ^a ^b Yin & Wang 2007, p. 98.
^ Su 2014, pp. 103–105.
^ Handel 2019, pp. 145, 150.
^ ^a ^b Yin & Wang 2007, p. 100.
^ Yin & Wang 2007, p. 99.
^ ^a ^b Yang 2008, p. 147.
^ Su 2014, p. 107-108.
^ Su 2014, p. 109.
^ Su 2014, p. 105.
^ 辯 CJK Unified Ideograph 8FAF https://en.wiktionary.org/wiki/%E8%BE%AF

Works cited

Fu, Yonghe (傅永和) (1999). 中文信息处理 (Chinese Information Processing) (in Chinese) (3rd ed.). Guangzhou: 广东教育出版社 (Guangdong Education Press). p. 84. ISBN 9-787540-640804.
Handel, Zev (2019), Sinography: The Borrowing and Adaptation of the Chinese Script, Language, Writing and Literary Culture in the Sinographic Cosmopolis, vol. 1, Brill, ISBN 978-9-004-35222-3, S2CID 189494805
Li, Dasui 李大遂 (2013). 简明实用汉字学 [Concise and Practical Chinese Characters] (in Chinese) (3rd ed.). Beijing: Peking University Press. ISBN 978-7-301-21958-4.
National Language Commission, Ministry of Education, China (2009). Specification of Common Modern Chinese Character Components and Component Names ( 现代常用字部件及部件名称规范) (PDF). Beining: National Language Commission. Retrieved 3 September 2023.{{cite book}}: CS1 maint: multiple names: authors list (link)
National Language Commission, Ministry of Education, China (2009a). Specification of the Undecomposable Characters Commonly Used in the Modern Chinese (现代常用独体字规范) (PDF). Beining: National Language Commission. Retrieved September 8, 2023.{{cite book}}: CS1 maint: multiple names: authors list (link)
Peking University, Modern Chinese Language Teaching and Research Office (2004). Modern Chinese (现代汉语) (in Chinese). Beijing: Commercial Press. ISBN 7-100-00940-5.
Qiu, Xigui 裘锡圭 (2013). 文字学概要 [Chinese Writing] (in Chinese) (2nd ed.). Beijing: 商务印书馆 (Commercial Press). ISBN 978-7-100-09369-9.
Su, Peicheng (苏培成) (2014). 现代汉字学纲要 (Essentials of Modern Chinese Characters) (in Chinese) (3rd ed.). Beijing: 商务印书馆 (Commercial Press). p. 84. ISBN 978-7-100-10440-1.
Yang Runlu (杨润陆) (2008), 现代汉字学 [Modern Chinese Characters] (in Chinese), Beijing Normal University Press, ISBN 978-7-303-09437-0
Yin, Jiming 殷寄明; Wang, Rudong 汪如东 (2007). Xiàndài hànyǔ wénzìxué 现代汉语文字学 [Modern Chinese Writing] (in Chinese). Shanghai: Fudan University Press. ISBN 978-7-309-05525-2.
Zhang, Xiaoheng; Li, Xiaotong (2024). "On the Classification of Modern Chinese Characters Based on Forms, Sounds and Meanings: A formation perspective (谈现代汉字的形音义分类——构字法篇)". The Journal of Modernization of Chinese Language Education (中文教学现代化学报). 13 (2024) (1): 12–18.

[13] Unicode 2FF0, IDC (Ideographic description character) LEFT TO RIGHT

[17] Examples are available in the next section.

[23] This is the Middle Vietnamese pronunciation; the word is pronounced in modern Vietnamese as trăng.

[FOOTNOTENational_Language_Commission20092-1] National Language Commission 2009, p. 2.

[FOOTNOTESu201473–74-2] Su 2014, pp. 73–74.

[FOOTNOTEPeking_University2004148–152-3] Peking University 2004, pp. 148–152.

[FOOTNOTESu201474–75-4] Su 2014, pp. 74–75.

[FOOTNOTESu201482-5] Su 2014, p. 82.

[FOOTNOTENational_Language_Commission20091-6] National Language Commission 2009, p. 1.

[FOOTNOTESu201486-7] Su 2014, p. 86.

[FOOTNOTESu201494-8] Su 2014, p. 94.

[FOOTNOTENational_Language_Commission2009a1-9] National Language Commission 2009a, p. 1.

[FOOTNOTESu201498-10] Su 2014, p. 98.

[FOOTNOTESu201498–99-11] Su 2014, pp. 98–99.

[12] Ideographic Description Characters https://www.unicode.org/charts/PDF/U2FF0.pdf

[FOOTNOTEFu199939–41-14] Fu 1999, pp. 39–41.

[FOOTNOTELi2013122–124-15] Li 2013, pp. 122–124.

[FOOTNOTEQiu2013102–108-16] Qiu 2013, pp. 102–108.

[FOOTNOTEYinWang200797–100-18] Yin & Wang 2007, pp. 97–100.

[FOOTNOTESu2014102–111-19] Su 2014, pp. 102–111.

[FOOTNOTEZhangLi2024-20] Zhang & Li 2024.

[FOOTNOTEYinWang200798-21] Yin & Wang 2007, p. 98.

[FOOTNOTESu2014103–105-22] Su 2014, pp. 103–105.

[FOOTNOTEHandel2019145,_150-24] Handel 2019, pp. 145, 150.

[FOOTNOTEYinWang2007100-25] Yin & Wang 2007, p. 100.

[FOOTNOTEYinWang200799-26] Yin & Wang 2007, p. 99.

[FOOTNOTEYang2008147-27] Yang 2008, p. 147.

[FOOTNOTESu2014107-108-28] Su 2014, p. 107-108.

[FOOTNOTESu2014109-29] Su 2014, p. 109.

[FOOTNOTESu2014105-30] Su 2014, p. 105.

[31] 辯 CJK Unified Ideograph 8FAF https://en.wiktionary.org/wiki/%E8%BE%AF

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[a]

[13]

[14]

[15]

[b]

[16]

[17]

[18]

[19]

[20]

[c]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]