User:Ruud Koot/Gujarati script/How To: Use Unicode for creating Gujarati script
This is a subpage for the main article - Gujarati script. Here you can find additional details and resources regarding how to user Unicode for creating Gujarati script.
Unicode Code-set for Gujarati Script[edit]
The Unicode range for Gujarati script is from U+0A80 to U+0AFF. The ISCII Code-page identifier for Gujarati script is 57010.
The table below shows the glyphs that are implemented in Unicode standard 4.0.0. Gray boxes indicate the code-points that are reserved/unused.
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
U+0A8x | ઁ | ં | ઃ | અ | આ | ઇ | ઈ | ઉ | ઊ | ઋ | ઍ | એ | ||||
U+0A9x | ઐ | ઑ | ઓ | ઔ | ક | ખ | ગ | ઘ | ઙ | ચ | છ | જ | ઝ | ઞ | ટ | |
U+0AAx | ઠ | ડ | ઢ | ણ | ત | થ | દ | ધ | ન | પ | ફ | બ | ભ | મ | ય | |
U+0ABx | ર | લ | ળ | વ | શ | ષ | સ | હ | ઼ | ઽ | ા | િ | ||||
U+0ACx | ી | ુ | ૂ | ૃ | ૄ | ૅ | ે | ૈ | ૉ | ો | ૌ | ્ | ||||
U+0ADx | ૐ | |||||||||||||||
U+0AEx | ૠ | ૦ | ૧ | ૨ | ૩ | ૪ | ૫ | ૬ | ૭ | ૮ | ૯ | |||||
U+0AFx |
- For further details regarding Unicode Code-points and standards, you may refer to Unicode Code-chart — Standard 4.1.
How To: Use Unicode for creating Gujarati script[edit]
Note: In the examples shown in the sections below, the "+" sign denotes the combination of key-strokes.
Half-form of consonants[edit]
Half-forms of consonants are used in pre-base position. For consonants that do not have distinct glyph for half-forms, a Halant (્) is used to create half-forms as follows:
મ +્ + ય = મ્ય | — as in રમ્ય (pleasant) |
(Note the Half-form of મ, which is used here in conjunction with ય) Note: Half-form is not created for the base glyph even if the syllable ends with a Halant.
Application of Upper-based form of Ra – (Reph)[edit]
Application of Ra with a Halant (Half-form of Ra, as seen above) to a full-form consonant before the constonant produces Reph for that consonant. This affects the pronunciation of Ra in conjunction with that consonant. A Reph can be created as follows:
ર +્ | = Ra + Halant |
ર +્ + થ = ર્થ | — as in અર્થ (meaning) |
(Ra + Halant + થ = Reph effect on થ)
Application of Lower-based form of Ra – (Vattu)[edit]
Application of a Halant of a consonant (Half-form of consonant) to a full-form of Ra produces Vattu for that consonant. This affects the pronunciation of Ra in conjunction with that consonant. A Vattu can be created as follows:
પ +્ + ર = પ્ર | — as in પ્રજા (people) |
(પ + Halant + Ra = Vattu effect on પ)
Vattu variants[edit]
Vattu variants (half and full) are formed when consonants with vattu mark are combined. Often in some cases, a special glyph is required to represent vattu when various consonants are combined.
ડ +્ + ર = ડ્ર | — as in ડ્રમ (drum) |
(special glyph ડ્ર. Notice the two lower-based marks, as compared to only one in the previous example.)
Special Marks, Characters and Nukta[edit]
Above-based marks[edit]
All above-based marks and post-based matra are created as under:
ક +ં = કં | — as in કંપન (vibration) |
Below-based marks[edit]
The below-based marks and post-based matra are created as below:
ક +ુ = કુ | — as in કુતરો (dog) |
ભ +ૂ = ભૂ | — as in ભૂકંપ (earthquake) |
Characters શ્ર, ક્ષ and જ્ઞ[edit]
Following characters, which are part of the Gujarati alphabet, but are not explicitly created as glyphs in Unicode character-set, can be generated as indicated below:
શ +્ + ર = શ્ર |
ક +્ + ષ = ક્ષ |
જ +્ + ઞ = જ્ઞ |
Application of Nukta[edit]
Nukta effects the pronunciation of the (preceding) consonant to which it is applied. A Nukta form of a consonant can be created in Unicode as follows:
ય +઼ = ય઼ |
Substitutions for specific typography of the script[edit]
Substitution, in the context applicable here, means replacing a set or group of characters with a resultant single unicode character. Following are the main character substitutions which are required to address the complexity of the language and to generate various character forms of the script:
Pre-base substitutions[edit]
The half-form conjunctions, one of the most common occurrences of the script, are created by pre-base substitutions.
ન +્ + ન = ન્ન | — as in પ્રસન્ન (happy) |
Also, the special use of this substitution is in creating I-Matra (and its appropriately aligned shape) as shown below:
ત +િ = તિ | — as in તિર (arrow) |
Post-base substitutions[edit]
Consonants of the Gujarati script do not have post-based forms. Primarily, post-based substitution is used to create visarga out of vowels, and is also applied for "I-Matra" substitutions as follows (which will precede any above-based substitution, if applied as well):
જ +ી = જી | — as in જીવન (life) |
(Compare the special shape જી – a result of post-based substitution – with another result of similar conbination using a character like લ, which will generate: લ +ી = લી)
Above-base substitutions[edit]
Above-based substitution is mainly applied for Matra, Reph, vowel modifications and for stress and tone marks. Consider the following examples:
વ +ૈ = વૈ | — as in વૈભવ (pompousness) |
ર +્ + ગ +ે = ર્ગે | — as in સ્વર્ગે (in heaven) |
મ +ે +ં = મેં | — as in મેંઢક (frog) |
Below-base substitutions[edit]
Mainly used for below-based matra, the below-based substitution could produce a conjunction, or change the whole shape of the glyph. This substitution is also used for producing special tone effect like anudatta.
More details on Gujarati Unicode[edit]
- For further details on Gujarati Unicode, you may refer to Unicode Std 4.0.0 - Chapter 9
- TDIL: Ministry of Communication & Information Technology, India
- If you are creating a web-page while the OS language is not Gujarati, save the file as UTF-8 Unicode HTML. The code-points may be lost otherwise.