User:Ruud Koot/Gujarati script/How To: Use Unicode for creating Gujarati script

This page contains Indic text. Without proper rendering support, you may see question marks or boxes, misplaced vowels or missing conjuncts instead of Indic text.

This is a subpage for the main article - Gujarati script. Here you can find additional details and resources regarding how to user Unicode for creating Gujarati script.

Unicode Code-set for Gujarati Script[edit]

The Unicode range for Gujarati script is from U+0A80 to U+0AFF. The ISCII Code-page identifier for Gujarati script is 57010.

The table below shows the glyphs that are implemented in Unicode standard 4.0.0. Gray boxes indicate the code-points that are reserved/unused.

x=

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

U+0A8x

ઁ

ં

ઃ

અ

આ

ઇ

ઈ

ઉ

ઊ

ઋ

ઍ

એ

U+0A9x

ઐ

ઑ

ઓ

ઔ

ક

ખ

ગ

ઘ

ઙ

ચ

છ

જ

ઝ

ઞ

ટ

U+0AAx

ઠ

ડ

ઢ

ણ

ત

થ

દ

ધ

ન

પ

ફ

બ

ભ

મ

ય

U+0ABx

ર

લ

ળ

વ

શ

ષ

સ

હ

઼

ઽ

ા

િ

U+0ACx

ી

ુ

ૂ

ૃ

ૄ

ૅ

ે

ૈ

ૉ

ો

ૌ

્

U+0ADx

ૐ

U+0AEx

ૠ

૦

૧

૨

૩

૪

૫

૬

૭

૮

૯

U+0AFx

For further details regarding Unicode Code-points and standards, you may refer to Unicode Code-chart — Standard 4.1.

How To: Use Unicode for creating Gujarati script[edit]

Note: In the examples shown in the sections below, the "+" sign denotes the combination of key-strokes.

Half-form of consonants[edit]

Half-forms of consonants are used in pre-base position. For consonants that do not have distinct glyph for half-forms, a Halant (્) is used to create half-forms as follows:

મ +્ + ય = મ્ય

— as in રમ્ય (pleasant)

(Note the Half-form of મ, which is used here in conjunction with ય) Note: Half-form is not created for the base glyph even if the syllable ends with a Halant.

Application of Upper-based form of Ra – (Reph)[edit]

Application of Ra with a Halant (Half-form of Ra, as seen above) to a full-form consonant before the constonant produces Reph for that consonant. This affects the pronunciation of Ra in conjunction with that consonant. A Reph can be created as follows:

ર +્	= Ra + Halant
ર +્ + થ = ર્થ	— as in અર્થ (meaning)

(Ra + Halant + થ = Reph effect on થ)

Application of Lower-based form of Ra – (Vattu)[edit]

Application of a Halant of a consonant (Half-form of consonant) to a full-form of Ra produces Vattu for that consonant. This affects the pronunciation of Ra in conjunction with that consonant. A Vattu can be created as follows:

પ +્ + ર = પ્ર

— as in પ્રજા (people)

(પ + Halant + Ra = Vattu effect on પ)

Vattu variants[edit]

Vattu variants (half and full) are formed when consonants with vattu mark are combined. Often in some cases, a special glyph is required to represent vattu when various consonants are combined.

ડ +્ + ર = ડ્ર

— as in ડ્રમ (drum)

(special glyph ડ્ર. Notice the two lower-based marks, as compared to only one in the previous example.)

Special Marks, Characters and Nukta[edit]

Above-based marks[edit]

All above-based marks and post-based matra are created as under:

ક +ં = કં

— as in કંપન (vibration)

Below-based marks[edit]

The below-based marks and post-based matra are created as below:

ક +ુ = કુ	— as in કુતરો (dog)
ભ +ૂ = ભૂ	— as in ભૂકંપ (earthquake)

Characters શ્ર, ક્ષ and જ્ઞ[edit]

Following characters, which are part of the Gujarati alphabet, but are not explicitly created as glyphs in Unicode character-set, can be generated as indicated below:

શ +્ + ર = શ્ર

ક +્ + ષ = ક્ષ

જ +્ + ઞ = જ્ઞ

Application of Nukta[edit]

Nukta effects the pronunciation of the (preceding) consonant to which it is applied. A Nukta form of a consonant can be created in Unicode as follows:

ય +઼ = ય઼

Substitutions for specific typography of the script[edit]

Substitution, in the context applicable here, means replacing a set or group of characters with a resultant single unicode character. Following are the main character substitutions which are required to address the complexity of the language and to generate various character forms of the script:

Pre-base substitutions[edit]

The half-form conjunctions, one of the most common occurrences of the script, are created by pre-base substitutions.

ન +્ + ન = ન્ન

— as in પ્રસન્ન (happy)

Also, the special use of this substitution is in creating I-Matra (and its appropriately aligned shape) as shown below:

ત +િ = તિ

— as in તિર (arrow)

Post-base substitutions[edit]

Consonants of the Gujarati script do not have post-based forms. Primarily, post-based substitution is used to create visarga out of vowels, and is also applied for "I-Matra" substitutions as follows (which will precede any above-based substitution, if applied as well):

જ +ી = જી

— as in જીવન (life)

(Compare the special shape જી – a result of post-based substitution – with another result of similar conbination using a character like લ, which will generate: લ +ી = લી)

Above-base substitutions[edit]

Above-based substitution is mainly applied for Matra, Reph, vowel modifications and for stress and tone marks. Consider the following examples:

વ +ૈ = વૈ	— as in વૈભવ (pompousness)
ર +્ + ગ +ે = ર્ગે	— as in સ્વર્ગે (in heaven)
મ +ે +ં = મેં	— as in મેંઢક (frog)

Below-base substitutions[edit]

Mainly used for below-based matra, the below-based substitution could produce a conjunction, or change the whole shape of the glyph. This substitution is also used for producing special tone effect like anudatta.

More details on Gujarati Unicode[edit]

For further details on Gujarati Unicode, you may refer to Unicode Std 4.0.0 - Chapter 9
TDIL: Ministry of Communication & Information Technology, India
If you are creating a web-page while the OS language is not Gujarati, save the file as UTF-8 Unicode HTML. The code-points may be lost otherwise.