ISO/IEC JTC1/SC2/WG2 N1519R
Date: 1997-05-30
This is an unofficial HTML version of a document submitted to WG2.

Title: Proposal for encoding the Thaana script in ISO/IEC 10646

Source: Michael Everson
Status: Expert Contribution
Action: For consideration by JTC1/SC2/WG2

This document contains the proposal summary (ISO/IEC JTC1/SC2/WG2 form N1352) and contains a complete proposal to encode the Thaana script in ISO/IEC 10646. This proposal was a minor revision of a proposal by Rick McGowan, taken from Unicode Technical Report No. 3, with input from Maldivian experts.

A. Administrative

1. TitleProposal for encoding the Thaana script in ISO/IEC 10646
2. Requester's nameMichael Everson, Evertype (WG2 member for Ireland)
3. Requester typeExpert contribution
4. Submission date1997-05-30
5. Requester's referencehttp://www.evertype.com/standards/dv/thaana.html
6a. CompletionThis is a complete proposal.
6b. More information to be provided?No

B. Technical -- General

1a. New script? Name?Yes. Thaana.
1b. Addition of characters to existing block? Name?No
2. Number of characters50
3. Proposed categoryCategory A
4. Proposed level of implementation and rationaleThaana requires Level 2 implementation as Indic scripts do.
5a. Character names included in proposal?Yes
5b. Character names in accordance with guidelines?Yes
5c. Character shapes reviewable?Yes (see below)
6a. Who will provide computerized font?Michael Everson, Evertype
6b. Font currently available?Michael Everson, Evertype
6c. Font format?TrueType
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?Yes. An 8-bit font was made available to me as a source encoding. It has de-facto use status. There appear to be no standards. (see below)
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?Not provided here. Thaana is well-known. There is an online newspaper in Dhivehi (Haveeru Medhu Haftha, http://www.haveeru.com/midweek/index.htm), which displays text with gifs.
8. Does the proposal address other aspects of character data processing?Yes (see below)

C. Technical -- Justification

1. Contact with the user community?Yes. The Maldivian Students' Association (GB), the Haveeru Daily, the National Centre for Linguistic and Historical Research (MV).
2. Information on the user community?226,000 people live in the Maldives. Thaana is the national script.
3a. The context of use for the proposed characters?Thaana script is commonly used to write Dhivehi.
3b. ReferenceUnicode Technical Report #3
4a. Proposed characters in current use?Yes
4b. Where?In the Maldives.
5a. Characters should be encoded entirely in BMP?Yes. Positions U+0700 - U+074F are proposed for the encoding.
5b. RationaleThaana is a Category A script.
6. Should characters be kept in a continuous range?Yes
7a. Can the characters be considered a presentation form of an existing character or character sequence? No
7b. Where?
7c. Reference
8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character?No
8b. Where?
8c. Reference
9a. Combining characters or use of composite sequences included?Yes
9b. List of composite sequences and their corresponding glyph images provided?No
10. Characters with any special properties such as control function, etc. included?No

D. SC2/WG2 Administrative

To be completed by SC2/WG2
1. Relevant SC 2/WG 2 document numbers:
2. Status (list of meeting number and corresponding action or disposition)
3. Additional contact to user communities, liaison organizations etc.
4. Assigned category and assigned priority/time frame
Other Comments


E. Proposal

User community

The Thaana script is used to write the modern Dhivehi language of the Maldives, a group of atolls in the Indian Ocean, circa 650 km SW of Sri Lanka, about 4N 73E. 226,000 people live in the Maldives.

Processing

Thaana is written from right to left and partakes of features of both the Indic and Arabic script varieties. Consonants have no inherent a vowel sound, and are always written with either a vowel sign or a null "vanishing vowel" sign (U+0730) above them. On THAANA LETTER ALIF (U+0707) the null vowel sign is a glottal stop. Loanwords from Arabic are also written in the Arabic script or transcribed by means of dots on existing Thaana letters; the use of modified Thaana letters dates from the middle of the present century. Both Arabic and European digits are used. The ARABIC COMMA (U+060C), ARABIC SEMICOLON (U+061B), and ARABIC QUESTION MARK (U+061F), but a native comma and full stop are also used (according to Nakanishi 1980). These can be unified with FULL STOP already encoded; the native comma is "." and the native full stop is "..". Transliteration of the character names follows the usual method employed in the Maldives.

Issues

Two other scripts were formerly used to write Dhivehi: the older Eveelaa Akuru 'ancient letters' and the later Dives Akuru 'island letters'. The earliest documents written in the Eveelaa Akuru date from the 12th century; documents written in the Dives Akuru date from the 15th century. Both of those scripts are related to the Sinhala script, and are written from right to left. The use of the Thaana script dates from the 17th century. The present shapes of the Thaana (called the Gabulhi Thaana) date from the 19th century; earlier shapes should probably be considered font variants. It is likely that the Eveelaa Akuru and Dives Akuru should be encoded separately from Thaana in ISO 10646. Faulmann 1880 gives an example of one of the older scripts, probably the Dives Akuru, since the Eveelaa Akuru was more similar to Sinhala than the example given there.

The Unicode Technical Report #3 listed 12 Extended Thaana letters; a font provided by the Maldivian Students' Association provided two additional ones, as well as the three additional signs which follow the punctuation marks below. Maldivian expert Husine Zahid has said that only one of these, THAANA REYTU SIGN, should be encoded. With regard to the two additional Extended Thaana letters, Husine Zahid said:

The extended Thaana characters are used to write Arabic loanwords in the Thaana script. The extended characters ARABIC ZAVIYANI and ARABIC VAAVU are also in current use, although very rarely cited. These characters are not part of the formal Thaana character set. These are therefore best implemented as Thaana signs, rather than as part of the formal set of characters. When Arabic loanwords using the letter ARABIC ZAVIYANI are written, the Thaana letter ZAVIYANI is used as equivalent to ARABIC ZAVIYANI. Similarly VAAVU and ARABIC VAAVU are equivalent.
It may therefore be suitable to unify ARABIC ZAVIYANI with ZAVIYANI and ARABIC VAAVU with VAAVU. This would save a column; on the other hand, retaining the fourth column would allow for convenient expansion of the script with additional signs should they come to light, and retaining these two characters would preserve existing data, as they "are also in current use, although very rarely cited".

Four columns are required to encode Thaana. The Thaana block is divided into the following ranges:

	U+0700 -> U+0717		Consonant letters
	U+0718 -> U+0725		Extended Thaana letters
	U+0726 -> U+0730		Non-spacing vowel signs
	U+0731				Other sign
	U+0732 -> U+073F		currently unassigned

References


Names and code table

000	0700	THAANA LETTER HAA
001	0701	THAANA LETTER SHAVIYANI
002	0702	THAANA LETTER NOONU
003	0703	THAANA LETTER RAA
004	0704	THAANA LETTER BAA
005	0705	THAANA LETTER LHAVIYANI
006	0706	THAANA LETTER KAAFU
007	0707	THAANA LETTER ALIFU
008	0708	THAANA LETTER VAAVU
009	0709	THAANA LETTER MEEMU
010	070A	THAANA LETTER FAAFU
011	070B	THAANA LETTER DHAALU
012	070C	THAANA LETTER THAA
013	070D	THAANA LETTER LAAMU
014	070E	THAANA LETTER GAAFU
015	070F	THAANA LETTER GNAVIYANI
016	0710	THAANA LETTER SEENU
017	0711	THAANA LETTER DAVIYANI
018	0712	THAANA LETTER ZAVIYANI
019	0713	THAANA LETTER TAVIYANI
020	0714	THAANA LETTER YAA
021	0715	THAANA LETTER PAVIYANI
022	0716	THAANA LETTER JAVIYANI
023	0717	THAANA LETTER CHAVIYANI
025	0718	THAANA LETTER HHAA
026	0719	THAANA LETTER KHAA
027	071A	THAANA LETTER AINU
028	071B	THAANA LETTER GHAINU
029	071C	THAANA LETTER THAALU
030	071D	THAANA LETTER THAA
031	071E	THAANA LETTER THO
032	071F	THAANA LETTER ZO
033	0720	THAANA LETTER QAAFU
034	0721	THAANA LETTER SHEENU
035	0722	THAANA LETTER SAADHU
036	0723	THAANA LETTER DAADDU
036	0724	THAANA LETTER ARABIC ZAVIYANI
037	0725	THAANA LETTER ARABIC VAAVU
038	0726	THAANA ABAFILI
039	0727	THAANA AABAAFILI
040	0728	THAANA IBIFILI
041	0729	THAANA EEBEEFILI
042	072A	THAANA UBUFILI
043	072B	THAANA OOBOOFILI
044	072C	THAANA EBEFILI
045	072D	THAANA EYBEYFILI
046	072E	THAANA OBOFILI
047	072F	THAANA OABOAFILI
048	0730	THAANA SUKUN
049	0731	THAANA REYTU SIGN
050	0732	(This position shall not be used)
051	0733	(This position shall not be used)
052	0734	(This position shall not be used)
053	0735	(This position shall not be used)
054	0736	(This position shall not be used)
055	0737	(This position shall not be used)
056	0738	(This position shall not be used)
057	0739	(This position shall not be used)
058	073A	(This position shall not be used)
059	073B	(This position shall not be used)
060	073C	(This position shall not be used)
061	073D	(This position shall not be used)
062	073E	(This position shall not be used)
063	073F	(This position shall not be used)

Michael Everson, Evertype, Dublin, 2001-09-21