ISO/IEC JTC1/SC2/WG2 N1519R
Date: 1997-05-30
This is an unofficial HTML version of a document submitted to WG2.
A. Administrative | |
1. Title | Proposal for encoding the Thaana script in ISO/IEC 10646 |
2. Requester's name | Michael Everson, Evertype (WG2 member for Ireland) |
3. Requester type | Expert contribution |
4. Submission date | 1997-05-30 |
5. Requester's reference | http://www.evertype.com/standards/dv/thaana.html |
6a. Completion | This is a complete proposal. |
6b. More information to be provided? | No |
B. Technical -- General | |
1a. New script? Name? | Yes. Thaana. |
1b. Addition of characters to existing block? Name? | No |
2. Number of characters | 50 |
3. Proposed category | Category A |
4. Proposed level of implementation and rationale | Thaana requires Level 2 implementation as Indic scripts do. |
5a. Character names included in proposal? | Yes |
5b. Character names in accordance with guidelines? | Yes |
5c. Character shapes reviewable? | Yes (see below) |
6a. Who will provide computerized font? | Michael Everson, Evertype |
6b. Font currently available? | Michael Everson, Evertype |
6c. Font format? | TrueType |
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided? | Yes. An 8-bit font was made available to me as a source encoding. It has de-facto use status. There appear to be no standards. (see below) |
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached? | Not provided here. Thaana is well-known. There is an online newspaper in Dhivehi (Haveeru Medhu Haftha, http://www.haveeru.com/midweek/index.htm), which displays text with gifs. |
8. Does the proposal address other aspects of character data processing? | Yes (see below) |
C. Technical -- Justification | |
1. Contact with the user community? | Yes. The Maldivian Students' Association (GB), the Haveeru Daily, the National Centre for Linguistic and Historical Research (MV). |
2. Information on the user community? | 226,000 people live in the Maldives. Thaana is the national script. |
3a. The context of use for the proposed characters? | Thaana script is commonly used to write Dhivehi. |
3b. Reference | Unicode Technical Report #3 |
4a. Proposed characters in current use? | Yes |
4b. Where? | In the Maldives. |
5a. Characters should be encoded entirely in BMP? | Yes. Positions U+0700 - U+074F are proposed for the encoding. |
5b. Rationale | Thaana is a Category A script. |
6. Should characters be kept in a continuous range? | Yes |
7a. Can the characters be considered a presentation form of an existing character or character sequence? | No |
7b. Where? | |
7c. Reference | |
8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character? | No |
8b. Where? | |
8c. Reference | |
9a. Combining characters or use of composite sequences included? | Yes |
9b. List of composite sequences and their corresponding glyph images provided? | No |
10. Characters with any special properties such as control function, etc. included? | No |
D. SC2/WG2 AdministrativeTo be completed by SC2/WG2 | |
1. Relevant SC 2/WG 2 document numbers: | |
2. Status (list of meeting number and corresponding action or disposition) | |
3. Additional contact to user communities, liaison organizations etc. | |
4. Assigned category and assigned priority/time frame | |
Other Comments |
Processing
Thaana is written from right to left and partakes of features of both the Indic and Arabic script varieties. Consonants have no inherent a vowel sound, and are always written with either a vowel sign or a null "vanishing vowel" sign (U+0730) above them. On THAANA LETTER ALIF (U+0707) the null vowel sign is a glottal stop. Loanwords from Arabic are also written in the Arabic script or transcribed by means of dots on existing Thaana letters; the use of modified Thaana letters dates from the middle of the present century. Both Arabic and European digits are used. The ARABIC COMMA (U+060C), ARABIC SEMICOLON (U+061B), and ARABIC QUESTION MARK (U+061F), but a native comma and full stop are also used (according to Nakanishi 1980). These can be unified with FULL STOP already encoded; the native comma is "." and the native full stop is "..". Transliteration of the character names follows the usual method employed in the Maldives.
Issues
Two other scripts were formerly used to write Dhivehi: the older Eveelaa Akuru 'ancient letters' and the later Dives Akuru 'island letters'. The earliest documents written in the Eveelaa Akuru date from the 12th century; documents written in the Dives Akuru date from the 15th century. Both of those scripts are related to the Sinhala script, and are written from right to left. The use of the Thaana script dates from the 17th century. The present shapes of the Thaana (called the Gabulhi Thaana) date from the 19th century; earlier shapes should probably be considered font variants. It is likely that the Eveelaa Akuru and Dives Akuru should be encoded separately from Thaana in ISO 10646. Faulmann 1880 gives an example of one of the older scripts, probably the Dives Akuru, since the Eveelaa Akuru was more similar to Sinhala than the example given there.
The Unicode Technical Report #3 listed 12 Extended Thaana letters; a font provided by the Maldivian Students' Association provided two additional ones, as well as the three additional signs which follow the punctuation marks below. Maldivian expert Husine Zahid has said that only one of these, THAANA REYTU SIGN, should be encoded. With regard to the two additional Extended Thaana letters, Husine Zahid said:
The extended Thaana characters are used to write Arabic loanwords in the Thaana script. The extended characters ARABIC ZAVIYANI and ARABIC VAAVU are also in current use, although very rarely cited. These characters are not part of the formal Thaana character set. These are therefore best implemented as Thaana signs, rather than as part of the formal set of characters. When Arabic loanwords using the letter ARABIC ZAVIYANI are written, the Thaana letter ZAVIYANI is used as equivalent to ARABIC ZAVIYANI. Similarly VAAVU and ARABIC VAAVU are equivalent.It may therefore be suitable to unify ARABIC ZAVIYANI with ZAVIYANI and ARABIC VAAVU with VAAVU. This would save a column; on the other hand, retaining the fourth column would allow for convenient expansion of the script with additional signs should they come to light, and retaining these two characters would preserve existing data, as they "are also in current use, although very rarely cited".
Four columns are required to encode Thaana. The Thaana block is divided into the following ranges:
U+0700 -> U+0717 Consonant letters U+0718 -> U+0725 Extended Thaana letters U+0726 -> U+0730 Non-spacing vowel signs U+0731 Other sign U+0732 -> U+073F currently unassigned
Names and code table | |
000 0700 THAANA LETTER HAA 001 0701 THAANA LETTER SHAVIYANI 002 0702 THAANA LETTER NOONU 003 0703 THAANA LETTER RAA 004 0704 THAANA LETTER BAA 005 0705 THAANA LETTER LHAVIYANI 006 0706 THAANA LETTER KAAFU 007 0707 THAANA LETTER ALIFU 008 0708 THAANA LETTER VAAVU 009 0709 THAANA LETTER MEEMU 010 070A THAANA LETTER FAAFU 011 070B THAANA LETTER DHAALU 012 070C THAANA LETTER THAA 013 070D THAANA LETTER LAAMU 014 070E THAANA LETTER GAAFU 015 070F THAANA LETTER GNAVIYANI 016 0710 THAANA LETTER SEENU 017 0711 THAANA LETTER DAVIYANI 018 0712 THAANA LETTER ZAVIYANI 019 0713 THAANA LETTER TAVIYANI 020 0714 THAANA LETTER YAA 021 0715 THAANA LETTER PAVIYANI 022 0716 THAANA LETTER JAVIYANI 023 0717 THAANA LETTER CHAVIYANI 025 0718 THAANA LETTER HHAA 026 0719 THAANA LETTER KHAA 027 071A THAANA LETTER AINU 028 071B THAANA LETTER GHAINU 029 071C THAANA LETTER THAALU 030 071D THAANA LETTER THAA 031 071E THAANA LETTER THO 032 071F THAANA LETTER ZO 033 0720 THAANA LETTER QAAFU 034 0721 THAANA LETTER SHEENU 035 0722 THAANA LETTER SAADHU 036 0723 THAANA LETTER DAADDU 036 0724 THAANA LETTER ARABIC ZAVIYANI 037 0725 THAANA LETTER ARABIC VAAVU 038 0726 THAANA ABAFILI 039 0727 THAANA AABAAFILI 040 0728 THAANA IBIFILI 041 0729 THAANA EEBEEFILI 042 072A THAANA UBUFILI 043 072B THAANA OOBOOFILI 044 072C THAANA EBEFILI 045 072D THAANA EYBEYFILI 046 072E THAANA OBOFILI 047 072F THAANA OABOAFILI 048 0730 THAANA SUKUN 049 0731 THAANA REYTU SIGN 050 0732 (This position shall not be used) 051 0733 (This position shall not be used) 052 0734 (This position shall not be used) 053 0735 (This position shall not be used) 054 0736 (This position shall not be used) 055 0737 (This position shall not be used) 056 0738 (This position shall not be used) 057 0739 (This position shall not be used) 058 073A (This position shall not be used) 059 073B (This position shall not be used) 060 073C (This position shall not be used) 061 073D (This position shall not be used) 062 073E (This position shall not be used) 063 073F (This position shall not be used) | ![]() |