Proposal to encode Javanese in the BMP of ISO/IEC 10646

	ISO/IEC JTC1/SC2/WG2 N1___
DATE: 1998-02-11

DOC TYPE:	Expert contribution
TITLE:	Proposal to encode Javanese in the BMP of ISO/IEC 10646
SOURCE:	Michael Everson Jeroen Hellingman
PROJECT:	JTC1.02.18.01
STATUS:	Proposal.
ACTION ID:	FYI
DUE DATE:	--
DISTRIBUTION:	Worldwide
MEDIUM:	Paper and web
NO. OF PAGES:	3 (printed at 80%)

A. Administrative
1. Title	Proposal to encode Javanese in the BMP of ISO/IEC 10646-1
2. Requester's name	Michael Everson, Jeroen Hellingman
3. Requester type	Expert request
4. Submission date	1998-02-12
5. Requester's reference
6a. Completion	This is a complete proposal.
6b. More information to be provided?	No

B. Technical -- General
1a. New script? Name?	Yes. Javanese
1b. Addition of characters to existing block? Name?	No.
2. Number of characters	64
3. Proposed category	Category A
4. Proposed level of implementation and rationale	Level 2
5a. Character names included in proposal?	Yes
5b. Character names in accordance with guidelines?	Yes
5c. Character shapes reviewable?	Yes
6a. Who will provide computerized font?	Michael Everson
6b. Font currently available?	Michael Everson
6c. Font format?	TrueType
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?	Yes.
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?	No
8. Does the proposal address other aspects of character data processing?	Yes

C. Technical -- Justification
1. Contact with the user community?	Yes. Jeroen Hellingman.
2. Information on the user community?	Javanese enjoys both scholarly and some popular use.
3a. The context of use for the proposed characters?	Used to represent texts in the Javanese languages.
3b. Reference	See below.
4a. Proposed characters in current use?	Yes.
4b. Where?	In Indonesia, the Netherlands, and elsewhere.
5a. Characters should be encoded entirely in BMP?	Yes
5b. Rationale	Accordance with the Roadmap.
6. Should characters be kept in a continuous range?	Yes
7a. Can the characters be considered a presentation form of an existing character or character sequence?	No.
7b. Where?
7c. Reference
8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character?	No
8b. Where?
8c. Reference
9a. Combining characters or use of composite sequences included?	Yes, the usual Brahmic matras are used.
9b. List of composite sequences and their corresponding glyph images provided?	No.
10. Characters with any special properties such as control function, etc. included?	No

D. SC2/WG2 Administrative To be completed by SC2/WG2
1. Relevant SC 2/WG 2 document numbers:
2. Status (list of meeting number and corresponding action or disposition)
3. Additional contact to user communities, liaison organizations etc.
4. Assigned category and assigned priority/time frame
Other Comments

The Javanese script is a script which was in current use in Java until about 1945. It belongs to the Brahmi family of scripts. This document proposes an encoding of the script in the range 1B00--1B5F of the BMP of Unicode/ISO 10646. It also gives a mapping to the Devanagari block, showing the relation of the Javanese script with Devanagari.

History

The Javanese script, used for writing the Javanese and Madurese languages, is of Indian origin. Two variants are in use: a standing script and a running script. The later only differs from the former in that the letters are somewhat slanted, and that the upstroke, in the standing script written directly after the downstroke, is written through the downstroke. The script is closely related to Balinese and the older Kawi script. It might be possible to unify these scripts, but this needs further investigation. (Kromo [3] gives a table comparing the old and new Javanese scripts. The only difference seems to be, apart from completely different shapes, that the old script includes three more letters from the Brahmi alphabet. In a typewritten letter, I found in a copy of his book at Leiden University, the author shows the relation between the old and the new letter shapes, and explains that the shapes changed to acommodate the transistion from letters cut in stone or palm leaves to letters written on paper. -- To me, the shapes of the script suggest it is closest related to SE Asian scripts, and then to the South Indian scripts)

Nowadays, the script is replaced by Latin script, and is slowly fading out of use. It is still tought at schools in East and Middle Java, but only older people can read and write it easily. Computerized usage seems to be of interest for printers still printing traditional literature in the script and historians.

The Script

The structure of Javanese script is basically the same as all scripts derived from Brahmi.

The consonants, called _aksara_, all carry an inherent a, which can be altered by adding a vowel sign. When two consonants follow directly after each other, the second consonant is written in a alternative form, called _pasangan_, below the first, to indicate no vowel should be pronounced between them. When a phrase ends with a consonant, a special sign, called _paten_ or _pangkon_ in high language, (Sanskrit virama), is used to indicate the absense of the inherent a. Paten is also used when three or more consonants form a cluster, to avoid having to write three consonants below each other. A final aspirate is indicated by _wignjan_, (Sanskrit visarga), a final ng-sound by _cecak_ (Sanskrit anusvara), and a final r-sound is indicated by _layar_. Together with the secondary forms of ra (_cakra_), ri (_keret_), and ya (_pengkal_), which are treated specially, these signs and the vowel signs are referred to as _sandangan_.

When a normal Javanese word starts with a vowel, this is written by applying the respective vowel sign to ha, which represents a weak aspiration. The _sastra-svara_ or independent vowels are only used in Sanskrit and Arabic loanwords that start with a vowel.

The letters that represent aspirated sounds in the Sanskrit sound-system, have lost their original value because their sounds do not appear in Javanese, but are used in non-final position, replacing their non-aspirated counterparts, as honourific or `capital' letters in the names of persons and places that deserve respect.

Several extra letters have been created by placing three dots above some letters, to represent foreign sounds in loans from Arabic and Dutch. normally this sign is used with ka (kha), da (da), pa (fa), ja (za), ga (rha), also seen with ha, ta, sa, la, sa-gede, sha-gede, and ba. These three dots can be compared with the nukta in several North Indian scripts.

The Javanese script has its own decimal digits.

Punctuation

The Javanese script is written left to right without spaces between words that belong to the same part of a sentence.

A _pangkat_ is used to indicate a small pause, or to set numerals apart from the rest of a text. It is not used very much.

A _pada-lingsa_ indicates the end of a line of verse, a sentence, or part of a sentence.

A doubled pada-lingsa, called _lungsi_ is used when the writer wants to indicate a more important division, like the end of a full sentence or paragraph. (compare danda and double danda in Devanagari).

These three signs can be ommitted if the last word of a sentence or sentence part ends with paten.

At the end of a whole section, a special sign, _pancak_ is repeated as many times as needed to fill the last line.

In verse, punctuation is rather complicated. The end of a line of verse is indicated with a special sign, which depends on the last vowel of it. Actually these signs are not separators, but indicate the prolonged pronuncation of this last vowel, and thus are in effect vowel-signs for the long vowels.

When the final vowel is ulu, _ulu-melik_, or _dirga-melik_ is used, which is an ulu with an cecak written in it.

When the final vowel is suku, _suku mendut_ is used, with is a suku with a little hook.

When the final vowel is taling or taling-tarung, then a _dirga mure_ is placed above the taling.

When the final vowel is an a, tarung is used, which is then called _ras-vadi_, or _pada vacan anglagana_.

When the final word ends with paten, this paten on itself is enough to indicate the end of the line.

In older Kawi verse, the end of a small part of verse is indicated with _dirga_, wich is preceded with a tarung if the word before it does not end with paten.

A sentence is normally started with an _adeg-adeg_ (a double dirga). But at the opening of a letter an ornamental sign, indicating the relation between the sender and the receiver is used. a _pada-luhur_ indicates that the sender is higher in rank than the receiver, a _pada-madhya_ is used between people of the same rank, and a _pada-andap_ when person with a low rank is addressing a person with a higher rank.

Elaborate signs are used at the begin and end of verse, and the major sub-divisions parts of them.

Proposed Encoding

It is possible to map Javanese following the Indic scripts, but that will leave many places empty. The current proposal encodes a total of 71 characters (compare with Tamil 60). This will cost at least 5 columns or 80 code points, whereas the Devanagari mapping costs 128 code points.

Currently, the area 1B00--1B5F of the BMP is proposed to be allocated to the Javanese script.

Issues

The exact usage of cakra, cerek, layar, and pengkal is not yet completely clear. For correct contextual analysis, it may be neccessary to indicate syllabe boundaries in the text.

It may be neccessary to encode word boundaries with ZERO WIDTH SPACE, to make sensible line-breaking possible.

It may be considered to encode pancak with its filling nature implicit -- that is, the appearance of one pancak character will result in as many repetitions of the graphics as needed to fill the line. (The same thought may be followed in adding a LINE-FILLER and DOT-FILLER character, but I think this whole idea goes beyond the scope of UNICODE)

tarung is derived from vowel sign aa.

dirga mure is derived from lenght mark ai, can be used with taling and taling tarung only.

Nya-gede is derived from the Sanskrit conjunct jnya, but has become a distinguished letter in Javanese.

The ordering follows the order given in Roodra [1]. This is the traditional alphabetical order of the script.

U+xx00	U+xx	JAVANESE LETTER HA
U+xx01	JAVANESE LETTER NA
U+xx02	JAVANESE LETTER CA
U+xx03	JAVANESE LETTER RA
U+xx04	JAVANESE LETTER KA
U+xx05	JAVANESE LETTER DA
U+xx06	JAVANESE LETTER TA
U+xx07	JAVANESE LETTER SA
U+xx08	JAVANESE LETTER WA
U+xx09	JAVANESE LETTER LA
U+xx0A	JAVANESE LETTER PA
U+xx0B	JAVANESE LETTER DHA
U+xx0C	JAVANESE LETTER JA
U+xx0D	JAVANESE LETTER YA
U+xx0E	JAVANESE LETTER NYA
U+xx0F	JAVANESE LETTER MA
U+xx10	JAVANESE LETTER GA
U+xx11	JAVANESE LETTER BA
U+xx12	JAVANESE LETTER THA
U+xx13	JAVANESE LETTER NGA
U+xx14	JAVANESE LETTER PA CEREK
U+xx15	JAVANESE LETTER NGA LELET
U+xx16	JAVANESE LETTER NA GEDHE
U+xx17	JAVANESE LETTER CA GEDHE
U+xx18	JAVANESE LETTER KA GEDHE
U+xx19	JAVANESE LETTER TA GEDHE
U+xx1A	JAVANESE LETTER SA GEDHE
U+xx1B	JAVANESE LETTER SHA GEDHE
U+xx1C	JAVANESE LETTER PA GEDHE
U+xx1D	JAVANESE LETTER NYA GEDHE
U+xx1E	JAVANESE LETTER GA GEDHE
U+xx1F	JAVANESE LETTER BA GEDHE 
U+xx20	JAVANESE SIGN TRIPLE CECAK
U+xx21	JAVANESE VOWEL SIGN E
U+xx22	JAVANESE VOWEL SIGN I
U+xx23	JAVANESE VOWEL SIGN U
U+xx24	JAVANESE VOWEL SIGN EE
U+xx25	JAVANESE VOWEL SIGN O
U+xx26	JAVANESE SIGN PATEN
U+xx27	JAVANESE SIGN WIGNYAN
U+xx28	JAVANESE SIGN CECAK
U+xx29	JAVANESE SIGN KERET
U+xx2A	JAVANESE LETTER A
U+xx2B	JAVANESE LETTER I
U+xx2C	JAVANESE LETTER U
U+xx2D	JAVANESE LETTER E
U+xx2E	JAVANESE LETTER O
U+xx2F	JAVANESE DIGIT ZERO
U+xx30	JAVANESE DIGIT ONE
U+xx31	JAVANESE DIGIT TWO
U+xx32	JAVANESE DIGIT THREE
U+xx33	JAVANESE DIGIT FOUR
U+xx34	JAVANESE DIGIT FIVE
U+xx35	JAVANESE DIGIT SIX
U+xx36	JAVANESE DIGIT SEVEN
U+xx37	JAVANESE DIGIT EIGHT
U+xx38	JAVANESE DIGIT NINE
U+xx39	JAVANESE PADA-LUNGSI
U+xx3A	JAVANESE PADA-LINGSA
U+xx3B	JAVANESE PANGKAT
U+xx3C	JAVANESE TARUNG
U+xx3D	JAVANESE DIRGA
U+xx3E	JAVANESE ADEG-ADEG
U+xx3F	JAVANESE ULU MELIK
U+xx40	JAVANESE SUKU MENDUT
U+xx41	JAVANESE DIRGA MURE
U+xx42	JAVANESE PANCAK
U+xx43	JAVANESE PADA LUHUR
U+xx44	JAVANESE PADA MADYA
U+xx45	JAVANESE PADA HANDHAP
U+xx46	JAVANESE GURU
U+xx47	JAVANESE PURWA PADA
U+xx48	JAVANESE MADYA PADA
U+xx4A	JAVANESE WASANA PADA
U+xx4B	JAVANESE ARCHAIC LETTER DA GEDHE
U+xx4C	JAVANESE ARCHAIC LETTER AI
U+xx4D	JAVANESE ARCHAIC LENGTH MARK
U+xx4E	(This position shall not be used)
U+xx4F	(This position shall not be used)

Bibliography

Faulmann, Carl. 1990 (1880). Das Buch der Schrift. Frankfurt am Main: Eichborn. ISBN 3-8218-1720-8

Haarmann, Harald. 1990. Universalgeschichte der Schrift. Frankfurt/Main; New York: Campus. ISBN 3-593-34346-0

Hollander, J. J. de. 18??. Inleiding tot de Javaansche spraakkunst.

Raden Adipati Ario Kromodjojoadinegoro. 1922. Oud Javanaansch alphabet, [s.l.]. [pp. 4--7].

Rooda, T. 1906. Beknopte Javaansche grammatica benevens een leesboek tot oefening in de Javaansche taal, 5th impr. Zwolle: W. E. J. Tjeenk Willink. [pp. 5--55].

Unicode Consortium. 1992. Unicode Technical Report #3: exploratory proposals

Michael Everson, Evertype, Dublin, 2001-09-21

A. Administrative

B. Technical -- General