Armenian encoding on the Macintosh: Michael Everson The author presents some of the background of 8-bit encoded Armenian on the Macintosh, and discusses some of the technical problems involved in making Macintosh Armenian compatible with worldwide and Armenian encoding standards.This article first appeared in the Inaugural Issue of the Journal of the Association of Armenian Information Professionals, May 1994. IntroductionSince the beginning of the use of computers in text processing, languages with particular character needs have been ill-served by the predominance of English as a development medium. Armenian is no exception. This is surprising, because since 1984, with the advent of the first popular computer with a friendly graphical user interface – the Apple Macintosh – even languages with script requirements far more taxing than Armenian’s have had at least the potential for convenient access to their requisite characters. Early in 1994 I undertook a project to create an Apple WorldScript system for Armenian, because I sought to explore some of WorldScript’s capabilities and because I was interested in the Armenian script. WorldScript introduces a new way of handling 8-bit character sets, and enables a user to switch scripts, fonts, and keyboard layouts quickly and easily.Character setsMost computers nowadays use 8-bit character sets. What this means to the user is that his or her keyboard can generate 256 different characters. Usually about 32 of these are reserved for special computer functions, such as “delete”, “return” and so forth. Therefore only 224 characters can be used to display and print letters, numbers, and other symbols and signs. ASCII is a 7-bit character set, having only 128 characters; many of today’s 8-bit character sets retain ASCII for their first 128 characters, and so when discussing the differences between them, it’s usually only the last 128 characters which have to be worried about. Unicode, or tha Basic Multilingual Plane of ISO 10646, is a 16-bit character set, recently established but not yet widely implemented. It has 65,536 characters, and when it is universally available, many 8-bit issues will become moot.Interchange between character setsWe are currently dealing with an 8-bit situation for encoding Armenian texts, whether on the Mac or on the PC. Different computer systems have used different 8-bit codepages for various historical reasons. Mostly they have to do with developers not communicating nor agreeing to use international standards for text interchange. Although many of the first codepages were pioneering implementations, it’s easy to understand how this could have happened. On a Macintosh,
LATIN CAPITAL LETTER U WITH ACUTE
lives at F2, while on a DOS 850 codepage it lives at E9, and on Windows at DA. In order to transfer a document from one computer to another, special translation utilities must be used which move F9 or DA to F2, as well as juggling the other 128 characters in the second half of the codepage for the document to be read on the new computer. Fortunately many such utilities are available.Armenian MacsArmenian has had the usual history of a non-Roman script on the Mac: early on, bitmaps were created, and Armenian characters mapped to the keys in a way that made sense to the font developer. This might have been phonetic, so
ARMENIAN LETTER AYB
was put on
LATIN LETTER A
, or it may have been according to a typewriter layout. Later, laser fonts were encoded in the same way. Lola Koundakjian (1991) wrote an article describing some of the problems arising from a lack of keyboard/codepage standardization. Mapping characters to the codepage in order to achieve a particular keyboard configuration creates new codepages. A text written in Raffi Kojian’s Ararat, which has a Papazian (![]() ![]() WorldScript requirementsIn order to create a WorldScript system for Armenian, the following things are required:
AlphabetEveryone agrees that the standard alphabet,
PunctuationMost of the characters are well-defined and there is no problem determining what they correspond to in Unicode. There are some questionable ones, however. In ArmSCII and in DIS 10585, three kinds of dash are defined: “hyphen sign” (![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
EN DASH
to ensure that it has a unique encoding. Characters which do not seem to have much currency in Armenian, such as ![]() ![]()
MIDDLE DOT
, and dubious characters (such as
ARMENIAN MODIFIER LETTER LEFT HALF RING
and
ARMENIAN APOSTROPHE
found in ISO 10646) I did not include in the Macintosh codepage. Other characters are unified with their Roman equivalents. (The punctuation repertoire: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() WorldScript codepageCreation of an Apple WorldScript means that Apple’s rules must be followed. One such rule is that applications should be compatible with all scripts, and therefore can assume “that certain character codes (other than the control codes below $20) are never used”, and “that certain nonlinguistic symbols, such as numerals and punctuation marks, are always located at the same code positions” (Apple 1992:301). Examples of these characters are 7F (delete), CA (nonbreaking space), and A8 (registered sign). Because of this limitation, it is impossible to make the Macintosh codepage identical with most DOS codepages (and there is more than one). To do so would be to violate Apple’s policy and practice, and the result would be a non-conformant WorldScript. This means that an Apple Armenian codepage won’t be identical with an Armenian national standard set for DOS. It is, however quite simple to convert texts written in one codepage to another. I have tested some of these conversion utilities such as AIEA’s Transliterate and Jon Wind’s Add/Strip and found that they work quite well. The advantage of having an Apple Armenian codepage is that it’s easier to translate from one Mac standard to ISO 10646 or ArmSCII than it is to translate from ten . And no inter-Mac translation will be necessary.Arrangement of Armenian characters on a Mac codepageThere exist at least ten Macintosh codepages already. Most of them employ a remapping of Armenian characters according to various phonetic transliterations or keyboard layouts. As such, it can hardly be said that there is a standard at all on the Macintosh. Many existing Mac fonts fail to conform to ArmSCII with regard to the ligatures. The draft Apple Armenian codepage I have developed is, I believe, conformant to ArmSCII as well as ISO 10646, as well as to Apple’s standard encoding practice.Keyboard layoutsThere are several types of key layouts in existence already: “Phonetic Key Layouts”, based either on Eastern or Western pronunciation and arranged according to one or another romanization of Armenian on a standard QWERTY keyboard; and “Ergonomic Key Layouts”, based presumably on some sort of assessment of the most common characters in Armenian. Some of the keyboards in use are patterned off of other keyboards; for instance, the original Olympia typewriter key layout was modified by several developers, some keeping closer to the original typewriter keyboard (by using
ARMENIAN CAPITAL LETTER JA
(![]()
ARMENIAN LETTER CAPITAL YI
(![]()
ARMENIAN CAPITAL LETTER OH
(![]() OPTION - or
ALT
-keys to access these and other characters.
Apple keyboard requirementsFor the purposes of an Apple Macintosh WorldScript package, economy of keyboards is desirable. The chief reason for this is that each additional keyboard in use adds to the amount of memory in the system heap. I have suggested that something like four keyboards be employed based on the Olympia (Eastern phonetic), Papazian (Western phonetic), and Royal (ergonomic) typewriter layouts, and one based on the Hübschmann-Meillet (linguistic transliteration) layout. However, I have seen three ergonomic key layouts, and if these are widely used in Armenia or in the Diaspora, perhaps more than one ergonomic keyboard should be supported.Keyboard advantages for ArmenianSince an Armenian WorldScript uses only one codepage, users anywhere will, in principle, be able to transfer texts from one Mac to the next without having to change fonts or reencode files with transliteration tools as they do now. By supporting multiple keyboard layouts, the WorldScript allows users to choose their preferred method of input. A Royal typist and an Olympia typist can work on the same document, in the same font on the same machine without any difficulty – all one needs is to select the appropriate keyboard layout, which is as easy as typing
OPTION-COMMAND-SPACEBAR
! To support Armenian WorldScript, once it has been finished and released, one thing will remain to be done: all existing Macintosh fonts will need to be remapped to the Apple Armenian codepage. But this is the whole point of the exercise: the Macintosh allows you to have multiple key layouts accessing a single codepage. Using the ten key layouts I have seen to date, I’ll give an example of how to type the word ![]()
Now, if (and only if) each of these key layouts references the same codepage, then it doesn’t matter which key layout anyone types in, the text remains the same. Thus someone who prefers the Papazian layout, versus another who prefers the Olympia or Hübschmann-Meillet layout can all use the same computer, font, and document, simply by switching the keyboard layout and typing away. Then the only reason to use text conversion utilities will be when switching from platform to platform (Mac to DOS, for instance). It means, for instance, that one doesn’t need to have more than one version of Raffi’s Ararat font on a Mac at any given time, saving valuable disk space and conversion time. And, most importantly, a single codepage standard would facilitate the development of a spell-check dictionary for the Macintosh, or provide a convenient platform for porting an existing PC dictionary to the Mac. Below I give a sample of the four keyboards I have proposed, in Armenian and Roman transliteration. Continued developmentWork continues apace on this project. I am interested in comments of any kind, particularly on the key layout issue. Persons interested in beta-testing can contact me via e-mail.ReferencesApple Computer, Inc. 1992. Guide to Macintosh software localization. Reading, MA: Addison Wesley.ArmSCII Standard. 1990. Published in Annual of Armenian Linguistics , from the “Armenian Standard of Information Exchange Codes” ( ![]() ![]() Darbinyan, T. 1965. ![]() ISO 10646. Information technology: universal multiple octet coded character set (UCS). ISO/IEC 10646-1, 11 March 1993. ISO/DIS 10585. Information and documentation: Armenian alphabet coded character set for bibliographic information interchange. Draft International Standard, 1992. (There are some serious incompatibilities between this, ISO 10646, and ArmSCII.) Kalantarian, Andrey. 1990. VGA Armenian DOS standard, Version 3.1 (2 March 1990). Keyboard and font software. Armenian Academy of Sciences. Erevan. Koundakjian, R. H. Lola. 1991. “In search of a standard Armenian keyboard”, Armenian International Magazine (A.I.M.) , March 1991, pp. 32-33. Unicode Consortium. 1991. The Unicode standard: worldwide character encoding. Version 1.0, volume 1. Reading, MA: Addison Wesley Publishing Company. Thanks to: Bedo Agopian, Raffi Kojian, Lola Koundakjian, Rick McGowan, Michael Stone, Dirk Van Damme, Jos Weitenberg, and Richard Youatt. E-thanks to the other members of aiea@telf.com, hayastan@usc.edu, and hye-font@sain.org as well. All responsibility for errors or infelicities are those of the author and the author alone. |
HTML Michael Everson, Evertype, 48B Gleann na Carraige, Cill Fhionntain, Baile Átha Cliath 13, Éire, 2003-01-06
Copyright © 1993-2002 Evertype. All Rights Reserved
|