Armenian encoding on the Macintosh:
Michael EversonThe author presents some of the background of 8-bit encoded Armenian on the Macintosh, and discusses some of the technical problems involved in making Macintosh Armenian compatible with worldwide and Armenian encoding standards.
This article first appeared in the Inaugural Issue of the Journal of the Association of Armenian Information Professionals, May 1994.
IntroductionSince the beginning of the use of computers in text processing, languages with particular character needs have been ill-served by the predominance of English as a development medium. Armenian is no exception. This is surprising, because since 1984, with the advent of the first popular computer with a friendly graphical user interface – the Apple Macintosh – even languages with script requirements far more taxing than Armenian’s have had at least the potential for convenient access to their requisite characters. Early in 1994 I undertook a project to create an Apple WorldScript system for Armenian, because I sought to explore some of WorldScript’s capabilities and because I was interested in the Armenian script. WorldScript introduces a new way of handling 8-bit character sets, and enables a user to switch scripts, fonts, and keyboard layouts quickly and easily.
Character setsMost computers nowadays use 8-bit character sets. What this means to the user is that his or her keyboard can generate 256 different characters. Usually about 32 of these are reserved for special computer functions, such as “delete”, “return” and so forth. Therefore only 224 characters can be used to display and print letters, numbers, and other symbols and signs. ASCII is a 7-bit character set, having only 128 characters; many of today’s 8-bit character sets retain ASCII for their first 128 characters, and so when discussing the differences between them, it’s usually only the last 128 characters which have to be worried about. Unicode, or tha Basic Multilingual Plane of ISO 10646, is a 16-bit character set, recently established but not yet widely implemented. It has 65,536 characters, and when it is universally available, many 8-bit issues will become moot.
Interchange between character setsWe are currently dealing with an 8-bit situation for encoding Armenian texts, whether on the Mac or on the PC. Different computer systems have used different 8-bit codepages for various historical reasons. Mostly they have to do with developers not communicating nor agreeing to use international standards for text interchange. Although many of the first codepages were pioneering implementations, it’s easy to understand how this could have happened. On a Macintosh,
Armenian MacsArmenian has had the usual history of a non-Roman script on the Mac: early on, bitmaps were created, and Armenian characters mapped to the keys in a way that made sense to the font developer. This might have been phonetic, so
WorldScript requirementsIn order to create a WorldScript system for Armenian, the following things are required:
AlphabetEveryone agrees that the standard alphabet,
(Aa Bb Gg Dd Ee Zz Ēē Əə Tʻtʻ Žž Ii Ll Xx Cc Kk Hh Jj Łł Qq Mm Yy Nn Šš Oo Čʻčʻ Pp J̌ǰ Ṙṙ Ss Vv Tt Rr Cʻcʻ Ww Pʻpʻ Kʻkʻ Ōō Ff)
PunctuationMost of the characters are well-defined and there is no problem determining what they correspond to in Unicode. There are some questionable ones, however. In ArmSCII and in DIS 10585, three kinds of dash are defined: “hyphen sign” ( ), “direct speech sign” ( ), and “ligature” ( ). Apparently the consensus is that functions as a hyphen and should be unified with hyphen; this however introduces an imperfect compatibility with ArmSCII and DIS 10585, so I have unified with
WorldScript codepageCreation of an Apple WorldScript means that Apple’s rules must be followed. One such rule is that applications should be compatible with all scripts, and therefore can assume “that certain character codes (other than the control codes below $20) are never used”, and “that certain nonlinguistic symbols, such as numerals and punctuation marks, are always located at the same code positions” (Apple 1992:301). Examples of these characters are 7F (delete), CA (nonbreaking space), and A8 (registered sign). Because of this limitation, it is impossible to make the Macintosh codepage identical with most DOS codepages (and there is more than one). To do so would be to violate Apple’s policy and practice, and the result would be a non-conformant WorldScript. This means that an Apple Armenian codepage won’t be identical with an Armenian national standard set for DOS. It is, however quite simple to convert texts written in one codepage to another. I have tested some of these conversion utilities such as AIEA’s Transliterate and Jon Wind’s Add/Strip and found that they work quite well. The advantage of having an Apple Armenian codepage is that it’s easier to translate from one Mac standard to ISO 10646 or ArmSCII than it is to translate from ten . And no inter-Mac translation will be necessary.
Arrangement of Armenian characters on a Mac codepageThere exist at least ten Macintosh codepages already. Most of them employ a remapping of Armenian characters according to various phonetic transliterations or keyboard layouts. As such, it can hardly be said that there is a standard at all on the Macintosh. Many existing Mac fonts fail to conform to ArmSCII with regard to the ligatures. The draft Apple Armenian codepage I have developed is, I believe, conformant to ArmSCII as well as ISO 10646, as well as to Apple’s standard encoding practice.
Keyboard layoutsThere are several types of key layouts in existence already: “Phonetic Key Layouts”, based either on Eastern or Western pronunciation and arranged according to one or another romanization of Armenian on a standard QWERTY keyboard; and “Ergonomic Key Layouts”, based presumably on some sort of assessment of the most common characters in Armenian. Some of the keyboards in use are patterned off of other keyboards; for instance, the original Olympia typewriter key layout was modified by several developers, some keeping closer to the original typewriter keyboard (by using
Apple keyboard requirementsFor the purposes of an Apple Macintosh WorldScript package, economy of keyboards is desirable. The chief reason for this is that each additional keyboard in use adds to the amount of memory in the system heap. I have suggested that something like four keyboards be employed based on the Olympia (Eastern phonetic), Papazian (Western phonetic), and Royal (ergonomic) typewriter layouts, and one based on the Hübschmann-Meillet (linguistic transliteration) layout. However, I have seen three ergonomic key layouts, and if these are widely used in Armenia or in the Diaspora, perhaps more than one ergonomic keyboard should be supported.
Keyboard advantages for ArmenianSince an Armenian WorldScript uses only one codepage, users anywhere will, in principle, be able to transfer texts from one Mac to the next without having to change fonts or reencode files with transliteration tools as they do now. By supporting multiple keyboard layouts, the WorldScript allows users to choose their preferred method of input. A Royal typist and an Olympia typist can work on the same document, in the same font on the same machine without any difficulty – all one needs is to select the appropriate keyboard layout, which is as easy as typing
Now, if (and only if) each of these key layouts references the same codepage, then it doesn’t matter which key layout anyone types in, the text remains the same. Thus someone who prefers the Papazian layout, versus another who prefers the Olympia or Hübschmann-Meillet layout can all use the same computer, font, and document, simply by switching the keyboard layout and typing away. Then the only reason to use text conversion utilities will be when switching from platform to platform (Mac to DOS, for instance). It means, for instance, that one doesn’t need to have more than one version of Raffi’s Ararat font on a Mac at any given time, saving valuable disk space and conversion time. And, most importantly, a single codepage standard would facilitate the development of a spell-check dictionary for the Macintosh, or provide a convenient platform for porting an existing PC dictionary to the Mac. Below I give a sample of the four keyboards I have proposed, in Armenian and Roman transliteration.
Continued developmentWork continues apace on this project. I am interested in comments of any kind, particularly on the key layout issue. Persons interested in beta-testing can contact me via e-mail.
ReferencesApple Computer, Inc. 1992. Guide to Macintosh software localization. Reading, MA: Addison Wesley.
ArmSCII Standard. 1990. Published in Annual of Armenian Linguistics , from the “Armenian Standard of Information Exchange Codes” ( (Informacʻiayi kodi haykakan himnorinak). Erevan. (I have not seen the actual standard, only a photocopy of part of the AAL article.)
Darbinyan, T. 1965. (Katalog Haykakan SSṘ tparannerum gorcacvoł taṙatesakneri). Erevan.
ISO 10646. Information technology: universal multiple octet coded character set (UCS). ISO/IEC 10646-1, 11 March 1993.
ISO/DIS 10585. Information and documentation: Armenian alphabet coded character set for bibliographic information interchange. Draft International Standard, 1992. (There are some serious incompatibilities between this, ISO 10646, and ArmSCII.)
Kalantarian, Andrey. 1990. VGA Armenian DOS standard, Version 3.1 (2 March 1990). Keyboard and font software. Armenian Academy of Sciences. Erevan.
Koundakjian, R. H. Lola. 1991. “In search of a standard Armenian keyboard”, Armenian International Magazine (A.I.M.) , March 1991, pp. 32-33.
Unicode Consortium. 1991. The Unicode standard: worldwide character encoding. Version 1.0, volume 1. Reading, MA: Addison Wesley Publishing Company.
Thanks to: Bedo Agopian, Raffi Kojian, Lola Koundakjian, Rick McGowan, Michael Stone, Dirk Van Damme, Jos Weitenberg, and Richard Youatt. E-thanks to the other members of firstname.lastname@example.org, email@example.com, and firstname.lastname@example.org as well. All responsibility for errors or infelicities are those of the author and the author alone.
HTML Michael Everson, Evertype, 48B Gleann na Carraige, Cill Fhionntain, Baile Átha Cliath 13, Éire, 2003-01-06
Copyright © 1993-2002 Evertype. All Rights Reserved