1998-10-31

Dear colleagues,

I have reviewed your character sets implementation document. There are many features in the document which I found confusing and unclear. Some of these had to do with the use of the English language, and which, with some thought, I was able to puzzle out. I have presented my comments in HTML format here to include corrections to the English of the document since HTML is a convenient way of showing things that should be deleted and things which should be added.

My greatest concern is your committee's understanding and use of the concept "unification". Apart from this the whole section on basic principles is problematic, especially because you the document itself appears to indicate that none of the character sets standardized in Armenia follow these basic principles. Frankly, the discussion on principles is confusing, theoretical, and unnecessary. It also seems to differ from the principles used by ISO/IEC JTC1/SC2. There is a difference between theory of character sets and the practice of implementing them practically in the real world. As a member of JTC1/SC2, the committee responsible for character set encoding, I am troubled by the number of instances in which Armenian 8-bit code tables, standardized in Armenia, violate SC2's basic principles of character identification (what a given character is) and code table implementation (where a given character goes).

It seems clear that all of the Armenian code tables will cause problms with conversion to UCS (Unicode and ISO/IEC 10646) implementations. For this reason I oppose the acceptance of the Implementation Guide document as an RFC.

  1. A number of characters described as "Armenian" are not Armenian at all. Your armsection, armparenright, armparenleft, armquotright, armquotleft, armemdash, armdot, armcomma, armendash, and armellipsis are all international characters, and it is a grave mistake to describe them otherwise. There is no difference between "Armenian digit five" and "Armenian right parenthesis". Both of these characters are used in Latin, Greek, Cyrillic, Armenian, Georigan, Chinese, Cherokee, Canadian Syllabics, etc., etc., etc. Armenian standards describing the characters listed above as Armenian are in error.
  2. You can use whatever names you want in your Armenian national standards, of course. However, in the English language, the standard names found in ISO/IEC 10646 should always be used, whether or not you like them. Doing otherwise causes confusion to implementors. In some instances, of course, it could be useful to have a pronunciation variant listed, just as ISO/IEC 10646 does with Tibetan. I have edited the table below to present Armenian character names correctly in English (using UCS names), adding pronunciation notes based on what is in your document.
  3. The short alias names should be abandoned unless there is a good reason to use them. What are they for? Are they intended to be entity names in HTML and SGML? In that case, I suggest that you make them more useful with regard to the case of Armenian letters, by naming them Hyayb/HyAyb or Hyeayb/HyeAyb. Note I am using Hy and Hye here, which are the two-letter and three-letter codes proposed to identify the Armenian script (not language) in ISO NP 15924. You should also identify them as HTML identifiers and register them with the relevant agency. For all I know, Armenian entities have already been defined in SGML. If that is the case, the ones already standardized should be used.
  4. In any 8-bit environment, the left hand of the code table shall (must!) be no more and no less than identical to ASCII, identical in content with Table 1 of ISO/IEC 10646. Despite this, the ArmSCII-8A identifies eleven characters in the range x20-x7F as uniquely Armenian. If we allow for those characters which aren't really Armenian, but are international characters used in Armenian, then you have still replaced the following characters:

    1. NAK (this is a control character!) with SECTION
    2. APOSTROPHE with ARMENIAN EMPHASIS MARK
    3. HYPHEN-MINUS with EM DASH
    4. COLON with ARMENIAN FULL STOP
    5. LOW LINE with EN DASH
    6. GRAVE ACCENT with ARMENIAN COMMA
    7. TILDE with ARMENIAN EXCLAMATION MARK

    This practice is a mistaken one, and it is dangerous. Unique mappings between ArmSCII-8A and the UCS will cause you problems, I am sure. This is the reason SC2 resolved never to change ASCII -- to ensure interoperability. The ArmSCII-8A standard should be immediately withdrawn because it is defective in the context of reliable international text interchange.

    Regarding ArmSCII-87 I have nothing to say because I don't really care about 7-bit implementations. In an interoperability environment wyou are doubtless switching with unaltered ASCII so you are probably OK with ArmSCII-7.

    Regarding ArmSCII-16U, it should be removed from the draft RFC.


ARMENIAN CHARACTER SETS
IMPLEMENTATION GUIDE


Document version 005.en.html
June 12, 1998 1998-10-31

Abstract

The This document presents the set of Armenian characters that are used in the information systems in accordance to with AST 34.001 and AST 34.002, standards of the State Standards Commission of the Republic of Armenia, as well as . It also provides information on the classification and sorting thereof of Armenian characters and recommendations for implementation of basic algorithms of text processing.

Table of Contents

1. Introduction
2. Basic Character Set

2.1. Naming
2.2. Classification and Sorting
2.3. Ligatures

3. Encoding

3.1. Basic principles
3.2. Cross Reference of Coding Tables

4. Character Set and Language Tags

4.1. Character Set Tags
4.2. Language Tags

5. Acknowledgements
6. Author's Address
7. References

1. Introduction

The publication of comments in reference to the standards

ME: This phrase doesn't make sense to me.

is due to the following considerations:

1. The Armenian character sets have been used in different computer systems approx. since at least 1982, whereas the state although a national standard was established only in 1997. This time lag resulted in the emergence of incompatible coding systems. The Some of the existing discrepancies are also due to the existence of two different grammars of the Armenian language. grammatical differences between the two major dialects of Armenian.

2. The emergence of internationalized operating systems and an important number of multi-lingual multilingual applications result in situations difficulties when the national language support is implemented by programmers that who are not familiar with the given language. Armenian.

The present memo is a recommendation rather than a binding standard.

The recommendations set forth herein are elaborated on the basis of the state national standards AST 34.001 (reg.no. 166-97) and AST 34.002 (reg.no. 167-97), as well as ArmSCII standard. the ArmSCII-7 standard.


2. Basic Character Set

2.1. Naming

The Armenian character set presented below follows the standard AST 34.001. The first column contains a glyph, the second contains the UCS code position; the third contains the character name; and the last contains an alias.full naming of the characters, and the second column provides abbreviations thereof that can be used in the systems confined to the Latin character set. The detailed classification of the characters follows in the points below.

In spite of the fact that the space, numbers and Latin script are also part of the Armenian character set, these were not included in the AST 34.001 standard since these are present in all systems.

ME: Note that the AST 34.001 standard is probably incorrect, if it includes "Armenian" parentheses, "Armenian" quotation marks, and so on. A proper Armenian standard should have, as an 8-bit standard, given unaltered ASCII on the left (x00-x7F) and Armenian characters on the right (x80-xFF).

Table 1. Basic Character Set

 UCSUCS NameAlias
--ARMENIAN ETERNITY SIGNHyeternity
0587ARMENIAN LIGATURE ECH YIWN (ew)Hyechyiwn
0589ARMENIAN FULL STOP (verjaket)Hyfullstop
055DARMENIAN COMMA (but)Hycomma
058AARMENIAN HYPHEN (yentamna)Hyhyphen
055AARMENIAN APOSTROPHEHyapostrophe
055CARMENIAN EXCLAMATION MARK (amanak)Hyexclam
055BARMENIAN EMPHASIS MARK (shesht)Hyemphasis
055EARMENIAN QUESTION MARK (paruyk)Hyquestion
0531ARMENIAN CAPITAL LETTER AYBHyAyb
0561ARMENIAN SMALL LETTER AYBHyayb
0532ARMENIAN CAPITAL LETTER BENHyBen
0562ARMENIAN SMALL LETTER BENHyben
0533ARMENIAN CAPITAL LETTER GIMHyGim
0563ARMENIAN SMALL LETTER GIMHygim
0534ARMENIAN CAPITAL LETTER DAHyDa
0564ARMENIAN SMALL LETTER DAHyda
0535ARMENIAN CAPITAL LETTER ECH (yech)HyEch
0565ARMENIAN SMALL LETTER ECH (yech)Hyech
0536ARMENIAN CAPITAL LETTER ZAHyZa
0566ARMENIAN SMALL LETTER ZAHyza
0537ARMENIAN CAPITAL LETTER EH (e)HyEh
0567ARMENIAN SMALL LETTER EH (e)Hyeh
0538ARMENIAN CAPITAL LETTER ET (at)HyEt
0568ARMENIAN SMALL LETTER ET (at)Hyet
0539ARMENIAN CAPITAL LETTER TOHyTo
0569ARMENIAN SMALL LETTER TOHyto
053AARMENIAN CAPITAL LETTER ZHEHyZhe
056AARMENIAN SMALL LETTER ZHEHyzhe
053BARMENIAN CAPITAL LETTER INIHyIni
056BARMENIAN SMALL LETTER INIHyini
053CARMENIAN CAPITAL LETTER LIWN (lyun)HyLiwn
056CARMENIAN SMALL LETTER LIWN (lyun)Hyliwn
053DARMENIAN CAPITAL LETTER XEH (khe)HyXeh
056DARMENIAN SMALL LETTER XEH (khe)Hyxeh
053EARMENIAN CAPITAL LETTER CA (tsa)HyCa
056EARMENIAN SMALL LETTER CA (tsa)Hyca
053FARMENIAN CAPITAL LETTER KENHyKen
056FARMENIAN SMALL LETTER KENHyken
0540ARMENIAN CAPITAL LETTER HOHyHo
0570ARMENIAN SMALL LETTER HOHyho
0541ARMENIAN CAPITAL LETTER JA (dza)HyJa
0571ARMENIAN SMALL LETTER JA (dza)Hyja
0542ARMENIAN CAPITAL LETTER GHAD (ghat)HyGhad
0572ARMENIAN SMALL LETTER GHAD (ghat)Hyghad
0543ARMENIAN CAPITAL LETTER CHEH (tche)HyCheh
0573ARMENIAN SMALL LETTER CHEH (tche)Hycheh
0544ARMENIAN CAPITAL LETTER MENHyMen
0574ARMENIAN SMALL LETTER MENHymen
0545ARMENIAN CAPITAL LETTER YI (hi)HyYi
0575ARMENIAN SMALL LETTER YI (hi)Hyyi
0546ARMENIAN CAPITAL LETTER NOW (nu)HyNow
0576ARMENIAN SMALL LETTER NOW (nu)Hynow
0547ARMENIAN CAPITAL LETTER SHAHySha
0577ARMENIAN SMALL LETTER SHAHysha
0548ARMENIAN CAPITAL LETTER VOHyVo
0578ARMENIAN SMALL LETTER VOHyvo
0549ARMENIAN CAPITAL LETTER CHAHyCha
0579ARMENIAN SMALL LETTER CHAHycha
054AARMENIAN CAPITAL LETTER PEH (pe)HyPeh
057AARMENIAN SMALL LETTER PEH (pe)Hypeh
054BARMENIAN CAPITAL LETTER JHEH (je)HyJheh
057BARMENIAN SMALL LETTER JHEH (je)Hyjheh
054CARMENIAN CAPITAL LETTER RAHyRa
057CARMENIAN SMALL LETTER RAHyra
054DARMENIAN CAPITAL LETTER SEH (se)HySeh
057DARMENIAN SMALL LETTER SEH (se)Hyseh
054EARMENIAN CAPITAL LETTER VEW (vev)HyVew
057EARMENIAN SMALL LETTER VEW (vev)Hyvew
054FARMENIAN CAPITAL LETTER TIWN (tyun)HyTiwn
057FARMENIAN SMALL LETTER TIWN (tyun)Hytiwn
0550ARMENIAN CAPITAL LETTER REH (re)HyReh
0580ARMENIAN SMALL LETTER REH (re)Hyreh
0551ARMENIAN CAPITAL LETTER CO (tso)HyCo
0581ARMENIAN SMALL LETTER CO (tso)Hyco
0552ARMENIAN CAPITAL LETTER YIWN (vyun)HyYiwn
0582ARMENIAN SMALL LETTER YIWN (vyun)Hyyiwn
0553ARMENIAN CAPITAL LETTER PIWR (pyur)HyPiwr
0583ARMENIAN SMALL LETTER PIWR (pyur)Hypiwr
0554ARMENIAN CAPITAL LETTER KEH (ke)HyKeh
0584ARMENIAN SMALL LETTER KEH (ke)Hykeh
0555ARMENIAN CAPITAL LETTER OH (o)HyOh
0585ARMENIAN SMALL LETTER OH (o)Hyoh
0556ARMENIAN CAPITAL LETTER FEH (fe)HyFeh
0586ARMENIAN SMALL LETTER FEH (fe)Hyfeh

The naming of characters are hereinafter referred to in abbreviated forms contained in the second column.

ME: In general I don't see the point of the abbreviated aliases. You could simply use the standard UCS names, omitting the words "ARMENIAN" and "CAPITAL" or "SMALL" and "LETTER".

2.2. Classification and Sorting

The basic character set can be divided into the following functional subsets:

unclassified-symbols ::= {Hyeternity, Hyechyiwn, section}

punctuation-signs ::= {Hyfullstop, parenright, parenleft, quotright, quotleft, emdash, middot, hyphen, comma, endash}

pseudo-letters ::= {Hyhyphen, ellipsis, Hyapostrophe}

ME: These are punctuation signs. They are not letters. The term "pseudo-letter" doesn't really mean anything to me. In the UCS there are "modifier letters", which are treated as letters but which look like punctuation. Is this what you mean? If so, you could use that term.

diacritic-signscombining-punctuation ::= {Hyexclam, Hyemphasis, Hyquestion}

letters ::= {capital-letters, small-letters}

capital-letters ::= {HyAyb, HyBen, HyGim, HyDa, HyEch, HyZa, HyEh, HyEt, HyTo, HyZhe, HyIni, HyLiwn, HyXeh, HyCa, HyKen, HyHo, HyJa, HyGhad, HyCheh, HyMen, HyYi, HyNow, HySha, HyVo, HyCha, HyPeh, HyJheh, HyRa, HySeh, HyVew, HyTiwn, HyReh, HyCo, HyYiwn, HyPiwr, HyKeh, HyOh, HyFeh}

small-letters ::= {Hyayb, Hyben, Hygim, Hyda, Hyech, Hyza, Hyeh, Hyet, Hyto, Hyzhe, Hyini, Hyliwn, Hyxeh, Hyca, Hyken, Hyho, Hyja, Hyghad, Hytcheh, Hymen, Hyyi, Hynow, Hysha, Hyvo, Hycha, Hypeh, Hyjheh, Hyra, Hyseh, Hyvew, Hytiwn, Hyreh, Hyco, Hyviwn, Hypiwr, Hykeh, Hyoh, Hyfeh}

The sorting order is important for letter alphabeticcharacters only and is made in should follow the order presented in the Table 1.

The case shift Capitalization applies for letter to alphabeticcharacters only. The shift from the upper case to the lower case replaces the capital letter character with the subsequent following character as per the Table 1. Accordingly, the shift from lower case to the upper case replaces the small letter character with the preceding character as per the Table 1.

The text Textsearch and dictionary applications should take into account the following factors: (1) in the Armenian language, a word is a sequence of letter characters diacritic-signs, and pseudo-letters letters, combining punctuation, and modifier letters; (2) in comparison of words in the text or dictionary, the diacritic-signs and pseudo-letters combining punctuation and modifier lettersmay be ignored.

In reference to the diacritic-signs combining punctuation, the following factors are important: (1) the diacritic-sign refers to the preceding letter combining punctuation mark follows the letter to which it applies (only which can only be avowel in Armenian), (2) a letter can be followed by more than one diacritic sign.

ME: You should give guidance here as to permissible combinations of combining punctuation.

2.3. Ligatures

Ligature A ligature is a traditional or convenience convenient graphical presentation of a sequence of letters, e.g. the Latin ligature "fi", the German ligature "ss", the Armenian ligature "Hymen+Hynow", etc. The ligatures can be officially registered and codified (like in UNICODE standard as in the UCS), but the systems supporting ligatures may? should? substitute them automatically only on the screen, printer, or other graphical devices.

The Armenian ligature armewthat is a combination of armyechand armvyunwas included in the AST 34.001 standard in view of the following considerations: (1) armew is a "ligature symbol" rather than a ligature, and (2) armew carries an "and" denotation similar to the "&" character.


3. Encoding

3.1. Basic Principles

ME: As far as I can tell, this entire section is unnecessary and its theoretical background has led your committee to create character set standards which diverge significantly from the principles used by ISO/IEC JTC1/SC2. I believe the problem may be terminology. "Character set" is not the same as "character repertoire". Character set usually means "coded character set", that is, a set of code positions used to map the repertoire to. A character repertoire is like a basket of letters and other signs used, in this case to write Armenian. This misunderstanding may be the principle problem with this section. By mapping unique Armenian punctuation characters to ASCII characters, the AST has not followed the basic principles of character identification and coding used by ISO/IEC JTC1/SC2.

The Coded Character Set is a mapping of a set of characters into a set of integer numbers, e.g. ArmSCII-7, ArmSCII-8 and ArmSCII-8A tables.

ME: Where is the UCS?

The term "unification" is used in the following denotation: as a rule, the mapping of an Armenian character set takes place in operating environments where other character sets are already available; thus, certain characters, in particular punctuation marks, may have identical graphical mapping and similar functions. In such cases, some characters of the Armenian character set may be mapped into already existing codified characters. The details of unification of Armenian punctuation marks are reviewed below.

ME: This "principle" has enabled you to replace ASCII characters (in the range x20-x7F) which is precisely the problem. You should not have made these unifications.

The mapping of characters in coding tables has several aspects (in order of priority): (1) scope of the character mapping, (2) sequence of mapping, (3) character unification requirements, (4) general requirements of a given operating environment.

ME: In my view, the priorities you give are mistaken with regard to implementation in the real world. (4) is the most important; (2) is the least.

The encoding in every new operating environment should, to the extent possible, use the already existing coding tables (see the next section). Should this be impossible, the newly created coding tables should follow as much as possible the following general principles:

ME: I suppose this is fine as far as it goes, except for the problem that your basic character set is flawed, so following it will just cause problems for people.

1. The Armenian character set should be comprehensive (with due regard to the unification)

2. The Armenian character set should be mapped into a continual sequence of codes in the order these are presented in the Table 1. The unified character codes should be left absolute, i.e. should not be used for other purposes. The most important is the letter sequence.

ME: The order of the characters is totally irrelevant. All ordering in the real world is, and will be, table-based, not coding based. There are no significant savings in speed or efficiency of ordering which should lead to the requirement stated here. Further, it causes problems for implementation on some platforms.

3. The unification implies both graphical and functional identity of characters. For example, mapping of the parenthesis (armparenleft and armparenright) into the parenthesis existing in the ASCII is not an error.

ME: Yes, it is. There is no such thing as an Armenian parenthesis different from an ordinary parenthesis.

On the other hand, the similarity of the Armenian full stop (armfullstop) and the colon is purely graphical. The armdotand armsepbear functions different from the Latin dot and the grave accent character accordingly. Another important factor of character unification is the use of the Latin alphabet and punctuation marks in formal languages. It should be born in mind, for example, that a comma is often used as a separator in lists (e.g. in a keyword list in HTML document header), and in order to avoid confusion, the armcomma character may be mapped into a Latin comma.

ME: This is where you have made a grave error. You don't want dynamic mapping, you want reliable mapping. You have in ArmSCII-8A replaced x2D HYPHEN-MINUS with "ARMENIAN" EM DASH, and replaced x5F LOW LINE with "ARMENIAN" EN DASH. You will surely run into interoperability problems with UCS implementations. One example of the problem would be in internet addresses and URLs: an example address like hovik_melikyan@physics-university.am will be corrupted by the mappings your Armenian standard has specified for ArmSCII-8A.

4. It may often happen that the requirements of a given operating environment may contradict the above principles. For example, the pseudo-graphical characters in DOS that were supported by video-adapters ("ninth pixel" factor), resulted in the creation of an alternative 8-bit coding table ArmSCII-8A. Another example is Macintosh OS where codes like ellipsis, nbsp and soft hyphen are recognized and interpreted in a special by numerous applications, which rendered the meaningful use the ArmSCII standard in this system impossible (the ArmSCII-8A table is used in OS Macintosh).

ME: The ArmSCII-8A table is unsuitable for use on the Macintosh. Althopugh you have said that characters like ELLIPSIS, NO-BREAK SPACE, and SOFT HYPHEN are recognized and interpreted in certain ways by Mac software, the following errors have been made.

  1. Mac doesn't have a SOFT HYPHEN. Applications interpret HYPHEN-MINUS with some control character as a SOFT HYPHEN
  2. You have not given NO-BREAK SPACE in any of your coded character sets.
  3. ArmSCII-8A codes LEFT-POINTING DOUBLE ANGLE QUOTATION MARK at xAE, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK at xAF, and ELLIPSIS at xDE, but on the Macintosh these are found at xC7, xC8, and xC9 respectively. Double coding them is a grave error. Further, you have coded EN DASH and EM DASH at x2D and x5F respectively, while they are supposed to be at xD0 and xD1. See the note on this above. Also the SECTION SIGN is coded at x1F (!) when it is supposed to be at xA4 on the Mac.

    The WorldScript software I have released for Armenian on the Mac, which has been used by Armenian scholars in Israel and the Netherlands, is conformant with Apple Macintosh practice and with SC2's principles of character set encoding.

ArmSCII coding table does not fully correspond to the above principles, and the Armenian block in the current version of UNICODE Unicode (2.1) corresponds to neither (1), (2), nor (3).

3.2. Cross Reference of Coding Tables

Table 2. Cross reference

1 - Short name
2 - ArmSCII-7
3 - ArmSCII-8 (AST 34.002-97, Basic coding table)
4 - ArmSCII-8A (AST 34.002-97, Alternative coding table)
5 - ArmSCII-16U
6 - UNICODE Unicode Version 2.1

 123456
armeternity21A1DC0521-
armew----0587
armsection22A215052200A7
armfullstop23A33A05230589
armparenright24A42905240029
armparenleft25A5280525002A
armquotright26A6AF052600BB
armquotleft27A7AE052700AB
armemdash28A82D05282014
armdot29A92E0529002E
armsep2AAA60052A055D
armcomma2BAB2C052B002C
armendash2CAC5F052C2013
armyentamna2DADDD052D058A
armellipsis2EAEDE052E2026
armapostrophe7EFEFE057E02BC
armexclam2FAF7E052F055C
armaccent30B0270530055B
armquestion31B1DF0531055E
Armayb32B28005320531
armayb33B38105330561
Armben34B48205340532
armben35B58305350562
Armgim36B68405360533
armgim37B78505370563
Armda38B88605380534
armda39B98705390564
Armyech3ABA88053A0535
armyech3BBB89053B0565
Armza3CBC8A053C0536
armza3DBD8B053D0566
Arme3EBE8C053E0537
arme3FBF8D053F0567
Armat40C08E05400538
armat41C18F05410568
Armto42C29005420539
armto43C39105430569
Armzhe44C4920544053A
armzhe45C5930545056A
Armini46C6940546053B
armini47C7950547056B
Armlyun48C8960548053C
armlyun49C9970549056C
Armkhe4ACA98054A053D
armkhe4BCB99054B056D
Armtsa4CCC9A054C053E
armtsa4DCD9B054D056E
Armken4ECE9C054E053F
armken4FCF9D054F056F
Armho50D09E05500540
armho51D19F05510570
Armdza52D2A005520541
armdza53D3A105530571
Armghat54D4A205540542
armghat55D5A305550572
Armtche56D6A405560543
armtche57D7A505570573
Armmen58D8A605580544
armmen59D9A705590574
Armhi5ADAA8055A0545
armhi5BDBA9055B0575
Armnu5CDCAA055C0546
armnu5DDDAB055D0576
Armsha5EDEAC055E0547
armsha5FDFAD055F0577
Armvo60E0E005600548
armvo61E1E105610578
Armcha62E2E205620549
armcha63E3E305630579
Armpe64E4E40564054A
armpe65E5E50565057A
Armje66E6E60566054B
armje67E7E70567057B
Armra68E8E80568054C
armra69E9E90569057C
Armse6AEAEA056A054D
armse6BEBEB056B057D
Armvev6CECEC056C054E
armvev6DEDED056D057E
Armtyun6EEEEE056E054F
armtyun6FEFEF056F057F
Armre70F0F005700550
armre71F1F105710580
Armtso72F2F205720551
armtso73F3F305730581
Armvyun74F4F405740552
armvyun75F5F505750582
Armpyur76F6F605760553
armpyur77F7F705770583
Armke78F8F805780554
armke79F9F905790584
Armo7AFAFA057A0555
armo7BFBFB057B0585
Armfe7CFCFC057C0556
armfe7DFDFD057D0586


4. Character Set and Language Tags

4.1. Coded Character Set Tags

In the systems and protocols using mnemonic tags for coded character sets, the following tags should be used (name, official source, optional alias):

Name:   armscii-8
Source:   Armenian State Standard AST 34.002 Basic 8-bit coded character set
Alias:   AST_34.002
     
Name:   armscii-8a
Source:   Armenian State Standard AST 34.002 Alternative 8-bit coded character set
Alias:   AST_34.002-A

4.2. Language Tags

Dictionaries, spelling checkers and other linguistic systems, as well as operating environments distinguishing human languages and locale identification should take into consideration the existence of 4 mutually incomprehensible forms (dialects) of the Armenian language: Eastern Armenian, Western Armenian, Grabar Armenian and Middle Armenian. Table 3 presents two forms of suggested mnemonic tags: MIME-style (RFC-1766) and Windows-style 3-letter abbreviations.

Table 3. Language tags

Mime-style name   3-letter code   Full name
hy-eastern   AME   Armenian Eastern
hy-western   AMW   Armenian Western
hy-grabar   AMG   Armenian Grabar
hy-middle   AMM   Armenian Middle

5. Acknowledgements

This document is the result of long and intensive consultations and cooperation with the staff of the Standards Working Group of the Armenian Computer Center. Special thanks for most valuable inputs and comments go to (in alphabetical order):

Hovhannes Gizoghian
Tigran Haroutunian
Aram Hayrapetian
Ivan Lulukian
Vahram Mekhitarian
Rouben Taroumian-Hakobian
Hovhannes Zakarian

ME: Without prejudice to the hard work done by this committee, consultation with experts outside of Armenian would, in my opinion, have been a good idea.

6. Author's Address

Hovik Melikyan
Center of Humane Technologies "Armenian Computer"
Yerevan, Republic of Armenia
hovik@moon.yerphi.am


7. References

[AST 34.001-97]

Information Technologies -- Character Set And Information Encoding: Character Set -- State Standardization Committee of the Republic of Armenia, July 1997

[AST 34.002-97]

Information Technologies -- Character Set And Information Encoding: 8-bit Coded Character Sets -- State Standardization Committee of the Republic of Armenia, July 1997

[ArmSCII]

Armenian Standard Code for Information Interchange -- Center of Humane Technologies "Armenian Computer", June 1991

[RFC-1766]

Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995.

[UNICODE Unicode]

The Unicode Consortium, "The Unicode Standard -- Version 2.0", Addison-Wesley, 1996.

[UNICODE Unicode Version 2.1]

Unicode Technical Report #8, The Unicode Standard, Version 2.1 -- http://www.unicode.org/unicode/reports/tr8.html.