ARMENIAN CHARACTER SETS

1998-10-31

Dear colleagues,

I have reviewed your character sets implementation document. There are many features in the document which I found confusing and unclear. Some of these had to do with the use of the English language, and which, with some thought, I was able to puzzle out. I have presented my comments in HTML format here to include corrections to the English of the document since HTML is a convenient way of showing ~~things that should be deleted~~ and things which should be added.

My greatest concern is your committee's understanding and use of the concept "unification". Apart from this the whole section on basic principles is problematic, especially because you the document itself appears to indicate that none of the character sets standardized in Armenia follow these basic principles. Frankly, the discussion on principles is confusing, theoretical, and unnecessary. It also seems to differ from the principles used by ISO/IEC JTC1/SC2. There is a difference between theory of character sets and the practice of implementing them practically in the real world. As a member of JTC1/SC2, the committee responsible for character set encoding, I am troubled by the number of instances in which Armenian 8-bit code tables, standardized in Armenia, violate SC2's basic principles of character identification (what a given character is) and code table implementation (where a given character goes).

It seems clear that all of the Armenian code tables will cause problms with conversion to UCS (Unicode and ISO/IEC 10646) implementations. For this reason I oppose the acceptance of the Implementation Guide document as an RFC.

A number of characters described as "Armenian" are not Armenian at all. Your armsection, armparenright, armparenleft, armquotright, armquotleft, armemdash, armdot, armcomma, armendash, and armellipsis are all international characters, and it is a grave mistake to describe them otherwise. There is no difference between "Armenian digit five" and "Armenian right parenthesis". Both of these characters are used in Latin, Greek, Cyrillic, Armenian, Georigan, Chinese, Cherokee, Canadian Syllabics, etc., etc., etc. Armenian standards describing the characters listed above as Armenian are in error.
You can use whatever names you want in your Armenian national standards, of course. However, in the English language, the standard names found in ISO/IEC 10646 should always be used, whether or not you like them. Doing otherwise causes confusion to implementors. In some instances, of course, it could be useful to have a pronunciation variant listed, just as ISO/IEC 10646 does with Tibetan. I have edited the table below to present Armenian character names correctly in English (using UCS names), adding pronunciation notes based on what is in your document.
The short alias names should be abandoned unless there is a good reason to use them. What are they for? Are they intended to be entity names in HTML and SGML? In that case, I suggest that you make them more useful with regard to the case of Armenian letters, by naming them Hyayb/HyAyb or Hyeayb/HyeAyb. Note I am using Hy and Hye here, which are the two-letter and three-letter codes proposed to identify the Armenian script (not language) in ISO NP 15924. You should also identify them as HTML identifiers and register them with the relevant agency. For all I know, Armenian entities have already been defined in SGML. If that is the case, the ones already standardized should be used.
In any 8-bit environment, the left hand of the code table shall (must!) be no more and no less than identical to ASCII, identical in content with Table 1 of ISO/IEC 10646. Despite this, the ArmSCII-8A identifies eleven characters in the range x20-x7F as uniquely Armenian. If we allow for those characters which aren't really Armenian, but are international characters used in Armenian, then you have still replaced the following characters:
1. NAK (this is a control character!) with SECTION
2. APOSTROPHE with ARMENIAN EMPHASIS MARK
3. HYPHEN-MINUS with EM DASH
4. COLON with ARMENIAN FULL STOP
5. LOW LINE with EN DASH
6. GRAVE ACCENT with ARMENIAN COMMA
7. TILDE with ARMENIAN EXCLAMATION MARK
This practice is a mistaken one, and it is dangerous. Unique mappings between ArmSCII-8A and the UCS will cause you problems, I am sure. This is the reason SC2 resolved never to change ASCII -- to ensure interoperability. The ArmSCII-8A standard should be immediately withdrawn because it is defective in the context of reliable international text interchange.
Regarding ArmSCII-87 I have nothing to say because I don't really care about 7-bit implementations. In an interoperability environment wyou are doubtless switching with unaltered ASCII so you are probably OK with ArmSCII-7.
Regarding ArmSCII-16U, it should be removed from the draft RFC.

ARMENIAN CHARACTER SETS
IMPLEMENTATION GUIDE

Document version 005.en.html
~~June 12, 1998~~ 1998-10-31

Abstract

~~The~~ This document presents the set of Armenian characters ~~that are~~ used in ~~the~~ information systems in accordance to with AST 34.001 and AST 34.002, standards of the State Standards Commission of the Republic of Armenia~~, as well as~~ . It also provides information on the classification and sorting ~~thereof~~ of Armenian characters and recommendations for implementation of basic algorithms of text processing.

Table of Contents

1. Introduction

2. Basic Character Set

2.1. Naming
2.2. Classification and Sorting
2.3. Ligatures

3. Encoding

3.1. Basic principles
3.2. Cross Reference of Coding Tables

4. Character Set and Language Tags

4.1. Character Set Tags
4.2. Language Tags

5. Acknowledgements

6. Author's Address

7. References

1. Introduction

The publication of comments in reference to the standards
ME: This phrase doesn't make sense to me.
is due to the following considerations:

1. ~~The~~ Armenian character sets have been used in different computer systems ~~approx.~~ since at least 1982, ~~whereas the state~~ although a national standard was established only in 1997. This time lag resulted in the emergence of incompatible coding systems. ~~The~~ Some of the existing discrepancies are also due to ~~the existence of two different grammars of the Armenian language.~~ grammatical differences between the two major dialects of Armenian.

2. The emergence of internationalized operating systems and an important number of ~~multi-lingual~~ multilingual applications result in ~~situations~~ difficulties when ~~the~~ national language support is implemented by programmers ~~that~~ ~~who~~ are not familiar with ~~the given language.~~ Armenian.

The present memo is a recommendation rather than a binding standard.

The recommendations set forth herein are elaborated on the basis of the ~~state~~ national standards AST 34.001 (reg.no. 166-97) and AST 34.002 (reg.no. 167-97), as well as ~~ArmSCII standard.~~ the ArmSCII-7 standard.

2. Basic Character Set

2.1. Naming

The Armenian character set presented below follows the standard AST 34.001. The first column contains a glyph, the second contains the UCS code position; the third contains the character name; and the last contains an alias.~~full naming of the characters, and the second column provides abbreviations thereof that can be used in the systems confined to the Latin character set.~~ The detailed classification of the characters follows in the points below.

In spite of the fact that the space, numbers and Latin script are also part of the Armenian character set, these were not included in the AST 34.001 standard since these are present in all systems.
ME: Note that the AST 34.001 standard is probably incorrect, if it includes "Armenian" parentheses, "Armenian" quotation marks, and so on. A proper Armenian standard should have, as an 8-bit standard, given unaltered ASCII on the left (x00-x7F) and Armenian characters on the right (x80-xFF).

Table 1. Basic Character Set

UCS UCS Name Alias
-- ARMENIAN ETERNITY SIGN Hyeternity
0587 ARMENIAN LIGATURE ECH YIWN (ew) Hyechyiwn
0589 ARMENIAN FULL STOP (verjaket) Hyfullstop
055D ARMENIAN COMMA (but) Hycomma
058A ARMENIAN HYPHEN (yentamna) Hyhyphen
055A ARMENIAN APOSTROPHE Hyapostrophe
055C ARMENIAN EXCLAMATION MARK (amanak) Hyexclam
055B ARMENIAN EMPHASIS MARK (shesht) Hyemphasis
055E ARMENIAN QUESTION MARK (paruyk) Hyquestion
0531 ARMENIAN CAPITAL LETTER AYB HyAyb
0561 ARMENIAN SMALL LETTER AYB Hyayb
0532 ARMENIAN CAPITAL LETTER BEN HyBen
0562 ARMENIAN SMALL LETTER BEN Hyben
0533 ARMENIAN CAPITAL LETTER GIM HyGim
0563 ARMENIAN SMALL LETTER GIM Hygim
0534 ARMENIAN CAPITAL LETTER DA HyDa
0564 ARMENIAN SMALL LETTER DA Hyda
0535 ARMENIAN CAPITAL LETTER ECH (yech) HyEch
0565 ARMENIAN SMALL LETTER ECH (yech) Hyech
0536 ARMENIAN CAPITAL LETTER ZA HyZa
0566 ARMENIAN SMALL LETTER ZA Hyza
0537 ARMENIAN CAPITAL LETTER EH (e) HyEh
0567 ARMENIAN SMALL LETTER EH (e) Hyeh
0538 ARMENIAN CAPITAL LETTER ET (at) HyEt
0568 ARMENIAN SMALL LETTER ET (at) Hyet
0539 ARMENIAN CAPITAL LETTER TO HyTo
0569 ARMENIAN SMALL LETTER TO Hyto
053A ARMENIAN CAPITAL LETTER ZHE HyZhe
056A ARMENIAN SMALL LETTER ZHE Hyzhe
053B ARMENIAN CAPITAL LETTER INI HyIni
056B ARMENIAN SMALL LETTER INI Hyini
053C ARMENIAN CAPITAL LETTER LIWN (lyun) HyLiwn
056C ARMENIAN SMALL LETTER LIWN (lyun) Hyliwn
053D ARMENIAN CAPITAL LETTER XEH (khe) HyXeh
056D ARMENIAN SMALL LETTER XEH (khe) Hyxeh
053E ARMENIAN CAPITAL LETTER CA (tsa) HyCa
056E ARMENIAN SMALL LETTER CA (tsa) Hyca
053F ARMENIAN CAPITAL LETTER KEN HyKen
056F ARMENIAN SMALL LETTER KEN Hyken
0540 ARMENIAN CAPITAL LETTER HO HyHo
0570 ARMENIAN SMALL LETTER HO Hyho
0541 ARMENIAN CAPITAL LETTER JA (dza) HyJa
0571 ARMENIAN SMALL LETTER JA (dza) Hyja
0542 ARMENIAN CAPITAL LETTER GHAD (ghat) HyGhad
0572 ARMENIAN SMALL LETTER GHAD (ghat) Hyghad
0543 ARMENIAN CAPITAL LETTER CHEH (tche) HyCheh
0573 ARMENIAN SMALL LETTER CHEH (tche) Hycheh
0544 ARMENIAN CAPITAL LETTER MEN HyMen
0574 ARMENIAN SMALL LETTER MEN Hymen
0545 ARMENIAN CAPITAL LETTER YI (hi) HyYi
0575 ARMENIAN SMALL LETTER YI (hi) Hyyi
0546 ARMENIAN CAPITAL LETTER NOW (nu) HyNow
0576 ARMENIAN SMALL LETTER NOW (nu) Hynow
0547 ARMENIAN CAPITAL LETTER SHA HySha
0577 ARMENIAN SMALL LETTER SHA Hysha
0548 ARMENIAN CAPITAL LETTER VO HyVo
0578 ARMENIAN SMALL LETTER VO Hyvo
0549 ARMENIAN CAPITAL LETTER CHA HyCha
0579 ARMENIAN SMALL LETTER CHA Hycha
054A ARMENIAN CAPITAL LETTER PEH (pe) HyPeh
057A ARMENIAN SMALL LETTER PEH (pe) Hypeh
054B ARMENIAN CAPITAL LETTER JHEH (je) HyJheh
057B ARMENIAN SMALL LETTER JHEH (je) Hyjheh
054C ARMENIAN CAPITAL LETTER RA HyRa
057C ARMENIAN SMALL LETTER RA Hyra
054D ARMENIAN CAPITAL LETTER SEH (se) HySeh
057D ARMENIAN SMALL LETTER SEH (se) Hyseh
054E ARMENIAN CAPITAL LETTER VEW (vev) HyVew
057E ARMENIAN SMALL LETTER VEW (vev) Hyvew
054F ARMENIAN CAPITAL LETTER TIWN (tyun) HyTiwn
057F ARMENIAN SMALL LETTER TIWN (tyun) Hytiwn
0550 ARMENIAN CAPITAL LETTER REH (re) HyReh
0580 ARMENIAN SMALL LETTER REH (re) Hyreh
0551 ARMENIAN CAPITAL LETTER CO (tso) HyCo
0581 ARMENIAN SMALL LETTER CO (tso) Hyco
0552 ARMENIAN CAPITAL LETTER YIWN (vyun) HyYiwn
0582 ARMENIAN SMALL LETTER YIWN (vyun) Hyyiwn
0553 ARMENIAN CAPITAL LETTER PIWR (pyur) HyPiwr
0583 ARMENIAN SMALL LETTER PIWR (pyur) Hypiwr
0554 ARMENIAN CAPITAL LETTER KEH (ke) HyKeh
0584 ARMENIAN SMALL LETTER KEH (ke) Hykeh
0555 ARMENIAN CAPITAL LETTER OH (o) HyOh
0585 ARMENIAN SMALL LETTER OH (o) Hyoh
0556 ARMENIAN CAPITAL LETTER FEH (fe) HyFeh
0586 ARMENIAN SMALL LETTER FEH (fe) Hyfeh

The naming of characters are hereinafter referred to in abbreviated forms contained in the second column.

ME: In general I don't see the point of the abbreviated aliases. You could simply use the standard UCS names, omitting the words "ARMENIAN" and "CAPITAL" or "SMALL" and "LETTER".

2.2. Classification and Sorting

The basic character set can be divided into the following functional subsets:

unclassified-symbols ::= {Hyeternity, Hyechyiwn, section}

punctuation-signs ::= {Hyfullstop, parenright, parenleft, quotright, quotleft, emdash, middot, hyphen, comma, endash}

pseudo-letters ::= {Hyhyphen, ellipsis, Hyapostrophe}

ME: These are punctuation signs. They are not letters. The term "pseudo-letter" doesn't really mean anything to me. In the UCS there are "modifier letters", which are treated as letters but which look like punctuation. Is this what you mean? If so, you could use that term.

~~diacritic-signs~~combining-punctuation ::= {Hyexclam, Hyemphasis, Hyquestion}

letters ::= {capital-letters, small-letters}

capital-letters ::= {HyAyb, HyBen, HyGim, HyDa, HyEch, HyZa, HyEh, HyEt, HyTo, HyZhe, HyIni, HyLiwn, HyXeh, HyCa, HyKen, HyHo, HyJa, HyGhad, HyCheh, HyMen, HyYi, HyNow, HySha, HyVo, HyCha, HyPeh, HyJheh, HyRa, HySeh, HyVew, HyTiwn, HyReh, HyCo, HyYiwn, HyPiwr, HyKeh, HyOh, HyFeh}

small-letters ::= {Hyayb, Hyben, Hygim, Hyda, Hyech, Hyza, Hyeh, Hyet, Hyto, Hyzhe, Hyini, Hyliwn, Hyxeh, Hyca, Hyken, Hyho, Hyja, Hyghad, Hytcheh, Hymen, Hyyi, Hynow, Hysha, Hyvo, Hycha, Hypeh, Hyjheh, Hyra, Hyseh, Hyvew, Hytiwn, Hyreh, Hyco, Hyviwn, Hypiwr, Hykeh, Hyoh, Hyfeh}

The sorting order is important for ~~letter~~ alphabeticcharacters only and ~~is made in~~ should follow the order presented in ~~the~~ Table 1.

~~The case shift~~ Capitalization applies ~~for letter~~ to alphabeticcharacters only. The shift from ~~the~~ upper case to ~~the~~ lower case replaces the capital letter character with the ~~subsequent~~ following character as per ~~the~~ Table 1. Accordingly, the shift from lower case to the upper case replaces the small letter character with the preceding character as per the Table 1.

~~The text~~ Textsearch and dictionary applications should take into account the following factors: (1) in the Armenian language, a word is a sequence of ~~letter characters diacritic-signs, and pseudo-letters~~ letters, combining punctuation, and modifier letters; (2) in comparison of words in the text or dictionary, the ~~diacritic-signs and pseudo-letters~~ combining punctuation and modifier lettersmay be ignored.

In reference to the ~~diacritic-signs~~ combining punctuation, the following factors are important: (1) the ~~diacritic-sign refers to the preceding letter~~ combining punctuation mark follows the letter to which it applies (~~only~~ which can only be avowel in Armenian), (2) a letter can be followed by more than one diacritic sign.

ME: You should give guidance here as to permissible combinations of combining punctuation.

2.3. Ligatures

~~Ligature~~ A ligature is a traditional or ~~convenience~~ convenient graphical presentation of a sequence of letters, e.g. the Latin ligature "fi", the German ligature "ss", the Armenian ligature "Hymen+Hynow", etc. The ligatures can be officially registered and codified (~~like in UNICODE standard~~ as in the UCS), but the systems supporting ligatures may? should? substitute them automatically only on the screen, printer, or other graphical devices.

The Armenian ligature armewthat is a combination of armyechand armvyunwas included in the AST 34.001 standard in view of the following considerations: (1) armew is a "ligature symbol" rather than a ligature, and (2) armew carries an "and" denotation similar to the "&" character.

3. Encoding

3.1. Basic Principles

ME: As far as I can tell, this entire section is unnecessary and its theoretical background has led your committee to create character set standards which diverge significantly from the principles used by ISO/IEC JTC1/SC2. I believe the problem may be terminology. "Character set" is not the same as "character repertoire". Character set usually means "coded character set", that is, a set of code positions used to map the repertoire to. A character repertoire is like a basket of letters and other signs used, in this case to write Armenian. This misunderstanding may be the principle problem with this section. By mapping unique Armenian punctuation characters to ASCII characters, the AST has not followed the basic principles of character identification and coding used by ISO/IEC JTC1/SC2.

The Coded Character Set is a mapping of a set of characters into a set of integer numbers, e.g. ArmSCII-7, ArmSCII-8 and ArmSCII-8A tables.

ME: Where is the UCS?

The term "unification" is used in the following denotation: as a rule, the mapping of an Armenian character set takes place in operating environments where other character sets are already available; thus, certain characters, in particular punctuation marks, may have identical graphical mapping and similar functions. In such cases, some characters of the Armenian character set may be mapped into already existing codified characters. The details of unification of Armenian punctuation marks are reviewed below.

ME: This "principle" has enabled you to replace ASCII characters (in the range x20-x7F) which is precisely the problem. You should not have made these unifications.

The mapping of characters in coding tables has several aspects (in order of priority): (1) scope of the character mapping, (2) sequence of mapping, (3) character unification requirements, (4) general requirements of a given operating environment.

ME: In my view, the priorities you give are mistaken with regard to implementation in the real world. (4) is the most important; (2) is the least.

The encoding in every new operating environment should, to the extent possible, use the already existing coding tables (see the next section). Should this be impossible, the newly created coding tables should follow as much as possible the following general principles:

ME: I suppose this is fine as far as it goes, except for the problem that your basic character set is flawed, so following it will just cause problems for people.

1. The Armenian character set should be comprehensive (with due regard to the unification)

2. The Armenian character set should be mapped into a continual sequence of codes in the order these are presented in the Table 1. The unified character codes should be left absolute, i.e. should not be used for other purposes. The most important is the letter sequence.

ME: The order of the characters is totally irrelevant. All ordering in the real world is, and will be, table-based, not coding based. There are no significant savings in speed or efficiency of ordering which should lead to the requirement stated here. Further, it causes problems for implementation on some platforms.

3. The unification implies both graphical and functional identity of characters. For example, mapping of the parenthesis (armparenleft and armparenright) into the parenthesis existing in the ASCII is not an error.

ME: Yes, it is. There is no such thing as an Armenian parenthesis different from an ordinary parenthesis.

On the other hand, the similarity of the Armenian full stop (armfullstop) and the colon is purely graphical. The armdotand armsepbear functions different from the Latin dot and the grave accent character accordingly. Another important factor of character unification is the use of the Latin alphabet and punctuation marks in formal languages. It should be born in mind, for example, that a comma is often used as a separator in lists (e.g. in a keyword list in HTML document header), and in order to avoid confusion, the armcomma character may be mapped into a Latin comma.

ME: This is where you have made a grave error. You don't want dynamic mapping, you want reliable mapping. You have in ArmSCII-8A replaced x2D HYPHEN-MINUS with "ARMENIAN" EM DASH, and replaced x5F LOW LINE with "ARMENIAN" EN DASH. You will surely run into interoperability problems with UCS implementations. One example of the problem would be in internet addresses and URLs: an example address like hovik_melikyan@physics-university.am will be corrupted by the mappings your Armenian standard has specified for ArmSCII-8A.

4. It may often happen that the requirements of a given operating environment may contradict the above principles. For example, the pseudo-graphical characters in DOS that were supported by video-adapters ("ninth pixel" factor), resulted in the creation of an alternative 8-bit coding table ArmSCII-8A. Another example is Macintosh OS where codes like ellipsis, nbsp and soft hyphen are recognized and interpreted in a special by numerous applications, which rendered the meaningful use the ArmSCII standard in this system impossible (the ArmSCII-8A table is used in OS Macintosh).

ME: The ArmSCII-8A table is unsuitable for use on the Macintosh. Althopugh you have said that characters like ELLIPSIS, NO-BREAK SPACE, and SOFT HYPHEN are recognized and interpreted in certain ways by Mac software, the following errors have been made.

Mac doesn't have a SOFT HYPHEN. Applications interpret HYPHEN-MINUS with some control character as a SOFT HYPHEN
You have not given NO-BREAK SPACE in any of your coded character sets.
ArmSCII-8A codes « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK at xAE, » LEFT-POINTING DOUBLE ANGLE QUOTATION MARK at xAF, and ELLIPSIS at xDE, but on the Macintosh these are found at xC7, xC8, and xC9 respectively. Double coding them is a grave error. Further, you have coded EN DASH and EM DASH at x2D and x5F respectively, while they are supposed to be at xD0 and xD1. See the note on this above. Also the SECTION SIGN is coded at x1F (!) when it is supposed to be at xA4 on the Mac.
The WorldScript software I have released for Armenian on the Mac, which has been used by Armenian scholars in Israel and the Netherlands, is conformant with Apple Macintosh practice and with SC2's principles of character set encoding.

ArmSCII coding table does not fully correspond to the above principles, and the Armenian block in the current version of ~~UNICODE~~ Unicode (2.1) corresponds to neither (1), (2), nor (3).

3.2. Cross Reference of Coding Tables

Table 2. Cross reference

1 - Short name
2 - ArmSCII-7
3 - ArmSCII-8 (AST 34.002-97, Basic coding table)
4 - ArmSCII-8A (AST 34.002-97, Alternative coding table)
5 - ArmSCII-16U
6 - ~~UNICODE~~ Unicode Version 2.1

1 2 3 4 5 6
armeternity 21 A1 DC 0521 -
armew - - - - 0587
armsection 22 A2 15 0522 00A7
armfullstop 23 A3 3A 0523 0589
armparenright 24 A4 29 0524 0029
armparenleft 25 A5 28 0525 002A
armquotright 26 A6 AF 0526 00BB
armquotleft 27 A7 AE 0527 00AB
armemdash 28 A8 2D 0528 2014
armdot 29 A9 2E 0529 002E
armsep 2A AA 60 052A 055D
armcomma 2B AB 2C 052B 002C
armendash 2C AC 5F 052C 2013
armyentamna 2D AD DD 052D 058A
armellipsis 2E AE DE 052E 2026
armapostrophe 7E FE FE 057E 02BC
armexclam 2F AF 7E 052F 055C
armaccent 30 B0 27 0530 055B
armquestion 31 B1 DF 0531 055E
Armayb 32 B2 80 0532 0531
armayb 33 B3 81 0533 0561
Armben 34 B4 82 0534 0532
armben 35 B5 83 0535 0562
Armgim 36 B6 84 0536 0533
armgim 37 B7 85 0537 0563
Armda 38 B8 86 0538 0534
armda 39 B9 87 0539 0564
Armyech 3A BA 88 053A 0535
armyech 3B BB 89 053B 0565
Armza 3C BC 8A 053C 0536
armza 3D BD 8B 053D 0566
Arme 3E BE 8C 053E 0537
arme 3F BF 8D 053F 0567
Armat 40 C0 8E 0540 0538
armat 41 C1 8F 0541 0568
Armto 42 C2 90 0542 0539
armto 43 C3 91 0543 0569
Armzhe 44 C4 92 0544 053A
armzhe 45 C5 93 0545 056A
Armini 46 C6 94 0546 053B
armini 47 C7 95 0547 056B
Armlyun 48 C8 96 0548 053C
armlyun 49 C9 97 0549 056C
Armkhe 4A CA 98 054A 053D
armkhe 4B CB 99 054B 056D
Armtsa 4C CC 9A 054C 053E
armtsa 4D CD 9B 054D 056E
Armken 4E CE 9C 054E 053F
armken 4F CF 9D 054F 056F
Armho 50 D0 9E 0550 0540
armho 51 D1 9F 0551 0570
Armdza 52 D2 A0 0552 0541
armdza 53 D3 A1 0553 0571
Armghat 54 D4 A2 0554 0542
armghat 55 D5 A3 0555 0572
Armtche 56 D6 A4 0556 0543
armtche 57 D7 A5 0557 0573
Armmen 58 D8 A6 0558 0544
armmen 59 D9 A7 0559 0574
Armhi 5A DA A8 055A 0545
armhi 5B DB A9 055B 0575
Armnu 5C DC AA 055C 0546
armnu 5D DD AB 055D 0576
Armsha 5E DE AC 055E 0547
armsha 5F DF AD 055F 0577
Armvo 60 E0 E0 0560 0548
armvo 61 E1 E1 0561 0578
Armcha 62 E2 E2 0562 0549
armcha 63 E3 E3 0563 0579
Armpe 64 E4 E4 0564 054A
armpe 65 E5 E5 0565 057A
Armje 66 E6 E6 0566 054B
armje 67 E7 E7 0567 057B
Armra 68 E8 E8 0568 054C
armra 69 E9 E9 0569 057C
Armse 6A EA EA 056A 054D
armse 6B EB EB 056B 057D
Armvev 6C EC EC 056C 054E
armvev 6D ED ED 056D 057E
Armtyun 6E EE EE 056E 054F
armtyun 6F EF EF 056F 057F
Armre 70 F0 F0 0570 0550
armre 71 F1 F1 0571 0580
Armtso 72 F2 F2 0572 0551
armtso 73 F3 F3 0573 0581
Armvyun 74 F4 F4 0574 0552
armvyun 75 F5 F5 0575 0582
Armpyur 76 F6 F6 0576 0553
armpyur 77 F7 F7 0577 0583
Armke 78 F8 F8 0578 0554
armke 79 F9 F9 0579 0584
Armo 7A FA FA 057A 0555
armo 7B FB FB 057B 0585
Armfe 7C FC FC 057C 0556
armfe 7D FD FD 057D 0586

4. Character Set and Language Tags

4.1. Coded Character Set Tags

In the systems and protocols using mnemonic tags for coded character sets, the following tags should be used (name, official source, optional alias):

Name:		armscii-8
Source:		Armenian State Standard AST 34.002 Basic 8-bit coded character set
Alias:		AST_34.002

Name:		armscii-8a
Source:		Armenian State Standard AST 34.002 Alternative 8-bit coded character set
Alias:		AST_34.002-A

4.2. Language Tags

Dictionaries, spelling checkers and other linguistic systems, as well as operating environments distinguishing human languages and locale identification should take into consideration the existence of 4 mutually incomprehensible forms (dialects) of the Armenian language: Eastern Armenian, Western Armenian, Grabar Armenian and Middle Armenian. Table 3 presents two forms of suggested mnemonic tags: MIME-style (RFC-1766) and Windows-style 3-letter abbreviations.

Table 3. Language tags

Mime-style name	3-letter code	Full name
hy-eastern	AME	Armenian Eastern
hy-western	AMW	Armenian Western
hy-grabar	AMG	Armenian Grabar
hy-middle	AMM	Armenian Middle

5. Acknowledgements

This document is the result of long and intensive consultations and cooperation with the staff of the Standards Working Group of the Armenian Computer Center. Special thanks for most valuable inputs and comments go to (in alphabetical order):

Hovhannes Gizoghian
Tigran Haroutunian
Aram Hayrapetian
Ivan Lulukian
Vahram Mekhitarian
Rouben Taroumian-Hakobian
Hovhannes Zakarian

ME: Without prejudice to the hard work done by this committee, consultation with experts outside of Armenian would, in my opinion, have been a good idea.

6. Author's Address

Hovik Melikyan
Center of Humane Technologies "Armenian Computer"
Yerevan, Republic of Armenia
hovik@moon.yerphi.am

7. References

[AST 34.001-97]

Information Technologies -- Character Set And Information Encoding: Character Set -- State Standardization Committee of the Republic of Armenia, July 1997

[AST 34.002-97]

Information Technologies -- Character Set And Information Encoding: 8-bit Coded Character Sets -- State Standardization Committee of the Republic of Armenia, July 1997

[ArmSCII]

Armenian Standard Code for Information Interchange -- Center of Humane Technologies "Armenian Computer", June 1991

[RFC-1766]

Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995.

[~~UNICODE~~ Unicode]

The Unicode Consortium, "The Unicode Standard -- Version 2.0", Addison-Wesley, 1996.

[~~UNICODE~~ Unicode Version 2.1]

Unicode Technical Report #8, The Unicode Standard, Version 2.1 -- http://www.unicode.org/unicode/reports/tr8.html.

UCS	UCS Name	Alias
--	ARMENIAN ETERNITY SIGN	Hyeternity
0587	ARMENIAN LIGATURE ECH YIWN (ew)	Hyechyiwn
0589	ARMENIAN FULL STOP (verjaket)	Hyfullstop
055D	ARMENIAN COMMA (but)	Hycomma
058A	ARMENIAN HYPHEN (yentamna)	Hyhyphen
055A	ARMENIAN APOSTROPHE	Hyapostrophe
055C	ARMENIAN EXCLAMATION MARK (amanak)	Hyexclam
055B	ARMENIAN EMPHASIS MARK (shesht)	Hyemphasis
055E	ARMENIAN QUESTION MARK (paruyk)	Hyquestion
0531	ARMENIAN CAPITAL LETTER AYB	HyAyb
0561	ARMENIAN SMALL LETTER AYB	Hyayb
0532	ARMENIAN CAPITAL LETTER BEN	HyBen
0562	ARMENIAN SMALL LETTER BEN	Hyben
0533	ARMENIAN CAPITAL LETTER GIM	HyGim
0563	ARMENIAN SMALL LETTER GIM	Hygim
0534	ARMENIAN CAPITAL LETTER DA	HyDa
0564	ARMENIAN SMALL LETTER DA	Hyda
0535	ARMENIAN CAPITAL LETTER ECH (yech)	HyEch
0565	ARMENIAN SMALL LETTER ECH (yech)	Hyech
0536	ARMENIAN CAPITAL LETTER ZA	HyZa
0566	ARMENIAN SMALL LETTER ZA	Hyza
0537	ARMENIAN CAPITAL LETTER EH (e)	HyEh
0567	ARMENIAN SMALL LETTER EH (e)	Hyeh
0538	ARMENIAN CAPITAL LETTER ET (at)	HyEt
0568	ARMENIAN SMALL LETTER ET (at)	Hyet
0539	ARMENIAN CAPITAL LETTER TO	HyTo
0569	ARMENIAN SMALL LETTER TO	Hyto
053A	ARMENIAN CAPITAL LETTER ZHE	HyZhe
056A	ARMENIAN SMALL LETTER ZHE	Hyzhe
053B	ARMENIAN CAPITAL LETTER INI	HyIni
056B	ARMENIAN SMALL LETTER INI	Hyini
053C	ARMENIAN CAPITAL LETTER LIWN (lyun)	HyLiwn
056C	ARMENIAN SMALL LETTER LIWN (lyun)	Hyliwn
053D	ARMENIAN CAPITAL LETTER XEH (khe)	HyXeh
056D	ARMENIAN SMALL LETTER XEH (khe)	Hyxeh
053E	ARMENIAN CAPITAL LETTER CA (tsa)	HyCa
056E	ARMENIAN SMALL LETTER CA (tsa)	Hyca
053F	ARMENIAN CAPITAL LETTER KEN	HyKen
056F	ARMENIAN SMALL LETTER KEN	Hyken
0540	ARMENIAN CAPITAL LETTER HO	HyHo
0570	ARMENIAN SMALL LETTER HO	Hyho
0541	ARMENIAN CAPITAL LETTER JA (dza)	HyJa
0571	ARMENIAN SMALL LETTER JA (dza)	Hyja
0542	ARMENIAN CAPITAL LETTER GHAD (ghat)	HyGhad
0572	ARMENIAN SMALL LETTER GHAD (ghat)	Hyghad
0543	ARMENIAN CAPITAL LETTER CHEH (tche)	HyCheh
0573	ARMENIAN SMALL LETTER CHEH (tche)	Hycheh
0544	ARMENIAN CAPITAL LETTER MEN	HyMen
0574	ARMENIAN SMALL LETTER MEN	Hymen
0545	ARMENIAN CAPITAL LETTER YI (hi)	HyYi
0575	ARMENIAN SMALL LETTER YI (hi)	Hyyi
0546	ARMENIAN CAPITAL LETTER NOW (nu)	HyNow
0576	ARMENIAN SMALL LETTER NOW (nu)	Hynow
0547	ARMENIAN CAPITAL LETTER SHA	HySha
0577	ARMENIAN SMALL LETTER SHA	Hysha
0548	ARMENIAN CAPITAL LETTER VO	HyVo
0578	ARMENIAN SMALL LETTER VO	Hyvo
0549	ARMENIAN CAPITAL LETTER CHA	HyCha
0579	ARMENIAN SMALL LETTER CHA	Hycha
054A	ARMENIAN CAPITAL LETTER PEH (pe)	HyPeh
057A	ARMENIAN SMALL LETTER PEH (pe)	Hypeh
054B	ARMENIAN CAPITAL LETTER JHEH (je)	HyJheh
057B	ARMENIAN SMALL LETTER JHEH (je)	Hyjheh
054C	ARMENIAN CAPITAL LETTER RA	HyRa
057C	ARMENIAN SMALL LETTER RA	Hyra
054D	ARMENIAN CAPITAL LETTER SEH (se)	HySeh
057D	ARMENIAN SMALL LETTER SEH (se)	Hyseh
054E	ARMENIAN CAPITAL LETTER VEW (vev)	HyVew
057E	ARMENIAN SMALL LETTER VEW (vev)	Hyvew
054F	ARMENIAN CAPITAL LETTER TIWN (tyun)	HyTiwn
057F	ARMENIAN SMALL LETTER TIWN (tyun)	Hytiwn
0550	ARMENIAN CAPITAL LETTER REH (re)	HyReh
0580	ARMENIAN SMALL LETTER REH (re)	Hyreh
0551	ARMENIAN CAPITAL LETTER CO (tso)	HyCo
0581	ARMENIAN SMALL LETTER CO (tso)	Hyco
0552	ARMENIAN CAPITAL LETTER YIWN (vyun)	HyYiwn
0582	ARMENIAN SMALL LETTER YIWN (vyun)	Hyyiwn
0553	ARMENIAN CAPITAL LETTER PIWR (pyur)	HyPiwr
0583	ARMENIAN SMALL LETTER PIWR (pyur)	Hypiwr
0554	ARMENIAN CAPITAL LETTER KEH (ke)	HyKeh
0584	ARMENIAN SMALL LETTER KEH (ke)	Hykeh
0555	ARMENIAN CAPITAL LETTER OH (o)	HyOh
0585	ARMENIAN SMALL LETTER OH (o)	Hyoh
0556	ARMENIAN CAPITAL LETTER FEH (fe)	HyFeh
0586	ARMENIAN SMALL LETTER FEH (fe)	Hyfeh

1	2	3	4	5	6
armeternity	21	A1	DC	0521	-
armew	-	-	-	-	0587
armsection	22	A2	15	0522	00A7
armfullstop	23	A3	3A	0523	0589
armparenright	24	A4	29	0524	0029
armparenleft	25	A5	28	0525	002A
armquotright	26	A6	AF	0526	00BB
armquotleft	27	A7	AE	0527	00AB
armemdash	28	A8	2D	0528	2014
armdot	29	A9	2E	0529	002E
armsep	2A	AA	60	052A	055D
armcomma	2B	AB	2C	052B	002C
armendash	2C	AC	5F	052C	2013
armyentamna	2D	AD	DD	052D	058A
armellipsis	2E	AE	DE	052E	2026
armapostrophe	7E	FE	FE	057E	02BC
armexclam	2F	AF	7E	052F	055C
armaccent	30	B0	27	0530	055B
armquestion	31	B1	DF	0531	055E
Armayb	32	B2	80	0532	0531
armayb	33	B3	81	0533	0561
Armben	34	B4	82	0534	0532
armben	35	B5	83	0535	0562
Armgim	36	B6	84	0536	0533
armgim	37	B7	85	0537	0563
Armda	38	B8	86	0538	0534
armda	39	B9	87	0539	0564
Armyech	3A	BA	88	053A	0535
armyech	3B	BB	89	053B	0565
Armza	3C	BC	8A	053C	0536
armza	3D	BD	8B	053D	0566
Arme	3E	BE	8C	053E	0537
arme	3F	BF	8D	053F	0567
Armat	40	C0	8E	0540	0538
armat	41	C1	8F	0541	0568
Armto	42	C2	90	0542	0539
armto	43	C3	91	0543	0569
Armzhe	44	C4	92	0544	053A
armzhe	45	C5	93	0545	056A
Armini	46	C6	94	0546	053B
armini	47	C7	95	0547	056B
Armlyun	48	C8	96	0548	053C
armlyun	49	C9	97	0549	056C
Armkhe	4A	CA	98	054A	053D
armkhe	4B	CB	99	054B	056D
Armtsa	4C	CC	9A	054C	053E
armtsa	4D	CD	9B	054D	056E
Armken	4E	CE	9C	054E	053F
armken	4F	CF	9D	054F	056F
Armho	50	D0	9E	0550	0540
armho	51	D1	9F	0551	0570
Armdza	52	D2	A0	0552	0541
armdza	53	D3	A1	0553	0571
Armghat	54	D4	A2	0554	0542
armghat	55	D5	A3	0555	0572
Armtche	56	D6	A4	0556	0543
armtche	57	D7	A5	0557	0573
Armmen	58	D8	A6	0558	0544
armmen	59	D9	A7	0559	0574
Armhi	5A	DA	A8	055A	0545
armhi	5B	DB	A9	055B	0575
Armnu	5C	DC	AA	055C	0546
armnu	5D	DD	AB	055D	0576
Armsha	5E	DE	AC	055E	0547
armsha	5F	DF	AD	055F	0577
Armvo	60	E0	E0	0560	0548
armvo	61	E1	E1	0561	0578
Armcha	62	E2	E2	0562	0549
armcha	63	E3	E3	0563	0579
Armpe	64	E4	E4	0564	054A
armpe	65	E5	E5	0565	057A
Armje	66	E6	E6	0566	054B
armje	67	E7	E7	0567	057B
Armra	68	E8	E8	0568	054C
armra	69	E9	E9	0569	057C
Armse	6A	EA	EA	056A	054D
armse	6B	EB	EB	056B	057D
Armvev	6C	EC	EC	056C	054E
armvev	6D	ED	ED	056D	057E
Armtyun	6E	EE	EE	056E	054F
armtyun	6F	EF	EF	056F	057F
Armre	70	F0	F0	0570	0550
armre	71	F1	F1	0571	0580
Armtso	72	F2	F2	0572	0551
armtso	73	F3	F3	0573	0581
Armvyun	74	F4	F4	0574	0552
armvyun	75	F5	F5	0575	0582
Armpyur	76	F6	F6	0576	0553
armpyur	77	F7	F7	0577	0583
Armke	78	F8	F8	0578	0554
armke	79	F9	F9	0579	0584
Armo	7A	FA	FA	057A	0555
armo	7B	FB	FB	057B	0585
Armfe	7C	FC	FC	057C	0556
armfe	7D	FD	FD	057D	0586