1998-10-31
Dear colleagues,
I have reviewed your character sets implementation document. There are many features in the document which I found confusing and unclear. Some of these had to do with the use of the English language, and which, with some thought, I was able to puzzle out. I have presented my comments in HTML format here to include corrections to the English of the document since HTML is a convenient way of showing things that should be deleted and things which should be added.
My greatest concern is your committee's understanding and use of the concept "unification". Apart from this the whole section on basic principles is problematic, especially because you the document itself appears to indicate that none of the character sets standardized in Armenia follow these basic principles. Frankly, the discussion on principles is confusing, theoretical, and unnecessary. It also seems to differ from the principles used by ISO/IEC JTC1/SC2. There is a difference between theory of character sets and the practice of implementing them practically in the real world. As a member of JTC1/SC2, the committee responsible for character set encoding, I am troubled by the number of instances in which Armenian 8-bit code tables, standardized in Armenia, violate SC2's basic principles of character identification (what a given character is) and code table implementation (where a given character goes).
It seems clear that all of the Armenian code tables will cause problms with conversion to UCS (Unicode and ISO/IEC 10646) implementations. For this reason I oppose the acceptance of the Implementation Guide document as an RFC.
This practice is a mistaken one, and it is dangerous. Unique mappings between ArmSCII-8A and the UCS will cause you problems, I am sure. This is the reason SC2 resolved never to change ASCII -- to ensure interoperability. The ArmSCII-8A standard should be immediately withdrawn because it is defective in the context of reliable international text interchange.
Regarding ArmSCII-87 I have nothing to say because I don't really care about 7-bit implementations. In an interoperability environment wyou are doubtless switching with unaltered ASCII so you are probably OK with ArmSCII-7.
Regarding ArmSCII-16U, it should be removed from the draft RFC.
ARMENIAN CHARACTER SETS
IMPLEMENTATION GUIDE
Document version 005.en.html
June 12, 19981998-10-31
Abstract
TheThis document presents the set of Armenian charactersthat areused intheinformation systems in accordancetowith AST 34.001 and AST 34.002, standards of the State Standards Commission of the Republic of Armenia, as well as. It also provides information on the classification and sortingthereofof Armenian characters and recommendations for implementation of basic algorithms of text processing.
Table of Contents
| 1. Introduction |
| 2. Basic Character Set |
|
| 3. Encoding |
|
| 4. Character Set and Language Tags |
|
| 5. Acknowledgements |
| 6. Author's Address |
| 7. References |
1. Introduction
The publication of comments in reference to the standards
ME: This phrase doesn't make sense to me.
is due to the following considerations:
1.
TheArmenian character sets have been used in different computer systemsapprox.since at least 1982,whereas the statealthough a national standard was established only in 1997. This time lag resulted in the emergence of incompatible coding systems.TheSome of the existing discrepancies are also due tothe existence of two different grammars of the Armenian language.grammatical differences between the two major dialects of Armenian.2. The emergence of internationalized operating systems and an important number of
multi-lingualmultilingual applications result insituationsdifficulties whenthenational language support is implemented by programmersthatwhoare not familiar withthe given language.Armenian.The present memo is a recommendation rather than a binding standard.
The recommendations set forth herein are elaborated on the basis of the
statenational standards AST 34.001 (reg.no. 166-97) and AST 34.002 (reg.no. 167-97), as well asArmSCII standard.the ArmSCII-7 standard.
2. Basic Character Set
2.1. Naming
The Armenian character set presented below follows the standard AST 34.001. The first column contains a glyph, the second contains the UCS code position; the third contains the character name; and the last contains an alias.
full naming of the characters, and the second column provides abbreviations thereof that can be used in the systems confined to the Latin character set.The detailed classification of the characters follows in the points below.
In spite of the fact that the space, numbers and Latin script are also part of the Armenian character set, these were not included in the AST 34.001 standard since these are present in all systems.
ME: Note that the AST 34.001 standard is probably incorrect, if it includes "Armenian" parentheses, "Armenian" quotation marks, and so on. A proper Armenian standard should have, as an 8-bit standard, given unaltered ASCII on the left (x00-x7F) and Armenian characters on the right (x80-xFF).
Table 1. Basic Character Set
| UCS | UCS Name | Alias | |
| -- | ARMENIAN ETERNITY SIGN | Hyeternity | |
| 0587 | ARMENIAN LIGATURE ECH YIWN (ew) | Hyechyiwn | |
| 0589 | ARMENIAN FULL STOP (verjaket) | Hyfullstop | |
| 055D | ARMENIAN COMMA (but) | Hycomma | |
| 058A | ARMENIAN HYPHEN (yentamna) | Hyhyphen | |
| 055A | ARMENIAN APOSTROPHE | Hyapostrophe | |
| 055C | ARMENIAN EXCLAMATION MARK (amanak) | Hyexclam | |
| 055B | ARMENIAN EMPHASIS MARK (shesht) | Hyemphasis | |
| 055E | ARMENIAN QUESTION MARK (paruyk) | Hyquestion | |
| 0531 | ARMENIAN CAPITAL LETTER AYB | HyAyb | |
| 0561 | ARMENIAN SMALL LETTER AYB | Hyayb | |
| 0532 | ARMENIAN CAPITAL LETTER BEN | HyBen | |
| 0562 | ARMENIAN SMALL LETTER BEN | Hyben | |
| 0533 | ARMENIAN CAPITAL LETTER GIM | HyGim | |
| 0563 | ARMENIAN SMALL LETTER GIM | Hygim | |
| 0534 | ARMENIAN CAPITAL LETTER DA | HyDa | |
| 0564 | ARMENIAN SMALL LETTER DA | Hyda | |
| 0535 | ARMENIAN CAPITAL LETTER ECH (yech) | HyEch | |
| 0565 | ARMENIAN SMALL LETTER ECH (yech) | Hyech | |
| 0536 | ARMENIAN CAPITAL LETTER ZA | HyZa | |
| 0566 | ARMENIAN SMALL LETTER ZA | Hyza | |
| 0537 | ARMENIAN CAPITAL LETTER EH (e) | HyEh | |
| 0567 | ARMENIAN SMALL LETTER EH (e) | Hyeh | |
| 0538 | ARMENIAN CAPITAL LETTER ET (at) | HyEt | |
| 0568 | ARMENIAN SMALL LETTER ET (at) | Hyet | |
| 0539 | ARMENIAN CAPITAL LETTER TO | HyTo | |
| 0569 | ARMENIAN SMALL LETTER TO | Hyto | |
| 053A | ARMENIAN CAPITAL LETTER ZHE | HyZhe | |
| 056A | ARMENIAN SMALL LETTER ZHE | Hyzhe | |
| 053B | ARMENIAN CAPITAL LETTER INI | HyIni | |
| 056B | ARMENIAN SMALL LETTER INI | Hyini | |
| 053C | ARMENIAN CAPITAL LETTER LIWN (lyun) | HyLiwn | |
| 056C | ARMENIAN SMALL LETTER LIWN (lyun) | Hyliwn | |
| 053D | ARMENIAN CAPITAL LETTER XEH (khe) | HyXeh | |
| 056D | ARMENIAN SMALL LETTER XEH (khe) | Hyxeh | |
| 053E | ARMENIAN CAPITAL LETTER CA (tsa) | HyCa | |
| 056E | ARMENIAN SMALL LETTER CA (tsa) | Hyca | |
| 053F | ARMENIAN CAPITAL LETTER KEN | HyKen | |
| 056F | ARMENIAN SMALL LETTER KEN | Hyken | |
| 0540 | ARMENIAN CAPITAL LETTER HO | HyHo | |
| 0570 | ARMENIAN SMALL LETTER HO | Hyho | |
| 0541 | ARMENIAN CAPITAL LETTER JA (dza) | HyJa | |
| 0571 | ARMENIAN SMALL LETTER JA (dza) | Hyja | |
| 0542 | ARMENIAN CAPITAL LETTER GHAD (ghat) | HyGhad | |
| 0572 | ARMENIAN SMALL LETTER GHAD (ghat) | Hyghad | |
| 0543 | ARMENIAN CAPITAL LETTER CHEH (tche) | HyCheh | |
| 0573 | ARMENIAN SMALL LETTER CHEH (tche) | Hycheh | |
| 0544 | ARMENIAN CAPITAL LETTER MEN | HyMen | |
| 0574 | ARMENIAN SMALL LETTER MEN | Hymen | |
| 0545 | ARMENIAN CAPITAL LETTER YI (hi) | HyYi | |
| 0575 | ARMENIAN SMALL LETTER YI (hi) | Hyyi | |
| 0546 | ARMENIAN CAPITAL LETTER NOW (nu) | HyNow | |
| 0576 | ARMENIAN SMALL LETTER NOW (nu) | Hynow | |
| 0547 | ARMENIAN CAPITAL LETTER SHA | HySha | |
| 0577 | ARMENIAN SMALL LETTER SHA | Hysha | |
| 0548 | ARMENIAN CAPITAL LETTER VO | HyVo | |
| 0578 | ARMENIAN SMALL LETTER VO | Hyvo | |
| 0549 | ARMENIAN CAPITAL LETTER CHA | HyCha | |
| 0579 | ARMENIAN SMALL LETTER CHA | Hycha | |
| 054A | ARMENIAN CAPITAL LETTER PEH (pe) | HyPeh | |
| 057A | ARMENIAN SMALL LETTER PEH (pe) | Hypeh | |
| 054B | ARMENIAN CAPITAL LETTER JHEH (je) | HyJheh | |
| 057B | ARMENIAN SMALL LETTER JHEH (je) | Hyjheh | |
| 054C | ARMENIAN CAPITAL LETTER RA | HyRa | |
| 057C | ARMENIAN SMALL LETTER RA | Hyra | |
| 054D | ARMENIAN CAPITAL LETTER SEH (se) | HySeh | |
| 057D | ARMENIAN SMALL LETTER SEH (se) | Hyseh | |
| 054E | ARMENIAN CAPITAL LETTER VEW (vev) | HyVew | |
| 057E | ARMENIAN SMALL LETTER VEW (vev) | Hyvew | |
| 054F | ARMENIAN CAPITAL LETTER TIWN (tyun) | HyTiwn | |
| 057F | ARMENIAN SMALL LETTER TIWN (tyun) | Hytiwn | |
| 0550 | ARMENIAN CAPITAL LETTER REH (re) | HyReh | |
| 0580 | ARMENIAN SMALL LETTER REH (re) | Hyreh | |
| 0551 | ARMENIAN CAPITAL LETTER CO (tso) | HyCo | |
| 0581 | ARMENIAN SMALL LETTER CO (tso) | Hyco | |
| 0552 | ARMENIAN CAPITAL LETTER YIWN (vyun) | HyYiwn | |
| 0582 | ARMENIAN SMALL LETTER YIWN (vyun) | Hyyiwn | |
| 0553 | ARMENIAN CAPITAL LETTER PIWR (pyur) | HyPiwr | |
| 0583 | ARMENIAN SMALL LETTER PIWR (pyur) | Hypiwr | |
| 0554 | ARMENIAN CAPITAL LETTER KEH (ke) | HyKeh | |
| 0584 | ARMENIAN SMALL LETTER KEH (ke) | Hykeh | |
| 0555 | ARMENIAN CAPITAL LETTER OH (o) | HyOh | |
| 0585 | ARMENIAN SMALL LETTER OH (o) | Hyoh | |
| 0556 | ARMENIAN CAPITAL LETTER FEH (fe) | HyFeh | |
| 0586 | ARMENIAN SMALL LETTER FEH (fe) | Hyfeh |
The naming of characters are hereinafter referred to in abbreviated forms contained in the second column.
ME: In general I don't see the point of the abbreviated aliases. You could simply use the standard UCS names, omitting the words "ARMENIAN" and "CAPITAL" or "SMALL" and "LETTER".
2.2. Classification and Sorting
The basic character set can be divided into the following functional subsets:
unclassified-symbols ::= {Hyeternity, Hyechyiwn, section}
punctuation-signs ::= {Hyfullstop, parenright, parenleft, quotright, quotleft, emdash, middot, hyphen, comma, endash}
pseudo-letters ::= {Hyhyphen, ellipsis, Hyapostrophe}
ME: These are punctuation signs. They are not letters. The term "pseudo-letter" doesn't really mean anything to me. In the UCS there are "modifier letters", which are treated as letters but which look like punctuation. Is this what you mean? If so, you could use that term.
diacritic-signscombining-punctuation ::= {Hyexclam, Hyemphasis, Hyquestion}letters ::= {capital-letters, small-letters}
capital-letters ::= {HyAyb, HyBen, HyGim, HyDa, HyEch, HyZa, HyEh, HyEt, HyTo, HyZhe, HyIni, HyLiwn, HyXeh, HyCa, HyKen, HyHo, HyJa, HyGhad, HyCheh, HyMen, HyYi, HyNow, HySha, HyVo, HyCha, HyPeh, HyJheh, HyRa, HySeh, HyVew, HyTiwn, HyReh, HyCo, HyYiwn, HyPiwr, HyKeh, HyOh, HyFeh}
small-letters ::= {Hyayb, Hyben, Hygim, Hyda, Hyech, Hyza, Hyeh, Hyet, Hyto, Hyzhe, Hyini, Hyliwn, Hyxeh, Hyca, Hyken, Hyho, Hyja, Hyghad, Hytcheh, Hymen, Hyyi, Hynow, Hysha, Hyvo, Hycha, Hypeh, Hyjheh, Hyra, Hyseh, Hyvew, Hytiwn, Hyreh, Hyco, Hyviwn, Hypiwr, Hykeh, Hyoh, Hyfeh}
The sorting order is important for
letteralphabeticcharacters only andis made inshould follow the order presented intheTable 1.
The case shiftCapitalization appliesfor letterto alphabeticcharacters only. The shift fromtheupper case tothelower case replaces the capital letter character with thesubsequentfollowing character as pertheTable 1. Accordingly, the shift from lower case to the upper case replaces the small letter character with the preceding character as per the Table 1.
The textTextsearch and dictionary applications should take into account the following factors: (1) in the Armenian language, a word is a sequence ofletter characters diacritic-signs, and pseudo-lettersletters, combining punctuation, and modifier letters; (2) in comparison of words in the text or dictionary, thediacritic-signs and pseudo-letterscombining punctuation and modifier lettersmay be ignored.In reference to the
diacritic-signscombining punctuation, the following factors are important: (1) thediacritic-sign refers to the preceding lettercombining punctuation mark follows the letter to which it applies (onlywhich can only be avowel in Armenian), (2) a letter can be followed by more than one diacritic sign.ME: You should give guidance here as to permissible combinations of combining punctuation.
2.3. Ligatures
LigatureA ligature is a traditional orconvenienceconvenient graphical presentation of a sequence of letters, e.g. the Latin ligature "fi", the German ligature "ss", the Armenian ligature "Hymen+Hynow", etc. The ligatures can be officially registered and codified (like in UNICODE standardas in the UCS), but the systems supporting ligatures may? should? substitute them automatically only on the screen, printer, or other graphical devices.
The Armenian ligature armew
that is a combination of armyech
and armvyun
was included in the AST 34.001 standard in view of the following considerations: (1) armew is a "ligature symbol" rather than a ligature, and (2) armew carries an "and" denotation similar to the "&" character.
3. Encoding
3.1. Basic Principles
ME: As far as I can tell, this entire section is unnecessary and its theoretical background has led your committee to create character set standards which diverge significantly from the principles used by ISO/IEC JTC1/SC2. I believe the problem may be terminology. "Character set" is not the same as "character repertoire". Character set usually means "coded character set", that is, a set of code positions used to map the repertoire to. A character repertoire is like a basket of letters and other signs used, in this case to write Armenian. This misunderstanding may be the principle problem with this section. By mapping unique Armenian punctuation characters to ASCII characters, the AST has not followed the basic principles of character identification and coding used by ISO/IEC JTC1/SC2.
The Coded Character Set is a mapping of a set of characters into a set of integer numbers, e.g. ArmSCII-7, ArmSCII-8 and ArmSCII-8A tables.
ME: Where is the UCS?
The term "unification" is used in the following denotation: as a rule, the mapping of an Armenian character set takes place in operating environments where other character sets are already available; thus, certain characters, in particular punctuation marks, may have identical graphical mapping and similar functions. In such cases, some characters of the Armenian character set may be mapped into already existing codified characters. The details of unification of Armenian punctuation marks are reviewed below.
ME: This "principle" has enabled you to replace ASCII characters (in the range x20-x7F) which is precisely the problem. You should not have made these unifications.
The mapping of characters in coding tables has several aspects (in order of priority): (1) scope of the character mapping, (2) sequence of mapping, (3) character unification requirements, (4) general requirements of a given operating environment.
ME: In my view, the priorities you give are mistaken with regard to implementation in the real world. (4) is the most important; (2) is the least.
The encoding in every new operating environment should, to the extent possible, use the already existing coding tables (see the next section). Should this be impossible, the newly created coding tables should follow as much as possible the following general principles:
ME: I suppose this is fine as far as it goes, except for the problem that your basic character set is flawed, so following it will just cause problems for people.
1. The Armenian character set should be comprehensive (with due regard to the unification)
2. The Armenian character set should be mapped into a continual sequence of codes in the order these are presented in the Table 1. The unified character codes should be left absolute, i.e. should not be used for other purposes. The most important is the letter sequence.
ME: The order of the characters is totally irrelevant. All ordering in the real world is, and will be, table-based, not coding based. There are no significant savings in speed or efficiency of ordering which should lead to the requirement stated here. Further, it causes problems for implementation on some platforms.
3. The unification implies both graphical and functional identity of characters. For example, mapping of the parenthesis (armparenleft and armparenright) into the parenthesis existing in the ASCII is not an error.
ME: Yes, it is. There is no such thing as an Armenian parenthesis different from an ordinary parenthesis.
On the other hand, the similarity of the Armenian full stop (armfullstop) and the colon is purely graphical. The armdot
and armsep
bear functions different from the Latin dot and the grave accent character accordingly. Another important factor of character unification is the use of the Latin alphabet and punctuation marks in formal languages. It should be born in mind, for example, that a comma is often used as a separator in lists (e.g. in a keyword list in HTML document header), and in order to avoid confusion, the armcomma character may be mapped into a Latin comma.
ME: This is where you have made a grave error. You don't want dynamic mapping, you want reliable mapping. You have in ArmSCII-8A replaced x2D HYPHEN-MINUS with "ARMENIAN" EM DASH, and replaced x5F LOW LINE with "ARMENIAN" EN DASH. You will surely run into interoperability problems with UCS implementations. One example of the problem would be in internet addresses and URLs: an example address like hovik_melikyan@physics-university.am will be corrupted by the mappings your Armenian standard has specified for ArmSCII-8A.
4. It may often happen that the requirements of a given operating environment may contradict the above principles. For example, the pseudo-graphical characters in DOS that were supported by video-adapters ("ninth pixel" factor), resulted in the creation of an alternative 8-bit coding table ArmSCII-8A. Another example is Macintosh OS where codes like ellipsis, nbsp and soft hyphen are recognized and interpreted in a special by numerous applications, which rendered the meaningful use the ArmSCII standard in this system impossible (the ArmSCII-8A table is used in OS Macintosh).
ME: The ArmSCII-8A table is unsuitable for use on the Macintosh. Althopugh you have said that characters like ELLIPSIS, NO-BREAK SPACE, and SOFT HYPHEN are recognized and interpreted in certain ways by Mac software, the following errors have been made.
The WorldScript software I have released for Armenian on the Mac, which has been used by Armenian scholars in Israel and the Netherlands, is conformant with Apple Macintosh practice and with SC2's principles of character set encoding.
ArmSCII coding table does not fully correspond to the above principles, and the Armenian block in the current version of
UNICODEUnicode (2.1) corresponds to neither (1), (2), nor (3).
3.2. Cross Reference of Coding Tables
Table 2. Cross reference
1 - Short name
2 - ArmSCII-7
3 - ArmSCII-8 (AST 34.002-97, Basic coding table)
4 - ArmSCII-8A (AST 34.002-97, Alternative coding table)
5 - ArmSCII-16U
6 -UNICODEUnicode Version 2.1
| 1 | 2 | 3 | 4 | 5 | | armeternity | 21 | A1 | DC | 0521 | -
| armew | - | - | - | - | 0587
| armsection | 22 | A2 | 15 | 0522 | 00A7
| armfullstop | 23 | A3 | 3A | 0523 | 0589
| armparenright | 24 | A4 | 29 | 0524 | 0029
| armparenleft | 25 | A5 | 28 | 0525 | 002A
| armquotright | 26 | A6 | AF | 0526 | 00BB
| armquotleft | 27 | A7 | AE | 0527 | 00AB
| armemdash | 28 | A8 | 2D | 0528 | 2014
| armdot | 29 | A9 | 2E | 0529 | 002E
| armsep | 2A | AA | 60 | 052A | 055D
| armcomma | 2B | AB | 2C | 052B | 002C
| armendash | 2C | AC | 5F | 052C | 2013
| armyentamna | 2D | AD | DD | 052D | 058A
| armellipsis | 2E | AE | DE | 052E | 2026
| armapostrophe | 7E | FE | FE | 057E | 02BC
| armexclam | 2F | AF | 7E | 052F | 055C
| armaccent | 30 | B0 | 27 | 0530 | 055B
| armquestion | 31 | B1 | DF | 0531 | 055E
| Armayb | 32 | B2 | 80 | 0532 | 0531
| armayb | 33 | B3 | 81 | 0533 | 0561
| Armben | 34 | B4 | 82 | 0534 | 0532
| armben | 35 | B5 | 83 | 0535 | 0562
| Armgim | 36 | B6 | 84 | 0536 | 0533
| armgim | 37 | B7 | 85 | 0537 | 0563
| Armda | 38 | B8 | 86 | 0538 | 0534
| armda | 39 | B9 | 87 | 0539 | 0564
| Armyech | 3A | BA | 88 | 053A | 0535
| armyech | 3B | BB | 89 | 053B | 0565
| Armza | 3C | BC | 8A | 053C | 0536
| armza | 3D | BD | 8B | 053D | 0566
| Arme | 3E | BE | 8C | 053E | 0537
| arme | 3F | BF | 8D | 053F | 0567
| Armat | 40 | C0 | 8E | 0540 | 0538
| armat | 41 | C1 | 8F | 0541 | 0568
| Armto | 42 | C2 | 90 | 0542 | 0539
| armto | 43 | C3 | 91 | 0543 | 0569
| Armzhe | 44 | C4 | 92 | 0544 | 053A
| armzhe | 45 | C5 | 93 | 0545 | 056A
| Armini | 46 | C6 | 94 | 0546 | 053B
| armini | 47 | C7 | 95 | 0547 | 056B
| Armlyun | 48 | C8 | 96 | 0548 | 053C
| armlyun | 49 | C9 | 97 | 0549 | 056C
| Armkhe | 4A | CA | 98 | 054A | 053D
| armkhe | 4B | CB | 99 | 054B | 056D
| Armtsa | 4C | CC | 9A | 054C | 053E
| armtsa | 4D | CD | 9B | 054D | 056E
| Armken | 4E | CE | 9C | 054E | 053F
| armken | 4F | CF | 9D | 054F | 056F
| Armho | 50 | D0 | 9E | 0550 | 0540
| armho | 51 | D1 | 9F | 0551 | 0570
| Armdza | 52 | D2 | A0 | 0552 | 0541
| armdza | 53 | D3 | A1 | 0553 | 0571
| Armghat | 54 | D4 | A2 | 0554 | 0542
| armghat | 55 | D5 | A3 | 0555 | 0572
| Armtche | 56 | D6 | A4 | 0556 | 0543
| armtche | 57 | D7 | A5 | 0557 | 0573
| Armmen | 58 | D8 | A6 | 0558 | 0544
| armmen | 59 | D9 | A7 | 0559 | 0574
| Armhi | 5A | DA | A8 | 055A | 0545
| armhi | 5B | DB | A9 | 055B | 0575
| Armnu | 5C | DC | AA | 055C | 0546
| armnu | 5D | DD | AB | 055D | 0576
| Armsha | 5E | DE | AC | 055E | 0547
| armsha | 5F | DF | AD | 055F | 0577
| Armvo | 60 | E0 | E0 | 0560 | 0548
| armvo | 61 | E1 | E1 | 0561 | 0578
| Armcha | 62 | E2 | E2 | 0562 | 0549
| armcha | 63 | E3 | E3 | 0563 | 0579
| Armpe | 64 | E4 | E4 | 0564 | 054A
| armpe | 65 | E5 | E5 | 0565 | 057A
| Armje | 66 | E6 | E6 | 0566 | 054B
| armje | 67 | E7 | E7 | 0567 | 057B
| Armra | 68 | E8 | E8 | 0568 | 054C
| armra | 69 | E9 | E9 | 0569 | 057C
| Armse | 6A | EA | EA | 056A | 054D
| armse | 6B | EB | EB | 056B | 057D
| Armvev | 6C | EC | EC | 056C | 054E
| armvev | 6D | ED | ED | 056D | 057E
| Armtyun | 6E | EE | EE | 056E | 054F
| armtyun | 6F | EF | EF | 056F | 057F
| Armre | 70 | F0 | F0 | 0570 | 0550
| armre | 71 | F1 | F1 | 0571 | 0580
| Armtso | 72 | F2 | F2 | 0572 | 0551
| armtso | 73 | F3 | F3 | 0573 | 0581
| Armvyun | 74 | F4 | F4 | 0574 | 0552
| armvyun | 75 | F5 | F5 | 0575 | 0582
| Armpyur | 76 | F6 | F6 | 0576 | 0553
| armpyur | 77 | F7 | F7 | 0577 | 0583
| Armke | 78 | F8 | F8 | 0578 | 0554
| armke | 79 | F9 | F9 | 0579 | 0584
| Armo | 7A | FA | FA | 057A | 0555
| armo | 7B | FB | FB | 057B | 0585
| Armfe | 7C | FC | FC | 057C | 0556
| armfe | 7D | FD | FD | 057D | 0586
| |
4. Character Set and Language Tags
4.1. Coded Character Set Tags
In the systems and protocols using mnemonic tags for coded character sets, the following tags should be used (name, official source, optional alias):
| Name: | armscii-8 | |
| Source: | Armenian State Standard AST 34.002 Basic 8-bit coded character set | |
| Alias: | AST_34.002 | |
| Name: | armscii-8a | |
| Source: | Armenian State Standard AST 34.002 Alternative 8-bit coded character set | |
| Alias: | AST_34.002-A |
4.2. Language Tags
Dictionaries, spelling checkers and other linguistic systems, as well as operating environments distinguishing human languages and locale identification should take into consideration the existence of 4 mutually incomprehensible forms (dialects) of the Armenian language: Eastern Armenian, Western Armenian, Grabar Armenian and Middle Armenian. Table 3 presents two forms of suggested mnemonic tags: MIME-style (RFC-1766) and Windows-style 3-letter abbreviations.
Table 3. Language tags
| Mime-style name | 3-letter code | Full name | ||
| hy-eastern | AME | Armenian Eastern | ||
| hy-western | AMW | Armenian Western | ||
| hy-grabar | AMG | Armenian Grabar | ||
| hy-middle | AMM | Armenian Middle |
5. Acknowledgements
This document is the result of long and intensive consultations and cooperation with the staff of the Standards Working Group of the Armenian Computer Center. Special thanks for most valuable inputs and comments go to (in alphabetical order):
ME: Without prejudice to the hard work done by this committee, consultation with experts outside of Armenian would, in my opinion, have been a good idea.Hovhannes Gizoghian
Tigran Haroutunian
Aram Hayrapetian
Ivan Lulukian
Vahram Mekhitarian
Rouben Taroumian-Hakobian
Hovhannes Zakarian
6. Author's Address
Hovik Melikyan
Center of Humane Technologies "Armenian Computer"
Yerevan, Republic of Armenia
hovik@moon.yerphi.am
7. References
[AST 34.001-97]
Information Technologies -- Character Set And Information Encoding: Character Set -- State Standardization Committee of the Republic of Armenia, July 1997
[AST 34.002-97]
Information Technologies -- Character Set And Information Encoding: 8-bit Coded Character Sets -- State Standardization Committee of the Republic of Armenia, July 1997
[ArmSCII]
Armenian Standard Code for Information Interchange -- Center of Humane Technologies "Armenian Computer", June 1991
[RFC-1766]
Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995.
[
UNICODEUnicode]
The Unicode Consortium, "The Unicode Standard -- Version 2.0", Addison-Wesley, 1996.
[
UNICODEUnicode Version 2.1]Unicode Technical Report #8, The Unicode Standard, Version 2.1 -- http://www.unicode.org/unicode/reports/tr8.html.