[Evertype]  ConScript Unicode Registry Back to the main CSUR page
 
 

How to propose Unicode character names

Every Unicode character, and so every ConScript Unicode character, must have a unique, distinct name. Names must consist only of CAPITAL LETTERS of the English alphabet (A-Z), plus HYPHEN-MINUS ("-") and SPACE (" "). Try to avoid using the hyphen. These rules must be strictly adhered to.

Character names consist of several words, as follows:

Script name

The first word is always the name of the script, such as LATIN, GREEK, DEVANAGARI, or TENGWAR. This is not necessarily the same name as the language which the script is used for, particularly when the same script is used for more than one language: thus, the Latin script can represent languages as diverse as English, Swahili, and Vietnamese. (Some Unicode symbols don't begin with a script name, but this is not allowed in ConScript Unicode.)

Character type

The next word or two represent the general type of the character. The standard alternatives are:
  • CAPITAL LETTER
  • SMALL LETTER
  • LETTER (for scripts that don't distinguish between capital and small letters)
  • MODIFIER LETTER (a letter that is always used in conjunction with some preceding or following letter, but isn't attached to it, like the triangular-colon that means "lengthen preceding sound" in the International Phonetic Alphabet)
  • COMBINING (an accent mark that appears above, below, or otherwise combined with one or more regular characters, whether physically attached or not)
  • VOWEL SIGN (used in place of COMBINING for marks representing vowels, as used in Hebrew, Devanagari, or Tengwar)
  • DIGIT (a decimal digit (OCTAL DIGIT or DUODECIMAL DIGIT are permissible where required))
  • NUMBER (a character representing an entire number or part of one)
  • SYLLABLE
  • IDEOGRAPH (representing a word or concept)
  • SYMBOL FOR
Other possibilities may need to be invented for the needs of particular scripts. Note that names of punctuation marks don't have a character type.

Language

The next word is the name of the language in which this character is used. This word does not appear unless two different languages use different characters which most naturally have the same individual name. Thus CIRTH LETTER N is used for the sound "n" when writing Sindarin, but when specific for writing Khuzdul, CIRTH LETTER KHUZDUL N is appropriate. There is no need to specify CIRTH LETTER SINDARIN N, as Sindarin is the "default language" for the script.

Individual name

For LETTERs, this should be the traditional name of the letter in the principal language of use, as in TENGWAR LETTER ROMEN. If there are no traditional names, or if they conflict wildly between languages, a string derived from the usual romanization may be used instead. Some kludging (such as doubling a long vowel after removing the acute accent) will often be needed to ensure uniqueness.

For DIGITs, spell out, in English, ZERO, ONE, TWO, etc.; similarly for NUMBERs.

In other cases, a name describing the meaning of a character is preferred to a name that describes the appearance of the character, which in turn is preferred to a name describing the usage of the character. Note that the Unicode name for the single dot normally used at the ends of sentences in English is FULL STOP, not PERIOD. Spelling and terminology of English words in character names follows the practice of the Oxford English Dictionary.
 
HTML Michael Everson, Evertype, Cnoc na Sceiche, Leac an Anfa, Cathair na Mart, Co. Mhaigh Eo, √Čire, 2006-05-28

Copyright © 1993-2006 Evertype. All Rights Reserved