Tengwar: U+E000 - U+E07F
NOTE: This is still a proposed encoding and has not been standardized. Discussion papers are available here, here, and here.The Tengwar script was invented by the philologist and author J. R. R. Tolkien as part of the mythological world he created, and was widely popularized through his work, The Lord of the Rings, The Silmarillion, etc. Along with a family of artificial languages and a large corpus of etymological data describing their relationships, the Tengwar script has attracted the attention of a large community of linguists and other enthusiasts interested in this expression of Tolkien's expertise in historical and comparative linguistics. The Tengwar shouldbe treated as a Category D (Attested Extinct) alphabet: there is a relatively limited corpus, and a relatively small (but existent) scholarly body studying it. In order to provide a standard Tengwar character coding for such scholars and enthusiasts, it has been suggested that this character set be included into the Unicode standard and ISO 10646.
8 columns are reserved to encode the Tengwar. The last column is currently unused, and is reserved for future discoveries in the Tolkien manuscripts. Character names derive from Tolkien's published writings; as usual, long vowels are written double.
General Principles of the Tengwar scriptThe Tengwar script is a system of consonantal signs without strictly fixed values; their glyphic structure comprises a matrix of potential phonetic relationships, rather than a set of fixed relationships between sound and character. The primary letters (U+E000 - U+E017) are formed of a telco 'stem' and a lúva 'bow'; raising the stem might indicate spirantization of a consonant, or doubling the bow might indicate voicing. Consonants are modified by tehtar 'signs', described below.
A series of "stemless consonants" have been encoded. STEMLESS OORE is used as DIGIT ZERO; STEMLESS VILYA is used as DIGIT ONE; STEMLESS ANNA is used as a vowel in the mode of Beleriand; STEMLESS VALA is as yet unattested, but is included here because of the inherent structure of the script.
Tengwar are written from left-to-right. Tengwar numerals are written from right-to-left (the least significant digit is on the left). The DECIMAL BASE MARK and DUODECIMAL BASE MARK are applied to the digits to indicate what the arithmetic base is used; the DUODECIMAL LEAST SIGNIFICANT DIGIT MARK is used on the least significant digit in a duodecimal expression. The numeric marks are not generally considered optional.
No positional variants of the letters exist. Like Arabic, the script is founded on calligraphic handwriting, and many ligatures may be required for high-quality rendering -- though unligatured forms may often be acceptable. No ligatures are encoded here.
Vowels and Other Marks of PronunciationNon-spacing marks, generically called tehtar 'signs', indicate vowels or other modifications of consonantal letters. Tehtar are placed above or below consonants, or atop "carriers" when no consonant is present in the required position. The occurrence of a character in the tehtar range, depicted with relation to a dashed circle, constitutes an assertion that this character is intended to be applied via some process to the consonantal character that precedes it in the text stream. General rules for applying non-spacing marks are given in Section 2.5 of the Unicode Standard. In ISO 10646, Level 2 encoding is intended. See the remarks on Modes below.
The SHORT CARRIER simply bears the vowel tehta; the LONG CARRIER indicates that the vowel was long; this can also be done by doubling the vowel sign.
ModesThe morphological structure of a language determines the "mode" in which the Tengwar script is used for it. For instance, the tehtar are placed above or below the preceding consonant in languages in which words tend to end in a vowel; but they are placed above or below the following consonant in languages in which words tend to end in a consonant (compare Quenya nelde 'three', neltildi 'triangle' with Sindarin neled and nelthil.). In accordance with Unicode specifications, however, the tehtar are encoded as non-spacing characters, and so must follow the consonant over which they appear. For Sindarin, this requires that the logical order of backing store does not reflect its true syllabic structure. For instance, the Quenya examples here are encoded NUUMEN-ACUTE-ALDA-ACUTE (n-e-ld-e), and NUUMEN-ACUTE-LAMBE-TINCO-AMATICSE-ALDA-AMATICSE (n-e-l-t-i-ld-i); the Sindarin encoded NUUMEN-LAMBE-ACUTE-ANDO-ACUTE (n-l-e-d-e), and NUUMEN-LAMBE-ACUTE-THUULE-LAMBE-AMATICSE (n-l-e-th-l-i). English is generally written according to a Sindarin-type mode; Italian would be written according to a Quenya-type mode. This inconsistency of phonetic representation and encoding in the backing store is a function of the script's unique representation of modalities which must be reckoned with apart from the character set itself. Smart inputting methods, such as are used for some Southeast Asian Brahmic scripts, could solve the problem for Sindarin-type mode inputting. In the mode of Beleriand, where the tehtar are not used, but full vowels, the Sindarin examples are written: OORE-YANTA-LAMBE-YANTA-ANDO (n-e-l-e-d) and OORE-YANTA-LAMBE-THUULE-SHORT CARRIER-LAMBE (n-e-l-th-i-l). Mapping software for conversion of standard-mode and Beleriand-mode Sindarin will be requisite.
PunctuationTengwar punctuation characters are considered to be unique to the script and are coded in the Tengwar block. Some composition of punctuation occurs in Tengwar: DOUBLE PUSTA can be followed by SECTION MARK, LONG SECTION MARK, PUSTA, and DOUBLE PUSTA.
Sometimes word space is not used; word separation may be achieved in that case with U+200B, ZERO WIDTH SPACE. Hyphenation is not used; words may be broken before any LETTER.
Encoding StructureThe Tengwar block is divided into the following ranges:
U+E001 -> E017 Consonants U+E018 -> E033 Miscellaneous letters U+E040 -> E04F Vowel signs U+E050 -> E053 Punctuation U+E054 -> E055 unassigned U+E056 -> E057 Additional vowel signs U+E058 -> E059 unassigned U+E05A Additional vowel sign U+E05B unassigned U+E05C -> E05D Miscellaneous letters U+E05E -> E05F unassigned U+E060 -> E061 Punctuation U+E062 -> E06B Numerals U+E06C -> E06E Numeric modifiers U+E06F -> E07F unassigned