Tengwar: U+E000 - U+E07F

Proposals 1993-05-09, 1996-09-15; revision 1998-01-10

NOTE: This is still a proposed encoding and has not been standardized. Discussion papers are available here, here, and here.

The Tengwar script was invented by the philologist and author J. R. R. Tolkien as part of the mythological world he created, and was widely popularized through his work, The Lord of the Rings, The Silmarillion, etc. Along with a family of artificial languages and a large corpus of etymological data describing their relationships, the Tengwar script has attracted the attention of a large community of linguists and other enthusiasts interested in this expression of Tolkien's expertise in historical and comparative linguistics. The Tengwar shouldbe treated as a Category D (Attested Extinct) alphabet: there is a relatively limited corpus, and a relatively small (but existent) scholarly body studying it. In order to provide a standard Tengwar character coding for such scholars and enthusiasts, it has been suggested that this character set be included into the Unicode standard and ISO 10646.

8 columns are reserved to encode the Tengwar. The last column is currently unused, and is reserved for future discoveries in the Tolkien manuscripts. Character names derive from Tolkien's published writings; as usual, long vowels are written double.

General Principles of the Tengwar script

The Tengwar script is a system of consonantal signs without strictly fixed values; their glyphic structure comprises a matrix of potential phonetic relationships, rather than a set of fixed relationships between sound and character. The primary letters (U+E000 - U+E017) are formed of a telco 'stem' and a lúva 'bow'; raising the stem might indicate spirantization of a consonant, or doubling the bow might indicate voicing. Consonants are modified by tehtar 'signs', described below.

A series of "stemless consonants" have been encoded. STEMLESS OORE is used as DIGIT ZERO; STEMLESS VILYA is used as DIGIT ONE; STEMLESS ANNA is used as a vowel in the mode of Beleriand; STEMLESS VALA is as yet unattested, but is included here because of the inherent structure of the script.

Tengwar are written from left-to-right. Tengwar numerals are written from right-to-left (the least significant digit is on the left). The DECIMAL BASE MARK and DUODECIMAL BASE MARK are applied to the digits to indicate what the arithmetic base is used; the DUODECIMAL LEAST SIGNIFICANT DIGIT MARK is used on the least significant digit in a duodecimal expression. The numeric marks are not generally considered optional.

No positional variants of the letters exist. Like Arabic, the script is founded on calligraphic handwriting, and many ligatures may be required for high-quality rendering -- though unligatured forms may often be acceptable. No ligatures are encoded here.

Vowels and Other Marks of Pronunciation

Non-spacing marks, generically called tehtar 'signs', indicate vowels or other modifications of consonantal letters. Tehtar are placed above or below consonants, or atop "carriers" when no consonant is present in the required position. The occurrence of a character in the tehtar range, depicted with relation to a dashed circle, constitutes an assertion that this character is intended to be applied via some process to the consonantal character that precedes it in the text stream. General rules for applying non-spacing marks are given in Section 2.5 of the Unicode Standard. In ISO 10646, Level 2 encoding is intended. See the remarks on Modes below.

The SHORT CARRIER simply bears the vowel tehta; the LONG CARRIER indicates that the vowel was long; this can also be done by doubling the vowel sign.

Modes

The morphological structure of a language determines the "mode" in which the Tengwar script is used for it. For instance, the tehtar are placed above or below the preceding consonant in languages in which words tend to end in a vowel; but they are placed above or below the following consonant in languages in which words tend to end in a consonant (compare Quenya nelde 'three', neltildi 'triangle' with Sindarin neled and nelthil.). In accordance with Unicode specifications, however, the tehtar are encoded as non-spacing characters, and so must follow the consonant over which they appear. For Sindarin, this requires that the logical order of backing store does not reflect its true syllabic structure. For instance, the Quenya examples here are encoded NUUMEN-ACUTE-ALDA-ACUTE (n-e-ld-e), and NUUMEN-ACUTE-LAMBE-TINCO-AMATICSE-ALDA-AMATICSE (n-e-l-t-i-ld-i); the Sindarin encoded NUUMEN-LAMBE-ACUTE-ANDO-ACUTE (n-l-e-d-e), and NUUMEN-LAMBE-ACUTE-THUULE-LAMBE-AMATICSE (n-l-e-th-l-i). English is generally written according to a Sindarin-type mode; Italian would be written according to a Quenya-type mode. This inconsistency of phonetic representation and encoding in the backing store is a function of the script's unique representation of modalities which must be reckoned with apart from the character set itself. Smart inputting methods, such as are used for some Southeast Asian Brahmic scripts, could solve the problem for Sindarin-type mode inputting. In the mode of Beleriand, where the tehtar are not used, but full vowels, the Sindarin examples are written: OORE-YANTA-LAMBE-YANTA-ANDO (n-e-l-e-d) and OORE-YANTA-LAMBE-THUULE-SHORT CARRIER-LAMBE (n-e-l-th-i-l). Mapping software for conversion of standard-mode and Beleriand-mode Sindarin will be requisite.

Punctuation

Tengwar punctuation characters are considered to be unique to the script and are coded in the Tengwar block. Some composition of punctuation occurs in Tengwar: DOUBLE PUSTA can be followed by SECTION MARK, LONG SECTION MARK, PUSTA, and DOUBLE PUSTA.

Sometimes word space is not used; word separation may be achieved in that case with U+200B, ZERO WIDTH SPACE. Hyphenation is not used; words may be broken before any LETTER.

Encoding Structure

The Tengwar block is divided into the following ranges:

	U+E001 -> E017 Consonants
	U+E018 -> E033 Miscellaneous letters
	U+E040 -> E04F Vowel signs
	U+E050 -> E053 Punctuation
	U+E054 -> E055 unassigned
	U+E056 -> E057 Additional vowel signs
	U+E058 -> E059 unassigned
	U+E05A         Additional vowel sign
	U+E05B         unassigned
	U+E05C -> E05D Miscellaneous letters
	U+E05E -> E05F unassigned
	U+E060 -> E061 Punctuation
	U+E062 -> E06B Numerals
	U+E06C -> E06E Numeric modifiers
	U+E06F -> E07F unassigned

U+E000	TENGWAR LETTER TINCO
U+E001	TENGWAR LETTER PARMA
U+E002	TENGWAR LETTER CALMA
U+E003	TENGWAR LETTER QUESSE
U+E004	TENGWAR LETTER ANDO
U+E005	TENGWAR LETTER UMBAR
U+E006	TENGWAR LETTER ANGA
U+E007	TENGWAR LETTER UNGWE
U+E008	TENGWAR LETTER THUULE (suule)
U+E009	TENGWAR LETTER FORMEN
U+E00A	TENGWAR LETTER HARMA (aha)
U+E00B	TENGWAR LETTER HWESTA
U+E00C	TENGWAR LETTER ANTO
U+E00D	TENGWAR LETTER AMPA
U+E00E	TENGWAR LETTER ANCA
U+E00F	TENGWAR LETTER UNQUE
U+E010	TENGWAR LETTER NUUMEN
U+E011	TENGWAR LETTER MALTA
U+E012	TENGWAR LETTER NOLDO (ngoldo)
U+E013	TENGWAR LETTER NWALME (ngwalme)
U+E014	TENGWAR LETTER OORE
U+E015	TENGWAR LETTER VALA
U+E016	TENGWAR LETTER ANNA
U+E017	TENGWAR LETTER VILYA (wilya)
U+E018	TENGWAR LETTER ROOMEN
U+E019	TENGWAR LETTER ARDA
U+E01A	TENGWAR LETTER LAMBE
U+E01B	TENGWAR LETTER ALDA
U+E01C	TENGWAR LETTER SILME
U+E01D	TENGWAR LETTER SILME NUQUERNA
U+E01E	TENGWAR LETTER AARE (aaze, esse)
U+E01F	TENGWAR LETTER AARE NUQUERNA (aaze n., esse n.)
U+E020	TENGWAR LETTER HYARMEN
U+E021	TENGWAR LETTER HWESTA SINDARINWA
U+E022	TENGWAR LETTER YANTA
U+E023	TENGWAR LETTER UURE
U+E024	TENGWAR LETTER HALLA
U+E025	TENGWAR LETTER SHORT CARRIER
U+E026	TENGWAR LETTER LONG CARRIER
U+E027	TENGWAR LETTER ANNA SINDARINWA
U+E028	TENGWAR LETTER EXTENDED THUULE
U+E029	TENGWAR LETTER EXTENDED FORMEN
U+E02A	TENGWAR LETTER EXTENDED HARMA
U+E02B	TENGWAR LETTER EXTENDED HWESTA
U+E02C	TENGWAR LETTER EXTENDED ANTO
U+E02D	TENGWAR LETTER EXTENDED AMPA
U+E02E	TENGWAR LETTER EXTENDED ANCA
U+E02F	TENGWAR LETTER EXTENDED UNQUE
U+E030	TENGWAR LETTER STEMLESS OORE (digit zero)
U+E031	TENGWAR LETTER STEMLESS VALA
U+E032	TENGWAR LETTER STEMLESS ANNA
U+E033	TENGWAR LETTER STEMLESS VILYA (digit one)
U+E034	(This position shall not be used)
U+E035	(This position shall not be used)
U+E036	(This position shall not be used)
U+E037	(This position shall not be used)
U+E038	(This position shall not be used)
U+E039	(This position shall not be used)
U+E03A	(This position shall not be used)
U+E03B	(This position shall not be used)
U+E03C	(This position shall not be used)
U+E03D	(This position shall not be used)
U+E03E	(This position shall not be used)
U+E03F	(This position shall not be used)
U+E040	TENGWAR SIGN THREE DOTS ABOVE
U+E041	TENGWAR SIGN THREE DOTS BELOW
U+E042	TENGWAR SIGN TWO DOTS ABOVE
U+E043	TENGWAR SIGN TWO DOTS BELOW
U+E044	TENGWAR SIGN AMATICSE (dot above)
U+E045	TENGWAR SIGN NUNTICSE (dot below)
U+E046	TENGWAR SIGN ACUTE (andaith, long mark)
U+E047	TENGWAR SIGN DOUBLE ACUTE
U+E048	TENGWAR SIGN RIGHT CURL
U+E049	TENGWAR SIGN DOUBLE RIGHT CURL
U+E04A	TENGWAR SIGN LEFT CURL
U+E04B	TENGWAR SIGN DOUBLE LEFT CURL
U+E04C	TENGWAR SIGN NASALIZER
U+E04D	TENGWAR SIGN DOUBLER
U+E04E	TENGWAR SIGN TILDE
U+E04F	TENGWAR SIGN BREVE
U+E050	TENGWAR PUSTA (putta, stop)
U+E051	TENGWAR DOUBLE PUSTA (putta)
U+E052	TENGWAR EXCLAMATION MARK
U+E053	TENGWAR QUESTION MARK
U+E054	TENGWAR SECTION MARK
U+E055	TENGWAR LONG SECTION MARK
U+E056	TENGWAR SIGN LONG CARRIER BELOW
U+E057	TENGWAR SIGN DOUBLE ACUTE BELOW
U+E058	TENGWAR SIGN RIGHT CURL BELOW
U+E059	(This position shall not be used)
U+E05A	TENGWAR SIGN LEFT CURL BELOW
U+E05B	(This position shall not be used)
U+E05C	TENGWAR SIGN LEFT FOLLOWING SILME
U+E05D	TENGWAR SIGN RIGHT FOLLOWING SILME
U+E05E	(This position shall not be used)
U+E05F	(This position shall not be used)
U+E060	(This position shall not be used)
U+E061	(This position shall not be used)
U+E062	TENGWAR DIGIT TWO
U+E063	TENGWAR DIGIT THREE
U+E064	TENGWAR DIGIT FOUR
U+E065	TENGWAR DIGIT FIVE
U+E066	TENGWAR DIGIT SIX
U+E067	TENGWAR DIGIT SEVEN
U+E068	TENGWAR DIGIT EIGHT
U+E069	TENGWAR DIGIT NINE
U+E06A	TENGWAR DUODECIMAL DIGIT TEN
U+E06B	TENGWAR DUODECIMAL DIGIT ELEVEN
U+E06C	TENGWAR DECIMAL BASE MARK
U+E06D	TENGWAR DUODECIMAL BASE MARK
U+E06E	TENGWAR DUODECIMAL LEAST SIGNIFICANT DIGIT MARK
U+E06F	(This position shall not be used)
U+E070	(This position shall not be used)
U+E071	(This position shall not be used)
U+E072	(This position shall not be used)
U+E073	(This position shall not be used)
U+E074	(This position shall not be used)
U+E075	(This position shall not be used)
U+E076	(This position shall not be used)
U+E077	(This position shall not be used)
U+E078	(This position shall not be used)
U+E079	(This position shall not be used)
U+E07A	(This position shall not be used)
U+E07B	(This position shall not be used)
U+E07C	(This position shall not be used)
U+E07D	(This position shall not be used)
U+E07E	(This position shall not be used)
U+E07F	(This position shall not be used)

HTML Michael Everson, Evertype, Cnoc na Sceiche, Leac an Anfa, Cathair na Mart, Co. Mhaigh Eo, Éire, 2006-05-28