CEN/TC304 N634R FIFTH DRAFT
Date: 1997-02-01


Title: Repertoires of characters used for writing the indigenous languages of Europe - Fifth Draft for CEN/TC304 Project 11

Source: Michael Everson, Everson Gunn Teoranta (IE)
Status: Expert Contribution
Action: For consideration by CEN/TC304/WG2


TECHNICAL REPORT
REPORT TECHNIQUE
TECHNISCHE RAPPORT

Descriptors: Data processing, information interchange, text processing, text communication, graphic characters, character sets, representation of characters, coded character sets, natural language.

English version

Repertoires of characters used for writing the indigenous languages of Europe


NOTE: THIS DOCUMENT IS A PRELIMINARY DRAFT AND IS A WORK IN PROGRESS ONLY. COMMENT IS INVITED BUT THE DATA CONTAINED HEREIN MAY CHANGE AND SHOULD NOT BE USED FOR DEVELOPMENT AT THIS TIME.

CEN members are the national bodies of Austria, Belgium, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, and the United Kingdom.

CEN

European Committee for Standardization
Comité Européen de Normalisation
Europäisches Komitee für Normung

Central Secretariat: rue de Stassart 36, B-1050 Brussels
©1997 Michael Everson. May be copied by CEN/TC304 members.
Ref. No. prTR XXX:1996 X


Contents

Foreword
0 Introduction
1 Scope
1.1 The geographical area of Europe
1.2 The languages of Europe
1.3 The scripts of Europe
2.0 Writing systems
2.1 Numbers
2.2 Punctuation
2.3 Alphabetic repertoires
Annex A Genetic classification of European languages
Annex B Names of European languages
Annex C Unwritten European languages
Annex D Administrative units of Europe
Annex E Mapping of letters used in European languages to ISO 10646
Annex F Index of European languages
Annex G Bibliography

Foreword.

This Technical Report was prepared by CEN/TC304/WG2 "Cultural Elements" to provide information on the letters used to write the indigenous languages of Europe in an Information Technology context.


0. Introduction.

This Technical Report provides a source of linguistic data for the indigenous languages of Europe. The use of the term "indigenous" (or "autochthonous") indicates that this report covers the languages native to the European geographical area. Other languages, more recently imported to Europe (Vietnamese or Bengali, for instance), are not covered by this report. The exclusion of these languages from this report is not intended to imply any bias whatsoever against such "immigrant" languages. But the fact remains that it is relatively easy to get information on the orthography of Vietnamese and Bengali, while many of Europe's indigenous minority languages have been poorly served -- if acknowledged at all -- in the area of Information Technology; this Technical Report serves to remedy that oversight. The main function of this Technical Report is to give a practical guide to European writing systems. The characters which are, and in some cases were, used to write each of the languages of Europe (as far as it has been possible to find information on them), are included here. Some of Europe's languages (particularly in the Caucasus) still have no tradition of writing and so are not represented in the main body of the report; though other information on them is provided in Annex C for the sake of completeness. Likewise, some languages have used, or continue to use, one or more than one writing system, which also is reflected here.

Personal acknowledgements are rarely presented in CEN Technical Reports. But this Technical Report could not have been compiled without the input of many, many people, and the difficult nature of the material presented here begs for acknowledgement of the abundant expertise which contributed to the final document. The following people are gratefully acknowledged by the editor, Michael Everson (EGT): Baldur Jónsson (Íslensk málstöđ), Elzbieta Broma-Wrzesień (PET), Gerhard Budin (ÖNORM), Borka Jerman-Blazic (IJS), Evángelos Melagrákis (ELOT), Klaas Ruppel (KKT), Alexandra Stătescu (RNORM), Trond Trosterud (Barentssekretariat), Johan van Wingen (NNI), Ţorvarđur Kári Ólafsson (STRÍ). Very special thanks are due to Judy Nye (University Research Library, University of California, Los Angeles), for granting the editor a week's indulgence as he photocopied a mountain of material.


1. Scope. This Technical Report gives information on characters used in Europe's indigenous languages. In order to accomplish this, a definition of the geographical area covered is necessary so that we can know where it is we are talking about. In general, it can be said that "everybody knows where Europe is", but in practice it is a little bit less clear than one would think. The nations in the Caucasus, for instance, are sometimes counted as part of Europe, and sometimes not.


1.1 The geographical area of Europe. For the purposes of this Technical Report, we have used the following geographical definition of Europe: Languages covered in this report also include languages found in the following areas: Information concerning the administrative units covered by this geographical definition can be found in Annex D. It is important to note that this is a geolinguistic survey. It is not a political survey. The area defined here may be seen on page xiv, "Geographical Comparisons", in The Times Atlas of the World, 1990 edition.


1.2 The languages of Europe. A convenient way of enumerating the languages of Europe is to do so by linguistic family. The classification used in this Technical Report is based on the classification found in Ruhlen 1992, but differs from it in some respects (Ruhlen 1992 may not be perfect or universally-accepted, in the field of linguistics at least, but it is reasonably comprehensive and its bibliography is helpful for further study).

Most Europeans speaking indigenous European languages speak Indo-European or Uralic languages, but there are five other language families represented in Europe. This report is intended to be neutral with respect to language; its task is to document languages, not to rank them in any particular way. Accordingly, languages are listed by family and subfamily. A fuller genetic classification for these languages is given in Annex A. The names of these languages are listed in English in this part of the technical report, but can be found in their original spellings (with Latin transliterations) in Annex B. An asterisk (*) following a language name indicates that the language has no standard literary orthography and will be found in Annex C. Inclusion in Annex C means only that a standard literary alphabet does not exist. For some of these languages, the populations speaking them are rather large; some of the languages with standard orthographies have very small numbers of speakers.


1.3. The scripts of Europe. The languages listed in §1.2 employ the Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, and Georgian scripts. Some of the scripts have unique stylistic variants: Latin has Roman, Gaelic, and Fraktur variants (of which Roman is the most commonly employed); Cyrillic has XXX and Slavonic variants (of which XX is the most commonly employed); Georgian has Mxedruli and Xucuri variants (of which Mxedruli is the most commonly employed).


2.0. Writing systems. European writing systems are alphabetic, not syllabic (like Ethiopic or Cherokee) or logographic (like Chinese). Together with its own alphabet, each of Europe's languages uses a set of punctuation marks which, in general, is standardized between the scripts. (Exceptions to this are discussed in the relevant sections below.) European alphabetic scripts have a fixed number of basic letters, to which additional letters are appended for use with particular languages. Some of these additional letters are also basic letters which cannot be identified with any other letter; others are derived letters created either by some deformation of the basic letter itself or by adding some diacritic mark or sign to it. The Latin, Greek, Cyrillic, and Armenian alphabets have case, that is, almost all letters have both a capital and a small form. Hebrew, Arabic, and Georgian do not share this feature. Latin, Greek, Cyrillic, Armenian, and Georgian are written from left to right; Hebrew and Arabic are written from right to left. Hebrew and Arabic are non-European scripts used to write some European languages. Some of the indigenous languages of Europe were formerly written with other scripts, such as Ogham, Runic, Linear B, and Iberian.


2.1 Numbers. Decimal digits are in standard use in European languages, though the Arabic script has unique glyphs for these. All European scripts also have a system of non-decimal numbers derived by giving numeric values to the letters. Decimal numerals are used for calculations; alphabetic numerals are often used for pagination and indexing and so forth. (This Technical Report does not give further information on alphabetic numeral systems; it can be found in works like Haarmann 1993 and Nakanishi 19xx.)
2.3 Punctuation. Punctuation marks to indicate major breaks in text are relatively ancient. In the oldest texts punctuation and wordspacing were often not used; use of various dots and dashes developed into the "standard" European repertoire of punctuation marks in use today. Most computers nowadays supply the following repertoire:

,	COMMA
.	FULL STOP
;	SEMICOLON
:	COLON
!	EXCLAMATION MARK
?	QUESTION MARK
/	SOLIDUS
\	REVERSE SOLIDUS
"	DOUBLE QUOTATION MARK (and its curly variants)
#	NUMBER SIGN
%	PERCENT SIGN
&	AMPERSAND
¶	PILCROW SIGN
@	COMMERCIAL AT SIGN
§	SECTION SIGN
'	APOSTROPHE
(	LEFT PARENTHESIS
)	RIGHT PARENTHESIS
*	ASTERISK
[	LEFT SQUARE BRACKET
]	RIGHT SQUARE BRACKET
«	LEFT-POINTING DOUBLE ANGLED QUOTATION MARK
»	LEFT-POINTING DOUBLE ANGLED QUOTATION MARK
<	LESS-THAN SIGN
>	GREATER THAN SIGN
-	HYPHEN-MINUS
­	EN DASH
‹	EM DASK
ł	LEFT CURLY QUOTATION MARK
˛	RIGHT CURLY QUOTATION MARK
Ś	LEFT QUOTATION MARK
ą	RIGHT QUOTATION MARK
·	MIDDLE DOT
’	LOW SINGLE QUOTTION MARK
“	LOW DOUBLE QUOTATION MARK
{	LEFT CURLY BRACKET
|	VERTICAL BAR
}	RIGHT CURLY BRACKET
_	LOW LINE

2.3 Alphabetic repertoires. This section lists each language and gives its "alphabet", including digraphs and alphabetical order, when this information is available. Some languages, such as Welsh, treat a string of two characters as a single letter for alphabetizing: thus, for Welsh "a b c ch d...", all words beginning with "ch" follow all words beginning with "cy" and precede words beginning with "da". In the repertoires, parentheses are used to indicate equivalences or basic letters not used natively in the orthography. An example of the first instance: Irish Gaelic sorts "A a (Á á), B b, C c..." while Icelandic sorts "A a, Á á, B b, C c...". This indicates that "Á" is a kind of "A" and is sorted with "A" in Irish Gaelic, but is a separate letter and is sorted as a separate letter in Icelandic. An example of the second instance: in Esperanto "... V v, (W w), (X x), (Y y), Z z". This indicates that in Esperanto the letters "W", "X", and "Y" are not used, except to write the "foreign " names of people and places.

The placement of the comma with respect to the parentheses is therefore especially to be noted in these repertoires.

To find a language in the repertoire, look up its name in the alphabetical list given in Annex F, and go to its corresponding number in the list below.


Coptic

Assyrian. No data available to the editor.

Maltese


Basque
Georgian

Judeo-Georgian. No data available to the editor.

Svan

Laz

Mingrelian. No data available to the editor.

Abaza

Abxaz

Adyge

Karbardian

Chechen

Ingush

Avar

Lak

Xinalug

Lezgian

Tabasaran

Rutul

Kryts. No data available to the editor.

Udi


Greenlandic
Armenian

Romani
The state of affairs of Romani orthography is complex but the International Romani Union has been making recommendations and apparently decisions with regard to the alphabet were taken in 1990 (Liégeois 1994).
Romani

Talysh. No data available to the editor.

Ossetian

Kirmanji. No data available to the editor.

Kurdish

Judeo-Kurdish. No data available to the editor.

Tati

Albanian

Arvanite. No data available to the editor. Possibly identical to Albanian.

Greek
Monotonikó orthography is given here.

Tsakonian

Latin

Sardinian. No data available to the editor.

Esperanto

Istro-Romanian

Romanian

Moldavian was written in the Cyrillic script until ca. 1993.
Moldavian

Arumanian

Megleno-Romanian

Corsican. No data available to the editor.

Italian

Friulan

Ladin

Romansh

Franco-Provençal (Duraffour 1969)
Duraffour gives a scientific transcription of Franco-Provençal sounds in his dictionary, but this is not a practical alphabet. Ordinary texts written in Franco-Provençal probably conform to French orthographic habits.

French

Walloon. No data available to the editor.

Occitan (Provençal). No data available to the editor.

Catalan

Spanish

Aragonese. No data available to the editor.

Ladino

Asturian. No data available to the editor.

Galician

Portuguese

Irish Gaelic

Scottish Gaelic

Manx Gaelic

Breton

Welsh

Cornish

Danish

Swedish

Älvdalska
As Swedish, but it is possible that V and W are distinguished as separate letters in sorting.

Norwegian

Icelandic

Faroese

German

Yiddish

Luxemburgish

Dutch/Flemish

Low German. No data available to the editor.

West Frisian

Frisian, East. No data available to the editor.

Frisian, South. No data available to the editor.

English

Latvian

Lithuanian

Belarussian

Russian

Ukrainian

Rusinian. No data available to the editor.

Polish

Kashubian (Sychta 1967, Lorentz 1958)
Sychta and Lorentz give two scientific transcriptions for Kashubian.

Upper Sorbian

Lower Sorbian

Czech

Slovak

Old Church Slavonic

Slovenian

Croatian

Serbian

Macedonian

Bulgarian

Pomak. No data available to the editor.


Kalmyk
Chuvash

Gagauz

Turkish

Crimean Turkish. No data available to the editor.

Azerbaijani

Bashkir

Karachay

Balkar

Karaim

Kumyk

Kazan Tatar

Crimean Tatar

Qazaq

Nogai


Nenets

Hungarian

Udmurt

Komi

Komi-Permyak. No data available to the editor.

Mari, Meadow

Mari, Mountain

Erzya Mordvin

Moksha Mordvin

Northern Sami

Lule Sami

Inari Sami

Skolt Sami

Kildin Sami

South Sami

Finnish

Ingrian

Karelian. No data available to the editor.

Olonets. No data available to the editor.

Ludic. No data available to the editor.

Veps

Votic

Estonian

Livonian. No data available to the editor.


Annex A. Genetic classification of European languages

Afro-Asiatic languages

Afro-Asiatic: Ancient Egyptian: Coptic
Afro-Asiatic: Semitic: West: Central: Aramaic: Assyrian
Afro-Asiatic: Semitic: West: Central: Arabo-Canaanite: Arabic: Maltese

Basque

Basque

Caucasian languages

Caucasian: South: Georgian
Caucasian: South: Judeo-Georgian
Caucasian: South: Svan
Caucasian: South: Zan: Mingrelian
Caucasian: South: Zan: Laz
Caucasian: North: Northwest: Ubyx *
Caucasian: North: Northwest: Abxaz-Abaza: Abaza
Caucasian: North: Northwest: Abxaz-Abaza: Abxaz
Caucasian: North: Northwest: Circassian: Adyge
Caucasian: North: Northwest: Circassian: Kabardian
Caucasian: North: Northeast: Nax: Bats *
Caucasian: North: Northeast: Nax: Chechen-Ingush: Chechen
Caucasian: North: Northeast: Nax: Chechen-Ingush: Ingush
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Avar: Avar
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Andi *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Botlix
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Godoberi *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Chamalal
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Bagulal *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Tindi
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Karata *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Axvax *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Xvarshi *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Dido-Hinux: Tsez *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Dido-Hinux: Hinux *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Bezhta-Hunzib: Bezhta *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Bezhta-Hunzib: Hunzib *
Caucasian: North: Northeast: Dagestan: Lak-Dargwa: Lak
Caucasian: North: Northeast: Dagestan: Lak-Dargwa: Dargwa *
Caucasian: North: Northeast: Dagestan: Lezgian: Archi *
Caucasian: North: Northeast: Dagestan: Lezgian: Xinalug
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Lezgian
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Tabasaran
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Agul *
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Rutul
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Tsaxur *
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Kryts
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Budux *
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Udi

Eskimo-Aleut languages

Eskimo-Aleut: Eskimo: Inuit: Greenlandic

Indo-European languages

Indo-European: Armenian
Indo-European: Indo-Iranian: Indic: Romany: Romani
Indo-European: Indo-Iranian: Iranian: East: Northeast: West Scythian: Ossetian
Indo-European: Indo-Iranian: Iranian: West: Northwest: Talysh: Talysh
Indo-European: Indo-Iranian: Iranian: West: Northwest: Kurdish: Kirmanji
Indo-European: Indo-Iranian: Iranian: West: Northwest: Kurdish: Kurdi
Indo-European: Indo-Iranian: Iranian: West: Northwest: Kurdish: Judeo-Kurdish
Indo-European: Indo-Iranian: Iranian: West: Southwest: Tati: Tati
Indo-European: Albanian: Albanian
Indo-European: Albanian: Arvanite
Indo-European: Greek: Greek
Indo-European: Greek: Tsakonian
Indo-European: Italic: Latino-Faliscan: Latin
Indo-European: Italic: Latino-Faliscan: Romance: Sardinian: Sardinian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: Artificial: Esperanto
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: North: Istro-Romanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: North: Romanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: South: Arumanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: South: Megleno-Romanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Italic: Italian: Corsican
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Italic: Italian: Italian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Rhaeto-Romance: Friulian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Rhaeto-Romance: Ladin
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Rhaeto-Romance: Romansh
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: North: Franco-Provençal
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: North: French
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: North: Walloon
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: South: Occitan
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: East: Catalan
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: Central: Spanish
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: Central: Aragonese
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: Central: Ladino
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: West: Asturian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: West: Galician
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: West: Portuguese
Indo-European: Celtic: Insular: Goidelic: Irish Gaelic
Indo-European: Celtic: Insular: Goidelic: Scottish Gaelic
Indo-European: Celtic: Insular: Goidelic: Manx Gaelic
Indo-European: Celtic: Insular: Brythonic: Breton
Indo-European: Celtic: Insular: Brythonic: Welsh
Indo-European: Celtic: Insular: Brythonic: Cornish
Indo-European: Germanic: North: East: Danish
Indo-European: Germanic: North: East: Swedish
Indo-European: Germanic: North: East: Älvdalska
Indo-European: Germanic: North: West: Norwegian
Indo-European: Germanic: North: West: Icelandic
Indo-European: Germanic: North: West: Faroese
Indo-European: Germanic: West: Continental: East: German
Indo-European: Germanic: West: Continental: East: Yiddish
Indo-European: Germanic: West: Continental: East: Luxemburgish
Indo-European: Germanic: West: Continental: West: Dutch/Flemish
Indo-European: Germanic: West: Continental: West: Low German
Indo-European: Germanic: West: North Sea: Frisian: West Frisian
Indo-European: Germanic: West: North Sea: Frisian: East Frisian
Indo-European: Germanic: West: North Sea: Frisian: North Frisian
Indo-European: Germanic: West: North Sea: English: English
Indo-European: Germanic: West: North Sea: English: Scots
Indo-European: Balto-Slavic: Baltic: East: Latvian
Indo-European: Balto-Slavic: Baltic: East: Lithuanian
Indo-European: Balto-Slavic: Slavic: East: North: Belarussian
Indo-European: Balto-Slavic: Slavic: East: North: Russian
Indo-European: Balto-Slavic: Slavic: East: South: Ukrainian
Indo-European: Balto-Slavic: Slavic: East: South: Rusinian
Indo-European: Balto-Slavic: Slavic: West: North: Polish
Indo-European: Balto-Slavic: Slavic: West: North: Kashubian
Indo-European: Balto-Slavic: Slavic: West: Central: Upper Sorbian
Indo-European: Balto-Slavic: Slavic: West: Central: Lower Sorbian
Indo-European: Balto-Slavic: Slavic: West: South: Czech
Indo-European: Balto-Slavic: Slavic: West: South: Slovak
Indo-European: Balto-Slavic: Slavic: South: Old Church Slavonic
Indo-European: Balto-Slavic: Slavic: South: West: Slovene
Indo-European: Balto-Slavic: Slavic: South: West: Serbo-Croatian
Indo-European: Balto-Slavic: Slavic: South: East: Macedonian
Indo-European: Balto-Slavic: Slavic: South: East: Bulgarian
Indo-European: Balto-Slavic: Slavic: South: East: Pomak

Mongolian languages

Mongolian: East: Oirat-Khalkha: Oirat-Kalmyk: Kalmyk

Turkic languages

Turkic: Bolgar: Chuvash
Turkic: Common Turkic: South: Gagauz
Turkic: Common Turkic: South: Turkish
Turkic: Common Turkic: South: Crimean Turkish
Turkic: Common Turkic: South: Azerbaijani: Azerbaijani
Turkic: Common Turkic: West: Bashkir
Turkic: Common Turkic: West: Kumyk-Karachay: Karachay
Turkic: Common Turkic: West: Kumyk-Karachay: Balkar
Turkic: Common Turkic: West: Kumyk-Karachay: Karaim
Turkic: Common Turkic: West: Kumyk-Karachay: Kumyk
Turkic: Common Turkic: West: Tatar: Kazan Tatar
Turkic: Common Turkic: West: Tatar: Baraba Tatar *
Turkic: Common Turkic: West: Tatar: Crimean Tatar
Turkic: Common Turkic: Central: Nogai
Turkic: Common Turkic: Central: Qazaq

Uralic languages

Uralic: Samoyed: North: Tundra Nenets
Uralic: Finno-Ugric: Ugric: Hungarian
Uralic: Finno-Ugric: Finno-Permic: Permic: Udmurt
Uralic: Finno-Ugric: Finno-Permic: Permic: Komi
Uralic: Finno-Ugric: Finno-Permic: Permic: Komi-Permyak
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mari: Hill Mari
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mari: Meadow Mari
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mordvin: Erza
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mordvin: Moksha
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: Central: North Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: Central: Lule Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: Central: Pite Sámi *
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Inari Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Skolt Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Kildin Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Ter Sámi *
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: South: South Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: South: Ume Sámi *
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Finnish
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Ingrian
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Karelian
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Olonets
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Ludic
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Veps
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: South: Votic
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: South: Estonian
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: South: Livonian

Annex B. Names of European languages

(To be supplied.)

Annex C. Unwritten European languages

Ubyx
Tevfik Esenç, the last speaker of Ubyx, died in 1992. The repertoire for Ubyx is included in this Technical Report in honour of his lifelong work to help specialists record his language. (Repertoire to be provided.)

Bats

Andi. No data available to the editor.

Botlix. No data available to the editor.

Godoberi

Chamalal. No data available to the editor.

Bagulal

Tindi. No data available to the editor.

Karata

Axvax

Xvarshi. No data available to the editor.

Tsez

Hinux

Bezhta

Hunzib

Dargwa

Archi

Agul

Tsaxur

Budux

Baraba Tatar

Pite Sámi. No data available to the editor.

Ter Sámi. No data available to the editor.

Ume Sámi. No data available to the editor.


Annex D. Administrative units of Europe

The following list enumerates the administrative units corresponding to the geographical definition of Europe in §1.1. This list was valid at the time of its compilation (1995-03-01).

The following countries and self-governing dependencies: Albania, Andorra, Armenia, Austria, Azerbaijan, Belarus, Belgium, Bosnia and Hercegovina, Bulgaria, the Channel Islands, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, the Faroe Islands, Finland (including Ĺland), France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, the former Yugoslav Republic of Macedonia, Malta, the Isle of Man, Moldova, Monaco, the Netherlands, Norway, Poland, Portugal, Qazaqstan (west of the Ural River), Romania, San Marino, Serbia and Montenegro, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey (excluding Anatolia), Ukraine, the United Kingdom, the Vatican City.

The following Republics in the Russian Federation: Adygea, Bashkortostan, Chechenia, Chuvashia, Dagestan, Ingushetia, Kabardino-Balkaria, Kalmykia, Karachay-Cherkessia, Karelia, Komi, Mari-El, Mordvinia, North Ossetia, Tatarstan, Udmurtia.

The following Oblasty in the Russian Federation: Arkhangel'sk (including the Nenets Autonomous Okrug), Astrakhan', Belgorod, Bryansk, Ivanovo, Kaliningrad, Kaluga, Kirov, Kostroma, Kursk, Leningrad, Lipetsk, Moskva, Murmansk, Nizhniy Novgorod, Novgorod, Orël, Orenburg, Penza, Perm' (including the Komi-Permyak Autonomous Okrug), Pskov, Rostov, Ryazan', Samara, Saratov, Smolensk, Tambov, Tula, Tver', Ul'yanovsk, Vladimir, Volgograd, Vologda, Voronezh, Yaroslavl'.

The following Krai in the Russian Federation: Krasnodar, Stavropol'.

The following Republic in Azerbaijan: Naxçivan.

The following Autonomous Region in Azerbaijan: Nagorno-Karabakh

The following Republics in Georgia: Abkhazia, Ajaria

The following Autonomous Region in Georgia: South Ossetia


Annex E. Mapping of European letters to ISO/IEC 10646-1

This Annex comprises a list of character identifiers in ISO/IEC 10646 which are used to write European languages, followed by the two-letter language code (Annex F) used in this report for the languages which use the letter. Where a character does not occur in ISO 10646, a long name is given for the character. (To be supplied.)

Annex F. Two-letter language codes for European languages

The following two-letter language codes, taken from ISO 639-1 where possible, are used in this Technical Report.

No.Language639-1639-2PopulationPop. Source
001Abazaaj8,400AG
002Abxazababk105,300HH
003Adygead124,800HH
Agul *6,700AM
Albaniansqsqi5,054,000HH
Älvdalska
Andi *10,000ET
Aragonesean11,000ET
Archi *860ET
Armenianhyhye4,615,000HH
Aromanianvl222,000HH
Arvaniteae140,000ET
002Assyrian
Asturianau1,159,000ET
Avaravava601,000HH
Axvax *5,000ZM
Azerbaijaniazaze6,571,000HH
Bagulal *4,000TG
Balkarbq
Bashkirbabak920,000HH
Basqueeueus680,000HH
Bats *3,000JD
Belarussianbebel9,698,700HH
Bezhta *2,500EB
Botlix *3,500ET
Bretonbrbre850,000HH
Budux *1,000JD
Bulgarianbgbul8,831,000HH
Catalancacat7,306,000HH
Chamalal *5,500ET
Chechennxche416,000JD
Chuvashcvchv1,796,300HH
001Copticcop
Cornishkwcor150ET
Corsicancocos240,000HH
Croatianhrhrv
Czechcsces9,839,100HH
Danishdadan4,990,000HH
Dargwa *dg365,000HH
Dutch/Flemishnlnld20,230,000HH
Englisheneng56,390,000+HH
Esperantoeoepo
Estonianetest1,046,300HH
Faroesefofao48,000HH
Finnishfifin4,895,600HH
Franco-Provençal70,000ET
Frenchfrfre58,119,600HH
Frisian, Eastfs800
Frisian, Northfn9,500HH
Frisian, Westfyfry460,300HH
Friulianfur520,000HH
Gaelic, Irishgagai1,050,000HH
Gaelic, Manxgv*max
Gaelic, Scottishgdgdh79,300HH
Gagauzgg223,000HH
Galicianglglg2,350,000HH
Georgiankakat3,954,500HH
Georgian, Judeo-10,000ET
Germandedeu91,473,450HH
Godoberi *
Greekelell10,074,500HH
Greenlandicklkal55,000HH
Hinux *200EB
Hungarianhuhun12,425,000HH
Hunzib *600EB
Icelandicisisl255,000HH
Ingrian300HH
Ingushng106,000JD
Istro-Romanianrx555ET
Italianitita55,437,000HH
Kabardianqb390,800HH
Kalmykxl173,000HH
Karachayqc156,000HH
Karaimqm530HH
Karata *6,000ZM
Kareliankj72,000HH
Kashubian4,500HH
Kirmanji7,000,000ET
Komikvkom339,900HH
Komi-Permyak117,100HH
Kryts6,000ET
Kumykqmkum282,000HH
Kurdishkukur152,700HH
Kurdish, Judeo-4,180ET
Ladinld30,000HH
Ladinolylad9,000HH
Laklk118,000HH
Latinlalat
Latvianlvlav1,446,330HH
Laz33,000ET
Lezgianlelez466,000HH
Lithuanianltlit3,040,200HH
Livonianli200
Low German
Ludic
Luxemburgishlbltz335,520ET
Macedonianmkmkd1,472,000HH
Maltesemtmlt340,000HH
Mari, Hillmm66,000ET
Mari, Meadowmj535,000ET
Megleno-Romanian
Mingrelian500,000ET
Moldavian
Mordvin, Erzaer800,000MM
Mordvin, Mokshamh430,000ET
Nenets, Tundrant28,000HH
Nogai nh67,500ET
Norwegiannonor4,150,000HH
Nynorsknn*% of above
Occitanococi2,700,000HH
Old Church Slavonicsjchu
Olonets
Ossetianiross572,300HH
Polishplpol38,231,100HH
Pomak
Portugueseptpor10,100,000HH
Qazaqkk
Romaniryrom3,246,000HH
Romanianroron23,741,000HH
Romanshrmroh51,100HH
Rusinian23,300HH
Russianrurus135,772,000HH
Rutul12,000GI
Sámi, Inarisy400ET
Sámi, Kildinsz1,000ET
Sámi, Lulesx2,000ET
Sámi, Northernsb16,600ET
Sámi, Pite *1,000ET
Sámi, Skoltxs1,000ET
Sámi, Southernso5,000ET
Sámi, Ter *500ET
Sámi, Ume *500ET
Sardinianscsrd1,400,000HH
Scotsllsco
Serbiansrsrp
Slovakskslk5,115,000HH
Slovenianslslv1,859,000HH
Sorbian, Lowersfwen20,000HH
Sorbian, Uppersewen40,000HH
Spanishesspa28,616,000HH
Svan35,000ET
Swedishsvswe?8,196,700HH
Tabasarantb35,000BX
Talysh165,000ET
Tatar, Baraba *
Tatar, Crimean47,000HH
Tatar, Kazantttat7,000,000ET
Tati
Tindi *5,000ET
Tsakonian1,200ET
Tsaxur20,000GI
Tsez *7,000EB
Turkishtrtur56,000,000ET
Turkish, Crimean300,000ET
Ubyx *0JG
Udi5,841ET
Udmurtum723,500HH
Ukrainianuaukr43,235,100HH
Vepsvp12,140HH
Voticvot25ET
Walloonwl
Welshcycym503,000HH
Xinalug1,000JD
Xvarshi *1,000EB
Yiddishyiyid265,000HH

AG Genko 1955
AM Magometov 1970
BX Xanmagomedov 1967
EB Bokarëv 1959, 1967
ET SIL's Ethnologue -- figures not necessarily reliable
GI Ibragimov 1968, 1978
HH Haarmann 1993
JD Desheriev 1967
JG Gippert 1994
MM Mosin 1994
VS Vestnik Statistiki 1990/91 (http://www.indigo.ie/egt/xxxx.html) TG Godava 1971
ZM Magomedbekova 1967, 1971


Annex G. Bibliography

The following sources, but not only these, were instrumental in the preparation of this report.
(To be supplied.)
Téir go dtí innéacs EGT
(Go to the EGT index)

HTML Michael Everson, everson@indigo.ie, Dublin, 1997-02-01