CEN/TC304 N634R FIFTH DRAFT
Date: 1997-02-01
Title: Repertoires of characters used for writing the indigenous languages of Europe - Fifth Draft for CEN/TC304 Project 11
Source: Michael Everson, Everson Gunn Teoranta (IE)
Status: Expert Contribution
Action: For consideration by CEN/TC304/WG2
TECHNICAL REPORT
REPORT TECHNIQUE
TECHNISCHE RAPPORT
Descriptors: Data processing, information interchange, text processing, text communication, graphic characters, character sets, representation of characters, coded character sets, natural language.
English version
Repertoires of characters used for writing the indigenous languages of Europe
NOTE: THIS DOCUMENT IS A PRELIMINARY DRAFT AND IS A WORK IN PROGRESS ONLY. COMMENT IS INVITED BUT THE DATA CONTAINED HEREIN MAY CHANGE AND SHOULD NOT BE USED FOR DEVELOPMENT AT THIS TIME.
CEN members are the national bodies of Austria, Belgium, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, and the United Kingdom.
CEN
European Committee for Standardization
Comité Européen de Normalisation
Europäisches Komitee für Normung
Central Secretariat: rue de Stassart 36, B-1050 Brussels
©1997 Michael Everson. May be copied by CEN/TC304 members.
Ref. No. prTR XXX:1996 X
Contents
Foreword
0 Introduction
1 Scope
1.1 The geographical area of Europe
1.2 The languages of Europe
1.3 The scripts of Europe
2.0 Writing systems
2.1 Numbers
2.2 Punctuation
2.3 Alphabetic repertoires
Annex A Genetic classification of European languages
Annex B Names of European languages
Annex C Unwritten European languages
Annex D Administrative units of Europe
Annex E Mapping of letters used in European languages to ISO 10646
Annex F Index of European languages
Annex G Bibliography
Foreword.
This Technical Report was prepared by CEN/TC304/WG2 "Cultural Elements" to provide information on the letters used to write the indigenous languages of Europe in an Information Technology context.
This Technical Report provides a source of linguistic data for the indigenous languages of Europe. The use of the term "indigenous" (or "autochthonous") indicates that this report covers the languages native to the European geographical area. Other languages, more recently imported to Europe (Vietnamese or Bengali, for instance), are not covered by this report. The exclusion of these languages from this report is not intended to imply any bias whatsoever against such "immigrant" languages. But the fact remains that it is relatively easy to get information on the orthography of Vietnamese and Bengali, while many of Europe's indigenous minority languages have been poorly served -- if acknowledged at all -- in the area of Information Technology; this Technical Report serves to remedy that oversight. The main function of this Technical Report is to give a practical guide to European writing systems. The characters which are, and in some cases were, used to write each of the languages of Europe (as far as it has been possible to find information on them), are included here. Some of Europe's languages (particularly in the Caucasus) still have no tradition of writing and so are not represented in the main body of the report; though other information on them is provided in Annex C for the sake of completeness. Likewise, some languages have used, or continue to use, one or more than one writing system, which also is reflected here.
Personal acknowledgements are rarely presented in CEN Technical Reports. But this Technical Report could not have been compiled without the input of many, many people, and the difficult nature of the material presented here begs for acknowledgement of the abundant expertise which contributed to the final document. The following people are gratefully acknowledged by the editor, Michael Everson (EGT): Baldur Jónsson (Íslensk málstöđ), Elzbieta Broma-Wrzesień (PET), Gerhard Budin (ÖNORM), Borka Jerman-Blazic (IJS), Evángelos Melagrákis (ELOT), Klaas Ruppel (KKT), Alexandra Stătescu (RNORM), Trond Trosterud (Barentssekretariat), Johan van Wingen (NNI), Ţorvarđur Kári Ólafsson (STRÍ). Very special thanks are due to Judy Nye (University Research Library, University of California, Los Angeles), for granting the editor a week's indulgence as he photocopied a mountain of material.
1. Scope. This Technical Report gives information on characters used in Europe's indigenous languages. In order to accomplish this, a definition of the geographical area covered is necessary so that we can know where it is we are talking about. In general, it can be said that "everybody knows where Europe is", but in practice it is a little bit less clear than one would think. The nations in the Caucasus, for instance, are sometimes counted as part of Europe, and sometimes not.
1.1 The geographical area of Europe. For the purposes of this Technical Report, we have used the following geographical definition of Europe:
"Europe" extends from the Arctic and Atlantic southeastwards to the Mediterranean (to include the islands in it), with its eastern and southern borders being the Ural Mountains, the Ural River, the Caspian Sea, and Anatolia, inclusive of Transcaucasia.
Languages covered in this report also include languages found in the following areas:
Anatolian Turkey, Greenland
Information concerning the administrative units covered by this geographical definition can be found in Annex D. It is important to note that this is a geolinguistic survey. It is not a political survey. The area defined here may be seen on page xiv, "Geographical Comparisons", in The Times Atlas of the World, 1990 edition.
1.2 The languages of Europe. A convenient way of enumerating the languages of Europe is to do so by linguistic family. The classification used in this Technical Report is based on the classification found in Ruhlen 1992, but differs from it in some respects (Ruhlen 1992 may not be perfect or universally-accepted, in the field of linguistics at least, but it is reasonably comprehensive and its bibliography is helpful for further study).
Most Europeans speaking indigenous European languages speak Indo-European or Uralic languages, but there are five other language families represented in Europe. This report is intended to be neutral with respect to language; its task is to document languages, not to rank them in any particular way. Accordingly, languages are listed by family and subfamily. A fuller genetic classification for these languages is given in Annex A. The names of these languages are listed in English in this part of the technical report, but can be found in their original spellings (with Latin transliterations) in Annex B. An asterisk (*) following a language name indicates that the language has no standard literary orthography and will be found in Annex C. Inclusion in Annex C means only that a standard literary alphabet does not exist. For some of these languages, the populations speaking them are rather large; some of the languages with standard orthographies have very small numbers of speakers.
- Caucasian languages
- South Caucasian
- Georgian
- Judeo-Georgian
- Svan
- Mingrelian
- Laz
- North Caucasian
- Ubyx *
- Abaza
- Abxaz
- Adyge
- Kabardian
- Bats *
- Chechen
- Ingush
- Avar
- Andi *
- Botlix *
- Godoberi *
- Chamalal *
- Bagulal *
- Tindi *
- Karata *
- Axvax *
- Xvarshi *
- Tsez *
- Hinux *
- Bezhta *
- Hunzib *
- Lak
- Dargwa *
- Archi *
- Xinalug
- Lezgian
- Tabasaran
- Agul *
- Rutul
- Tsaxur *
- Kryts
- Budux *
- Udi
- Indo-European languages
- Indo-Iranian
- Romani
- Talysh
- Ossetian
- Kirmanji
- Kurdish
- Judeo-Kurdish
- Tati
- Italic
- Latin
- Sardinian
- Esperanto
- Istro-Romanian
- Romanian
- Arumanian
- Megleno-Romanian
- Corsican
- Italian
- Friulian
- Ladin
- Romansh
- Franco-Provençal
- French
- Walloon
- Occitan
- Catalan
- Spanish
- Aragonese
- Ladino
- Asturian
- Galician
- Portuguese
- Celtic
- Irish Gaelic
- Scottish Gaelic
- Manx Gaelic
- Breton
- Welsh
- Cornish
- Germanic
- Danish
- Swedish
- Älvdalska
- Norwegian
- Icelandic
- Faroese
- German
- Yiddish
- Luxemburgish
- Dutch/Flemish
- Low German
- West Frisian
- East Frisian
- North Frisian
- English
- Scots
- Slavic
- Belarussian
- Russian
- Ukrainian
- Rusinian
- Polish
- Kashubian
- Upper Sorbian
- Lower Sorbian
- Czech
- Slovak
- Old Church Slavonic
- Slovenian
- Croatian
- Serbian
- Macedonian
- Bulgarian
- Pomak
- Turkic languages
- Common Turkic
- Gagauz
- Turkish
- Crimean Turkish
- Azerbaijani
- Bashkir
- Karachay
- Balkar
- Karaim
- Kumyk
- Kazan Tatar
- Baraba Tatar *
- Crimean Tatar
- Nogai
- Qazaq
- Uralic languages
- Finno-Ugric
- Hungarian
- Udmurt
- Komi
- Komi-Permyak
- Hill Mari
- Meadow Mari
- Erza Mordvin
- Moksha Mordvin
- Northern Sámi
- Lule Sámi
- Pite Sámi *
- Inari Sámi
- Skolt Sámi
- Kildin Sámi
- Ter Sámi *
- Southern Sámi
- Ume Sámi *
- Finnish
- Ingrian
- Karelian
- Olonets
- Ludic
- Veps
- Votic
- Estonian
- Livonian
1.3. The scripts of Europe. The languages listed in §1.2 employ the Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, and Georgian scripts. Some of the scripts have unique stylistic variants: Latin has Roman, Gaelic, and Fraktur variants (of which Roman is the most commonly employed); Cyrillic has XXX and Slavonic variants (of which XX is the most commonly employed); Georgian has Mxedruli and Xucuri variants (of which Mxedruli is the most commonly employed).
2.0. Writing systems. European writing systems are alphabetic, not syllabic (like Ethiopic or Cherokee) or logographic (like Chinese). Together with its own alphabet, each of Europe's languages uses a set of punctuation marks which, in general, is standardized between the scripts. (Exceptions to this are discussed in the relevant sections below.) European alphabetic scripts have a fixed number of basic letters, to which additional letters are appended for use with particular languages. Some of these additional letters are also basic letters which cannot be identified with any other letter; others are derived letters created either by some deformation of the basic letter itself or by adding some diacritic mark or sign to it. The Latin, Greek, Cyrillic, and Armenian alphabets have case, that is, almost all letters have both a capital and a small form. Hebrew, Arabic, and Georgian do not share this feature. Latin, Greek, Cyrillic, Armenian, and Georgian are written from left to right; Hebrew and Arabic are written from right to left. Hebrew and Arabic are non-European scripts used to write some European languages. Some of the indigenous languages of Europe were formerly written with other scripts, such as Ogham, Runic, Linear B, and Iberian.
2.1 Numbers. Decimal digits are in standard use in European languages, though the Arabic script has unique glyphs for these. All European scripts also have a system of non-decimal numbers derived by giving numeric values to the letters. Decimal numerals are used for calculations; alphabetic numerals are often used for pagination and indexing and so forth. (This Technical Report does not give further information on alphabetic numeral systems; it can be found in works like Haarmann 1993 and Nakanishi 19xx.)
2.3 Punctuation. Punctuation marks to indicate major breaks in text are relatively ancient. In the oldest texts punctuation and wordspacing were often not used; use of various dots and dashes developed into the "standard" European repertoire of punctuation marks in use today. Most computers nowadays supply the following repertoire:
, COMMA
. FULL STOP
; SEMICOLON
: COLON
! EXCLAMATION MARK
? QUESTION MARK
/ SOLIDUS
\ REVERSE SOLIDUS
" DOUBLE QUOTATION MARK (and its curly variants)
# NUMBER SIGN
% PERCENT SIGN
& AMPERSAND
¶ PILCROW SIGN
@ COMMERCIAL AT SIGN
§ SECTION SIGN
' APOSTROPHE
( LEFT PARENTHESIS
) RIGHT PARENTHESIS
* ASTERISK
[ LEFT SQUARE BRACKET
] RIGHT SQUARE BRACKET
« LEFT-POINTING DOUBLE ANGLED QUOTATION MARK
» LEFT-POINTING DOUBLE ANGLED QUOTATION MARK
< LESS-THAN SIGN
> GREATER THAN SIGN
- HYPHEN-MINUS
EN DASH
‹ EM DASK
ł LEFT CURLY QUOTATION MARK
˛ RIGHT CURLY QUOTATION MARK
Ś LEFT QUOTATION MARK
ą RIGHT QUOTATION MARK
· MIDDLE DOT
’ LOW SINGLE QUOTTION MARK
“ LOW DOUBLE QUOTATION MARK
{ LEFT CURLY BRACKET
| VERTICAL BAR
} RIGHT CURLY BRACKET
_ LOW LINE
2.3 Alphabetic repertoires. This section lists each language and gives its "alphabet", including digraphs and alphabetical order, when this information is available. Some languages, such as Welsh, treat a string of two characters as a single letter for alphabetizing: thus, for Welsh "a b c ch d...", all words beginning with "ch" follow all words beginning with "cy" and precede words beginning with "da". In the repertoires, parentheses are used to indicate equivalences or basic letters not used natively in the orthography. An example of the first instance: Irish Gaelic sorts "A a (Á á), B b, C c..." while Icelandic sorts "A a, Á á, B b, C c...". This indicates that "Á" is a kind of "A" and is sorted with "A" in Irish Gaelic, but is a separate letter and is sorted as a separate letter in Icelandic. An example of the second instance: in Esperanto "... V v, (W w), (X x), (Y y), Z z". This indicates that in Esperanto the letters "W", "X", and "Y" are not used, except to write the "foreign " names of people and places.
The placement of the comma with respect to the parentheses is therefore especially to be noted in these repertoires.
To find a language in the repertoire, look up its name in the alphabetical list given in Annex F, and go to its corresponding number in the list below.

Assyrian. No data available to the editor.



Judeo-Georgian. No data available to the editor.


Mingrelian. No data available to the editor.












Kryts. No data available to the editor.



Romani
The state of affairs of Romani orthography is complex but the International Romani Union has been making recommendations and apparently decisions with regard to the alphabet were taken in 1990 (Liégeois 1994).

Talysh. No data available to the editor.

Kirmanji. No data available to the editor.

Judeo-Kurdish. No data available to the editor.


Arvanite. No data available to the editor. Possibly identical to Albanian.

Monotonikó orthography is given here.


Sardinian. No data available to the editor.



Moldavian was written in the Cyrillic script until ca. 1993.



Corsican. No data available to the editor.




Franco-Provençal (Duraffour 1969)
Duraffour gives a scientific transcription of Franco-Provençal sounds in his dictionary, but this is not a practical alphabet. Ordinary texts written in Franco-Provençal probably conform to French orthographic habits.

Walloon. No data available to the editor.
Occitan (Provençal). No data available to the editor.


Aragonese. No data available to the editor.

Asturian. No data available to the editor.










Älvdalska
As Swedish, but it is possible that V and W are distinguished as separate letters in sorting.







Low German. No data available to the editor.

Frisian, East. No data available to the editor.
Frisian, South. No data available to the editor.






Rusinian. No data available to the editor.

Kashubian (Sychta 1967, Lorentz 1958)
Sychta and Lorentz give two scientific transcriptions for Kashubian.










Pomak. No data available to the editor.




Crimean Turkish. No data available to the editor.














Komi-Permyak. No data available to the editor.












Karelian. No data available to the editor.
Olonets. No data available to the editor.
Ludic. No data available to the editor.



Livonian. No data available to the editor.
Annex A. Genetic classification of European languages
Afro-Asiatic languages
Afro-Asiatic: Ancient Egyptian: Coptic
Afro-Asiatic: Semitic: West: Central: Aramaic: Assyrian
Afro-Asiatic: Semitic: West: Central: Arabo-Canaanite: Arabic: Maltese
Basque
Basque
Caucasian languages
Caucasian: South: Georgian
Caucasian: South: Judeo-Georgian
Caucasian: South: Svan
Caucasian: South: Zan: Mingrelian
Caucasian: South: Zan: Laz
Caucasian: North: Northwest: Ubyx *
Caucasian: North: Northwest: Abxaz-Abaza: Abaza
Caucasian: North: Northwest: Abxaz-Abaza: Abxaz
Caucasian: North: Northwest: Circassian: Adyge
Caucasian: North: Northwest: Circassian: Kabardian
Caucasian: North: Northeast: Nax: Bats *
Caucasian: North: Northeast: Nax: Chechen-Ingush: Chechen
Caucasian: North: Northeast: Nax: Chechen-Ingush: Ingush
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Avar: Avar
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Andi *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Botlix
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Godoberi *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Chamalal
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Bagulal *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Tindi
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Karata *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Andi: Axvax *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Xvarshi *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Dido-Hinux: Tsez *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Dido-Hinux: Hinux *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Bezhta-Hunzib: Bezhta *
Caucasian: North: Northeast: Dagestan: Avaro-Andi-Dido: Dido: Bezhta-Hunzib: Hunzib *
Caucasian: North: Northeast: Dagestan: Lak-Dargwa: Lak
Caucasian: North: Northeast: Dagestan: Lak-Dargwa: Dargwa *
Caucasian: North: Northeast: Dagestan: Lezgian: Archi *
Caucasian: North: Northeast: Dagestan: Lezgian: Xinalug
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Lezgian
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Tabasaran
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Agul *
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Rutul
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Tsaxur *
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Kryts
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Budux *
Caucasian: North: Northeast: Dagestan: Lezgian: Lezgian Proper: Udi
Eskimo-Aleut languages
Eskimo-Aleut: Eskimo: Inuit: Greenlandic
Indo-European languages
Indo-European: Armenian
Indo-European: Indo-Iranian: Indic: Romany: Romani
Indo-European: Indo-Iranian: Iranian: East: Northeast: West Scythian: Ossetian
Indo-European: Indo-Iranian: Iranian: West: Northwest: Talysh: Talysh
Indo-European: Indo-Iranian: Iranian: West: Northwest: Kurdish: Kirmanji
Indo-European: Indo-Iranian: Iranian: West: Northwest: Kurdish: Kurdi
Indo-European: Indo-Iranian: Iranian: West: Northwest: Kurdish: Judeo-Kurdish
Indo-European: Indo-Iranian: Iranian: West: Southwest: Tati: Tati
Indo-European: Albanian: Albanian
Indo-European: Albanian: Arvanite
Indo-European: Greek: Greek
Indo-European: Greek: Tsakonian
Indo-European: Italic: Latino-Faliscan: Latin
Indo-European: Italic: Latino-Faliscan: Romance: Sardinian: Sardinian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: Artificial: Esperanto
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: North: Istro-Romanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: North: Romanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: South: Arumanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: East: South: Megleno-Romanian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Italic: Italian: Corsican
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Italic: Italian: Italian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Rhaeto-Romance: Friulian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Rhaeto-Romance: Ladin
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Rhaeto-Romance: Romansh
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: North: Franco-Provençal
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: North: French
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: North: Walloon
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Gallic: South: Occitan
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: East: Catalan
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: Central: Spanish
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: Central: Aragonese
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: Central: Ladino
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: West: Asturian
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: West: Galician
Indo-European: Italic: Latino-Faliscan: Romance: Continental: West: Gallo-Iberic: Iberic: North: West: Portuguese
Indo-European: Celtic: Insular: Goidelic: Irish Gaelic
Indo-European: Celtic: Insular: Goidelic: Scottish Gaelic
Indo-European: Celtic: Insular: Goidelic: Manx Gaelic
Indo-European: Celtic: Insular: Brythonic: Breton
Indo-European: Celtic: Insular: Brythonic: Welsh
Indo-European: Celtic: Insular: Brythonic: Cornish
Indo-European: Germanic: North: East: Danish
Indo-European: Germanic: North: East: Swedish
Indo-European: Germanic: North: East: Älvdalska
Indo-European: Germanic: North: West: Norwegian
Indo-European: Germanic: North: West: Icelandic
Indo-European: Germanic: North: West: Faroese
Indo-European: Germanic: West: Continental: East: German
Indo-European: Germanic: West: Continental: East: Yiddish
Indo-European: Germanic: West: Continental: East: Luxemburgish
Indo-European: Germanic: West: Continental: West: Dutch/Flemish
Indo-European: Germanic: West: Continental: West: Low German
Indo-European: Germanic: West: North Sea: Frisian: West Frisian
Indo-European: Germanic: West: North Sea: Frisian: East Frisian
Indo-European: Germanic: West: North Sea: Frisian: North Frisian
Indo-European: Germanic: West: North Sea: English: English
Indo-European: Germanic: West: North Sea: English: Scots
Indo-European: Balto-Slavic: Baltic: East: Latvian
Indo-European: Balto-Slavic: Baltic: East: Lithuanian
Indo-European: Balto-Slavic: Slavic: East: North: Belarussian
Indo-European: Balto-Slavic: Slavic: East: North: Russian
Indo-European: Balto-Slavic: Slavic: East: South: Ukrainian
Indo-European: Balto-Slavic: Slavic: East: South: Rusinian
Indo-European: Balto-Slavic: Slavic: West: North: Polish
Indo-European: Balto-Slavic: Slavic: West: North: Kashubian
Indo-European: Balto-Slavic: Slavic: West: Central: Upper Sorbian
Indo-European: Balto-Slavic: Slavic: West: Central: Lower Sorbian
Indo-European: Balto-Slavic: Slavic: West: South: Czech
Indo-European: Balto-Slavic: Slavic: West: South: Slovak
Indo-European: Balto-Slavic: Slavic: South: Old Church Slavonic
Indo-European: Balto-Slavic: Slavic: South: West: Slovene
Indo-European: Balto-Slavic: Slavic: South: West: Serbo-Croatian
Indo-European: Balto-Slavic: Slavic: South: East: Macedonian
Indo-European: Balto-Slavic: Slavic: South: East: Bulgarian
Indo-European: Balto-Slavic: Slavic: South: East: Pomak
Mongolian languages
Mongolian: East: Oirat-Khalkha: Oirat-Kalmyk: Kalmyk
Turkic languages
Turkic: Bolgar: Chuvash
Turkic: Common Turkic: South: Gagauz
Turkic: Common Turkic: South: Turkish
Turkic: Common Turkic: South: Crimean Turkish
Turkic: Common Turkic: South: Azerbaijani: Azerbaijani
Turkic: Common Turkic: West: Bashkir
Turkic: Common Turkic: West: Kumyk-Karachay: Karachay
Turkic: Common Turkic: West: Kumyk-Karachay: Balkar
Turkic: Common Turkic: West: Kumyk-Karachay: Karaim
Turkic: Common Turkic: West: Kumyk-Karachay: Kumyk
Turkic: Common Turkic: West: Tatar: Kazan Tatar
Turkic: Common Turkic: West: Tatar: Baraba Tatar *
Turkic: Common Turkic: West: Tatar: Crimean Tatar
Turkic: Common Turkic: Central: Nogai
Turkic: Common Turkic: Central: Qazaq
Uralic languages
Uralic: Samoyed: North: Tundra Nenets
Uralic: Finno-Ugric: Ugric: Hungarian
Uralic: Finno-Ugric: Finno-Permic: Permic: Udmurt
Uralic: Finno-Ugric: Finno-Permic: Permic: Komi
Uralic: Finno-Ugric: Finno-Permic: Permic: Komi-Permyak
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mari: Hill Mari
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mari: Meadow Mari
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mordvin: Erza
Uralic: Finno-Ugric: Finno-Permic: Finno-Volgaic: Mordvin: Moksha
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: Central: North Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: Central: Lule Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: Central: Pite Sámi *
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Inari Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Skolt Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Kildin Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: East: Ter Sámi *
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: South: South Sámi
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Sámic: South: Ume Sámi *
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Finnish
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Ingrian
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Karelian
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Olonets
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Ludic
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: North: Veps
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: South: Votic
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: South: Estonian
Uralic: Finno-Ugric: Finno-Permic: Finno-Sámic: Finnic: South: Livonian
Annex B. Names of European languages
(To be supplied.)
Annex C. Unwritten European languages
Ubyx
Tevfik Esenç, the last speaker of Ubyx, died in 1992. The repertoire for Ubyx is included in this Technical Report in honour of his lifelong work to help specialists record his language. (Repertoire to be provided.)

Andi. No data available to the editor.
Botlix. No data available to the editor.

Chamalal. No data available to the editor.

Tindi. No data available to the editor.


Xvarshi. No data available to the editor.










Pite Sámi. No data available to the editor.
Ter Sámi. No data available to the editor.
Ume Sámi. No data available to the editor.
Annex D. Administrative units of Europe
The following list enumerates the administrative units corresponding to the geographical definition of Europe in §1.1. This list was valid at the time of its compilation (1995-03-01).
The following countries and self-governing dependencies: Albania, Andorra, Armenia, Austria, Azerbaijan, Belarus, Belgium, Bosnia and Hercegovina, Bulgaria, the Channel Islands, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, the Faroe Islands, Finland (including Ĺland), France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, the former Yugoslav Republic of Macedonia, Malta, the Isle of Man, Moldova, Monaco, the Netherlands, Norway, Poland, Portugal, Qazaqstan (west of the Ural River), Romania, San Marino, Serbia and Montenegro, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey (excluding Anatolia), Ukraine, the United Kingdom, the Vatican City.
The following Republics in the Russian Federation: Adygea, Bashkortostan, Chechenia, Chuvashia, Dagestan, Ingushetia, Kabardino-Balkaria, Kalmykia, Karachay-Cherkessia, Karelia, Komi, Mari-El, Mordvinia, North Ossetia, Tatarstan, Udmurtia.
The following Oblasty in the Russian Federation: Arkhangel'sk (including the Nenets Autonomous Okrug), Astrakhan', Belgorod, Bryansk, Ivanovo, Kaliningrad, Kaluga, Kirov, Kostroma, Kursk, Leningrad, Lipetsk, Moskva, Murmansk, Nizhniy Novgorod, Novgorod, Orël, Orenburg, Penza, Perm' (including the Komi-Permyak Autonomous Okrug), Pskov, Rostov, Ryazan', Samara, Saratov, Smolensk, Tambov, Tula, Tver', Ul'yanovsk, Vladimir, Volgograd, Vologda, Voronezh, Yaroslavl'.
The following Krai in the Russian Federation: Krasnodar, Stavropol'.
The following Republic in Azerbaijan: Naxçivan.
The following Autonomous Region in Azerbaijan: Nagorno-Karabakh
The following Republics in Georgia: Abkhazia, Ajaria
The following Autonomous Region in Georgia: South Ossetia
Annex E. Mapping of European letters to ISO/IEC 10646-1
This Annex comprises a list of character identifiers in ISO/IEC 10646 which are used to write European languages, followed by the two-letter language code (Annex F) used in this report for the languages which use the letter. Where a character does not occur in ISO 10646, a long name is given for the character.
(To be supplied.)
Annex F. Two-letter language codes for European languages
The following two-letter language codes, taken from ISO 639-1 where possible, are used in this Technical Report.
No. | Language | 639-1 | 639-2 | Population | Pop. Source
|
001 | Abaza | aj | | 8,400 | AG
|
002 | Abxaz | ab | abk | 105,300 | HH
|
003 | Adyge | ad | | 124,800 | HH
|
| Agul * | | | 6,700 | AM
|
| Albanian | sq | sqi | 5,054,000 | HH
|
| Älvdalska | | | |
|
| Andi * | | | 10,000 | ET
|
| Aragonese | an | | 11,000 | ET
|
| Archi * | | | 860 | ET
|
| Armenian | hy | hye | 4,615,000 | HH
|
| Aromanian | vl | | 222,000 | HH
|
| Arvanite | ae | | 140,000 | ET
|
002 | Assyrian | | | |
|
| Asturian | au | | 1,159,000 | ET
|
| Avar | av | ava | 601,000 | HH
|
| Axvax * | | | 5,000 | ZM
|
| Azerbaijani | az | aze | 6,571,000 | HH
|
| Bagulal * | | | 4,000 | TG
|
| Balkar | bq | | |
|
| Bashkir | ba | bak | 920,000 | HH
|
| Basque | eu | eus | 680,000 | HH
|
| Bats * | | | 3,000 | JD
|
| Belarussian | be | bel | 9,698,700 | HH
|
| Bezhta * | | | 2,500 | EB
|
| Botlix * | | | 3,500 | ET
|
| Breton | br | bre | 850,000 | HH
|
| Budux * | | | 1,000 | JD
|
| Bulgarian | bg | bul | 8,831,000 | HH
|
| Catalan | ca | cat | 7,306,000 | HH
|
| Chamalal * | | | 5,500 | ET
|
| Chechen | nx | che | 416,000 | JD
|
| Chuvash | cv | chv | 1,796,300 | HH
|
001 | Coptic | | cop | |
|
| Cornish | kw | cor | 150 | ET
|
| Corsican | co | cos | 240,000 | HH
|
| Croatian | hr | hrv | |
|
| Czech | cs | ces | 9,839,100 | HH
|
| Danish | da | dan | 4,990,000 | HH
|
| Dargwa * | dg | | 365,000 | HH
|
| Dutch/Flemish | nl | nld | 20,230,000 | HH
|
| English | en | eng | 56,390,000+ | HH
|
| Esperanto | eo | epo | |
|
| Estonian | et | est | 1,046,300 | HH
|
| Faroese | fo | fao | 48,000 | HH
|
| Finnish | fi | fin | 4,895,600 | HH
|
| Franco-Provençal | | | 70,000 | ET
|
| French | fr | fre | 58,119,600 | HH
|
| Frisian, East | fs | | 800 |
|
| Frisian, North | fn | | 9,500 | HH
|
| Frisian, West | fy | fry | 460,300 | HH
|
| Friulian | | fur | 520,000 | HH
|
| Gaelic, Irish | ga | gai | 1,050,000 | HH
|
| Gaelic, Manx | gv* | max | |
|
| Gaelic, Scottish | gd | gdh | 79,300 | HH
|
| Gagauz | gg | | 223,000 | HH
|
| Galician | gl | glg | 2,350,000 | HH
|
| Georgian | ka | kat | 3,954,500 | HH
|
| Georgian, Judeo- | | | 10,000 | ET
|
| German | de | deu | 91,473,450 | HH
|
| Godoberi * | | | |
|
| Greek | el | ell | 10,074,500 | HH
|
| Greenlandic | kl | kal | 55,000 | HH
|
| Hinux * | | | 200 | EB
|
| Hungarian | hu | hun | 12,425,000 | HH
|
| Hunzib * | | | 600 | EB
|
| Icelandic | is | isl | 255,000 | HH
|
| Ingrian | | | 300 | HH
|
| Ingush | ng | | 106,000 | JD
|
| Istro-Romanian | rx | | 555 | ET
|
| Italian | it | ita | 55,437,000 | HH
|
| Kabardian | qb | | 390,800 | HH
|
| Kalmyk | xl | | 173,000 | HH
|
| Karachay | qc | | 156,000 | HH
|
| Karaim | qm | | 530 | HH
|
| Karata * | | | 6,000 | ZM
|
| Karelian | kj | | 72,000 | HH
|
| Kashubian | | | 4,500 | HH
|
| Kirmanji | | | 7,000,000 | ET
|
| Komi | kv | kom | 339,900 | HH
|
| Komi-Permyak | | | 117,100 | HH
|
| Kryts | | | 6,000 | ET
|
| Kumyk | qm | kum | 282,000 | HH
|
| Kurdish | ku | kur | 152,700 | HH
|
| Kurdish, Judeo- | | | 4,180 | ET
|
| Ladin | ld | | 30,000 | HH
|
| Ladino | ly | lad | 9,000 | HH
|
| Lak | lk | | 118,000 | HH
|
| Latin | la | lat | |
|
| Latvian | lv | lav | 1,446,330 | HH
|
| Laz | | | 33,000 | ET
|
| Lezgian | le | lez | 466,000 | HH
|
| Lithuanian | lt | lit | 3,040,200 | HH
|
| Livonian | li | | 200 |
|
| Low German | | | |
|
| Ludic | | | |
|
| Luxemburgish | lb | ltz | 335,520 | ET
|
| Macedonian | mk | mkd | 1,472,000 | HH
|
| Maltese | mt | mlt | 340,000 | HH
|
| Mari, Hill | mm | | 66,000 | ET
|
| Mari, Meadow | mj | | 535,000 | ET
|
| Megleno-Romanian | | | |
|
| Mingrelian | | | 500,000 | ET
|
| Moldavian | | | |
|
| Mordvin, Erza | er | | 800,000 | MM
|
| Mordvin, Moksha | mh | | 430,000 | ET
|
| Nenets, Tundra | nt | | 28,000 | HH
|
| Nogai | nh | | 67,500 | ET
|
| Norwegian | no | nor | 4,150,000 | HH
|
| Nynorsk | nn* | | % of above |
|
| Occitan | oc | oci | 2,700,000 | HH
|
| Old Church Slavonic | sj | chu | |
|
| Olonets | | | |
|
| Ossetian | ir | oss | 572,300 | HH
|
| Polish | pl | pol | 38,231,100 | HH
|
| Pomak | | | |
|
| Portuguese | pt | por | 10,100,000 | HH
|
| Qazaq | kk | | |
|
| Romani | ry | rom | 3,246,000 | HH
|
| Romanian | ro | ron | 23,741,000 | HH
|
| Romansh | rm | roh | 51,100 | HH
|
| Rusinian | | | 23,300 | HH
|
| Russian | ru | rus | 135,772,000 | HH
|
| Rutul | | | 12,000 | GI
|
| Sámi, Inari | sy | | 400 | ET
|
| Sámi, Kildin | sz | | 1,000 | ET
|
| Sámi, Lule | sx | | 2,000 | ET
|
| Sámi, Northern | sb | | 16,600 | ET
|
| Sámi, Pite * | | | 1,000 | ET
|
| Sámi, Skolt | xs | | 1,000 | ET
|
| Sámi, Southern | so | | 5,000 | ET
|
| Sámi, Ter * | | | 500 | ET
|
| Sámi, Ume * | | | 500 | ET
|
| Sardinian | sc | srd | 1,400,000 | HH
|
| Scots | ll | sco | |
|
| Serbian | sr | srp | |
|
| Slovak | sk | slk | 5,115,000 | HH
|
| Slovenian | sl | slv | 1,859,000 | HH
|
| Sorbian, Lower | sf | wen | 20,000 | HH
|
| Sorbian, Upper | se | wen | 40,000 | HH
|
| Spanish | es | spa | 28,616,000 | HH
|
| Svan | | | 35,000 | ET
|
| Swedish | sv | swe? | 8,196,700 | HH
|
| Tabasaran | tb | | 35,000 | BX
|
| Talysh | | | 165,000 | ET
|
| Tatar, Baraba * | | | |
|
| Tatar, Crimean | | | 47,000 | HH
|
| Tatar, Kazan | tt | tat | 7,000,000 | ET
|
| Tati | | | |
|
| Tindi * | | | 5,000 | ET
|
| Tsakonian | | | 1,200 | ET
|
| Tsaxur | | | 20,000 | GI
|
| Tsez * | | | 7,000 | EB
|
| Turkish | tr | tur | 56,000,000 | ET
|
| Turkish, Crimean | | | 300,000 | ET
|
| Ubyx * | | | 0 | JG
|
| Udi | | | 5,841 | ET
|
| Udmurt | um | | 723,500 | HH
|
| Ukrainian | ua | ukr | 43,235,100 | HH
|
| Veps | vp | | 12,140 | HH
|
| Votic | | vot | 25 | ET
|
| Walloon | wl | | |
|
| Welsh | cy | cym | 503,000 | HH
|
| Xinalug | | | 1,000 | JD
|
| Xvarshi * | | | 1,000 | EB
|
| Yiddish | yi | yid | 265,000 | HH
|
AG Genko 1955
AM Magometov 1970
BX Xanmagomedov 1967
EB Bokarëv 1959, 1967
ET SIL's Ethnologue -- figures not necessarily reliable
GI Ibragimov 1968, 1978
HH Haarmann 1993
JD Desheriev 1967
JG Gippert 1994
MM Mosin 1994
VS Vestnik Statistiki 1990/91 (http://www.indigo.ie/egt/xxxx.html)
TG Godava 1971
ZM Magomedbekova 1967, 1971
The following sources, but not only these, were instrumental in the preparation of this report.
(To be supplied.)
Téir go dtí innéacs EGT
(Go to the EGT index)
HTML Michael Everson, everson@indigo.ie, Dublin, 1997-02-01