Date: 1997-04-20
This is an unofficial HTML version of a document submitted to WG2.

Title: On the derivation of YOGH and EZH

Source: Michael Everson
Status: Irish national position
Action: Consideration by WG2 and UTC

This document is a response to e-mail discussion of my previous document, "Reminder about 4 medieval English Latin characters", in which it was proposed that the letter YOGH be added to ISO/IEC 10646 because the unification of YOGH and EZH was in error. There is a fair bit of the e-mail discussion quoted here. Ken Whistler disagreed, citing Pullum and Ladusaw's Phonetic Symbol Guide (Chicago: University of Chicago Press, 1978):
This Old Irish form of g was used in Old English orthography to represent, at various times, a voiced velar stop, a voiced velar fricative, and a palatal approximant. It survived into Middle English with the latter two values in the form <> and called "yogh." It is sometimes found set as <> (cf. Jones 1972). The letter was used in Scotland later than in England and English printers perceived a similarity between the <> and a form of z and substituted the latter. This led, according to Jespersen (1949, 22), to the current spelling pronunciation of Scottish names like Mackenzie. The character occurs in this form with this value in Isaac Pitman's 1845 Phonotypic alphabet (cf. Pitman and St. John 1969, 82).
It can be noted in passing that the glyph Pullum and Ladusaw use for the round-headed YOGH, , is too much like a number 3 for my tastes. Good typography, such as that employed by the Oxford University Press and the Early English Text Society, prefers a rather unique glyph, , which looks even less z-like than does .

It is my contention that Pullum & Ladusaw have made an incorrect analysis.

  1. The YOGH differs rather a lot from the "Old Irish" g, which is still in use in modern Irish. In Irish nomenclature we would say g Gaelach 'Gaelic g'. YOGH was derived from the Gaelic g, but is not identical with it.
  2. Old English had only one g, the Irish one, which was used for //, //, and /j/ as stated. When the Normans introduced the Carolingian g for // and /d/, YOGH continued to be used for // and /j/.
  3. Foreign scribes wrote <þ> as <y> and <> as <z>. This was an error on their part, which has left us with modern erroneous pronunciations as /ji/ for ye (really þe 'the') and /makinzi/ for Mac Coinnich (I don't know how its anglicized version was spelled with a yogh, and have never seen an example of Jespersen's assertion other than this one). This is the story of the loss of THORN and YOGH in English orthography. It is not an argument for the identity of YOGH and EZH, or for the preservation of the former as the latter.
  4. I can find no evidence that Pitman derived his EZH from Medieval English character YOGH. Indeed, the evidence suggests very strongly that Pitman's interest was in working with deformations of the letter Z.
Phonotypic alphabet
No. 2
Phonotypic alphabet
No. 3
Phonotypic alphabet
No. 4
Phonotypic alphabet
No. 5
Phonotypic alphabet
No. 6
Phonotypic alphabet
No. 7
Phonotypic alphabet
No. 8
Phonotypic alphabet
No. 10
Phonotypic alphabet

A series of design choices is evident here; Pitman has experimented with different forms of S and Z to represent SH and ZH sounds. He has derived EZH from Z. Typographically, his EZH has a very sharp point in the middle, going down to the base line like the diagonal of a z. Its design is a logical lowercase typographical extension of the reversed SIGMA used for its capital. (YOGH has never had such a capital.)

It is interesting to note that Pitman did not make use of other common Old English characters in his Phonotypy either, preferring, again, deformations of Latin characters:

Phonotypic alphabet

Whistler cited Pullum and Ladusaw as authority for the unification -- but their authority is questionable in this case (however good their work is in general). The case is far stronger that the designers of the modern IPA were influenced by Pitman's work than they were by Medieval English -- and that they borrowed his EZH, not the medieval character YOGH, for the IPA:
Phonetic notation has been one of the Association's central concerns from the very beginning. The first alphabet that was employed and promulgated was a modification of the '1847 Alphabet' of Isaac Pitman and Alexander J. Ellis. (Journal of the International Phonetic Association 25.1:43, 1995)
I do not believe that the framers of the IPA gave up the Z-derived EZH for a similar YOGH-derived EZH.

Whistler wrote (1997-04-15) a long response to my proposal and other e-mail arguments. I will quote much of this this here, with comments:

There is also no dispute that there are two (or more) glyphs in question here. The dispute is instead whether we are talking about one or two abstract characters, and how the glyphs are related to those characters. Secondarily, there is a dispute regarding the history of the glyphs and their usages, since that has a bearing on the identity of the character or characters they have been used to represent.
Whistler outlines the Unicode 1.0-2.0 position (using the terms RTG for round-topped glyph ("vaguely 3-like in appearance", ) and FTG for flat-topped glyph ("z-top with rounded tail", ):
 FTG ----
        |--> EZH = YOGH
 RTG ----
I.e. two (or more) common glyphs representing the same character, which itself has two common names. (YOGH was used in 1.0, but the name was changed to EZH as part of the merger with 10646.) The Everson position, as reflected in and some contributions to this list, is:
 FTG -------> EZH (encoded as U+0292)

 RTG -------> YOGH (not encoded, and needs to be)
I.e. two distinct characters, whose appropriate glyphs differ. And to cite Everson, "never the twain do meet." Or also: "An EZH is an EZH and a YOGH is a YOGH."

Actually, there is a third position, somewhat different, also stated by Everson:

 FTG -------> EZH (encoded as U+0292)

 FTG ----
        |--> YOGH (not encoded, and needs to be)
 RTG ----
In other words, there are two characters which share the FTG, but only the YOGH gets the RTG, which is the preferred form for it. In Everson's words: "Yoghs can look like ezhes, but ezhes can't look like yoghs. And more important: real yoghs don't usually look like ezhes."

Now what evidence does Everson bring to bear?

1. Difference in language usage. Sámi, for example, makes use of an /ezh/ phoneme, written with the FTG, and not with the RTG. Whereas Middle English makes use of a {yogh} grapheme (phonemic status is irrelevant to this discussion), written with the RTG, and not the FTG.

What I said was that the two characters have a superficially similar appearance, but that the RTG and FTG cannot be used indiscriminately. YOGH has a wider range of permissible glyph variants than does EZH (to include RTG and a kind of FTG), is historically derived from the Gaelic form of the lower-case letter g, and is used in Middle English texts. YOGH represents a number of velar or velar-related sounds (//, /j/, /w/, /x/). EZH is a character used in the International Phonetic Alphabet, is used in transcriptions of languages such as Georgian and Armenian, is historically derived from the letter z, and has no RTG glyph variants; it must always be drawn with a sharp Z-like angle at the top. EZH represents palatal and apical sounds like // or /dz/. YOGH and EZH are different characters. In modern Skolt Sámi orthography, EZH is used for /dz/ and EZH WITH CARON is used for /d/. YOGH is never found to take diacritics. Whistler argues that these distinctions are not relevant to the case of de-unification:
First, regarding the difference in language usage, I have ... pointed out the parallels to the much more extensive problem of Han character unification and language/country-specific variations in preferred glyph usage for particular characters. Granted that the principles of unification (or non-disunification) were applied more systematically to Han characters than to Latin characters, the fact that language A prefers this glyph whereas language B prefers another, for what might otherwise be considered the same character, is not sufficient basis for separating the characters.
The case for de-unifying YOGH and EZH is not based on the language-usage issue, but neither is that irrelevant to the case. Latin is not Han (and anyway there is a difference between preferring and permitting). The IPA uses a character to represent the voiced velar sound at the beginning of the word "good". In ordinary typography this would be considered a glyph variant of the letter g. But in the IPA its shape is paramount, and although LATIN SMALL LETTER G (U+0067) has variants which look like the IPA character, LATIN SMALL LETTER SCRIPT G (U+0261) admits no such variants. SCRIPT G and EZH are alike in this way. Note that the FTG EZH can be used to represent the DRAM SIGN (cf. the OUNCE SIGN, U+2125) -- but the RTG YOGH cannot be used for this. The false unification of YOGH and EZH causes needless trouble for the simple plaintext use of either of the characters.

Trond Trosterud of the Barentssekretariat wrote the following statement on Sámi glyphs to me:

I have nothing to say about the LATIN LETTER YOGH, but as a linguist and as a scholar of Fenno-Ugric languages by profession, and as a member of the Sámi Committee for Computer Standardization, I can confirm what Everson claims about LATIN LETTER EZH. The EZH indeed must be written with a sharp z-like angle on top, when it occurs (in the official orthography of Skolt Sámi, in an earlier orthography of Northern Sámi, and in the Fenno-Ugric Phonetic Alphabet), it always occurs in writing systems allowing also LATIN LETTER EZH WITH CARON, and the two glyphs are always rendered the same way, with the sharp z-like angle, due to their origin as glyphs representing sounds resembling the sounds represented by the letter Z. Also, U+0292 LATIN SMALL LETTER EZH is indeed in use in the IPA alphabet, which is an alphabet very concerned about its glyph shapes.

I doubt very much that any Sámis would accept a Sámi text in which the rounded YOGH glyph appeared as anything but defective. I have seen texts with the glyph for EZH with a form resembling the numeral "3", but this has clearly been due to insufficent computer utilities. Both the traditional texts (set in lead, before the first computers), and modern editions with Sámi fonts available use an EZH where the upper part of the glyph resembles the upper part of the Z.

Whistler moves on to my second argument:
2. Difference in historical source. Everson disputes Pullum & Ladusaw's claim of origins:
Refutation of Pullum & Ladusaw will show that their assertion of the derivation of the EZH is incorrect. They're two different characters with two different sources: one G, the other Z. And saying that the G turned into the YOGH which was misread in Scotland as Z and then resprouted a tail to become EZH doesn't convince.
Arguments of this refutation have been given above. Whistler may be said to have disagreed with the assertion that EZH derived from Z:
Now about the dispute in the origin of EZH in IPA. To get definitive, we'd have to go dig around in the phonological literature from 1865 to 1888, but there seems a very clear line from the use of the FTG in Old English scholarship to transcribe a letter which among other allophones, had the value [d] (as in enel 'angel', [enel]), to choice of it to represent the IPA sound [], for which no other obvious letter was available.
This is not correct. In the first place there was no Old English "enel", since there was no YOGH in Old English (see the OED below). In the second place, in the Old English reflex of Latin angelus, "enel", the is just a way of showing the Gaelic g; that word is no more nor less than engel. The sequence <ng> in Old English is pronounced // or // -- engel was pronounced /eel/. When the Normans introduced the French reflex of angelus to English, angele or aungel, it was pronounced /andel/ and the g was always written with the Carolingian g. YOGH was never used for /d/.
Certainly Jesperson, Jones, Sweet, and other phonologists of the time would have been familiar with the Old English scholarship. And the glyphic resemblance of the FTG to the letter z would have made it an obvious choice also, because it would have put resemblant glyphs into a relation of representing two voiced fricatives in close articulatory proximity to each other. That is quite different from claiming that the EZH letter was coined de novo for IPA by adding a hook to the z. Other z-derived characters were added to IPA (cf. U+0290 and U+0291), but EZH would not seem to be one of them -- it had a clear standing in English orthography that predated IPA, and that standing would have been thoroughly familiar to the inventors of IPA, who were grounded in historical linguistics, among other linguistic disciplines.
Surely Jesperson et al. were familiar with Middle English orthography; but IPA was based on Pitman's 1847 Alphabet, in which it would seem that EZH was coined "de novo" by adding a hook to the z. Pitman seems to have eschewed traditional medieval characters -- at least, they are not evidenced in the Phonotypy charts (see J. Kelly, "The 1847 alphabet: an episode of Phonotypy" in R. E. Asher & J. A. Henderson, Towards a history of phonetics, Edinburgh: Edunburgh University Press, 1981, pp. 248-264).

(Actually, I checked again after writing this. ETH Ð appears in Alphabet No. 6 with the phonetic value /b/ and a reversed ETH appears in Alphabet No. 4 with the phonetic value /t/. I hardly think Pitman can be accused of being influenced at all by medieval English characters.)

3. In response to [Joe] Becker's request for "a credible plaintext context in which both letters occur and are (= must be) distinct", Everson cites the OED:
A very simple context would be the Oxford English Dictionary, which uses YOGH to represent the medieval English character in thousands of entries, and which uses EZH in the phonetic transcription of //, the sound of the "s" in the English word "measure".
Jim Agenbroad looked at the Oxford English Dictionary for me:
I checked the old Oxford English Dictionary (title page says 1933) under "measure", "yogh", "thought" and "sight". The pronounciation of the "zh" sound in the first is written with a different character than the "g" in the other three was written in former times. Both characters resemble a "old style" three -- one with a descender, but the top of "zh" is flat like a seven and the bottom has a bulbous end, while the other has a rounded top and the lower stroke tapers at the end. There seemed to be some difference among the latter as for as length of the lower stroke but that may be broken type or imperfect inking.... Also, the lower stroke of "zh" ends with an upward stroke that the other doesn't have -- it just tapers downward.
This is quite an accurate description of the typographical difference.

Gaelic gGaelic gGaelic gOld Eng. gYoghYoghZPitman

Lloyd Anderson did a very thorough analysis of the OED situation (only some of which I will copy here). Anderson uses "OE" for 'Old English' and "ME" for 'Middle English'.

  1. The OED uses an alphabet in which <> contrasts with <>.
  2. This contrast occurs within the same typeface and style
  3. The same contrast occurs within several different typefaces and styles
  4. A change of typeface or style is NOT the context for the difference between <> and <>. Therefore the two should on their face be separately encoded even in order to represent a plaintext version of the OED.
  5. (One might say that the OED has CREATED this distinction. Nevertheless, it is a plaintext distinction. Of all books, the OED should not be relegated to some secondary status.)
Here is the summary for point 1. above.
It is, I believe, important to reiterate a few things here. Firstly, Old English did not use a YOGH. Old English used the insular letter g, which the Saxons had been given by the Irish. In the table below, I give the Old English word gif 'if' (pronounced "yiff") in two fonts designed for and used by Old English scholars, and in ten modern Gaelic fonts currently available from me.

BeowulfJuniusAcaillCois LifeCorcaighCeanannas

None of these characters are YOGHs, and none of them can be construed to be YOGHs. Now for paedagogical purposes, some Old English editors, and the OED, mark the Old English g in the environment before front vowels in particular ways, sometimes with YOGH:

UnmarkedUnmarkedMarkedMarkedMiddle Eng.

But the distinction does not obtain in Old English texts. In Middle English, the letters <g> and <> are distinct, and when "if" is written, "if" is meant. Whistler suggested that Old English editors have chosen to prefer an FTG glyph while deferring to my statement that Middle English editors prefer the RTG glyph. (All one need do is look at the EETS texts to confirm the preferred glyph for Middle English, since they are editions meant to be read.) But I would suggest that the situation is more complex still. Old English editors, endeavouring to represent the Gaelic g, and being familiar with modern linguistics and modern fonts, substitute the IPA EZH for the Gaelic g because it is handy to do so -- not because the EZH looks very much like a Gaelic g (it doesn't, really, to the discriminating eye).

The Gaelic g is neither a YOGH nor an EZH. That some Old English editors use an EZH to represent the Gaelic g is not an argument for unifying YOGH and EZH. The editors of the OED tried to explain the situation, as Lloyd Anderson shows:

Here is the summary for points 2-3-4 above:

As Ken Whistler correctly notes, dictionaries do use a range of typeface styles, point sizes, and so on. We should always consider the possibility that a distinction is caused by change of typeface. However, we should also accept the prima-facie evidence when it stares us in the face.

In the OED, there are at least the following:

PRIMARY TEXT: Larger and smaller point sizes, including italic portions for some citations.
CITATIONS: in plainstyle
Within the Citations, the contrast occurs, with no other indications that either typeface or style or anything analogous is being changed. In fact, coherence of the sections indicates precisely that all of the contrasts belong to the SAME style. We have a list like the following of forms, for example, for the word "give"

-efo, -eofu, efve, eove, eve, ife, iefe, ife, ive, yive, if, gif, gyve, geve, give

Notice that the differences <> vs. <> vs. <g> vs. <y> are all treated as exactly analogous, as like the differences between <e>, <i>, <eo>, <ie>, <y>

General discussions of this topic occur in two places in the OED, in the introduction (volume I page xxix) and in the article on the letter "G" (volume VI page 299). The first of these is actually more explicit and clear.

Introduction volume I page xxix:

In printing Old English modern scholars sometimes reproduce the contemporary ', ' (as is done by Sievers, in his Angelsachsische Grammatik), but more commonly substitute modern 'g, g'. The adoption of either course exclusively in this work would have broken the historical continuity of the forms; in the one case, we should have had the same word appearing in the eleventh century as 'old', and in the twelfth century as 'gold'; in the other, the same word written in the eleventh century 'ge' and in twelfth century 'e'. To avoid this, both forms are here used in Old English, in accordance with the Middle English distinction in their use: thus, 'gold', 'e', 'dæ'. The reader will understand that 'g' and '' represent the same Old English letter, and that the distinction made between them is purely editorial (though certainly corresponding to a distinction of sound in OE.). For ME. the form '' commonly used in reprints is employed, so that OE. 'e' becomes ME. 'e', modern 'ye'; OE. 'eno, enoh', ME. 'yno, inou', mod. 'enough'.
Article on "G", volume VI page 299:
In early ME. the continental form of G (approximately g) was used for the two sounds which the letter hand in French, (g) and (d), while the OE. form was used for the sounds peculiar to native words, viz. the guttural and palatal spirants (, j). ... The symbol gradually came to assume a form indistinguishable from that used for Z in contemporary MSS.; in this Dictionary the form is employed for ME words. The symbol was commonly used in ME. for the sound of (j) initial and final, for the g guttural and palatal unvoiced spirant final or before t (as in inou, aut, nit, OE. enóh, áht, niht), and, so long as the sound remained in the language, for the guttural voiced spirant. From the 13th c., however, the was by some scribes wholly or partially discarded for y or gh; a few texts have yh. In the 15th c. vocabularies the words beginning with are at the end of the alphabet.
Getting back to the present proposal: The Middle English character YOGH is not represented in UCS. The character EZH has been made to do triple duty: for YOGH, for DRAM, and for itself. I am not a foe of unification; unifying EZH and DRAM was sensible. But EZH and DRAM can't be satisfactorily represented by the RTG. Whistler recognizes this in principle:
IPA is a fairly prescriptive system, Because IPA, from its inception in 1886, has had among its principles the use of distinct (Latin-derived) letter forms for sounds which may distinguish words in any one language, it added a fairly large number of letters beyond the typical Latin alphabet. The converse of this principle is that arbitrary glyphic variation of IPA letters would be confusing in transcription (since many new letters are created by adding small hooks and tails to existing letters). IPA transcription thus tends to disallow glyphic variation and to follow quite rigidly the forms for IPA letters published in the official charts from the International Phonetic Association.

So from 1886 we have a tradition of the FTG being specified for the phone [] in IPA. Languages which have orthographies derived from IPA-influenced phonology will, understandably, follow the IPA pattern in representation of the glyph. This explains the Sámi situation for EZH.

Actually, as shown above, EZH was in use by 1847.
But what about the Old and Middle English tradition? I don't think there is any dispute that the yogh in Old English is graphically derived from the g in Old Irish.
Whistler is right to say that Gaelic g became YOGH. He also believes that YOGH was borrowed into the IPA as EZH. But we have seen that EZH derives from Z and not from g, and so it is clear that there are two (admittedly not dissimilar) characters -- to be encoded, not to be unified.
Middle English may also be transcribed in various ways, but I defer to Everson's claim that the "correct" way to represent the Middle English yogh is with the RTG. I suspect that the RTG may be established in the typography of canonical editions of Middle English sources, and certainly may have been derived from medieval preferences for the written form of the letter.
Just so.
More research is invited. My immediate sources just show y's and g's, having dropped all the Old-English derived special letters.
There were in fact numerous Middle English dialects which didn't make use of letters like YOGH or THORN.
Finally, Whistler drew up a list of pros and cons regarding disunification of U+0292 LATIN SMALL LETTER EZH and coding a separate character for YOGH:
Do nothing (leave things as they are).Encode a separate YOGH (follow the Ireland proposal)
  • Easy to accomplish.
  • Corresponds to what the standard has been claiming since 1991.
  • Does not disturb any encoded data which may have been relying on U+0292 to encode the yogh in particular, as for Middle English or Old English.
  • Makes it impossible in a monofont to distinguish between the (claimed) Middle English preferred form of yogh (RTG) from the IPA prescribed form of ezh (FTG). This makes it difficult to mix, e.g. Middle English and IPA in plain text, and have both look "correct" in a single font.
  • Solves the monofont problem for mixing Middle English and IPA (or IPA-derived orthographies -- say, Sámi) in plain text while getting best presentation for both yogh and ezh.
  • Creates a new problem for Old English yogh.
  • What if I now try to mix Old English and Middle English, and I expect the Old English to look as it does in a definitive source such as Diamond? I would certainly want to encode the Old English yogh with the standard's YOGH character, but the glyph is wrong. The correct glyph is instead associated with the disunified IPA EZH character. I still have a monofont problem, and this one has no clear solution.
  • Also, creation of the new character invalidates the interpretation of any existing U+0292 characters which were intended for yogh.
  • Response:Response:Response:Response:
  • True enough.
  • The standard has been in error since 1991.
  • In reality today, hardly anyone has UCS implementations, and the average medieval English scholar is most likely using the Macintosh and PC fonts available on the Internet, which do not have UCS values associated with them. (Probably more ASCII data where the digit 3 has been used to encode YOGH exists than Middle English data in UCS encoding in 1997.) Indeed, at least one medievalist font, available on the internet, encodes G, G WITH DOT ABOVE, YOGH, and EZH as separate characters (see the table below). Adding YOGH to UCS will preserve the integrity of data encoded with such a font.
  • This is compelling enough, but is compounded by the unification of EZH with DRAM, which cannot be correctly represented by the YOGH's RTG.
  • This is compelling; it is what the standard should do. The example given by Lloyd Anderson from the OED (-efo, -eofu, efve, eove, eve, ife, iefe, ife, ive, yive, if, gif, gyve, geve, give) can be easily and correctly represented if both a YOGH and an EZH are encoded in UCS.
  • It cannot possibly create a problem for the Old English YOGH, as there is no Old English YOGH.
  • Some editors reencode Old English g before front vowels as either G WITH DOT ABOVE or EZH (others use a non-plaintext Gaelic g font solution, preserving the g encoding). This is an editorial choice which is irrelevant to the present proposal. In any case, if you wish to present a multilingual text in which one OE scholar writes <gif>, another <if>, a third <if>, and a ME scholar <if>, you can do this in plaintext in UCS only if you encode a YOGH.
  • This is a fact of font implementation which font designers (and I am one of these) have to live with. Better to correct the error now than to let it perpetuate for decades and decades to come. The unification of YOGH with EZH is a false unification.
  • Edlund Medievalist font

    De-unification of YOGH and EZH will be of benefit to all who wish to use either characters, or both characters. By making this correct character distinction, problems of character identity and presentation can be comprehensively solved.

    Michael Everson, Evertype, Dublin, 2002-03-31