ISO SC22/WG20 N482
Date: 1996-08-10


Source: Michael Everson
Status: Expert Contribution
Action: For consideration by CEN/TC304/WG1 and SC22/WG20
Distribution: CEN/TC304/WG1, SC22/WG20

Section 3 of the "Tutorial on problems solved by this standard" makes the claim that "in English and German dictionaries... lower case always precedes upper case in homographs". I maintain that, not only is this UNTRUE for the English language, but that it is unsuitable for a default Latin sorting specification.

The Concise Oxford Dictionary of Current English (Oxford: Clarendon Press 1990) consistently gives its articles with capital letters preceding small letters:

AS (Anglo-Saxon) precedes As (arsenic) precedes as (thus); August (the month) precedes august (impressive); Jock (a Scotsman) precedes jock (a jockey); LA (Los Angeles) precedes La (lanthanum) precedes la (a note to follow so...); March (the month) precedes march (walk); May (the month) precedes may (be permitted); Min. (Minster, Ministry) precedes min. (minutes, minimum, minim); Scotch (whisky) precedes scotch (put an end to).

Even in the list of abbreviations: F. (French) precedes f. (from), Hist. (History) precedes hist. (with historical reference), Ind. (of the subcontinent comprising India, Pakistan, and Bangladesh) precedes ind. (indirect), Med. (Medicine) precedes med. (medieval), Pers. (Persian) precedes pers. (person(al)), Rhet. (Rhetoric) precedes rhet. (rhetorical(ly)). There are other examples.

"English practice" cannot be said to support small-before-capital ordering. Oxford is authoritative. But there are other things which we can point to which show the naturalness of capital-before-small ordering (and the unnaturalness of small-before-capital ordering).

There is a kind of "order of honour" which is also a kind of expectation. It's hard to find this explicitly, but in general the "feeling" is that August should precede august; Bishop should precede bishop; God should precede god; Dieu should precede dieu.

There is the general naturalness of ordinary writing. We write Aachen. We don't write aAchen. Take a pen and paper and try it. Write out Ab Cd Ef Gh Ij Kl Mn Op Qr St Uv Wx Yz. Then try writing out aB cD eF gH iJ kL mN oP qR sT uV wX yZ. Difficult, isn't it? Why shouldn't this similar expectation apply for Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz ? the reverse, aA bB cC dD eE fF gG hH iI jJ kK lL mM nN oO pP qQ rR sS tT uU vV wW xX yY zZ is even hard to type. And it certainly looks wrong, doesn't it, given this context?

NOTE: In Irish Gaelic, forms like Gaeilge na hireann 'Irish Gaelic' and r nAthair 'our Father' -- but in sorting these, the lower case character (which indicates particular grammatical mutation) is meant to be ignored. When the mutated noun is not capitalized, a hyphen is used (r n-athair 'our father').

Historically, CAPITAL LETTERS existed before the small letters. (Well, they DID.)

Greek, Cyrillic, and Armenian specification is that capital letters precede small letters. It is logical to do the same for Latin.

These are reasons for why capital letters should precede small ones. I can think of no reason why small letters should precede capital letters in a default, multilingual sorting specification.

Michael Everson, Evertype, Dublin, 2001-09-21