The other week I worked on a project to “rehabilitate” two already-encoded letters that are badly specified, and which cause problems to people using Cyrillic in the UCS. Not problems just for the end user, but problems for implementers as well. The characters in question are U+0478 CYRILLIC CAPITAL LETTER UK, U+0479 CYRILLIC SMALL LETTER UK, U+047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO, U+047D CYRILLIC SMALL LETTER OMEGA WITH TITLO. The exciting story is found in
this document.
My idea was to come up with practical solutions that will
avoid ambiguity. On the other hand, theoretical perfection is something we don’t have the luxury for. We are doing damage control on bad choices made more than a decade ago! I am sure we would not have made those mistakes were we encoding Cyrillic for the first time today.
Today, I think we would have encoded a BROAD OMEGA and used diacritics for the beautiful omega or other things, and we would have encoded MONOGRAPH UK and left digraph UK to be encoded as a string of characters, Cyrillic
о and
у. Solution 2b and 3b in my document were attempts to achieve that situation, which would have been ideal, in my view.
The UTC was conservative on the side of stability, and more or less chose solutions 2a and 3a. (It's not done till it's published of course.) I had a concern that if they choose 2a, it will be possible to represent beautiful omega both as 047D and as BROAD OMEGA with two diacritics, and those will not be equivalent, which would cause ambiguity in text representation. (Of course, we have this now with OMEGA WITH TITLO, so the situation would not be
worse than it is today.)
I thought that the case against 3a is a good deal stronger. A number of vendors are happy shipping monograph glyphs for 0479, and this poses no security issues. Looking at the Cyrillic fonts shipping with Windows XP, however, I found that all but one of them avoids encoding this character at all. My guess is that this is a question of security. So... we still have a problem here, since digraph UK can be represented by two letters, or (in principle) by this UK. I am thinking that the best solution for security's sake is to recommend that the reference glyphs for 0479 are drawn with half-width letters, to distinguish it and make it unappealing to use the character at all. This is tantamount to deprecation—if everyone does this in their fonts, it would be a real solution.