ISO/IEC JTC1/SC2/WG2 N____
Date: 1997-06-26
This is an unofficial HTML version of a document submitted to WG2.

Title: Proposal to add Latin characters required by Latinized Taiwanese languages to ISO/IEC 10646

Source: Te Khai-su, Taiwan Protocol (TW) and Michael Everson, (IE)
Status: Expert Contribution
Action: For consideration by JTC1/SC2/WG2

A. Administrative

1. Title Proposal to add Latin characters required by Latinized Taiwanese languages to ISO/IEC 10646
2. Requester's name Te Khai-su, Taiwan Protocol, and Michael Everson, Everson Typography (WG2 member for Ireland)
3. Requester type Expert contribution
4. Submission date 1997-06-26
5. Requester's reference http://www.taiwanese.com/tp/10646/latin.html
6a. Completion This is a complete proposal.
6b. More information to be provided? No

B. Technical -- General

1a. New script? Name? No
1b. Addition of characters to existing block? Name? Yes. One character to be added to Combining Diacritical Marks and 44 characters to be added to Latin Extended-B
2. Number of characters 45
3. Proposed category Latin is a Category A script.
4. Proposed level of implementation and rationale Implementation level is 1 for these characters as they can be decomposed, except for COMBINING RIGHT DOT ABOVE.
5a. Character names included in proposal? Yes
5b. Character names in accordance with guidelines? Yes
5c. Character shapes reviewable? Yes (see below)
6a. Who will provide computerized font? Te Khai-su, Taiwan Protocol
6b. Font currently available? Te Khai-su, Taiwan Protocol (fonts)
6c. Font format? TrueType
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided? Yes. An 8-bit font was commonly used. It has de-facto use status. A standard (TP 1) was drafted to describe the encoding used by the font. (see below)
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?
  • Hak-ngi Siin-kin: Taiwanese Hakka New Testament of 1993. This is I Corinthians ("Ko-lim-to") 12:1--10.
  • Seng-keng: The Holo ("Amoy Romanized") Bible of 1933. This is the end of the Book of Job ("Iok-pek-ki") and the beginning few Psalms ("Si-phian").
  • More examples can be found at http://www.taiwanese.com/tp/tpsurvey/ref/

    8. Does the proposal address other aspects of character data processing? No

    C. Technical -- Justification

    1. Contact with the user community? Yes. The World Conference on Taiwanese Languages (TW, US), the Bible Society (TW), HOTSYS-HAKSYS (US), 5% Taiwanese Translation Project (TW), Taiwanese Writing Forum (TW, US), and users elsewhere (CA, DE, JP).
    2. Information on the user community? Taiwanese Holo, more commonly known as Min Nan, is used by about 14,345,000 Taiwanese, or 67% of the population in Taiwan. Taiwanese Hakka is used by about 2,366,000 Taiwanese, or 11% of the population in Taiwan.
    3a. The context of use for the proposed characters? Peh-oe-ji/Phak-fa-sii (colloquial writing), as shown here, is the most common Latin script, and the only Latin script with Bible published. Modern journals such as Taiwanese Writing Forum use a mixture of Han and Latin characters. (see below)
    3b. Reference (see below)
    4a. Proposed characters in current use? Yes
    4b. Where? In Taiwan.
    5a. Characters should be encoded entirely in BMP? Yes
    5b. Rationale Latin is a Category A script.
    6. Should characters be kept in a continuous range? Not necessarily.
    7a. Can the characters be considered a presentation form of an existing character or character sequence? Yes for some of the precomposed characters.
    7b. Where? Some of the precomposed characters can be composed of existing Latin characters and existing combining diacritical marks.
    7c. Reference
    8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character? Yes
    8b. Where? The proposed COMBINING RIGHT DOT ABOVE and the existing COMBINING DOT ABOVE are similar in appearance and function. (see below)
    8c. Reference
    9a. Combining characters or use of composite sequences included? No
    9b. List of composite sequences and their corresponding glyph images provided? No
    10. Characters with any special properties such as control function, etc. included? No

    D. SC2/WG2 Administrative

    To be completed by SC2/WG2

    1. Relevant SC 2/WG 2 document numbers:
    2. Status (list of meeting number and corresponding action or disposition)
    3. Additional contact to user communities, liaison organizations etc.
    4. Assigned category and assigned priority/time frame
    Other Comments


    E. Proposal

    User community

    The Latin script Peh-oe-ji is used to write the modern Holo language of Taiwan; The Latin script Phak-fa-sii is used to write the modern Hakka language of Taiwan. Taiwan is an island in the Western Pacific Ocean, north of the Philippines, off the southeastern coast of China, about 23°30'N 121°00'E. About 21 million people live in Taiwan. Taiwanese Holo, more commonly known as Min Nan, is used by about 14,345,000 Taiwanese, or 67% of the population in Taiwan. Taiwanese Hakka is used by about 2,366,000 Taiwanese, or 11% of the population in Taiwan (Ethnologue).

    Issues

    Other Latin scripts for these two languages also exist, though far less prevalent, in Taiwan. Most modern publications use a mixture of Han and Latin scripts, with the proportion of each differing from author to author.

    Although the proposed COMBINING RIGHT DOT ABOVE and the existing COMBINING DOT ABOVE have similar appearance and probably the same function indicating an open vowel "o", in no publication has COMBINING DOT ABOVE appeared in the place of COMBINING RIGHT DOT ABOVE. Therefore it is inappropriate to view the proposed character COMBINING RIGHT DOT ABOVE as a presentation form of COMBINING DOT ABOVE.

    The precomposed characters are proposed to ensure compatibility with the existing font "HoloWin" and "HakkaWin" in the word-processing software HOTSYS and HAKSYS widely employed in the user community.


    References


    Names

    COMBINING RIGHT DOT ABOVE
    
    LATIN CAPITAL LETTER A WITH VERTICAL BAR
    LATIN SMALL LETTER A WITH VERTICAL BAR
    LATIN CAPITAL LETTER E WITH VERTICAL BAR
    LATIN SMALL LETTER E WITH VERTICAL BAR
    LATIN CAPITAL LETTER I WITH VERTICAL BAR
    LATIN SMALL LETTER I WITH VERTICAL BAR
    LATIN CAPITAL LETTER M WITH GRAVE
    LATIN SMALL LETTER M WITH GRAVE
    LATIN CAPITAL LETTER M WITH CIRCUMFLEX
    LATIN SMALL LETTER M WITH CIRCUMFLEX
    LATIN CAPITAL LETTER M WITH MACRON
    LATIN SMALL LETTER M WITH MACRON
    LATIN CAPITAL LETTER M WITH VERTICAL BAR
    LATIN SMALL LETTER M WITH VERTICAL BAR
    LATIN CAPITAL LETTER N WITH CIRCUMFLEX
    LATIN SMALL LETTER N WITH CIRCUMFLEX
    
    LATIN CAPITAL LETTER N WITH MACRON
    LATIN SMALL LETTER N WITH MACRON
    LATIN CAPITAL LETTER N WITH VERTICAL BAR
    LATIN SMALL LETTER N WITH VERTICAL BAR
    LATIN CAPITAL LETTER O WITH VERTICAL BAR
    LATIN SMALL LETTER O WITH VERTICAL BAR
    LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE
    LATIN SMALL LETTER O WITH RIGHT DOT ABOVE
    LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE AND ACUTE
    LATIN SMALL LETTER O WITH RIGHT DOT ABOVE AND ACUTE
    LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE AND GRAVE
    LATIN SMALL LETTER O WITH RIGHT DOT ABOVE AND GRAVE
    LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE AND CIRCUMFLEX
    LATIN SMALL LETTER O WITH RIGHT DOT ABOVE AND CIRCUMFLEX
    LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE AND MACRON
    LATIN SMALL LETTER O WITH RIGHT DOT ABOVE AND MACRON
    
    LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE AND VERTICAL BAR
    LATIN SMALL LETTER O WITH RIGHT DOT ABOVE AND VERTICAL BAR
    LATIN CAPITAL LETTER U WITH VERTICAL BAR
    LATIN SMALL LETTER U WITH VERTICAL BAR
    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW AND ACUTE
    LATIN SMALL LETTER U WITH DIAERESIS BELOW AND ACUTE
    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW AND GRAVE
    LATIN SMALL LETTER U WITH DIAERESIS BELOW AND GRAVE
    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW AND CIRCUMFLEX
    LATIN SMALL LETTER U WITH DIAERESIS BELOW AND CIRCUMFLEX
    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW AND VERTICAL BAR
    LATIN SMALL LETTER U WITH DIAERESIS BELOW AND VERTICAL BAR

    Glyphs

    8-bit 2022-compatible code table
    8-bit de-facto font code table

    Go to evertype.com | Khì Tâi-oân Piau-chún (Go to Taiwan Protocol)
    Michael Everson, everson@evertype.com, Dublin, and Te Khai-su, khaisu@formosa.org, Pasadena, 1997-06-26