ISO/IEC JTC1/SC2/WG3 N362 FIRST DRAFT
Date: 1996-06-19

Title: Proposal for a new part of ISO/IEC 8859: Latin alphabet No. 9 (Sámi)

Source: Michael Everson, Everson Gunn Teoranta (WG3 member for Ireland)
Status: Expert Contribution
Action: For consideration by JTC1/SC2/WG3


Latin alphabet No. 9 (Sámi)

preCOMMITTEE DRAFT INTERNATIONAL STANDARD
ISO/IEC 8859-15 (E)
Draft revision 1996-06-19

Information technology -
8-bit single-byte coded graphic character sets

Part 15:
Latin alphabet No. 9 (Sámi)


Contents

0 Introduction
1 Scope
2 Conformance
3 Normative references
4 Definitions
5 Notation, code table and character names
6 Specification of the coded character set
7 Identification of the character set

0 Introduction

ISO/IEC 8859 consists of several parts. Each part specifies a set of up to 191 graphic characters and the coded representation of these characters by means of a single 8-bit byte. Each set is intended for use for a particular group of languages.

Control functions may be used in conjunction with the coded characters specified in the parts of ISO/IEC 8859. However, control functions are not used to create composite graphic symbols from two or more graphic characters (see 6).

1 Scope

This part of ISO/IEC 8859 specifies a set of 191 coded graphic characters identified as Latin alphabet No. 9 (Sámi).

This set of coded graphic characters is intended for use in data and text processing applications and may also be used for information interchange.

The set contains graphic characters used for general purpose applications in typical office environments in at least the following languages: Basque, Breton, Catalan, Cornish, Danish, Dutch, English, Estonian, Faroese, Finnish, French, German, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Manx Gaelic, Norwegian, Portuguese, Inari Sámi, Northern Sámi, Skolt Sámi, Scottish Gaelic, Slovenian, Swedish.

This set of coded graphic characters may be regarded as a version of an 8-bit code according to ISO/IEC 2022 or ISO/IEC 4873 at level 1.

This part of ISO/IEC 8859 may not be used in conjunction with any other parts of ISO/IEC 8859. If coded characters from more than one part are to be used together, by means of code extension techniques, the equivalent coded character sets from ISO/IEC 10367 should be used instead within a version of ISO/IEC 4873 at level 2 or level 3.

The coded characters in this set may be used in conjunction with coded control functions selected from ISO/IEC 6429.

NOTE: ISO/IEC 8859 is not intended for use with Telematic services defined by ITU-T. If information coded according to ISO/IEC 8859 is to be transferred to such services, it will have to conform to the requirements of those services at the access-point.

2 Conformance

2.1 Conformance of information interchange

A coded-character-data-element (CC-data-element) within coded information for interchange is in conformance with this part of this International Standard if all the coded representations of graphic characters within that CC-data-element conform to the requirements of clause 6.

2.2 Conformance of devices

A device is in conformance with this International Standard if it conforms to the requirements of 2.2.1, and either or both of 2.2.2 and 2.2.3. A claim of conformance shall identify the document which contains the description specified in 2.2.1.

2.2.1 Device description

A device that conforms to this International Standard shall be the subject of a description that identifies the means by which the user may supply characters to the device, or may recognize them when they are made available to him, as specified respectively in 2.2.2 and 2.2.3.

2.2.2 Originating devices

An originating device shall allow its user to supply any sequence of characters from those specified in clause 6, and shall be capable of transmitting their coded representations within a CC-data-element.

2.2.3 Receiving devices

A receiving device shall be capable of receiving and interpreting any coded representations of characters that are within a CC-data-element, and that conform to clause 6, and shall make the corresponding characters available to its user in such a way that the user can identify them from among those specified there, and can distinguish them from each other.

3 Normative references

ISO/IEC 2022, Information technology - Character code structure and extension techniques.

ISO/IEC 4873, Information technology - ISO 8-bit code for information interchange - Structure and rules for implementation.

ISO/IEC 8824, Information technology - Open systems interconnection - Abstract Syntax Notation One (ASN.1).

4 Definitions

4.1 bit combination: An ordered set of bits used for the representation of characters.

4.2 byte: A bit string that is operated upon as a unit.

4.3 character: A member of a set of elements used for the organization, control, or representation of data.

4.4 code table: A table showing the characters allocated to each bit combination in a code.

4.5 coded character set; code: A set of unambiguous rules that establishes a character set and the one-to-one relationship between the characters of the set and their bit combinations.

4.6 graphic character: A character, other than a control function, that has a visual representation normally handwritten, printed or displayed, and that has a coded representation consisting of one or more bit combinations.

NOTE: in ISO/IEC 8859 a single bit combination is used to represent each character.

4.7 graphic symbol: A visual representation of a graphic character or of a control function.

4.8 position: That part of a code table identified by its column and row coordinates.

5 Notation, code table, and names

5.1 Notation

The bits of the bit combinations of the 8-bit code are identified by b8, b7, b6, b5, b4, b3, b2, and b1, where b8 is the highest-order, or most-significant bit and b1 is the lowest-order, or least-significant bit.

The bit combinations may be interpreted to represent numbers in binary notation by attributing the following weights to the individual bits:

Bitb8b7b6b5b4b3b2b1
Weight1286432188421

Using these weights, the bit combinations are identified by notations of the form xx/yy, where xx and yy are numbers in the range 00 to 15. The correspondence between the notations of the form xx/yy and the bit combinations consisting of the bits b8 to b1 is as follows:


5.2 Layout of the code table

An 8-bit code table consists of 256 positions arranged in 16 columns and 16 rows. The columns and the rows are numbered 00 to 15.

The code table positions are identified by notations of the form xx/yy, where xx is the column number and yy is the row number.

The positions of the code table are in one-to-one correspondence with the bit combinations of the code. The notation of a code table position, of the form xx/yy, is the same as that of the corresponding bit combination.

5.3 Names and meanings

This part of ISO/IEC 8859 assigns a unique name to each graphic character. These names have been taken from ISO/IEC 10646-1 (E). This part of ISO/IEC 8859 also specifies an acronym for each of the characters SPACE, NO-BREAK SPACE and SOFT HYPHEN. For acronyms only Latin capital letters A to Z are used. It is intended that the acronyms be retained in all translations of the text.

The names chosen to denote graphic characters are intended to reflect their customary meaning. However, except for SPACE (SP), NO-BREAK SPACE (NBSP) and SOFT HYPHEN (SHY), this part of ISO/IEC 8859 does not define and does not restrict the meanings of graphic characters. Neither does it specify a particular style or font design for imaging graphic characters.

This part of ISO/IEC 8859 specifies a graphic symbol for each graphic character. This symbol is shown in the corresponding position of the code table.

5.3.1 SPACE (SP)

A graphic character the visual representation of which consists of the absence of a graphic symbol.

5.3.2 NO-BREAK SPACE (NBSP)

A graphic character the visual representation of which consists of the absence of a graphic symbol, for use when a line break is to be prevented in the text as presented.

5.3.3 SOFT HYPHEN (SHY)

A graphic character that is imaged by a graphic symbol identical with, or similar to, that representing HYPHEN-MINUS, for use when a line break has been established within a word.

6 Specification of the coded character set

This part of ISO/IEC 8859 specifies 191 characters allocated to the bit combinations of the code table (table 2).

NOTE: None of these characters are combining characters in the sense of ISO/IEC 2022, clause 6.3.3.

Control functions, such as BACKSPACE or CARRIAGE RETURN, shall not be used to create composite graphic symbols, which are graphic symbols made up from the graphic representations of two or more characters.

6.1 Characters of the set and their coded representation

Table 1A - Name and coded representation of the characters in Columns 02 to 07

Bit
combination
HexISO
10646
Name
02/00200020SPACE
02/01210021EXCLAMATION MARK
02/02220022QUOTATION MARK
02/03230023NUMBER SIGN
02/04240024DOLLAR SIGN
02/05250025PERCENT SIGN
02/06260026AMPERSAND
02/07270027APOSTROPHE
02/08280027LEFT PARENTHESIS
02/09290029RIGHT PARENTHESIS
02/102A002AASTERISK
02/112B002BPLUS SIGN
02/122C002CCOMMA
02/132D002DHYPHEN-MINUS
02/142E002EFULL STOP
02/152F002FSOLIDUS
03/00300030DIGIT ZERO
03/01310031DIGIT ONE
03/02320032DIGIT TWO
03/03330033DIGIT THREE
03/04340034DIGIT FOUR
03/05350035DIGIT FIVE
03/06360036DIGIT SIX
03/07370037DIGIT SEVEN
03/08380038DIGIT EIGHT
03/09390039DIGIT NINE
03/103A003ACOLON
03/113B003BSEMICOLON
03/123C003CLESS-THAN SIGN
03/133D003DEQUALS SIGN
03/143E003EGREATER-THAN SIGN
03/153F003FQUESTION MARK
04/00400040COMMERCIAL AT
04/01410041LATIN CAPITAL LETTER A
04/02420042LATIN CAPITAL LETTER B
04/03430043LATIN CAPITAL LETTER C
04/04440044LATIN CAPITAL LETTER D
04/05450045LATIN CAPITAL LETTER E
04/06460046LATIN CAPITAL LETTER F
04/07470047LATIN CAPITAL LETTER G
04/08480048LATIN CAPITAL LETTER H
04/09490049LATIN CAPITAL LETTER I
04/104A004ALATIN CAPITAL LETTER J
04/114B004BLATIN CAPITAL LETTER K
04/124C004CLATIN CAPITAL LETTER L
04/134D004DLATIN CAPITAL LETTER M
04/144E004ELATIN CAPITAL LETTER N
04/154F004FLATIN CAPITAL LETTER O
05/00500050LATIN CAPITAL LETTER P
05/01510051LATIN CAPITAL LETTER Q
05/02520052LATIN CAPITAL LETTER R
05/03530053LATIN CAPITAL LETTER S
05/04540054LATIN CAPITAL LETTER T
05/05550055LATIN CAPITAL LETTER U
05/06560056LATIN CAPITAL LETTER V
05/07570057LATIN CAPITAL LETTER W
05/08580058LATIN CAPITAL LETTER X
05/09590059LATIN CAPITAL LETTER Y
05/105A005ALATIN CAPITAL LETTER Z
05/115B005BLEFT SQUARE BRACKET
05/125C005CREVERSE SOLIDUS
05/135D005DRIGHT SQUARE BRACKET
05/145E005ECIRCUMFLEX ACCENT
05/155F005FLOW LINE
06/00600060GRAVE ACCENT
06/01610061LATIN SMALL LETTER A
06/02620062LATIN SMALL LETTER B
06/03630063LATIN SMALL LETTER C
06/04640064LATIN SMALL LETTER D
06/05650065LATIN SMALL LETTER E
06/06660066LATIN SMALL LETTER F
06/07670067LATIN SMALL LETTER G
06/08680068LATIN SMALL LETTER H
06/09690069LATIN SMALL LETTER I
06/106A006ALATIN SMALL LETTER J
06/116B006BLATIN SMALL LETTER K
06/126C006CLATIN SMALL LETTER L
06/136D006DLATIN SMALL LETTER M
06/146E006ELATIN SMALL LETTER N
06/156F006FLATIN SMALL LETTER O
07/00700070LATIN SMALL LETTER P
07/01710071LATIN SMALL LETTER Q
07/02720072LATIN SMALL LETTER R
07/03730073LATIN SMALL LETTER S
07/04740074LATIN SMALL LETTER T
07/05750075LATIN SMALL LETTER U
07/06760076LATIN SMALL LETTER V
07/07770077LATIN SMALL LETTER W
07/08780078LATIN SMALL LETTER X
07/09790079LATIN SMALL LETTER Y
07/107A007ALATIN SMALL LETTER Z
07/117B007BLEFT CURLY BRACKET
07/127C007CVERTICAL LINE
07/137D007DRIGHT CURLY BRACKET
07/147E007ETILDE
07/157F007F(This position shall not be used)

Table 1B - Name and coded representation of the characters in Columns 10 to 15

(All positions from A0-FF are used for graphic characters)
Bit
combination
HexISO
10646
Name
10/00A000A0NO-BREAK SPACE
10/01A1010CLATIN CAPITAL LETTER C WITH CARON
10/02A2010DLATIN SMALL LETTER C WITH CARON
10/03A30110LATIN CAPITAL LETTER D WITH STROKE
10/04A40111LATIN SMALL LETTER D WITH STROKE
10/05A501E4LATIN CAPITAL LETTER G WITH STROKE
10/06A601E5LATIN SMALLL LETTER G WITH STROKE
10/07A700A7SECTION SIGN
10/08A801E6LATIN CAPITAL LETTER G WITH CARON
10/09A900A9COPYRIGHT SIGN
10/10AA01E7LATIN SMALL LETTER G WITH CARON
10/11AB00ABLEFT-POINTING DOUBLE ANGLED QUOTATION MARK
10/12AC01E8LATIN CAPITAL LETTER K WITH CARON
10/13AD00ADSOFT HYPHEN
10/14AE01E9LATIN SMALL LETTER K WITH CARON
10/15AF014ALATIN CAPITAL LETTER ENG
11/00B000B0DEGREE SIGN
11/01B1014BLATIN SMALL LETTER ENG
11/02B20160LATIN CAPITAL LETTER S WITH CARON
11/03B30161LATIN SMALL LETTER S WITH CARON
11/04B400B4ACUTE ACCENT
11/05B50166LATIN CAPITAL LETTER T WITH STROKE
11/06B600B6PILCROW SIGN
11/07B700B7MIDDLE DOT
11/08B80167LATIN SMALL LETTER T WITH STROKE
11/09B9017DLATIN CAPITAL LETTER Z WITH CARON
11/10BA017ELATIN SMALL LETTER Z WITH CARON
11/11BB00BBRIGHT-POINTING DOUBLE ANGLED QUOTATION MARK
11/12BC01B7LATIN CAPITAL LETTER EZH
11/13BD0292LATIN SMALL LETTER EZH
11/14BE01EELATIN CAPITAL LETTER EZH WITH CARON
11/15BF01EFLATIN SMALL LETTER EZH WITH CARON
12/10C000C0LATIN CAPITAL LETTER A WITH GRAVE
12/11C100C1LATIN CAPITAL LETTER A WITH ACUTE
12/12C200C2LATIN CAPITAL LETTER A WITH CIRCUMFLEX
12/13C300C3LATIN CAPITAL LETTER A WITH TILDE
12/14C400C4LATIN CAPITAL LETTER A WITH DIAERESIS
12/15C500C5LATIN CAPITAL LETTER A WITH RING ABOVE
12/16C600C6LATIN CAPITAL LETTER AE
12/17C700C7LATIN CAPITAL LETTER C WITH CEDILLA
12/18C800C8LATIN CAPITAL LETTER E WITH GRAVE
12/19C900C9LATIN CAPITAL LETTER E WITH ACUTE
12/10CA00CALATIN CAPITAL LETTER E WITH CIRCUMFLEX
12/11CB00CBLATIN CAPITAL LETTER E WITH DIAERESIS
12/12CC00CCLATIN CAPITAL LETTER I WITH GRAVE
12/13CD00CDLATIN CAPITAL LETTER I WITH ACUTE
12/14CE00CELATIN CAPITAL LETTER I WITH CIRCUMFLEX
12/15CF00CFLATIN CAPITAL LETTER I WITH DIAERESIS
13/10D000D0LATIN CAPITAL LETTER ETH
13/11D100D1LATIN CAPITAL LETTER N WITH TILDE
13/12D200D2LATIN CAPITAL LETTER O WITH GRAVE
13/13D300D3LATIN CAPITAL LETTER O WITH ACUTE
13/14D400D4LATIN CAPITAL LETTER O WITH CIRCUMFLEX
13/15D500D5LATIN CAPITAL LETTER O WITH TILDE
13/16D600D6LATIN CAPITAL LETTER O WITH DIAERESIS
13/17D700D7MULTIPLICATION SIGN
13/18D800D8LATIN CAPITAL LETTER O WITH STROKE
13/19D900D9LATIN CAPITAL LETTER U WITH GRAVE
13/10DA00DALATIN CAPITAL LETTER U WITH ACUTE
13/11DB00DBLATIN CAPITAL LETTER U WITH CIRCUMFLEX
13/12DC00DCLATIN CAPITAL LETTER U WITH DIAERESIS
13/13DD00DDLATIN CAPITAL LETTER Y WITH ACUTE
13/14DE00DELATIN CAPITAL LETTER THORN
13/15DF00DFLATIN SMALL LETTER SHARP S (German)
14/00E000E0LATIN SMALL LETTER A WITH GRAVE
15/11E100E1LATIN SMALL LETTER A WITH ACUTE
15/12E200E2LATIN SMALL LETTER A WITH CIRCUMFLEX
15/13E300E3LATIN SMALL LETTER A WITH TILDE
15/14E400E4LATIN SMALL LETTER A WITH DIAERESIS
15/15E500E5LATIN SMALL LETTER A WITH RING ABOVE
15/16E600E6LATIN SMALL LETTER AE
15/17E700E7LATIN SMALL LETTER C WITH CEDILLA
15/18E800E8LATIN SMALL LETTER E WITH GRAVE
15/19E900E9LATIN SMALL LETTER E WITH ACUTE
15/10EA00EALATIN SMALL LETTER E WITH CIRCUMFLEX
15/11EB00EBLATIN SMALL LETTER E WITH DIAERESIS
15/12EC00ECLATIN SMALL LETTER I WITH GRAVE
15/13ED00EDLATIN SMALL LETTER I WITH ACUTE
15/14EE00EELATIN SMALL LETTER I WITH CIRCUMFLEX
15/15EF00EFLATIN SMALL LETTER I WITH DIAERESIS
16/10F000F0LATIN SMALL LETTER ETH
16/11F100F1LATIN SMALL LETTER N WITH TILDE
16/12F200F2LATIN SMALL LETTER O WITH GRAVE
16/13F300F3LATIN SMALL LETTER O WITH ACUTE
16/14F400F4LATIN SMALL LETTER O WITH CIRCUMFLEX
16/15F500F5LATIN SMALL LETTER O WITH TILDE
16/16F600F6LATIN SMALL LETTER O WITH DIAERESIS
16/17F700F7DIVISION SIGN
16/18F800F8LATIN SMALL LETTER O WITH STROKE
16/19F900F9LATIN SMALL LETTER U WITH GRAVE
16/10FA00FALATIN SMALL LETTER U WITH ACUTE
16/11FB00FBLATIN SMALL LETTER U WITH CIRCUMFLEX
16/12FC00FCLATIN SMALL LETTER U WITH DIAERESIS
16/13FD00FDLATIN SMALL LETTER Y WITH ACUTE
16/14FE00FELATIN SMALL LETTER THORN
16/15FF00FFLATIN SMALL LETTER Y WITH DIAERESIS

6.2 Code table

For each character in the set the code table (table 2) shows a graphic symbol at the position in the code table corresponding to the bit combination specified in table 1A or 1B.

The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429.

Table 2 - Code table of Latin alphabet No. 9 (Sámi)



NOTE: The lightly shaded boxes to the left of and below the code table give the hexadecimal notation of the characters' code positions for the convenience of the user of this standard. To find the address of a character, read first the horizonta rowl, then the vertical column (so 13/15 or DF = ß).

7 Identification of the character set

7.1 Identification according to ISO/IEC 2022 and ISO/IEC 4873

The graphic characters of this part of ISO/IEC 8859 constitute a single coded character set. However in accordance with ISO/IEC 2022 and ISO/IEC 4873 the code table of this part of ISO/IEC 8859 may be considered to consist of the following components:


When the identification methods of ISO/IEC 2022 or ISO/IEC 4873 are used this part of ISO/IEC 8859 shall be identified by the following pair of designation functions:

NOTE: The corresponding escape dequences are shown in parentheses.

7.2 Identification according to ISO/IEC 8824 (ASN.1)

In the terminology of ISO/IEC 8824 the character set of this part of ISO/IEC 8859 and the corresponding coded representations are distinct, and are known as the "character abstract syntax" and the "character transfer syntax" respectively.

When the identification methods of ISO/IEC 8824 are used this part of ISO/IEC 8859 shall be identified by the following object identifiers:

The corresponding object descriptions shall be:


7.3 Identification using the ISO International register of coded character sets to be used with escape sequences

According to 7.1 above the character set of this part of ISO/IEC 8859 may be considered to consist of the character SPACE, a 94-character G0 graphic character set, and a 96-character G1 graphic character set. The G0 and G1 graphic character sets may be identified by the use of the Registration Numbers from the ISO International register of coded character sets to be used with escape sequences.

When these registration numbers are used this part of ISO/IEC 8859 shall be identifed by the following pair of registration numbers:


Annex A (informative) Coverage of languages by parts 1 to 10 of ISO/IEC 8859


A.1 Languages written in Latin script


(To be supplied)

A.2 Languages written in non-Latin scripts


(To be supplied)

Annex B (informative)


Bibliography


ISO/IEC 6429: 1988, Information processing - Control functions for 7-bit and 8-bit coded character sets.

ISO/IEC 10646-1, Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architechture and Multilingual Plane.

ISO International register of coded character sets to be used with escape sequences.

Téir go dtí innéacs EGT
(Go to the EGT index)

HTML Michael Everson, everson@indigo.ie, Dublin, 1996-10-22