Code Structure
ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters. Escape sequences allow any of a large registry of graphic character sets to be "designated" into one of four working sets, named G0 through G3, and shorter control sequences specify the working set that is "invoked" to interpret bytes in the stream.
Character codes from the 7-bit ASCII graphic range (0x20–0x7F) are referred to as "GL" codes, being on the left side of a character code table, while codes from the "high ASCII" range (0xA0–0xFF), if available, are referred to as the "GR" codes.
By default, GL codes specify G0 characters, and GR codes specify G1 characters, but this may be modified with control codes or by prior agreement:
Code | Abbr. | Name | Effect |
---|---|---|---|
0x0F | SI LS0 |
Shift In Locking shift zero |
GL encodes G0 from now on |
0x0E | SO LS1 |
Shift Out Locking shift one |
GL encodes G1 from now on |
ESC 0x6E (n) | LS2 | Locking shift two | GL encodes G2 from now on |
ESC 0x6F (o) | LS3 | Locking shift three | GL encodes G3 from now on |
0x8E ESC 0x4E (N) |
SS2 | Single shift two | GL encodes G2 for next character only |
0x8F ESC 0x4F (O) |
SS3 | Single shift three | GL encodes G3 for next character only |
ESC 0x7E (~) | LS1R | Locking shift one right | GR encodes G1 from now on |
ESC 0x7D (}) | LS2R | Locking shift two right | GR encodes G2 from now on |
ESC 0x7C (|) | LS3R | Locking shift three right | GR encodes G3 from now on |
Each of the four working sets may be a 94-character set or a 94n-character set. Additionally, G1 through G3 may be a 96- or 96n-character set. When one of the latter is invoked in the GL region, the space and delete characters (codes 0x20 and 0x7F) are not available.
There are additional (rarely used) features for switching control character sets, but this is a single-level lookup: the 0x00–0x1F range is the C0 control character set, the 0x80–0x9F range is the C1 control character set, and there are escape sequences which switch in various alternatives. It is required that any C0 character set include the ESC character at position 0x1B, so that further changes are possible.
As seen in the SS2 and SS3 examples above, single control characters from the C1 control character set may be invoked using only 7 bits using the sequences ESC 0x40 (@)
through ESC 0x5F (_)
. Additional control functions are assigned in the range ESC 0x60 (`)
through ESC 0x7E (~)
. While this article describes escape sequences using the corresponding ASCII characters, they are actually defined in terms of byte values, and the graphic assigned to that byte value may be altered without affecting the control sequence.
Escape sequences to designate character sets take the form ESC I F
, where there are one or more intermediate I bytes from the range 0x20–0x2F, and a final F byte from the range 0x40–0x7F. (The range 0x30–0x3F is reserved for private-use F bytes.) The I bytes identify the type of character set and the working set it is to be designated to, while the F byte identifies the character set itself.
Code | Hex | Abbr. | Name | Effect |
---|---|---|---|---|
ESC ! F | 1B 21 F | CZD | C0-designate | F selects a C0 control character set to be used. |
ESC " F | 1B 22 F | C1D | C1-designate | F selects a C1 control character set to be used. |
ESC % F | 1B 25 F | DOCS | Designate other coding system | F selects an 8-bit code; use ESC % @ to return to ISO/IEC 2022. |
ESC % / F | 1B 25 2F F | DOCS | Designate other coding system | F selects an 8-bit code; there is no standard way to return. |
ESC ( F | 1B 28 F | GZD4 | G0-designate 94-set | F selects a 94-character set to be used for G0. |
ESC ) F | 1B 29 F | G1D4 | G1-designate 94-set | F selects a 94-character set to be used for G1. |
ESC * F | 1B 2A F | G2D4 | G2-designate 94-set | F selects a 94-character set to be used for G2. |
ESC + F | 1B 2B F | G3D4 | G3-designate 94-set | F selects a 94-character set to be used for G3. |
ESC - F | 1B 2D F | G1D6 | G1-designate 96-set | F selects a 96-character set to be used for G1. |
ESC . F | 1B 2E F | G2D6 | G2-designate 96-set | F selects a 96-character set to be used for G2. |
ESC / F | 1B 2F F | G3D6 | G3-designate 96-set | F selects a 96-character set to be used for G3. |
ESC $ ( F | 1B 24 28 F | GZDM4 | G0-designate multibyte 94-set | F selects a 94n-character set to be used for G0. |
ESC $ ) F | 1B 24 29 F | G1DM4 | G1-designate multibyte 94-set | F selects a 94n-character set to be used for G1. |
ESC $ * F | 1B 24 2A F | G2DM4 | G2-designate multibyte 94-set | F selects a 94n-character set to be used for G2. |
ESC $ + F | 1B 24 2B F | G3DM4 | G3-designate multibyte 94-set | F selects a 94n-character set to be used for G3. |
ESC $ - F | 1B 24 2D F | G1DM6 | G1-designate multibyte 96-set | F selects a 96n-character set to be used for G1. |
ESC $ . F | 1B 24 2E F | G2DM6 | G2-designate multibyte 96-set | F selects a 96n-character set to be used for G2. |
ESC $ / F | 1B 24 2F F | G3DM6 | G3-designate multibyte 96-set | F selects a 96n-character set to be used for G3. |
Note that the registry of F bytes is independent for the different types. The 94-character graphic set designated by ESC ( A
through ESC + A
is not related in any way to the 96-character set designated by ESC - A
through ESC / A
. And neither of those is related to the 94n-character set designated by ESC $ ( A
through ESC $ + A
, and so on; the final bytes must be interpreted in context. (Indeed, without any intermediate bytes, ESC A
is a way of specifying the C1 control code 0x81.)
Also note that C0 and C1 control character sets are independent; the C0 control character set designated by ESC ! A
(which happens to be the NATS control set for newspaper text transmission) is not the same as the C1 control character set designated by ESC " A
(the CCITT attribute control set for Videotex).
Additional I bytes may be added before the F byte to extend the F byte range. This is currently only used with 94-character sets, where codes of the form ESC ( ! F
have been assigned. At the other extreme, no multibyte 96-sets have been registered, so the sequences above are strictly theoretical.
Read more about this topic: ISO/IEC 2022
Famous quotes containing the words code and/or structure:
“Wise Draco comes, deep in the midnight roll
Of black artillery; he comes, though late;
In code corroborating Calvins creed
And cynic tyrannies of honest kings;
He comes, nor parlies; and the Town, redeemed,
Gives thanks devout; nor, being thankful, heeds
The grimy slur on the Republics faith implied,
Which holds that Man is naturally good,
Andmoreis Natures Roman, never to be
scourged.”
—Herman Melville (18191891)
“The verbal poetical texture of Shakespeare is the greatest the world has known, and is immensely superior to the structure of his plays as plays. With Shakespeare it is the metaphor that is the thing, not the play.”
—Vladimir Nabokov (18991977)