ISO/IEC 2022 Character Sets
Character encodings using ISO/IEC 2022 mechanism include:
- ISO-2022-JP. A widely used encoding for Japanese. Starts in ASCII and includes the following escape sequences
- ESC ( B to switch to ASCII (1 byte per character)
- ESC ( J to switch to JIS X 0201-1976 (ISO/IEC 646:JP) Roman set (1 byte per character)
- ESC $ @ to switch to JIS X 0208-1978 (2 bytes per character)
- ESC $ B to switch to JIS X 0208-1983 (2 bytes per character)
- ISO-2022-JP-1. The same as ISO-2022-JP with one additional escape sequence
- ESC $ ( D to switch to JIS X 0212-1990 (2 bytes per character)
- ISO-2022-JP-2. A multilingual extension of ISO-2022-JP. The same as ISO-2022-JP-1 with the following additional escape sequences
- ESC $ A to switch to GB 2312-1980 (2 bytes per character)
- ESC $ ( C to switch to KS X 1001-1992 (2 bytes per character)
- ESC . A to switch to ISO/IEC 8859-1 high part, Extended Latin 1 set (1 byte per character)
- ESC . F to switch to ISO/IEC 8859-7 high part, Basic Greek set (1 byte per character)
- ISO-2022-JP-3. The same as ISO-2022-JP with three additional escape sequences
- ESC ( I to switch to JIS X 0201-1976 Kana set (1 byte per character)
- ESC $ ( O to switch to JIS X 0213-2000 Plane 1 (2 bytes per character)
- ESC $ ( P to switch to JIS X 0213-2000 Plane 2 (2 bytes per character)
- ISO-2022-JP-2004. The same as ISO-2022-JP-3 with one additional escape sequence
- ESC $ ( Q to switch to JIS X 0213-2004 Plane 1 (2 bytes per character)
- ISO-2022-KR. An encoding for Korean.
- ESC $ ) C to switch to KS X 1001-1992, previously named KS C 5601-1987 (2 bytes per character)
- ISO-2022-CN. An encoding for Chinese.
- ESC $ ) A to switch to GB 2312-1980 (2 bytes per character)
- ESC $ ) G to switch to CNS 11643-1992 Plane 1 (2 bytes per character)
- ESC $ * H to switch to CNS 11643-1992 Plane 2 (2 bytes per character)
- ISO-2022-CN-EXT. The same as ISO-2022-CN with six additional escape sequences
- ESC $ ) E to switch to ISO-IR-165 (2 bytes per character)
- ESC $ + I to switch to CNS 11643-1992 Plane 3 (2 bytes per character)
- ESC $ + J to switch to CNS 11643-1992 Plane 4 (2 bytes per character)
- ESC $ + K to switch to CNS 11643-1992 Plane 5 (2 bytes per character)
- ESC $ + L to switch to CNS 11643-1992 Plane 6 (2 bytes per character)
- ESC $ + M to switch to CNS 11643-1992 Plane 7 (2 bytes per character)
The character after the ESC
(for single-byte character sets) or ESC $
(for multi-byte character sets) specifies the type of character set and working set that is designated to. In the above examples, the character (
(0x28) designates a 94-character set to the G0 character set. This may be replaced by )
, *
or +
(0x29–0x2B) to designate to the G1–G3 character sets.
Two of the codes above are 96-character codes, and in the above examples, the character -
(0x2D) designates to the G1 character set. This may be replaced with .
or /
(0x2E or 0x2F) to designate to the G2 or G3 character sets. As mentioned earlier, a 96-character set may not be designated to the G0 set.
There are three special cases for multi-byte codes. The code sequences ESC $ @
, ESC $ A
, and ESC $ B
were all registered before the ISO/IEC 2022 standard was finalized, so must be accepted as synonyms for the sequences ESC $ ( @
through ESC $ ( B
to designate to the G0 character set. The latter form may also be used, and may be adapted by changing the (
character to designate to the G1 through G3 character sets.
The standard also defines a way to specify coding systems that do not follow its own structure. Of particular interest, the sequence ESC % G
designates the UTF-8 coding system, which does not reserve the range 0x80–0x9F for control characters.
Read more about this topic: ISO/IEC 2022
Famous quotes containing the words character and/or sets:
“Eccentricity: strength of character doubling back on itself.”
—Mason Cooley (b. 1927)
“The moment a man sets his thoughts down on paper, however secretly, he is in a sense writing for publication.”
—Raymond Chandler (18881959)