ISO/IEC 2022 - ISO/IEC 2022 Character Sets

ISO/IEC 2022 Character Sets

Character encodings using ISO/IEC 2022 mechanism include:

  • ISO-2022-JP. A widely used encoding for Japanese. Starts in ASCII and includes the following escape sequences
    • ESC ( B to switch to ASCII (1 byte per character)
    • ESC ( J to switch to JIS X 0201-1976 (ISO/IEC 646:JP) Roman set (1 byte per character)
    • ESC $ @ to switch to JIS X 0208-1978 (2 bytes per character)
    • ESC $ B to switch to JIS X 0208-1983 (2 bytes per character)
  • ISO-2022-JP-1. The same as ISO-2022-JP with one additional escape sequence
    • ESC $ ( D to switch to JIS X 0212-1990 (2 bytes per character)
  • ISO-2022-JP-2. A multilingual extension of ISO-2022-JP. The same as ISO-2022-JP-1 with the following additional escape sequences
    • ESC $ A to switch to GB 2312-1980 (2 bytes per character)
    • ESC $ ( C to switch to KS X 1001-1992 (2 bytes per character)
    • ESC . A to switch to ISO/IEC 8859-1 high part, Extended Latin 1 set (1 byte per character)
    • ESC . F to switch to ISO/IEC 8859-7 high part, Basic Greek set (1 byte per character)
  • ISO-2022-JP-3. The same as ISO-2022-JP with three additional escape sequences
    • ESC ( I to switch to JIS X 0201-1976 Kana set (1 byte per character)
    • ESC $ ( O to switch to JIS X 0213-2000 Plane 1 (2 bytes per character)
    • ESC $ ( P to switch to JIS X 0213-2000 Plane 2 (2 bytes per character)
  • ISO-2022-JP-2004. The same as ISO-2022-JP-3 with one additional escape sequence
    • ESC $ ( Q to switch to JIS X 0213-2004 Plane 1 (2 bytes per character)
  • ISO-2022-KR. An encoding for Korean.
    • ESC $ ) C to switch to KS X 1001-1992, previously named KS C 5601-1987 (2 bytes per character)
  • ISO-2022-CN. An encoding for Chinese.
    • ESC $ ) A to switch to GB 2312-1980 (2 bytes per character)
    • ESC $ ) G to switch to CNS 11643-1992 Plane 1 (2 bytes per character)
    • ESC $ * H to switch to CNS 11643-1992 Plane 2 (2 bytes per character)
  • ISO-2022-CN-EXT. The same as ISO-2022-CN with six additional escape sequences
    • ESC $ ) E to switch to ISO-IR-165 (2 bytes per character)
    • ESC $ + I to switch to CNS 11643-1992 Plane 3 (2 bytes per character)
    • ESC $ + J to switch to CNS 11643-1992 Plane 4 (2 bytes per character)
    • ESC $ + K to switch to CNS 11643-1992 Plane 5 (2 bytes per character)
    • ESC $ + L to switch to CNS 11643-1992 Plane 6 (2 bytes per character)
    • ESC $ + M to switch to CNS 11643-1992 Plane 7 (2 bytes per character)

The character after the ESC (for single-byte character sets) or ESC $ (for multi-byte character sets) specifies the type of character set and working set that is designated to. In the above examples, the character ( (0x28) designates a 94-character set to the G0 character set. This may be replaced by ), * or + (0x29–0x2B) to designate to the G1–G3 character sets.

Two of the codes above are 96-character codes, and in the above examples, the character - (0x2D) designates to the G1 character set. This may be replaced with . or / (0x2E or 0x2F) to designate to the G2 or G3 character sets. As mentioned earlier, a 96-character set may not be designated to the G0 set.

There are three special cases for multi-byte codes. The code sequences ESC $ @, ESC $ A, and ESC $ B were all registered before the ISO/IEC 2022 standard was finalized, so must be accepted as synonyms for the sequences ESC $ ( @ through ESC $ ( B to designate to the G0 character set. The latter form may also be used, and may be adapted by changing the ( character to designate to the G1 through G3 character sets.

The standard also defines a way to specify coding systems that do not follow its own structure. Of particular interest, the sequence ESC % G designates the UTF-8 coding system, which does not reserve the range 0x80–0x9F for control characters.

Read more about this topic:  ISO/IEC 2022

Famous quotes containing the words character and/or sets:

    Eccentricity: strength of character doubling back on itself.
    Mason Cooley (b. 1927)

    The moment a man sets his thoughts down on paper, however secretly, he is in a sense writing for publication.
    Raymond Chandler (1888–1959)