Character Encodings
There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today.
For example, most Japanese e-mails are in JIS encoding and web pages in Shift-JIS and yet mobile phones in Japan usually use some form of Extended Unix Code. If a program fails to determine the encoding scheme employed, it can cause mojibake (文字化け?, "misconverted garbled/garbage characters", literally "transformed characters") and thus unreadable text on computers.
To understand how this state of affairs has arisen, it is useful to learn a little about the history of the encodings. The first encoding to become widely used was JIS X 0201, which is a single-byte encoding that only covers standard 7-bit ASCII characters with half-width katakana extensions. This was widely used in systems that were neither powerful enough nor had the storage to handle kanji (including old embedded equipment such as cash registers). This means that only katakana, not kanji, was supported using this technique. Still many embedded equipments with displays have only katakana support.
The development of kanji encodings was the beginning of the split. Shift JIS supports kanji and was developed to be completely backward compatible with JIS X 0201, and thus is in much embedded electronic equipment.
However, Shift JIS has the unfortunate property that it often breaks any parser (software that reads the coded text) that is not specifically designed to handle it. For example, a text search method can get false hits if it is not designed for Shift JIS. EUC, on the other hand, is handled much better by parsers that have been written for 7-bit ASCII (and thus EUC encodings are used on UNIX, where much of the file-handling code was historically only written for English encodings). But EUC is not backwards compatible with JIS X 0201, the first main Japanese encoding. Further complications arise because the original Internet e-mail standards only support 7-bit transfer protocols. Thus JIS encoding was developed for sending and receiving e-mails.
In character set standards such as JIS, not all required characters are included, so gaiji (外字 "external characters") are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in Internet environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be written using a larger character set (such as Unicode) that supports the required character.
Unicode is supposed to solve all encoding problems in all languages of the world. The UTF-8 encoding used to encode Unicode in web pages does not have the disadvantages that Shift-JIS has. Unicode is supported by international software and no gaiji methods are needed. There are still controversies. For Japanese, the kanji characters have been unified with Chinese, that is a character considered to be the same in both Japanese and Chinese have been given one and the same code number in Unicode, even if they look a little different. This process, called Han unification, has caused controversy. The previous encodings in Japan, Taiwan Area, Mainland China and Korea have only handled one language and Unicode should handle all. The handling of Kanji/Chinese have however been designed by a committee composed of representatives from all four countries/areas. Unicode is slowly growing because it is better supported by software from outside Japan, but still (as of 2011) most web pages in Japanese use Shift-JIS. The Japanese Wikipedia uses Unicode.
Read more about this topic: Japanese Language And Computers
Famous quotes containing the word character:
“Foolish, whenever you take the meanness and formality of that thing you do, instead of converting it into the obedient spiracle of your character and aims.”
—Ralph Waldo Emerson (18031882)