Notepad (software) - Unicode Detection

Unicode Detection

The Windows NT version of Notepad, installed by default on Windows 2000 and Windows XP, has the ability to detect Unicode files even when they are missing a byte order mark. To do this, it utilizes a Windows API function called IsTextUnicode. However, this function is imperfect, incorrectly identifying some all-lowercase ASCII text as UTF-16. As a result, Notepad interprets a file containing a phrase like "aaaa aaa aaa aaaaa" ("4-3-3-5") as two-byte Unicode text file and attempts to display it as such. If a font with support for Chinese is installed, nine Chinese characters (桴獩愠灰挠湡戠敲歡) are displayed; otherwise, it will display squares instead of Chinese characters.

A few people misinterpreted this issue for an easter egg. Many phrases which fit the pattern (including “this app can break”, “Bush hid the facts” and “acre vai pra globo”) appeared on the web as hoaxes. Windows expert Raymond Chen correctly attributed it to the Unicode detection algorithm.

This issue has been resolved in the Windows Vista and Windows 7 versions of Notepad.

Read more about this topic:  Notepad (software)