UCS (Universal Character Set)
Revision as of 14:50, 19 February 2012 by DominiqueS
From the Wikipedia entry:
- The Universal Character Set is a character encoding that is defined by the international standard ISO/IEC 10646. It maps hundreds of thousands of abstract characters, each identified by an unambiguous name, to integers, called numeric code points.
- Since 1991, the Unicode Consortium has been working with ISO to develop the Unicode Standard and ISO/IEC 10646 in tandem. The repertoire, character names, and code points of Version 2.0 of the Unicode Standard are identical to those of ISO/IEC 10646-1:1993 with its first seven published amendments. After Unicode 3.0 was published in February 2000, the new and updated characters were brought into the UCS via ISO/IEC 10646-1:2000.
- The UCS has over 1.1 million code points, but only the first 65536 (the Basic Multilingual Plane, or BMP) were commonly used before 2000. This situation began changing with mandate by the People's Republic of China in 2000 that computer systems sold there must support GB18030, which required that computer systems intended for sale in the PRC must move beyond the BMP.
- Many code points, even in the BMP, are deliberately not assigned to characters, to allow for future expansion or to minimize conflicts with other encoding forms.