News Release: World Wide Web Consortium Issues Critical Internationalization Recommendation

Today, W3C publishes the first of a series of recommendations aimed at 
expanding the international reach of the Web. "Character Model of the 
World Wide Web - Fundamentals" brings a unified approach to using 
characters from the world's languages on the Web. For more information, 
please contact Janet Daly, <janet@w3.org> at +1 617 253 5884.

----------------------------------------------------------------

World Wide Web Consortium Issues Critical Internationalization
Recommendation

"Character Model of the World Wide Web - Fundamentals" Brings Unified
Approach to Using Characters on the Web

Web Resources:

This press release
   In English: http://www.w3.org/2005/02/charmod-pressrelease.html.en
   In French: http://www.w3.org/2005/02/charmod-pressrelease.html.fr
   In Japanese: http://www.w3.org/2005/02/charmod-pressrelease.html.ja

Internationalization Activity Homepage:
http://www.w3.org/International/

Character Model of the World Wide Web - Fundamentals:
http://www.w3.org/TR/2005/REC-charmod-20050215/

http://www.w3.org/ -- 15 February 2005 -- The World Wide Web Consortium
(W3C) has published the "Character Model of the World Wide Web:
Fundamentals" as a W3C Recommendation. It provides a well-defined and
well-understood way for Web applications to transmit and process the
characters of the world's languages.

This architectural Recommendation gives authors of specifications,
software developers, and content developers a common reference, enabling
interoperable text manipulation on the World Wide Web. It builds on the
Universal Character Set, defined jointly by the Unicode Standard and
ISO/IEC 10646. Topics include use of the terms 'character', 'encoding'
and 'string', a reference processing model, choice and identification of
character encodings, character escaping, and string indexing.

The goal of the Character Model for the World Wide Web is to facilitate
use of the Web by all people, regardless of their language, script,
writing system, and cultural conventions, in accordance with the W3C
goal of universal access.

Unicode Brings the Universal Character Set to the Web

At the core of the character model is the Universal Character Set (UCS).
The model allows Web technologies to support text in the world's scripts
(and on different platforms) and to be exchanged, read, and searched by
Web users around the world. Unicode was chosen because it provides a way
of referencing characters independent of the encoding of the text, it is
being updated and completed carefully, and it is widely accepted and
implemented by industry.

W3C adopted Unicode as the document character set for HTML in HTML 4.0.
The same approach was later used for Recommendations such as XML 1.0 and
CSS Level 2. W3C specifications and applications now use Unicode as the
common reference character set.

New Specification Clarifies Character Usage on the Web

As the number of Web applications increases, the need for a shared
character model has become more critical. Unicode is the natural choice
as the basis for that shared model, especially as applications
developers begin to consolidate their encoding options. However,
applying Unicode to the Web requires additional specifications; this is
the purpose of the W3C Character Model series.

Some aspects particular to the Web that receive more explanation in the
series include:

     * Choice of Unicode encoding forms (UTF-8, UTF-16, UTF-32)
     * Counting characters, measuring string length in the presence of
       variable-length character encodings and combining characters
     * Duplicate encodings of characters (e.g., precomposed vs.
       decomposed)
     * Use of escape mechanisms to represent characters

Series Documents to Be Completed in 2005

Today's Recommendation is the first in a set of three documents. In
development are "Character Model for the World Wide Web 1.0:
Normalization," specifying early uniform normalization and string
identity matching for text manipulation, and "Character Model for the
World Wide Web 1.0: Resource Identifiers," specifying IRI conventions.

Industry Leaders Key in Development of Character Model Series

The Character Model was developed by the W3C Internationalization
Activity's Working Group (now the W3C Internationalization Core Working
Group) with the help of the W3C Internationalization Interest Group. W3C
Members participating in the Working Group include BBC, Boeing, Ecole
Mohammadia d'Ingénieurs, IBM, Microsoft, Siemens, Sun Microsystems, and
webMethods.

Contact Americas and Australia --
     Janet Daly, <janet@w3.org>, +1.617.253.5884
Contact Europe, Africa and Middle East --
     Marie-Claire Forgue, <mcf@w3.org>, +33.492.38.75.94
Contact Asia --
     Yasuyuki Hirakawa <chibao@w3.org>, +81.466.49.1170
(also available in French and Japanese)

About the World Wide Web Consortium [W3C]

The W3C was created to lead the Web to its full potential by developing
common protocols that promote its evolution and ensure its
interoperability. It is an international industry consortium jointly run
by the MIT Computer Science and Artificial Intelligence Laboratory
(CSAIL) in the USA, the European Research Consortium for Informatics and
Mathematics (ERCIM) headquartered in France and Keio University in
Japan. Services provided by the Consortium include: a repository of
information about the World Wide Web for developers and users, and
various prototype and sample applications to demonstrate use of new
technology. More than 350 organizations are Members of W3C. To learn
more, see http://www.w3.org/

Received on Tuesday, 15 February 2005 15:03:22 UTC