W3C home > Mailing lists > Public > www-tag@w3.org > June 2005

Re: Requiring UTF-8 for XML (was: RE: google sitemaps and some history of sitemaps [siteData-36])

From: Dan Connolly <connolly@w3.org>
Date: Fri, 10 Jun 2005 13:20:42 -0500
To: noah_mendelsohn@us.ibm.com
Cc: Dare Obasanjo <dareo@microsoft.com>, www-tag@w3.org, Paul Grosso <pgrosso@arbortext.com>, adamb@google.com
Message-Id: <1118427642.12287.488.camel@localhost>

On Fri, 2005-06-10 at 13:18 -0400, noah_mendelsohn@us.ibm.com wrote:
> Anyway, I think it's interesting that users of XML are voting with their 
> feet and imposing requirements for UTF-8 only XML.  Something to watch.

I think it's the consensus of the IETF that UTF-8-only protocols are OK.
Let's see... the exact text is...

3.1.  What charset to use

   All protocols MUST identify, for all character data, which charset is
   in use.

   Protocols MUST be able to use the UTF-8 charset, which consists of
   the ISO 10646 coded character set combined with the UTF-8 character
   encoding scheme, as defined in [10646] Annex R (published in
   Amendment 2), for all text.

   Protocols MAY specify, in addition, how to use other charsets or
   other character encoding schemes for ISO 10646, such as UTF-16, but
   lack of an ability to use UTF-8 is a violation of this policy; such a
   violation would need a variance procedure ([BCP9] section 9) with
   clear and solid justification in the protocol specification document
   before being entered into or advanced upon the standards track.

  -- IETF Policy on Character Sets and Languages
  Best Current Practice
  January 1998

Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Friday, 10 June 2005 18:20:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:09 UTC