Overview of 'charset' handling for XML

I wrote this up for a different purpose, but Dan Connolly
suggested that it might fit into the XML spec
(http://lists.w3.org/Archives/Member/w3c-html-cg/2000JanMar/0133.html).

----
There are three basic situations:

- XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):
  - Charset parameter is strongly recommended
  - If no charset parameter, default is ASCII. The default of iso-8859-1 in
    HTTP is explicitly overridden in the specification of the charset
    parameter in section 3.1 "Text/xml Registration" of RFC 2376
    (http://www.ietf.org/rfc/rfc2376.txt)
  - No error handling provisions
  - An encoding declaration, if present, is irrelevant, but when saving a
    received resource as a file, the correct encoding declaration should
    be inserted.

- XML sent as application/xml (or equivalent):
  - Charset parameter is strongly recommended, and if present,
    it takes precedence.
  - If the charset parameter is omited, the rules for XML in static storage
    are followed (see below).

- XML in static storage without external metainformation (e.g. file):
  - Default is UTF-8, or UTF-16 if there is a BOM
  - For other things, there has to be an encoding declaration
  - There is some provision for 'error recovery'. What exactly this
    means is currently under discussion in the XML Core WG, so that
    it can  be clarified.
----

Regards,   Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst

Received on Sunday, 12 March 2000 21:51:43 UTC