Re: Of encodings and HTML from Alex Milowski on 2010-06-17 (public-xml-processing-model-wg@w3.org from June 2010)

From: Alex Milowski <alex@milowski.org>
Date: Thu, 17 Jun 2010 16:15:43 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <AANLkTikfEaDgH8fnsb5Tx6QMi28fJo_2bHNODp7eDeia@mail.gmail.com>

On Thu, Jun 17, 2010 at 4:07 PM, Henry S. Thompson <ht@inf.ed.ac.uk> wrote:>
>  2) When addressed via <p:document href="..."/> or <p:load
>     href="..."/>?  Hmmm.  We sort of blew that, in that the spec. is
>     silent as to how the Content-Type header plays wrt the
>     requirement that the retrieved represention be "a well-formed XML
>     document" [1].  What if the Content-Type were image/jpeg ? Should
>     you go ahead and try to parse it as XML anyway?
>
>     Assuming the answer is 'yes', then I think the situation is clear
>     -- RFC3023 [2] says explicitly that in the case of text/xml, if
>     there is no Charset, then you _must_ assume US-ASCII:
>
>     "This example shows text/xml with the charset parameter omitted.
>      In this case, MIME and XML processors MUST assume the charset is
>      "us- ascii", the default charset value for text media types
>      specified in [RFC2046].  The default of "us-ascii" holds even if
>      the text/xml entity is transported using HTTP.

...and then it says:

(Note: There is an
      inconsistency between this specification and HTTP/1.1, which uses
      ISO-8859-1[ISO8859] as the default for a historical reason.  Since
      XML is a new format, a new default should be chosen for better
      I18N.  US-ASCII was chosen, since it is the intersection of UTF-8
      and ISO-8859-1 and since it is already used by MIME.)

So, they arm wrestle?

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Thursday, 17 June 2010 15:16:23 UTC