Re: "solely for error recovery"

The appended mail, from Martin Duerst, provides the background to my 
mail of February 22:
   http://lists.w3.org/Archives/Member/xml-editor/2000JanMar/0036.html

Misha

[This mail was written using voice recognition software]


> Hello Makoto,
> 
> Many thanks for your mail with your clarifications. I'm sorry to not
> be able to follow up earlier on this.
> 
> At 21:23 00/02/08 +0900, MURATA Makoto wrote:
> 
> > Actually, the sentence Aaron referenced is already deleted by E48 in the 
> > errata.  
> > 
> > http://www.w3.org/XML/xml-19980210-errata#E48
> > 
> > 
> >     Modify the text from the paragraph beginning "The second possible case
> >     occurs when the XML entity..." to the end of the appendix to read:
> >     
> >         The second possible case occurs when the XML entity is
> >         accompanied by encoding information, as in some file systems
> >         and some network protocols. When multiple sources of
> >         information are available, their relative priority and the
> >         preferred method of handling conflict should be specified as
> >         part of the higher-level protocol used to deliver XML. In
> >         particular, please refer to [IETF RFC2376] "XML Media Types"
> >         which defines the text/xml and application/xml MIME types and
> >         provides some useful guidance. In the interests of
> >         interoperability, however, the following rule is recommended.
> >         
> >         If an XML entity is in a file, the Byte-Order Mark and
> >         encoding-declaration PI are used (if present) to determine the
> >         character encoding. All other heuristics and sources of information
> >         are solely for error recovery.
> >         
> >     
> > Thus, RFC 2376 is the normative document, and XML 1.0 as corrected does not 
> > say anything about media types.
> 
> This is good to know. RFC 2376, for text/xml, says:
> 
> >>>>
> Conformant with [RFC-2046], if a text/xml entity is received with the charset
> parameter omitted, MIME processors and XML processors MUST use the
> default charset value of "us-ascii". In cases where the XML entity is transmitted
> via HTTP, the default charset value is still "us-ascii".
> <<<<
> 
> The MUST for us-ascii (even in the case of HTTP), without any provisions for
> error handling, makes it very clear that the for the case under discussion,
> namely a http header of:
>     Content-Type: text/vnd.wap.wml
> with no 'charset' parameter, and the document encoded in iso-8859-1,
> is definitely illegal, and that treating this as anything else than
> us-ascii by a recipient is also illegal.
> 
> I would like therefore to ask the relevant persons on the WAP side
> to make sure that this conclusion is adequately reflected in the
> WAP specifications, test suites, and so on, as soon as possible,
> or to tell us how should be contacted to make this happen.
> 
> Also, for this, it looks like there is no need for any change
> anymore on the XML side.
> 
> 
> What remains is the following [copied from above]
> 
> >         If an XML entity is in a file, the Byte-Order Mark and
> >         encoding-declaration PI are used (if present) to determine the
> >         character encoding. All other heuristics and sources of information
> >         are solely for error recovery.
> 
> 'for error recovery', in the present wording, seems to open all kinds of backdoors
> for heuristics and so on that for very good reasons I think we don't want.
> 
> 
> Could this be changed to be more precise? Can the xml core wg take this up, please?
> 
> 
> Regards,    Martin.
> 
> 
> #-#-#  Martin J. Du"rst, World Wide Web Consortium
> #-#-#  mailto:duerst@w3.org   http://www.w3.org

-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

Received on Friday, 25 February 2000 09:58:38 UTC