Re: Should we say anything on security? from Liam R E Quin on 2012-09-12 (public-microxml@w3.org from September 2012)

From: Liam R E Quin <liam@w3.org>
Date: Wed, 12 Sep 2012 13:07:13 -0400
To: James Clark <jjc@jclark.com>
Cc: public-microxml@w3.org
Message-ID: <1347469633.23437.127.camel@localhost.localdomain>

On Wed, 2012-09-12 at 12:56 +0700, James Clark wrote:
> On Wed, Sep 12, 2012 at 12:37 PM, Liam R E Quin <liam@w3.org> wrote:
> 
>  >  We can also say that another factor that may make
> > > it more suitable for protocols is that it allows you to follow the
> > > long-standing IETF tradition of being liberal in what you accept.
> >
> > I'm reluctant there. XML doesn't forbid error recovery either - it only
> > forbids *silent* error recovery. If a document isn't XML you can't claim
> > it's XML, but you can turn it into XML and process the result.
> >
> 
> The XML Rec says (in the definition of fatal error):

Yes. I know well what it says :-)

However, if a document does not conform to the well-formed rules, it is
not an XML document (by definition). In that case the XML spec does not
apply to it, and a non-XML processor can process it in a way
unconstrained by the XML spec, EXCEPT it must not claim that the input
was XML.

Part of the motivation of HTML 5 was (as you know) to fix the problem
that different HTML implementations performed error recovery
differently, and, since people learn (and often teach) from examples
rather than from specs, the implementations had to do large-scale
reverse engineering to be able to accept each others' documents.

Some of this came from mis-application of Jon Postel's maxim - yes, the
Web browser should be liberal, but no, the Web server should not emit
out-of-spec data.

Telling people (e.g. in user agents) that the µXML contains errors is a
definite improvement. For a while NCSA Mosaic had a Bad HTML icon that
lit up when errors were detected, but it was very unpopular.

> The way I've interpreted this (which I think it s reasonable) is that the
> parser must not continue to pass start-/end-element/character data events
> to the application after it has seen a well-formedness error.
An XML parser cannot do this, I agree.

An application that starts again from the start of the document with a
different parser, or that can switch to a different parsing algorithm
mid-stream, is not out of line, provided that it makes clear that it's
doing so, both at the API level and at the human level.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/

Received on Wednesday, 12 September 2012 17:07:47 UTC