Re: XML namespaces on the Web from James Graham on 2009-11-17 (public-html@w3.org from November 2009)

From: James Graham <jgraham@opera.com>
Date: Tue, 17 Nov 2009 18:14:19 +0100
To: Krzysztof Maczyński <1981km@gmail.com>
CC: Aryeh Gregor <Simetrical+w3c@gmail.com>, public-html@w3.org, public-xml-core-wg@w3.org
Message-ID: <4B02D9EB.4040707@opera.com>

Krzysztof Maczyński wrote:
> Dear WGs, (CCing public-xml-core-wg@w3.org.)
> 
>>> Moreover something what is appropriate for web -- non-draconian
>>> error handling and error recovery -- is not necessary appropriate
>>> for other domains -- if you use XML for business data interchange
>>> draconian error handling makes much more sense.
>> Parsers could be permitted to use draconian error handling at user 
>> request.  Then groups that don't want it don't have to have it,
>> while groups that want it can have it.  The current situation gives
>> us no standardized XML-like data format without draconian
>> error-handling. This is a problem unless HTML is really the only
>> use-case for non-draconian error-handling, which I think is very
>> unlikely.  For instance, I've been told some widely-used RSS
>> readers have seen fit to implement error recovery -- which must
>> currently be completely non-interoperable because of the lack of
>> standardization here.
> Both on the Web and elsewhere there are circumstances warranting
> strict or lax parsing. This was already a highly debated point when
> XML was designed. Already then we knew that for dissenting opinions
> usually a good solution is to include both ways things can work and a
> switch. Having the experience of over 10 years, it's clear that the
> needs of both sides are valid and not going away [1]. Therefore I'd
> like to propose XML 1.2 with a pseudo-attribute parse accepting
> values strict and lax added to the XML and text declaration. strict
> would do what parsers currently do (unifying XML 1.0 Fifth Edition
> with XML 1.1 Second Edition in some sensible way) and lax would use
> an algorithm based on Anne van Kesteren's draft, but returning an
> Infoset.

One of the purposes of error recovery (for web content) is to not punish 
end users for a class of easy-to-miss (and often harmless) bugs in the 
system producing the content. Making that error recovery depend on the 
author specifically opting in misses the point somewhat.

It makes much more sense to follow the HTML5 model where the client gets 
to determine the parsing mode. Systems where a syntax-level error should 
be fatal would switch the parser to the strict mode whereas systems such 
as web browsers would switch the parser to the graceful recovery mode.

If web authors are interested in QAing their content using a strict 
parser they could do that either using a tool such as a validator or 
using a UA-specific setting to change the parser mode.

Received on Tuesday, 17 November 2009 17:13:45 UTC