Re: XML namespaces on the Web from unknown charset on 2009-11-17 (public-xml-core-wg@w3.org from November 2009)

From: unknown charset <1981km@gmail.com>
Date: Tue, 17 Nov 2009 15:29:39 +0000
To: "Aryeh Gregor" <Simetrical+w3c@gmail.com>
Cc: <public-html@w3.org>, <public-xml-core-wg@w3.org>
Message-ID: <FE442FB28B2540A9858E332DB101462D@kmPC>

Dear WGs,
(CCing public-xml-core-wg@w3.org.)

>> Moreover something what is appropriate for web -- non-draconian error
>> handling and error recovery -- is not necessary appropriate for other
>> domains -- if you use XML for business data interchange draconian error
>> handling makes much more sense.
> 
> Parsers could be permitted to use draconian error handling at user
> request.  Then groups that don't want it don't have to have it, while
> groups that want it can have it.  The current situation gives us no
> standardized XML-like data format without draconian error-handling.
> This is a problem unless HTML is really the only use-case for
> non-draconian error-handling, which I think is very unlikely.  For
> instance, I've been told some widely-used RSS readers have seen fit to
> implement error recovery -- which must currently be completely
> non-interoperable because of the lack of standardization here.
Both on the Web and elsewhere there are circumstances warranting strict or lax parsing. This was already a highly debated point when XML was designed. Already then we knew that for dissenting opinions usually a good solution is to include both ways things can work and a switch. Having the experience of over 10 years, it's clear that the needs of both sides are valid and not going away [1]. Therefore I'd like to propose XML 1.2 with a pseudo-attribute parse accepting values strict and lax added to the XML and text declaration. strict would do what parsers currently do (unifying XML 1.0 Fifth Edition with XML 1.1 Second Edition in some sensible way) and lax would use an algorithm based on Anne van Kesteren's draft, but returning an Infoset. I think authoring (generating) content is the time at which the knowledge of which parsing algorithm will be desirable is usually best. When the pseudo-attribute is absent though, the processor can choose (possibly following a setting of th!
 e user).

Jirka Kosek wrote:
> So instead of adding something like XML namespaces into HTML and
> implement it in a half dozen of web-browsers you asking for changing XML
> and implementing changes in tens of XML parsers in use? It is
> interesting plan for decreasing unemployment rate, but I don't see why
> to add burden to providers of XML toolchain
That's why I think support for parse="lax" should be only a SHOULD. (An informative appendix could even specify a WSDL description of Web Services for tidying (lax to strict syntax transformation) XML which XML processors may use for preprocessing (the location of the service being configurable, ideally) when they aren't capable of lax parsing by themselves. Of course efficiency requirements would cause major browser vendors not to use this approach.) Perhaps processors supporting only strict still MAY attempt to parse a document with parse="lax" - all strict documents would be lax with the same meaning after all, except for that pseudo-attribute.

Another approach would be to add an optional MIME type parameter to XML media types for this purpose. It's hard to tell whether providing this information in band or out of band would benefit more use cases - probably there are good reasons to have both (with the pseudo-attribute winning when both are specified). An additional advantage of the parameter would be the ability to apply to XML 1.0 and 1.1.

Best regards,

Krzysztof Maczyński
Invited Expert, HTML WG

[1] http://www.w3.org/2001/tag/issues.html#TagSoupIntegration-54

Received on Tuesday, 17 November 2009 15:35:06 UTC