Re: Suggested revised text for HTML/XML report intro from David Carlisle on 2011-08-16 (public-html-xml@w3.org from August 2011)

From: David Carlisle <davidc@nag.co.uk>
Date: Tue, 16 Aug 2011 22:03:48 +0100
To: "public-html-xml@w3.org" <public-html-xml@w3.org>
Message-ID: <4E4ADB34.9000606@nag.co.uk>

On 16/08/2011 17:32, John Cowan wrote:
> Please explain with clear and convincing examples why the XML5
> approach is superior to Siefke's algorithm.  Or vice versa, I'm not
> picky.

That was directed at Anne, but I could take a stab at answering.
Siefke's algorithm (as written) is described as essentially _repairing_
broken xml, as it's a source level textual transformation. As noted
earlier this is somewhat different in spirit from an xml5/html5 parsing
spec which directly parses an arbitrary input stream into a tree without
explicitly repairing the input.

But actually I don't think (for this usage) it actually matters much
what the parsing rules are so long as they have the feature that well
formed xml input parses as per xml, and non well formed input produces
some document.

schema driven alternatives like tag soup would have been (if it were not
for inconvenient legacy concerns) a cleaner way to define _html_ (rather
than xhtml) via some generic parsing methodology rather than the ad hoc
parsing rules in html5,
but that (given the politics of the situation, if nothing else)
wasn't an option for defining html as implemented in browsers.

For an xml5-style lenient xml parse (which I'd only define as part of, 
or companion to, html rather than changing xml itself) I don't think a
schema driven approach or any kind of complicated per-element
re-starting rules is appropriate. It would, I think, be open to
investigation whether such a parser is, of necessity, not an xml parser,
or whether it could be devised such that the application using the parse
tree could determine that the input was not well formed, and so make use
of the wiggle-room in the xml spec that the behaviour for non well
formed input isn't specified so long as the failure is reported to
the application, in which case such a parser might be a conformant (if
unconventional) xml parser.

David

Received on Tuesday, 16 August 2011 21:04:24 UTC