- From: David Carlisle <davidc@nag.co.uk>
- Date: Tue, 16 Aug 2011 22:03:48 +0100
- To: "public-html-xml@w3.org" <public-html-xml@w3.org>
On 16/08/2011 17:32, John Cowan wrote: > Please explain with clear and convincing examples why the XML5 > approach is superior to Siefke's algorithm. Or vice versa, I'm not > picky. That was directed at Anne, but I could take a stab at answering. Siefke's algorithm (as written) is described as essentially _repairing_ broken xml, as it's a source level textual transformation. As noted earlier this is somewhat different in spirit from an xml5/html5 parsing spec which directly parses an arbitrary input stream into a tree without explicitly repairing the input. But actually I don't think (for this usage) it actually matters much what the parsing rules are so long as they have the feature that well formed xml input parses as per xml, and non well formed input produces some document. schema driven alternatives like tag soup would have been (if it were not for inconvenient legacy concerns) a cleaner way to define _html_ (rather than xhtml) via some generic parsing methodology rather than the ad hoc parsing rules in html5, but that (given the politics of the situation, if nothing else) wasn't an option for defining html as implemented in browsers. For an xml5-style lenient xml parse (which I'd only define as part of, or companion to, html rather than changing xml itself) I don't think a schema driven approach or any kind of complicated per-element re-starting rules is appropriate. It would, I think, be open to investigation whether such a parser is, of necessity, not an xml parser, or whether it could be devised such that the application using the parse tree could determine that the input was not well formed, and so make use of the wiggle-room in the xml spec that the behaviour for non well formed input isn't specified so long as the failure is reported to the application, in which case such a parser might be a conformant (if unconventional) xml parser. David
Received on Tuesday, 16 August 2011 21:04:24 UTC