W3C home > Mailing lists > Public > public-xml-er@w3.org > June 2012

Re: David's less simple example

From: Norman Walsh <ndw@nwalsh.com>
Date: Tue, 12 Jun 2012 14:17:21 -0500
To: W3C XML-ER Community Group <public-xml-er@w3.org>
Message-ID: <m24nqg7326.fsf@nwalsh.com>
David Carlisle <davidc@nag.co.uk> writes:
> On 29/02/2012 00:11, Noah Mendelsohn wrote:
>> I think the most important question is: how bad would be consequences
>> be if we guessed wrong.
> I still think that viewing things in that way leads to pain. If you look
> at the output of html5/xml5/Anne's-draft/ on my example (or any example
> really) there's no sense in which markup has been fixed. It is just
> parsed with a grammar that isn't xml and produces a tree in a
> deterministic fashion. The input was correct for that result tree.
> (Some inputs may be called parse error to make humans feel better but
> from a parsing point of view, that's a side issue).

I'm inclined to agree with David on this point. 

> If you view it as fix up, then
> a) you have to worry about how good the fix was
> b) you have to worry about the consequences of getting the fix wrong.

There's a part of me that wants to view it as fixup. There are so many
examples of minor errors in pseudo-XML documents where you can so
easily say "just add quotes" or "just add the closing angle bracket"
that it's easy to think you can fix things up.

But invariably, you'll run into a couple of minor errors in close
proximity and then it's down to a value judgment. Was that meant to be
a quoted attribute value in the start tag, or was it meant to be
unquoted element content?

I can imagine countless hours spent arguing about which fixup was
correct or more important. I'd sooner not spend hours doing that.

I think our goal should be deterministic parsing rules for building a
tree from a sequence of characters. For well-formed XML (at least
without an external subset), we should build the "right" tree. For
documents that look mostly like XML but have the occasional missing
quote or end tag, we should get the "right" tree for as many of those
as we can without making the spec impractical or having arguments
about the finer points of which result is correct. For everything
else, we should get a specific answer.

                                        Be seeing you,

Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 413 624 6676

Received on Tuesday, 12 June 2012 19:17:51 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:47:26 UTC