Re: David's less simple example from David Carlisle on 2012-02-29 (public-xml-er@w3.org from February 2012)

From: David Carlisle <davidc@nag.co.uk>
Date: Wed, 29 Feb 2012 12:02:42 +0000
To: Noah Mendelsohn <nrm@arcanedomain.com>
Cc: "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-ID: <4F4E13E2.9010207@nag.co.uk>

On 29/02/2012 00:11, Noah Mendelsohn wrote:
> I think the most important question is: how bad would be consequences
> be if we guessed wrong.

I still think that viewing things in that way leads to pain. If you look
at the output of html5/xml5/Anne's-draft/ on my example (or any example
really) there's no sense in which markup has been fixed. It is just
parsed with a grammar that isn't xml and produces a tree in a
deterministic fashion. The input was correct for that result tree.
(Some inputs may be called parse error to make humans feel better but
from a parsing point of view, that's a side issue).

If you view it as fix up, then
a) you have to worry about how good the fix was
b) you have to worry about the consequences of getting the fix wrong.

If you just view it as parsing with a non-xml parser then all these
problems go away, the remaining problem for people (like me) with
xml/sgml backgrounds is that you have to re-wire your brain, which
hurts, but is survivable.

Ah sadly this conclusion means that viewing it either way leads to pain.

Incidentally this doesn't mean that I'm opposed to tweaking the
tokenisation rules so that < starts a tag more often (which is what any
human would expect I think) so long as we weigh the costs of diverging
from html5 (I'm assuming that html5 isn't going to change this)

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Wednesday, 29 February 2012 12:03:12 UTC