Re: What problem is this task force trying to solve and why? from John Cowan on 2010-12-24 (public-html-xml@w3.org from December 2010)

From: John Cowan <cowan@mercury.ccil.org>
Date: Thu, 23 Dec 2010 22:01:40 -0500
To: Kurt Cagle <kurt.cagle@gmail.com>
Cc: David Carlisle <davidc@nag.co.uk>, public-html-xml@w3.org
Message-ID: <20101224030140.GB29826@mercury.ccil.org>

Kurt Cagle scripsit:

> I would contend that when a web browser attempts to parse ill-formed
> HTML, it is doing precisely this kind of "kludge".

That used to be true.  But with the advent of HTML5, what we have is
no longer a kludge, but a standard.  There is no llonger such a thing as
ill-formed HTML, although there is a subset of HTML called "valid HTML"
which authors and their tools are supposed to use.

The point is that HTML5 is a very complex and retroactive standard
that has a long history behind it.  Trying to invent new complex
standards prospectively tends to produce horrible messes like OSI
networking or XSD.

> Read the post again. I am positing a new parser (as opposed to rewriting
> the WHOLE of the XML canon, *plus* changing literally hundreds of
> billions of XML documents currently in circulation) that would serve to
> take XML content and attempt to intelligently discern what the intent
> of the user was. I've laid out the mechanisms by which such a parser
> would work, and tried to make the point that, yes,  you can in fact
> change the heuristics based upon a set of configuration files in those
> cases where you DID have a general idea of the provenance of the XML.

I have no problem with heuristic parsers; I have a problem when the
behavior of a heuristic parser is prematurely erected into a standard.
HTML5 is not premature (though it may be a bit postmature).  XML5 is.

> What I am arguing is simply that rather than seeing HTML5 as being
> some kind of blessed language that has its own inner workings, you
> look at HTML5 as being XML for a second, then ask what would need to
> change in that dirty-data parser to generalize this to the level of XML.

Throw it away and start over, is what you'd have to do.

> Most of the problems that people have working with XML is that there are
> rules that can seem arcane and arbitrary, and that, without a fairly
> sophisticated understanding of the language don't make sense. 

The MicroXML insight is that much of the parts that don't make sense are
there to add flexibility that isn't needed much of the time -- so strip
them out.

> However, to the vast majority of non-XML people, Listing 1 is INTENDED to
> be:
> 
> <ns1:foo xmlns:ns="myFooNS">
>     <ns1:bar/>
>     <ns1:bat/>
> </ns1:foo>
> Listing 3. Anonymous elements map to the declared namespace.

Do you have *evidence* for the obviousness of this?

-- 
A mosquito cried out in his pain,               John Cowan
"A chemist has poisoned my brain!"              http://www.ccil.org/~cowan
        The cause of his sorrow                 cowan@ccil.org
        Was para-dichloro-
Diphenyltrichloroethane.                                (aka DDT)

Received on Friday, 24 December 2010 03:02:10 UTC