Re: HTML and XML from Anne van Kesteren on 2009-02-11 (www-tag@w3.org from February 2009)

From: Anne van Kesteren <annevk@opera.com>
Date: Wed, 11 Feb 2009 11:08:32 +0100
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: "David Orchard" <orchard@pacificspirit.com>, "Henri Sivonen" <hsivonen@iki.fi>, www-tag@w3.org
Message-ID: <op.uo6mgijl64w2qv@annevk-t60.oslo.opera.com>

On Tue, 10 Feb 2009 21:26:46 +0100, Henry S. Thompson <ht@inf.ed.ac.uk>  
wrote:
> Anne van Kesteren writes:
>> I think that if you want to allow arbitrary tree-based markup
>> languages  your only option is using XML. If you want them to be
>> usable by authors as  well you need something like XML5
>
> (Let me start by emphasising that in what follows I'm not being
> critical of Anne for designing and implementing XML5, it was an
> interesting experiment.)
>
> But I think the world has already voted with its feet on the XML5
> question, in that there is a notable _lack_ of folk advocating it.

Yeah, it seems most people are ok with just advancing HTML.

> And there's good reason for that:  XML actually _is_ usable by
> authors and authoring well-formed XML is _not_ hard.

Sure, until you start dealing with anything slightly more complex. E.g.  
trying to write blog software that accepts user input, input from other  
sites, etc.

>> because even the experts fail:
>>
>>   http://diveintomark.org/archives/2004/01/14/thought_experiment
>>   http://diveintomark.org/archives/2008/03/09/no-fury-like-dracon-scorned
>>   http://annevankesteren.nl/2009/01/xml-sunday
>
> That's one article which a) confuses validity with well-formedness and

At the time of writing the validator links did point out actual  
well-formedness errors and some of those links still do. I don't think  
that naming it "invalid XML" should distract from the overall point of the  
article.

> b) points to a piece of broken _software_; one article which reports
> on one instance of HTML->XHTML upgrade failure (reading between the
> lines); one article that points to a page in which someone trying to
> introduce an _intentional_ markup error made the wrong error.  Hardly
> a compelling set of evidence that well-formed XML is too hard for
> ordinary mortals.

The point is that writing robust software is apparently not that easy. A  
subsequent point would be that writing robust software for HTML or XML5 is  
not a requirement, and that they would work well for the end user  
regardless of whether the software contains a serialization bug or not.

> I did a quick (less so than I'd hoped -- the era of free access to
> well-parameterised Web Search APIs appears to be over) web search,
> which yielded 48 .xml documents.  Of these
>
>   1 was ill-formed (said it was UTF-8, but had a Latin-1 character in
>                     it.  Intriguingly, it was served with _no_
>                     Content-encoding header)
>   1 was unretrievable
>   1 used a character encoding I couldn't immediately find a parser for
>  45 were well-formed.
>
> Conveniently, that gives us the exact opposite of Ian Hickson's
> oft-cited 97% broken HTML figure: we have 97% well-formed XML.

That a bunch of standalone XML documents is well-formed is hardly  
surprising imo. I do not think that is the interesting case.

> So whatever else may be still be discussed, I do not think there's
> much if any evidence of either demand or need for an "XML5".

I was just answering a question from David on whether such a thing existed  
and whether it could be reconciled with HTML parsing. I'm fine with people  
just using HTML instead.

-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Wednesday, 11 February 2009 10:09:44 UTC