W3C home > Mailing lists > Public > www-tag@w3.org > February 2009

Re: HTML and XML

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 10 Feb 2009 20:26:46 +0000
To: "Anne van Kesteren" <annevk@opera.com>
Cc: "David Orchard" <orchard@pacificspirit.com>, "Henri Sivonen" <hsivonen@iki.fi>, www-tag@w3.org
Message-ID: <f5btz729gxl.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There are a number of different issues which have emerged, been
discussed, and put to one side during this thread, and in _this_
message I only want to reply to the most recent one, which in turn is
actually only one part of Anne's post:

Anne van Kesteren writes:

> I think that if you want to allow arbitrary tree-based markup
> languages  your only option is using XML. If you want them to be
> usable by authors as  well you need something like XML5

(Let me start by emphasising that in what follows I'm not being
critical of Anne for designing and implementing XML5, it was an
interesting experiment.)

But I think the world has already voted with its feet on the XML5
question, in that there is a notable _lack_ of folk advocating it.

And there's good reason for that:  XML actually _is_ usable by
authors and authoring well-formed XML is _not_ hard.

> because even the experts fail:
>
>   http://diveintomark.org/archives/2004/01/14/thought_experiment
>   http://diveintomark.org/archives/2008/03/09/no-fury-like-dracon-scorned
>   http://annevankesteren.nl/2009/01/xml-sunday

That's one article which a) confuses validity with well-formedness and
b) points to a piece of broken _software_; one article which reports
on one instance of HTML->XHTML upgrade failure (reading between the
lines); one article that points to a page in which someone trying to
introduce an _intentional_ markup error made the wrong error.  Hardly
a compelling set of evidence that well-formed XML is too hard for
ordinary mortals.

I did a quick (less so than I'd hoped -- the era of free access to
well-parameterised Web Search APIs appears to be over) web search,
which yielded 48 .xml documents.  Of these

  1 was ill-formed (said it was UTF-8, but had a Latin-1 character in
                    it.  Intriguingly, it was served with _no_
                    Content-encoding header)
  1 was unretrievable
  1 used a character encoding I couldn't immediately find a parser for
 45 were well-formed.

Conveniently, that gives us the exact opposite of Ian Hickson's
oft-cited 97% broken HTML figure: we have 97% well-formed XML.

So whatever else may be still be discussed, I do not think there's
much if any evidence of either demand or need for an "XML5".

ht
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFJkeMNkjnJixAXWBoRAvDEAJwKtHxpiqDi3kk7UO3F9ut8IHhCRQCfdry/
jb5pWVu+SVRU++lEVVDzfE4=
=bNhm
-----END PGP SIGNATURE-----
Received on Tuesday, 10 February 2009 20:28:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:12 GMT