W3C home > Mailing lists > Public > public-html-comments@w3.org > January 2008

Re: HTML5 and XML syntax

From: James Graham <jg307@cam.ac.uk>
Date: Mon, 28 Jan 2008 11:25:44 +0000
Message-ID: <479DBBB8.9000305@cam.ac.uk>
To: temp17@staldal.nu
CC: public-html-comments@w3.org

Note: this is my own opinion and I do not speak for the HTML-WG.

temp17@staldal.nu wrote:
>>> Why is this syntax [the traditional non-XML syntax] recommended?
>>
>> AIUI, because of wider support in UAs, because the syntax is more 
>> forgiving, and because most authors use it already.
> 
> Wider support in UAs is a valid argument. But I don't understand why 
> more forgiving syntax is an advantage.

Because the fail-on-error behavior of XML is user hostile in the sense that it 
requires clients to fail gracelessly leaving the end user -- who is in no 
position to fix the problem -- with an unintelligible error message (e.g. the 
YSoD in Firefox) and potentially, since the site is inaccessible, no way to 
report the problem [1]. In addition, the vast majority of CMS's in use today are 
not designed to ensure that content they send over the wire is XML-well-formed 
in all circumstances, so it is exceptionally hard to ensure that users never 
experience a YSoD. Indeed my experience is that almost all the sites I visit 
that serve XML have been caught out at one time or another.

>>> Why not recommend the XML syntax instead?
>>
>> Why should it be recommended instead?
> 
> Because it is an advantage to be able to process HTML documents with XML 
> tools. And it's easier to parse.

This is not strictly true. Since HTML5 specifies parsing behavior for text/html 
there have been several interoperable, highly robust, libraries developed for 
parsing HTML. So, when you need to parse something, you simply choose an XML 
library for XML content or choose an HTML library for HTML content. Trying to do 
anything else (e.g. use regular expressions) is a mistake that will lead to 
problems. Once you have the content in a tree-like structure it is generally 
possible to serialize as either HTML or XML as you prefer. So it's totally 
possible to have a pipeline that looks like:

                    html parser            html serializer
text/html content ------------> XML tools --------------> text/html content

Planet Venus [2] does something like this

[1] Arguably the XML spec does leave scope for fixing up the problem at the 
application layer, but then the benefits of XML are lost.

[2] http://www.intertwingly.net/code/venus/

-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead
Received on Monday, 28 January 2008 11:26:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:13:58 GMT