- From: James Graham <jg307@cam.ac.uk>
- Date: Mon, 28 Jan 2008 11:25:44 +0000
- To: temp17@staldal.nu
- CC: public-html-comments@w3.org
Note: this is my own opinion and I do not speak for the HTML-WG.
temp17@staldal.nu wrote:
>>> Why is this syntax [the traditional non-XML syntax] recommended?
>>
>> AIUI, because of wider support in UAs, because the syntax is more
>> forgiving, and because most authors use it already.
>
> Wider support in UAs is a valid argument. But I don't understand why
> more forgiving syntax is an advantage.
Because the fail-on-error behavior of XML is user hostile in the sense that it
requires clients to fail gracelessly leaving the end user -- who is in no
position to fix the problem -- with an unintelligible error message (e.g. the
YSoD in Firefox) and potentially, since the site is inaccessible, no way to
report the problem [1]. In addition, the vast majority of CMS's in use today are
not designed to ensure that content they send over the wire is XML-well-formed
in all circumstances, so it is exceptionally hard to ensure that users never
experience a YSoD. Indeed my experience is that almost all the sites I visit
that serve XML have been caught out at one time or another.
>>> Why not recommend the XML syntax instead?
>>
>> Why should it be recommended instead?
>
> Because it is an advantage to be able to process HTML documents with XML
> tools. And it's easier to parse.
This is not strictly true. Since HTML5 specifies parsing behavior for text/html
there have been several interoperable, highly robust, libraries developed for
parsing HTML. So, when you need to parse something, you simply choose an XML
library for XML content or choose an HTML library for HTML content. Trying to do
anything else (e.g. use regular expressions) is a mistake that will lead to
problems. Once you have the content in a tree-like structure it is generally
possible to serialize as either HTML or XML as you prefer. So it's totally
possible to have a pipeline that looks like:
html parser html serializer
text/html content ------------> XML tools --------------> text/html content
Planet Venus [2] does something like this
[1] Arguably the XML spec does leave scope for fixing up the problem at the
application layer, but then the benefits of XML are lost.
[2] http://www.intertwingly.net/code/venus/
--
"Eternity's a terrible thought. I mean, where's it all going to end?"
-- Tom Stoppard, Rosencrantz and Guildenstern are Dead
Received on Monday, 28 January 2008 11:26:08 UTC