Re: Prevalence of ill-formed XHTML from Philip Taylor on 2007-08-31 (public-html@w3.org from August 2007)

From: Philip Taylor <philip@zaynar.demon.co.uk>
Date: Sat, 01 Sep 2007 00:46:12 +0100
To: Robert Burns <rob@robburns.com>
CC: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <46D8A844.80108@zaynar.demon.co.uk>

Robert Burns wrote:
> On Aug 31, 2007, at 3:06 PM, Philip Taylor wrote:
>> Robert Burns wrote:
>>> I'm sure we can find countless 
>>> sites that serve valid XHTML files as text/html. This discussion 
>>> keeps popping up, but so far no one has been able to articulate what 
>>> the dangers are in doing so.
>>
>> There's a bigger countless number that serve invalid XHTML files as 
>> text/html, and they are often invalid (in part) directly because of 
>> confusion between XML syntax and HTML syntax.
> 
> There may be many pages like this, I don't know. However, I don't think 
> your data bears that out. I also don't think the confusion is due to the 
> differences between XML and HTML but rather the laxity of HTML parsers 
> that authors use to test their pages.

I think the missing slashes in <img src="..."> indicate some level of 
XML/HTML confusion - if the only factor was the laxity of HTML parsers, 
people could be writing <img src="..."\> or <img src=... /> or <image 
src="..."/> since those are similarly wrong and are handled as the 
author expects by HTML parsers. But those are very rare, whereas the 
preferred HTML syntax (<img src="...">) is quite common. That suggests 
that people are erroneously using HTML syntax in particular, rather than 
erroneously using any other syntax which works in their browser.

>> <...>
>>
>> I looked in more detail at the first half,
> 
> I wasn't sure what half you were referring to here. I assume you  mean 
> you looked at the 51 that were ill-formed XML. Is that right?

Sorry, that was far too vague - I meant I looked at the XHTML pages from 
the first half of the list of 200 pages. (Of those 100 pages, that was 
32 XHTML pages, of which 23 had parse errors.)

> This to me is about 
> whether it is possible or troublesome to send an (appendix C style)  XML 
> authored document as text/html.

Appendix C is a bit troublesome to follow since it misses lots of cases 
where conforming XHTML breaks in normal HTML UAs. It's clearly possible 
to send some XHTML documents to HTML UAs and have them function 
properly, and it's possible for some to be conforming HTML5 too; but I 
don't know that it's feasible to define exactly which documents are safe 
and to cover all the relevant cases, except by saying "HTML 5 documents 
sent as text/html must be conforming HTML5 [i.e. the HTML serialisation] 
[regardless of whether they're conforming XHTML5 too]" (in which case it 
doesn't matter whether they're conceptually HTML5 or XHTML5 documents).

-- 
Philip Taylor
philip@zaynar.demon.co.uk

Received on Friday, 31 August 2007 23:46:20 UTC