- From: Philip Taylor <philip@zaynar.demon.co.uk>
- Date: Fri, 31 Aug 2007 21:06:15 +0100
- To: Robert Burns <rob@robburns.com>
- CC: "public-html@w3.org WG" <public-html@w3.org>
Robert Burns wrote:
> On Aug 31, 2007, at 7:50 AM, Dean Edridge wrote:
>> How much longer do we need to go on pretending that XHTML can be sent
>> as text/html Dan? This is ridiculous. Hasn't the W3C learnt it's
>> lesson with XHTML's failure over the last 8 years.
>>
>> Exactly who benefits from the myth of XHTML being able to be sent as
>> text/html? Not you, me, the W3c or anyone, and certainly not XHTML it
>> self.
>
> I"m not sure why you call it a myth. I'm sure we can find countless
> sites that serve valid XHTML files as text/html. This discussion keeps
> popping up, but so far no one has been able to articulate what the
> dangers are in doing so.
There's a bigger countless number that serve invalid XHTML files as
text/html, and they are often invalid (in part) directly because of
confusion between XML syntax and HTML syntax. I believe that that
confusion is a real danger: XHTML-as-text/html has harder syntax than
HTML, since you have to understand XML as well as
HTML-as-misunderstood-by-browsers, so people get it wrong more often;
and encouraging people to use the harder syntax will result in more
errors and less happiness among those who follow the advice. (I think
this encouragement issue is independent of DanC's proposal to permit the
(discouraged) practice, though.)
Of the top 200 sites in Alexa's list (which I assume means there is a
strong bias towards professionally-designed sites), looking at the front
page of each (which I assume is more likely to have been checked with a
validator than other less-prominent pages): 67 looked like they were
using XHTML (they contained "DTD XHTML 1." somewhere); 51 were
ill-formed XML (or, specifically, causing parse errors in libxml).
I looked in more detail at the first half, grouping by the first
reported error - see below for the list. Unencoded ampersands in
non-<script> contexts are errors in HTML too, but I think most of the
other issues are fine in HTML, and it looks like many are caused by
attempting to use HTML syntax in the XHTML document.
Of the pages without parse errors, most were valid - the only exceptions
were http://www.msn.com (duplicate ID value) and sort of
http://www.bbc.co.uk (sends invalid HTML4 to the validator, but sends
valid XHTML1 to me - is that location-based?). (I have no idea how many
would actually work as application/xhtml+xml, given the differences in
the DOM and document.write and everything else.)
The lack of pages which are well-formed but invalid may suggest that few
people are actually interested in well-formedness - some are just
interested in validity, and fix their well-formedness errors as an
incidental detail. Those people would get the same benefit from using
HTML and an HTML validator instead.
Unencoded ampersands:
* http://www.uol.com.br: if(op!=0) parent.top.location =
'http://click.uol.com.br/?rf=hu-drop'+form_name_seed+'&u=http://'+op;
* http://cn.yahoo.com/: <a
href="http://cn.yahoo.com/go.php?homepage=new&resolution=800by600">...</a>
* http://www.163.com: <a
href="http://shanda.allyes.com/main/adfclick?db=shanda&bid=535,10258,10960&cid=0,0,0&sid=31107&advid=312&camid=575&show=ignore&url=http://163.cmfu.com/">...</a>
* http://www.seznam.cz: <embed
src="http://1.im.cz/ad/1/43043/475459/598404_0.swf?clickthru1=http%3A//ad.seznam.cz/clickthru%3FspotId%3D682213%26destination%3Dhttp%253A//sz.aukro.cz/17570_cyklistika.html%253Forder%253Dbd%2526view%253Dgtext&pad&clickthru2=http%3A//ad.seznam.cz/clickthru%3FspotId%3D682213%26seqNo%3D1%26destination%3Dhttp%253A//sz.aukro.cz/special_listing.php%253Ftype%253Dfrom1&pad&clickthru3=http%3A//ad.seznam.cz/clickthru%3FspotId%3D682213%26seqNo%3D2%26destination%3Dhttp%253A//sz.aukro.cz/&pad&clickthru4=http%3A//ad.seznam.cz/clickthru%3FspotId%3D682213%26seqNo%3D3%26destination%3Dhttp%253A//sz.aukro.cz/new_user.php&pad&clickthru5=http%3A//ad.seznam.cz/clickthru%3FspotId%3D682213%26seqNo%3D4%26destination%3Dhttp%253A//sz.aukro.cz/8495_mobilni_telefony.html&pad&clickTarget=_top"
quality="high" width="400" height="100" allowScriptAccess="always"
type="application/x-shockwave-flash"></embed>
* http://www.digg.com: ...{"name":"World &
Business","ctitle":"world_business",...
* http://www.globo.com: if (array != null && array.length > 0)
* http://www.livedoor.com: <script language="javascript"
type="text/javascript"
src="http://click.adv.livedoor.com/A-affiliate2/distribute?keyword=hs9000731&isJS=true&encoding=utf-8"></script>
* http://www.facebook.com/common/browser.php:
Env={method:"GET",dev:0,start:(new Date(
)).getTime(),cache:(((typeof(cc)!="undefined")&&cc.hit)||0),ps_limit:5,ps_ratio:4,pkgv:0};
* http://www.adobe.com: <a
href="/shockwave/download/download.cgi?P1_Prod_Version=ShockwaveFlash&promoid=BIOW"
class="noHover">
Other unencoded characters:
* http://www.livejournal.com: for (var i = 0; i < site_k.length; i++) {
* http://www.xunlei.com: for(var i = 0; i < productName.length && random
> temp; i++) {
* http://www.xanga.com: document.write('<scr' + 'ipt src="' + adserver +
allAdTags + ad1 + ad2 + ad3 + '?" type="text/javascript">');
document.write('</scr' + 'ipt>');
* http://www.wretch.cc: document.write("<img style=\"display:none;\"
width=1 height=1
src=http://bcw1.mining.vip.tp2.yahoo.com/b?s=2022137079&make=yahoo&type=wretch&t="+random_num+">");
Unclosed tags:
* http://www.rapidshare.com: <img
src="http://images.rapidshare.com/img/rslogo.jpg">
* http://www.sina.com.cn: <meta name="stencil" content="PGLS000022">
* http://www.dailymotion.com/gb: <img src="/images/creative_user_logo.gif">
* http://www.aol.com: <link rel="alternate" type="application/rss+xml"
title="AOL Top Stories"
href="http://xml.web.aol.com/aolportal/dynamiclead.xml">
* http://www.hi5.com: <link
href="http://images.hi5.com/images/favicon.ico" type=image/x-icon
rel="shortcut icon">
* http://www.taobao.com: <input name="f" value="D9_5_1" type="hidden">
* http://www.tom.com: <meta http-equiv="Content-Type"
content="text/html; charset=gb2312">
Other errors:
* http://www.deviantart.com: <![if ! lt IE 5.5]>
* http://www.live.com: <html xmlns:web
xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"
class="liveApp la_en lo_gb">
* http://www.eastmoney.com: <meta name=keywords content="..." />
--
Philip Taylor
philip@zaynar.demon.co.uk
Received on Friday, 31 August 2007 20:06:27 UTC