Re: Prevalence of ill-formed XHTML from Kornel Lesinski on 2007-09-02 (public-html@w3.org from September 2007)

From: Kornel Lesinski <kornel@geekhood.net>
Date: Sun, 02 Sep 2007 15:37:53 +0100
To: "Robert Burns" <rob@robburns.com>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-ID: <op.tx06xfi4ptj49s@aimac.local>

On Sat, 01 Sep 2007 21:24:31 +0100, Robert Burns <rob@robburns.com> wrote:

> I'm not sure what you're saying here. If you change your XSLT to a  
> different output mode won't it output a pure HTML serialization (with no  
> xml-isms)?

It won't output XHTML as HTML. It's completly counter-intuitive, but  
that's what the spec requires:
"The html output method should not output an element differently from the  
xml output method unless the expanded-name of the element has a null  
namespace URI;"
http://www.w3.org/TR/xslt#section-HTML-Output-Method

>> I find it troublesome. The fundamental problem is that you have to  
>> observe all restrictions of XML, but you can't use XML tools anymore,  
>> because they don't care about additional limitations imposed by HTML.
>
> I think observing the XML restrictions is a good thing.
> I also think the treatment of void elements explicitly with something  
> like <br/> makes it easier for authors to understand what their doing  
> (which is the only additional restriction for HTML I can think of).

The same syntax can also be source of confusion in case of <script  
src=""/>.

> Many of those problems relate to the immaturity of XML / XHTML  
> implementations and not anything about the DOM APIs themselves.

I disagree. If one does intend to parse document as XML, sniffing will  
always be required when text/html is used. Incompatibilities between HTML  
and XML DOM are part of the spec: case sensitivity vs case folding,  
forbidden document.write or implied <tbody> won't change as  
implementations mature.

> The CSS issues are minor to non-existent for anyone following appendix C.

Indeed, it's just yet another thing authors have to be aware of, and it  
fails silently if they don't.

>> I think that if a document will not work properly as XHTML, and was  
>> never intended to do, it shouldn't be called XHTML.
>
> I'm not clear what you're saying here. Any document that is valid and  
> well-formed XHTML 1 and also adheres  to the XHTML 1.0 appendix C  
> guidelines will work properly as XHTML.

Yes, if such document adheres to appendix C (and possibly few other  
things) it would. The problem is that appendix C is not normative, it  
doesn't formalize any new language. XHTML, whether it's compatible or not,  
is allowed to be sent as text/html.

This leads to ridiculous situation where you can have valid, well-formed,  
100% spec-compliant XHTML that's not compatible with XML mode. And this is  
common on the web today (unless authors fail short of creating valid  
and/or well-formed XHTML in a first place, of course :)

Therefore my suggestion is not to allow XHTML to be sent as text/html.  
Migration path should come from HTML5 side, which allows appendix  
C-compatible syntax now. "HTML with slashes" better describes what those  
XHTML-wannabe documents are, and there would be no confusion which media  
type applies to which language.

-- 
regards, Kornel Lesinski

Received on Sunday, 2 September 2007 14:38:21 UTC