Re: HTML or XHTML - why do you use it? from Ian Hickson on 2003-01-07 (www-html@w3.org from January 2003)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 7 Jan 2003 02:33:20 +0000 (GMT)
To: Tantek Çelik <tantek@cs.stanford.edu>
Cc: "Peter Foti (PeterF)" <PeterF@systolicnetworks.com>, "'Nick Boalch'" <nick@fof.durge.org>, "'www-html@w3.org'" <www-html@w3.org>
Message-ID: <Pine.LNX.4.21.0301070107240.4082-100000@dhalsim.dreamhost.com>
On Mon, 6 Jan 2003, Tantek Çelik wrote:
> On 1/6/03 2:48 PM, "Ian Hickson" <ian@hixie.ch> wrote:
>>
>> my argument is that the XHTML specification was wrong to allow
>> [XHTML sent as text/html].
> 
> It might be good send that feedback to the proper feedback email address
> noted in the specification so that the working group can address it as a
> potential errata item or change for the next version etc.

Ok, will do.


>> XHTML documents (or rather, Appendix C compliant XHTML 1.0
>> documents) are intended to operate in HTML Tag Soup parsers.
>> Strictly speaking, a compliant implementation of HTML 4.01 would be
>> well within its rights to totally reject an XHTML document, since
>> XHTML documents are not valid HTML 4.01.
> 
> Ian, I have heard this assertion before, and while I would lean towards
> believing you (since I presume you would make a thorough analysis before
> making such a claim), it would help significantly if you could provide
> references to ALL (that you know of at least) of the precise HTML 4.01 UA
> compliance requirements which would require a compliant HTML4.01 UA to
> reject a valid XHTML 1.0 document that uses the Appendix C guidelines.

None.

The HTML 4.01 spec says absolutely nothing about what to do with
invalid documents. A UA would be compliant to the HTML 4 spec whatever
it did.

So anything that makes an XHTML document invalid in HTML would be an
example, including:

   The DOCTYPE.
   The xmlns attribute.
   The xml:lang attribute.
   The /> syntax for empty tags.


> IMHO the HTML WG should look at errata'ing any such HTML 4.01 UA
> compliance requirements in order that a compliant HTML 4.01 UA can
> accept valid XHTML 1.0 documents authored with the Appendix C
> guidlines.

That's certainly an interesting idea. I can't think of any other
things off hand, assuming Appendix-C compliance.

I'll try to compile a list of the changes that would be required.


>>    UAs. Since most authors only check their documents using one or
>>    two UAs, rather than using a validator, this means that authors
>>    are not checking for validity, and thus most XHTML documents on
>>    the web now are invalid. Therefore the main advantage of using
>>    XHTML, that it has to be valid, is lost if the document is then
>>    sent as text/html.
>> 
>> I am presuming that _most_ authors will fail to do so. Given the
>> state of the Web, I feel this assumption is justified.
> 
> I don't doubt your assumption, just your conclusion. The advantage
> of being able to more strictly validate a document is still there.

We've always been able to validate HTML. The key is getting UAs to
_require_ that the documents be valid (actually, well formed, which is
what matters the most). There isn't any chance that HTML UAs will
_ever_ require that of text/html content.


> I think the key is, that there is a desire to let HTML UAs that
> don't support XHTML treat the markup as HTML. That is different than
> asking for all UAs to treat the markup as HTML.

Sending markup as text/html is a signal to all UAs that the markup
should be handled as tag soup (officially known as "HTML"), for the
reasons given in the section labelled as "Why UAs can't handle XHTML
sent as text/html as XML" in:

   http://www.hixie.ch/advocacy/xhtml


>> Note that it doesn't matter how soon you intend to move to an XML
>> MIME type; if you ever intend to, you'll hit the problems.
> 
> True enough. However, I believe the author can just use lower case
> element/attribute names (even in the HTML documents and related
> scripts), and have it just work.

To make sure XHTML works as both MIME types you have to ensure you do
everything in appendix C, plus:

   never use <!-- --> in <script> or <style>
   never use namespaces
   never use PIs
   use lowercase CSS selectors
   explicitly include <tbody> elements
   style the html element instead of the body element
   compare tagnames by lowercasing them first
   create elements in lowercase

There are probably many more things that have to be ensured. I know
I've forgotten some of CSS's caveats.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 6 January 2003 21:33:23 UTC