Re: Is this legal XHTML 1.1? from Christian Wolfgang Hujer on 2002-12-14 (www-html@w3.org from December 2002)

From: Christian Wolfgang Hujer <Christian.Hujer@itcqis.com>
Date: Sat, 14 Dec 2002 19:01:40 +0100
To: Ian Hickson <ian@hixie.ch>
Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, "www-html@w3.org" <www-html@w3.org>
Message-Id: <200212141901.40291.Christian.Hujer@itcqis.com>
Hi Ian, dear list members,


Am Freitag, 13. Dezember 2002 08:30 schrieb Ian Hickson:
> > Yes, but *that* group usually even does not know of either W3C or XHTML.
>
> They may not know of either, but they are using XHTML because everyone
> else is using XHTML now. Authors write by copying and pasting.
>
> Just take a look at the insanely hight percentage of new pages being
> writen that have XHTML DOCTYPEs.
What about including a comment like "use a validator to ensure your page is 
correct: http://validator.w3.org/" right after the DOCTYPE? ;-)

> >>    Ergo these authors are not checking for validity.
> >
> > Yes. They should be teached validity first, then XHTML.
>
> Good luck with that.
Oh, you mean, I shall do that. No, I meant someone should do that. Not I ;-)
No fun, yes, I already do that sometimes.

But I've just found an example of how correct you are, regarding to the usual 
other. I've taken a look at the source code of http://www.microsoft.com/ and 
was totally shocked! Well, of course, that corporation has never been a good 
example of how to correctly implement standards and norms...

Then I've taken a look at:
http://www.microsoft.com/ Grande Catastroph!
http://www.netscape.com/ Not much better.
http://www.opera.com/ Validates as XHTML 1.0
http://www.mozilla.org/ Validates as HTML 4.01 (after setting char enc to 
US-ASCII, which is okay.)


> >>    Most authors (including me) occasionally include at least one error
> >> in their documents, making them ill-formed.
> >
> > Well, the only "error" I occasionally include is namespace declarations
> > that should not be there because some XSLT processors have some really
> > annoying (but still absolutely correct) behaviour about namespace
> > declarations.
> You never write documents that don't validate?
I do, of course, typos happen, but invalid documents even won't get into 
transformation. Not talking of upload... Each validation error is instantly 
reportet by Ant / Xerces.

I wrote non-validating documents some time ago.
For instance, on my homepage, which was last modified in spring 2002, uses tr 
height="1%" which is not existent in XHTML 1.0 Strict. I will not change it 
because the complete site will be thrown away anyway in the next few days.

> > But usually all my XHTML documents are, and all my HTML documents were
> > valid. I check each document for validity *before* upload, even before
> > locally viewing them in the browsers.
> And you never find them invalid when you are writing them?
Nearly never.
It's like with a compiler language. Once you're used to it, the probably of 
making errors decreases rapidly.
And I check validity on all documents using Ant and Xerces or Crimson.

> >>    Ergo the authors that are a cross-section of both groops, and use
> >>    XHTML, are placing invalid XHTML documents on the web.
> > That's true.
> And that is the problem!

But the solution can't be not to use XHTML.
"Many many People use XHTML in the wrong way, so no one must use XHTML [even 
those using it correct]" - I don't agree with that.
What about "People use HTML in the wrong way, so no one must use HTML"...

> >> Since the two groups are huge proportions of the Web authoring
> >> community, as a quick perusal of XHTML sites will show,most XHTML
> >> documents on the web now are invalid.
> >>
> >> Where is the error?
> >
> > But that is not a reason not to use XHTML
>
> I'm not saying "don't use XHTML". I'm saying "don't send XHTML as
> text/html", for the good of the Web and for your future sanity.
Okay.

But I keep sending XHTML as application/xhtml+xml or text/html, depending on 
the browser, for a while.

> >> This document isn't some sort of theoretical excercise. It is listing
> >> practical reasons why using text/html for XHTML is bad.
> >
> > It's bad in the cases described.
>
> Thank you.
You're welcome ;-)
I think a little flamewar is quite good to strengthen each other's arguments. 
It's not personal.
And yes, that's probably just a bad excuse for an arrogant guy who wrote first 
and thought then concerning the use of the word bad. I say _sorry_ for being 
so rude.

> > But what about a valid XHTML 1.1 document that displays fine even in
> > Netscape 4?
> Since no valid XHTML 1.1 document could ever be sent as text/html, that
> will never happen.
Oh, why can't a valid XHTML 1.1 doc not be sent as text/html?
It's still a valid XHTML 1.1 document then, it's just not real text/html ;-)

> > Why shouldn't such a document be served as .xhtml with
> > application/xhtml+xml to Mozilla, Opera and all other browsers that
> > send an Accept header which contains application/xhtml+xml and .html
> > with text/html to the user agents that don't say they knew
> > application/xhtml+xml like Internet Explorer? Those are tag soup
> > anyway, they "don't know the difference".
>
> That would be great, if people did it.
>
> They don't. (With the exception of maybe 3 or 4 sites.)
Well, I do.

And all the sites I'm responsible for will migrate to that behaviour before 
christmas. That'll already be more than 3 or 4 (but still less than 
0.0000001% of the Web, probably)

I even use real link elements, like <link rel="Alternate" rev="Alternate" 
href="index.de" hreflang="de" type="application/xhtml+xml" /> and <link 
title="Startseite" href="/de/haupt_home" hreflang="de" 
type="application/xhtml+xml" rel="Start" />

> >> One of those reasons is that if you ever switch your XHTML documents
> >> from text/html to text/xml, then you will in all likelyhood end up with
> >> a considerable number of XML errors, meaning your content won't be
> >> readable by users. (Most XHTML documents do not validate.)
> >
> > See above. When I speak of XHTML I speak of validating XHTML.
>
> When I speak of XHTML I speak of anything with an XHTML DOCTYPE, valid or
> not. Shutting your eyes to the reality of the mess being created here does
> no-one any good.
Invalidity sometimes not really is a big problem. Add 
xmlns="http://www.w3.org/1999/xhtml" as attribute to an element that's name 
is not html, and the document is invalid, but not causing any problem (XML 
Schema I think will avoid that kind of problem). Add topmargin="2" to the 
body element and the document is **** (insert your favorite rude four letter 
word ;-). I'm much more concerned about well-formedness and logical 
correctness in that case. And a not well-formed document can never be an 
XHTML document because it must be an XML document in the first place.

> > Of course, many problems will arise if they have not been considered
> > in the first place. HTML DOM seems not to apply to current
> > implementations of application/xhtml+xml user agents, so XML DOM must
> > be used. Some differences are between HTML CSS and XHTML CSS in
> > current User Agents.
> >
> > What if all this has been considered?
>
> Then you are one in about 5,000,000 people. (I think that number is
> actually accurate. By it I mean that you are one of about hundred people
> world-wide who actually understand the problem and know how to avoid its
> pitfalls.)
Oh, thanks.

> If you are one of those few, very few, people, then sure, go ahead, use
> XHTML and send it as XML to some UAs and XHTML to others. But otherwise,
> you _will_ run into those many problems I listed, and you shouldn't be
> using XHTML as text/html.
>
> My document is aimed at the general authoring public, not the extremely
> rare uber-geeks of the Web world.
Okay. I should have considered that before, I was too narrow-minded. But on 
the other hand, I didn't want to shut my eyes on your document and say "that 
doesn't concern *me*, yeah, because I'm cool", I rather wanted to discuss 
this.

> >> I don't know of _anyone_ who has switched or could switch from text/html
> >> to an XML MIME type without a single problem.
> >
> > I also had my problems, but I detected them *before* I published the
> > documents on the web.
>
> But that's not important. The point is you will hit problems, wherever you
> hit them, if you try to change to a correct MIME type. If you have
> 1,000,000 documents, as some big sites do, then that will cost real cash.
No.
That's just a slight change in .htaccess, build.xml and transform.xslt :-)

> You are not a typical Web author.
That now is a real compliment :-)
Well, okay, and I admit I use vi improved as text editor on a Linux system 
configured to use UTF-8.

[HTTP, WWW, HTML, RFC vs. TR]
I'm convinced now.

> > I still can't see why I should not serve them as application/xhtml+xml to
> > application/xhtml+xml accepting ua's and text/html to the tag soup
> > browsers.
>
> _You_, specifically, are an such an extreme exception that the document
> doesn't even attempt to apply to you. For the few people who understand
> enough of the issues to know to serve XHTML to only some UAs, I wish all
> the best of luck.
>
> But for the real world, the issues I raised are very real issues that
> should be valid reasons for not using XHTML with text/html.
>
> I'll update my document to make this clearer.
Okay.
And you really made me thinking about using two transformations, one which 
creates XHTML 1.1 / application/xhtml+xml for Mozilla and Opera, and one 
which creates HTML 4.01 Strict / text/html for the tag soup chaos.

I tought it'll take about 7 minutes per site to change this.

But then I just found a bug in Xalan. It won't transform to HTML (<xsl:output 
method="html"/>) when there's a namespace declaration in the source document 
(<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de">). But in XHTML 
Basic, which I use as input, there automatically is a namespace declaration 
on the html element.
And switching between Xalan, saxon etc. all the time isn't fun since the 
handling of namespace declarations is very different.


Bye
-- 
ITCQIS GmbH
Christian Wolfgang Hujer
Geschäftsführender Gesellschafter
Telefon: +49  (0)89  27 37 04 37
Telefax: +49  (0)89  27 37 04 39
E-Mail: Christian.Hujer@itcqis.com
WWW: http://www.itcqis.com/
Received on Saturday, 14 December 2002 13:01:08 UTC