Re: XHTML Family Documents and Media Types from Mark Birbeck on 2008-03-17 (www-validator@w3.org from March 2008)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Mon, 17 Mar 2008 21:49:36 +0000
To: "David Dorward" <david@dorward.me.uk>
Cc: "XHTML WG" <public-xhtml2@w3.org>, www-validator@w3.org
Message-ID: <a707f8300803171449u677ade8ao3bb6857f8f1e9b3@mail.gmail.com>
HI David,

>  Sorry, I should have CCed the XHTML WG list when I sent this
>  originally. Please CC www-validator on responses, thanks.

Thanks.

>  >> XHTML 1.0 was incredibly important for HTML because it told people
>  >> how
>  >> to generate XML documents that could be rendered in web browsers. In
>  >> other words, the world of XML tools could be used to generate
>  >> documents, and yet those documents could still be read in standard
>  >> browsers.
>  >
>
> > Transforming XHTML to HTML with XSLT is trivial.

If you have an XLST processor. :)


> If the document
>  > isn't being processed as XML, then what's the benefit of using
>  > XHTML over HTML?

With respect, you've just repeated the usual confusion. In this
scenario, the document _was_ processed as XML, in order to create it.
Contrary to your assertion at the end of your post, an enormous number
of documents are created as XML, using server-side frameworks.

The final step in the whole process is the rendering, and that does
not need to be restrictive. If I get a crackly line on the phone,
should the entire message be blocked out by the phone company? An HTML
renderer could make a pretty good stab at rendering a badly formed
XHTML document, such that it would be useful to people.

In short, the document _is_ processed as XML, and that is why XHTML is
useful, but it could be *rendered* based on non-XML rules.

I'm not encouraging tag-soup, by the way. I'm merely saying that it is
not the job of the browser to enforce validity--we have other tools
for that (see further discussion on this, below).


> It just means that HTML parsers are effectively
>  > forbidden from implementing <foo /> according to HTML 4.01.

I don't follow that.


>  >> But why insist that browsers must interpret those documents as XML?
>  >> For example, sending a non-well formed XHTML document to Firefox
>  >> means
>  >> you get a blank page with lots of hyphens and a caret...what use is
>  >> that to anyone? And worse, it goes against the whole history of HTML,
>  >> where attempts are made to render documents that have errors in.
>  >
>
> > Good reasons to either
>  >
>  > (a) Have good tools that won't let you produce pages with such
>  > errors in them

That's right, we certainly can use more of them.

But you have only shifted the question by one; now that I am using
lots more tools to validate my XHTML documents, what benefit is there
to showing a blank page with lots of hyphens in, if the browser
receives a faulty page?

In Sidewinder, an open source browser that is being worked on by
myself and a couple of colleagues, we run the validation step in
parallel with the rendering step, and show the results in a separate
window. It means that you can still see and use the web page (in so
far as the rendering engine can), but it also means that you can see
any errors and fix them.

That seems to me an eminently more useful and practical approach.


>  > or
>  >
>  > (b) Use HTML

See above--lots of people like using XML tools, since they are very
powerful and widespread. And as you say, it's trivial (if you have an
XSLT processor) to create HTML from XML.


>  >> Of course, the HTML 5 route, of trying to recover from every error is
>  >> in my view, just as bizarre,
>  >
>
> > If throwing an error message when the document is not well formed
>  > is of no use to anyone, and recovering from every well formedness
>  > error is bizarre, what does that leave?

The bizareness is in trying to define the recovery from every error.
The point is that all we need to say is that if the document is not
completely valid XHTML then it is in error, and show the errors in a
separate window. But the renderer can still give it its best shot at
rendering. This takes advantage of widely available validation
technologies (the canonical form of the document is the valid XML
version) but the renderer can still do some work if you miss quotes of
an attribute value.


>  >> but that doesn't really matter. The key
>  >> point is that there should be nothing wrong with creating a document
>  >> using XML tools, and then delivering that document to an HTML
>  >> rendering engine and seeing something useful.
>  >
>
> > Works for me. I process various bits of data, some in a database,
>  > some in static files, some from URIs on the web, process them with
>  > XML tools and output various things - mostly HTML 4.01 Strict and
>  > ATOM+XHTML.

:)


>  >> That's what most of the world is doing, after all.
>  >
>
> > Most of the world is throwing tag soup about with graphical HTML
>  > generators and string substitution. XML tools aren't involved all
>  > that often.

I don't think so...not any more. Even blogging software is checking
your posts for well-formedness nowadays. Add to that, that most pages
are generated using server-side tools, and you'll find that most
web-pages would be acceptible XHTML if it wasn't for this MIME type
issue.

Regards,

Mark

-- 
  Mark Birbeck

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.x-port.net | http://internet-apps.blogspot.com

  x-port.net Ltd. is registered in England and Wales, number 03730711
  The registered office is at:

    2nd Floor
    Titchfield House
    69-85 Tabernacle Street
    London
    EC2A 4RR
Received on Monday, 17 March 2008 21:50:17 UTC