Re: Serving XHTML as XML from Tina Holmboe on 2005-02-23 (w3c-wai-ig@w3.org from January to March 2005)

From: Tina Holmboe <tina@greytower.net>
Date: Wed, 23 Feb 2005 13:32:53 +0100 (CET)
To: w3c-wai-ig@w3.org
Message-Id: <200502231232.j1NCWrZX030201@asterix.andreasen.se>
On 23 Feb, Jesper Tverskov wrote:

> Like so many others I'm now serving my XHTML webpages as XML using
> mime-type application/xhtml+xml and content negotiation based on the
> http accept header.

  I wonder whether more than a handful do this. Most, it seems, send
  their XHTML using text/html and doesn't really care more than that.



> "XHTML, http accept-header and mime-type application/xhtml+xml"
> www.smackthemouse.com/xhtmlxml

> Nice if someone could report some problems caused by "XHTML as XML" on
> or off list, so I can start experimenting with ways to solve them.
> Comments and suggestions are also welcome.

  There are a number of issues with your article. It might be
  instructive to raise them here, so that we can lay a few myths to
  rest.

  First, with a very few exceptions, XHTML does not provide any
  advantages over HTML. You comment, based on WCAG 1.0 11.1:


> Since is has been possible for several years to serve XHTML as XML to
> browsers understanding it, I would say that one can't claim
> Conformance Level "Double-A" if one is just using HTML.

  Unless the author has specific XML needs - aka mixing namespaces,
  typically using MathML and similar - then HTML is appropriate for the task of
  publishing information on the WWW. It is available today (it is
  supported by the majority of UAs), and does the job.

  XHTML, on the other hand, is only supported by a very few browsers;
  among those in the wild I can think of only Opera, and those which are
  Gecko- and khtml-based.


  Now for the article. The first thing that I notice is your
  statement that XHTML has replaced HTML. To my knowledge, this is not
  correct. The word "replace" is not used at all in the XHTML 1.0
  specification; and it is clear that HTML and XHTML will exist side by
  side for different purposes.

  The second detail is the algorithm for determining which type of
  content to send. It entirely - as far as I can see - ignore the q
  parameter. What was your rationale for designing the algorithm this
  way?

  You then go on to state that all browsers "worth mentioning" support
  XHTML today. Frankly, I find that an exaggeration if not outright
  offensive. You are judging the worth of an UA based on whether or not
  they support one specific markup language - there is nothing at all
  here about backwards compatibility.

  Following that is the comment: 

   "Just one  violation  of  the  markup  rules  of well-formedness and
    the browsers will only show an error message.  That  is the recipe of
    quality web pages based on modules of xml applications"

  This is not only slightly incorrect - there are provisions in the XML
  standard for presenting content alongside the error; but the content
  should not be processed. This, in my view, is a grave mistake in the
  design of XML - unprocessed data (Opera style), or "just" an error
  (Gecko style), are inherently inaccessible.

  It is the job of the browser to *try*. Keep in mind what the word
  "agent" in "user agent" signifies, and the old Internet saying that
  one should be liberal in what one accept.

  Error-correction-by-stopping might be a good idea in a spacecraft,
  but not on the web. Imagine, for a moment, a DVD player which - when
  encountering a scratch in a disc - simply stops instead of trying to
  correct the error to the best of it's ability. While a slightly flawed
  analogy, it's food for thought.


  Returning for a moment to your algorithm. You say:

    'All  we  need  to do is to test if the accept-header sent by the
     browser, etc., contains the string "application/xhtml+xml".   If
     it   does,   we   send   it file   A,  XHTML  1.1  and  mime-type
     application/xhtml+xml, if it doesn't we send it file B, XHTML 1.0
     Strict and mime-type text/html.'

  While ignoring for a moment the q parameter, this sounds to me wrong.
  Are you saying that:

   IF the accept header indicates that application/xhtml+xml can be parsed THEN
    send XHTML 1.1
    send application/xhtml+xml

   IF the accept header indicate that application/xhtml+xml cannot be parsed THEN
    send XHTML 1.0 Strict
    tell the UA it's *actually* HTML - even if it isn't.


  If so, may I ask what advantage you believe is gained by sending XHTML under the
  guise of HTML? If no advantage is gained, why do you not transform the XHTML to
  properly structured HTML?

  Next we find

    "Styling  the html element is so contrary to common sense that browsers like
     Opera, Safari and Amaya don't  support  it  yet even though they support
     XHTML as XML."

  This is confusing. As far as I know Opera has done this with HTML for a long time;
  and it is easily proven by experimentation. The same go for Konqueror. I have not
  tested Amaya. There is no inherent limitation to styling the HTML element - the
  CSS specification even does so in the example stylesheet for 4.01. No news for
  XHTML this.

  Could you expand a little on why document.write() - which is an awfully bad
  habit by the way - "can not be used" with XHTML?





> Using well-formed markup, it is much easier for browsers and assistive
> technology to reuse content and to provide better features, including
> accessibility features, for end users.

  A well-formed document is easier to parse - yes. Whether that will
  help UAs is another matter, since they are already required to handle
  badly formed markup.

  What will help a UA, in particular assistive technologies, is markup
  which structures content using markup with predefined and well-known
  semantic interpretation. That, of course, works nicely all the way up
  to the XHTML 1.1 if not beyond.




> If markup is not well-formed you need a lot of testing, a lot of
> string manipulation and unreliable Regular Expressions that easily
> breaks.

  Unlikely, but even so: unless a browser manufacturer want to take the
  suicidal way out and create an UA which don't handle anything but
  well-formed code ... well.

  Besides, not handling badly formed code is a usability and
  accessibility nightmare. Whatever is thrown at a browser, it must do
  something to present content. The method used by, for instance,
  Firefox when encountering badly formed code is highly unsatisfactory.
  Opera does better, but I find the principle poorly thought out.

  Remember the words of Jon Postel in RFC 791:

   "In general, an implementation must be conservative in its sending
    behavior, and liberal in its receiving behavior."

  I suggest a rewrite of your article.

-- 
 -    Tina Holmboe                    Greytower Technologies
   tina@greytower.net                http://www.greytower.net/
   [+46] 0708 557 905
Received on Wednesday, 23 February 2005 12:32:56 UTC