RE: HTML or XHTML - why do you use it? from Peter Foti (PeterF) on 2003-01-06 (www-html@w3.org from January 2003)

From: Peter Foti (PeterF) <PeterF@SystolicNetworks.com>
Date: Mon, 6 Jan 2003 16:57:21 -0500
To: "'Ian Hickson'" <ian@hixie.ch>, "'Nick Boalch'" <nick@fof.durge.org>
Cc: "'www-html@w3.org'" <www-html@w3.org>
Message-ID: <A10A983C9DFBD4119F0300104B2EA6B725FF30@ZIPPY>
Ian,

I followed that link and read your document.  Don't take this as a personal
attack, but some of the points that you make are not quite accurate in my
view.  Your argument does not seem to take into consideration the case where
an XHTML document is meant to be treated as HTML.  From the XHTML 1.0
recommendation:

<snip>

It is intended to be used as a language for content that is both
XML-conforming and, if some simple guidelines are followed, operates in HTML
4 conforming user agents. Developers who migrate their content to XHTML 1.0
will realize the following benefits:

XHTML documents are XML conforming. As such, they are readily viewed,
edited, and validated with standard XML tools. 
XHTML documents can be written to operate as well or better than they did
before in existing HTML 4-conforming user agents as well as in new, XHTML
1.0 conforming user agents. 
XHTML documents can utilize applications (e.g. scripts and applets) that
rely upon either the HTML Document Object Model or the XML Document Object
Model [DOM]. 
As the XHTML family evolves, documents conforming to XHTML 1.0 will be more
likely to interoperate within and among various XHTML environments. 

</snip>


Having said that, I will now dive into your arguments:


<Ian>
 * Current UAs are HTML user agents (at best) and certainly not XHTML
   user agents (certainly not when sent as text/html), so if you send
   them XHTML you are sending them content in a language which is not
   native to them, and relying on their error handling.
</Ian>


As the XHTML recommendation stated, XHTML documents are intended to operate
in HTML 4 conforming agents.  Sending an XHTML document as text/html seems
perfectly fine (when the document in question is meant to be viewed as HTML
by HTML 4 conforming agents).


<Ian>
 * <script> and <style> elements in XHTML may not have their contents
   commented out, a trick frequently used in HTML documents to hide
   the contents of such elements from legacy UAs. [1]
[1] Because in XHTML, <script> and <style> elements are #PCDATA
blocks, not #CDATA blocks, and therefore <!-- and --> really _are_
comments tags, and are not ignored by the HTML parser.
</Ian>


This is interesting, and it leads me to wonder if this is a typo in the
recommendation.  The HTML recommendation states that a script element
contains 
%Script data, which is defined as CDATA.  The XHTML recommendation also
defines
%Script as CDATA, but the script element contains (#PCDATA) instead.  I
don't know if this is a mistake in the recommendation or not.  

However, PCDATA can contain CDATA.  And again, since the document is meant
to be viewed as HTML by HTML 4 conforming agents, comments will be treated
as such when the document is served as text/html.


<Ian>
 * XHTML documents that use the "/>" notation, as in "<link />", are
   not valid HTML documents. 
</Ian>


I don't really have a good argument for this case, other than HTML agents
are generally very forgiving regarding valid documents.  As stated in the
HTML 4 documentation at:
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.1

If a user agent encounters an attribute it does not recognize, it should
ignore the entire attribute specification (i.e., the attribute and its
value). 

I admit, I don't have a really strong argument on this one, but I do
recognize that most (all?) HTML 4 agents will not have any problems with
this notation.


<Ian>
 * Document sent as text/html are handled as tag soup [2] by most UAs.
   Since most authors only check their documents using one or two UAs,
   rather than using a validator, this means that authors are not
   checking for validity, and thus most XHTML documents on the web now
   are invalid. Therefore the main advantage of using XHTML, that it
   has to be valid, is lost if the document is then sent as text/html.
</Ian>


You are presuming that all authors will fail to validate their XHTML
document.  This is an authoring issue and you can't use this as a reason why
using text/html for XHTML is bad.  Authors will have to catch up someday and
start writing valid documents (if they want to find work).  Better to get
them on track now than to keep waiting for some overnight miracle.  :)
Seriously, though, if *I* am willing to check the validity of my documents,
and I want to send them as text/html, then I will be able to take advantage
of using XHTML.


<Ian>
 * If you ever switch your XHTML documents from text/html to text/xml,
   then you will in all likelyhood end up with a considerable number
   of XML errors, meaning your content won't be readable by users.
   (Most XHTML documents do not validate.)
</Ian>


This is the same argument as the previous, just in different clothing.  I
*do* write valid XHTML documents, and since I am writing them to act as
HTML, I *don't* want to switch them from text/html to text/xml.


<Ian>
 * A CSS stylesheet written for an HTML document has subtly different
   semantics in an XHTML context (e.g. the <body> element is not
   magical in XHTML).
</Ian> 


I agree... and that's why I want to serve those documents as text/html
instead of text/xml.  As I just wrote, I don't want to switch those
documents from text/html to text/xml.  


<Ian>
 * A script written for an HTML document has subtly different
   semantics in an XHTML context (e.g. element names are uppercase in
   HTML, lowercase in XHTML).
</Ian>


I assume you are referring to the DOM for each of these?  Again, this is not
that big of an issue, especially since I have no intention of an HTML to XML
conversion anytime soon.


<Ian>
 * If a user saves an XHTML-as-text/html document to disk and later
   reopens it locally, triggering the content type sniffing code since
   filesystems typically do not include file type information, the
   document could be reopened as XML, potentially resulting in
   validation errors, parsing differences, or styling differences.
</Ian>


It depends on what application the user has associated with the file
extension, does it not?  If the user saves the file with a .htm extension,
then his/her HTML User Agent will most likely be the one to open the file.  


<Ian>
 * The only real advantage to using XHTML rather than HTML is that it
   is then possible to use XML tools with it. However, if tools are
   being used, then the same tools might as well produce HTML for you.
   Alternatively, the tools could take SGML as input instead of XML.
</Ian>


No, they should not produce HTML (I presume you mean HTML 4 with missing end
tags, etc.).  If they did, then the XML tool would have to guess where
elements ended if they re-opened the generated HTML file.  Much better to
produce XHTML documents that can be viewed as HTML 4.  Also, not sure what
tools you use, but the ones I work with don't take SGML.  SGML is too
loose... the point is that they can validate as XML.  Also, this is not the
only real advantage.  


<Ian>
 * HTML 4.01 contains everything that XHTML contains, so there is
   little reason to use XHTML in the real world. It appears the main
   reason is simply "jumping on the bandwagon" of using the latest and
   (perceived) greatest thing.
</Ian>


True.  However, documents that conform to XHTML may perform better than a
document that conforms only to HTML 4 because all of the closing tags are
defined.  The browser doesn't have to do any guess work to try to figure out
where they go.  And you'll probably say that HTML documents can be written
with all of their closing tags as well, but the documents will validate
without them, making it more likely that the developer could miss some and
not realize it.  And you'll probably say that validators can be configured
to require all closing tags, but why go through that trouble when you could
just write the document as XHTML?  You'll be more likely to write cleaner
code, you won't have to configure a validator to your own special needs, and
you will probably have a better understanding of both XML and HTML instead
of just HTML.

Much of your argument seems to revolve around converting HTML documents to
XML documents, which is a lot of work (mostly presentational).  But my
arguement is that I want to display XML compatible documents as HTML
*BECAUSE* it is so much less work (and because there are so many HTML agents
vs. XML agents).  At the same time, I get the benefits of using XML tools if
I want... I could convert my XHTML document to some other document using
XSLT if I wanted... can't do that with HTML.  I guess I just don't see why
anyone would NOT want to write their documents as XTHML.


Phew... I haven't even gotten to read the rest of that document ("Why UAs
can't handle XHTML sent as text/html as XML" and on...).  Though I will
stress that since I'm not wanting the UA to handle XHTML sent as text/html
as XML, I probably don't need to read that section. :)

Anyway, I'm outta time for today.

Regards,
Peter Foti



> -----Original Message-----
> From: www-html-request@w3.org 
> [mailto:www-html-request@w3.org]On Behalf
> Of Ian Hickson
> Sent: Monday, January 06, 2003 1:41 PM
> To: Nick Boalch
> Cc: www-html@w3.org
> Subject: Re: HTML or XHTML - why do you use it?
> 
> 
> 
> On Mon, 6 Jan 2003, Nick Boalch wrote:
> >>>
> >>> [1] <URL: http://www.hixie.ch/advocacy/xhtml>, for example.
> >> 
> >> Think I've read this before. It only talks about why one 
> shouldn't send 
> >> XHTML as text/html, right?
> > 
> > More or less. It's conclusion is that XHTML delivered as 
> text/html is 
> > broken and XHTML delivered as text/xml is risky, so authors 
> intending 
> > their work for public consumption should stick to HTML 4.01.
> 
> Wow, I've never seen someone summarise that document so succintly.
> 
> Do you mind if I use that as the abstract?
> 
> -- 
> Ian Hickson                                      
> )\._.,--....,'``.    fL
> "meow"                                          /,   _.. \   
> _\  ;`._ ,.
> http://index.hixie.ch/                         
> `._.-(,_..'--(,_..'`-.;.'
>
Received on Monday, 6 January 2003 16:47:28 UTC