RE: HTML or XHTML - why do you use it? from Peter Foti (PeterF) on 2003-01-07 (www-html@w3.org from January 2003)

From: Peter Foti (PeterF) <PeterF@SystolicNetworks.com>
Date: Tue, 7 Jan 2003 14:14:05 -0500
To: "'Ian Hickson'" <ian@hixie.ch>
Cc: "'www-html@w3.org'" <www-html@w3.org>
Message-ID: <A10A983C9DFBD4119F0300104B2EA6B725FF34@ZIPPY>
> On Mon, 6 Jan 2003, Peter Foti (PeterF) wrote:
> >
> > Your argument does not seem to take into consideration the case
> > where an XHTML document is meant to be treated as HTML.
> 
> Well, more specifically, my argument is that the XHTML specification
> was wrong to allow that.
> 
> 
> > <Ian>
> >  * Current UAs are HTML user agents (at best) and certainly not
> >    XHTML user agents (certainly not when sent as text/html), so if
> >    you send them XHTML you are sending them content in a language
> >    which is not native to them, and relying on their error handling.
> > </Ian>
> > 
> > As the XHTML recommendation stated, XHTML documents are intended to
> > operate in HTML 4 conforming agents.
> 
> This isn't quite accurate -- XHTML documents (or rather, Appendix C
> compliant XHTML 1.0 documents) are intended to operate in HTML Tag
> Soup parsers. Strictly speaking, a compliant implementation of HTML
> 4.01 would be well within its rights to totally reject an XHTML
> document, since XHTML documents are not valid HTML 4.01.


Perhaps.  But considering the leniency that all current agents seemt to
offer, this is somewhat of a non-issue, is it not?  Are there any browsers
today that would reject non-valid HTML 4.01?  I don't believe there is.


> > <Ian>
> >  * <script> and <style> elements in XHTML may not have their
> >    contents commented out, a trick frequently used in HTML documents
> >    to hide the contents of such elements from legacy UAs. [1]
> >
> > [1] Because in XHTML, <script> and <style> elements are #PCDATA
> > blocks, not #CDATA blocks, and therefore <!-- and --> really _are_
> > comments tags, and are not ignored by the HTML parser.
> > </Ian>
> > 
> > This is interesting, and it leads me to wonder if this is a typo in
> > the recommendation.
> 
> It's not -- XML doesn't have any content model which allows comment-
> like markup to be ignored. Don't forget in XML parsers should get the
> same result whether or not they parse the DTD (with a few exceptions
> related to attributes and entities).


I see.  However, since we are talking about parsing XHTML as HTML, I don't
think this matters because the agent will still treat it as an HTML comment.


> > As stated in the HTML 4 documentation at:
> >
> >    
> http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.
> html#h-B.1
> > 
> > If a user agent encounters an attribute it does not recognize, it
> > should ignore the entire attribute specification (i.e., the
> > attribute and its value).
> 
> The slash in the form
> 
>    <foo/>
> 
> ...is not an unrecognised attribute, it is the end of the start tag,
> and the ">" is character data. This is known as the Null End Tag (NET)
> SHORTTAG feature. See, e.g.:
> 
>    http://www.nyct.net/~aray/sgml/short/shorttag.html#NET
> 


Oh, I see.  But do any agents support the SHORTTAG feature?  The spec states
that documents using SHORTTAGs are unlikely to work with many existing HTML
tools:
http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.7

I grant you, I don't have a strong argument here.  I don't enjoy fighting
for something that is *technically* wrong (even if it is widely accepted).
:)  But then again, I've never seen anyone use the SHORTTAG feature, so I'd
rather see it removed from the HTML recommendation than to go along with it.
  

> > <Ian>
> >  * If you ever switch your XHTML documents from text/html to
> >    text/xml, then you will in all likelyhood end up with a
> >    considerable number of XML errors, meaning your content won't be
> >    readable by users. (Most XHTML documents do not validate.)
> > </Ian>
> > 
> > This is the same argument as the previous, just in different
> > clothing. I *do* write valid XHTML documents, and since I am writing
> > them to act as HTML, I *don't* want to switch them from text/html to
> > text/xml.
> 
> Then this document is not for you.


I'll take that as a compliment then. :)  But don't you think the focus
should be on improving the quality of the existing developers rather than to
say "Existing developers are too stupid to us XHTML, so they shouldn't?"


> > <Ian>
> >  * A CSS stylesheet written for an HTML document has subtly
> >    different semantics in an XHTML context (e.g. the <body> element
> >    is not magical in XHTML).
> > </Ian> 
> > 
> > I agree... and that's why I want to serve those documents as
> > text/html instead of text/xml. As I just wrote, I don't want to
> > switch those documents from text/html to text/xml.
> 
> So you want HTML syntax and processing rules, and you want UAs to
> treat the markup as HTML.
> 
> Why not just use HTML?


Because I want the benefits of using XML tools and validators.  Not to
mention the experience of writing valid XML.


> > <Ian>
> >  * A script written for an HTML document has subtly different
> >    semantics in an XHTML context (e.g. element names are 
> uppercase in
> >    HTML, lowercase in XHTML).
> > </Ian>
> > 
> > I assume you are referring to the DOM for each of these? Again, this
> > is not that big of an issue, especially since I have no intention of
> > an HTML to XML conversion anytime soon.
> 
> Yes, I was referring to the DOM.
> 
> Note that it doesn't matter how soon you intend to move to an XML MIME
> type; if you ever intend to, you'll hit the problems.


Ok, I'll admit you are right here.  Eventually, if one intends to move from
serving HTML documents to XML documents, this problem will arrise.


> > <Ian>
> >  * If a user saves an XHTML-as-text/html document to disk and later
> >    reopens it locally, triggering the content type sniffing code
> >    since filesystems typically do not include file type information,
> >    the document could be reopened as XML, potentially resulting in
> >    validation errors, parsing differences, or styling differences.
> > </Ian>
> > 
> > It depends on what application the user has associated with the file
> > extension, does it not? If the user saves the file with a .htm
> > extension, then his/her HTML User Agent will most likely be the one
> > to open the file.
> 
> Yes, it depends on many things, on some platforms, it depends on the
> extension. That's why I said "could".
> 
> It has happened to me several times, on both Windows and Unix.
> 
> 
> > <Ian>
> >  * The only real advantage to using XHTML rather than HTML is that
> >    it is then possible to use XML tools with it. However, if tools
> >    are being used, then the same tools might as well produce HTML
> >    for you. Alternatively, the tools could take SGML as input
> >    instead of XML.
> > </Ian>
> > 
> > No, they should not produce HTML (I presume you mean HTML 4 with
> > missing end tags, etc.).
> 
> Yes, I mean HTML 4.01.
> 
> 
> > If they did, then the XML tool would have to guess where elements
> > ended if they re-opened the generated HTML file.
> 
> So why not use the SGML tools that have existed since before XML was
> even an inkling in anyone's eye?


Because they are not as strict as XML tools and can produce sloppy code?
Also, the usability of newer XML tools is probably better than older SGML
tools (just an opinion).


> > SGML is too loose...
> 
> Note that XML is a simplified version of SGML. What is too loose about
> it? I grant you it is more complicated than XML, but you need only use
> an already existing SGML tool.


XML is a more *strict* version of SGML.  SGML is loose in that it allows
things like missing end tags and such.  Blech!  :)


> > Also, this is not the only real advantage.
> 
> What other advantages are there?


Besides being able to use XML tools, it also gives authors experience
writing *better* documents that are more structured.  Debugging time is
reduced because you become accustomed to closing all tags.  Combine this
with the validator, and you will soon be creating documents that are
cross-browser compatible...  I guess my argument is that developers should
be trained to use XHTML *correctly*, and your argument seems to be that not
enough people use XHTML correctly so therefore those people should not use
it at all.

 
> > <Ian>
> >  * HTML 4.01 contains everything that XHTML contains, so there is
> >    little reason to use XHTML in the real world. It appears the main
> >    reason is simply "jumping on the bandwagon" of using the latest
> >    and (perceived) greatest thing.
> > </Ian>
> > 
> > True. However, documents that conform to XHTML may perform better
> > than a document that conforms only to HTML 4 because all of the
> > closing tags are defined.
> 
> This isn't strictly true. HTML is fully defined and not ambiguous,
> even with omitted end tags, there is no ambiguity about where tags
> wend and tags start, because of the strict parsing rules. In any case,
> you are using the same parser, and, as you point out below, you don't
> have to omit the tags in HTML anyway.
> 
> 
> > The browser doesn't have to do any guess work to try to figure out
> > where they go.
> 
> Right... instead it has to guess what to do with these unexpected "/"
> characters, these "xmlns" and xml:lang attributes, etc.


Actually, it doesn't have to guess what to do with the attributes...  it
just ignores the ones it doesn't know.  And as for "/", this seems to just
be ignored as well.  I'd much prefer it *guessing* to ignore the unknown
attributes than to have it guess wrong as to the intended location of a
missing closing tag.  :)


> > And you'll probably say that HTML documents can be written with all
> > of their closing tags as well, but the documents will validate
> > without them, making it more likely that the developer could miss
> > some and not realize it.
> 
> If the document validates, there is no ambiguity about where the
> elements end. It is fully defined.
> 
> For example:
> 
>    <p>Test<ol><li></ol>
> 
> ...is _exactly_ equivalent to:
> 
>    <p>Test</p><ol><li></li></ol>
> 
> ...and all UAs support this correctly as far as my testing has shown.


That would be nice... but Netscape 4 has proven you wrong. :)  Of course,
the example you list here is very simple, and probably will display
correctly in Netscape 4.  However, once you get into nested tables, it's a
whole different story.  Documents that are valid will NOT always display
correctly in Netscape 4 when they are missing closing tags.


> Basically, my argument is that if you know what you're doing, then
> sure, go ahead, but that most people don't, and that for them it would
> be a lot easier if they used HTML 4.01 now and thus were never tempted
> to convert these documents to an XML MIME type.


You don't think it would be better for those people to simply learn XHTML?
Do you really think it is "a lot" easier to use HTML than XHTML?


> Incidentally, do you have a URI to your XHTML pages? I would be
> interested in seeing whether there were any obvious mistakes I could
> point out to demonstrate my point.


Most of the stuff I work on is internal to my company.  I did do a lot of
the work on the following site:
http://www.missnh.com

which is XHTML 1.0 Transitional.  However, I am no longer the developer for
that site, so I can't say that all of the code there came from me (though I
suspect most of it probably is).  The person who took over for me is *not*
into validation, as I can see at least 1 page that he updated and made
invalid (the local reps page).  I'm sure you will find something that would
not work correctly when converted to pure XML.  :)

Regards,
Peter
Received on Tuesday, 7 January 2003 14:04:07 UTC