Re: Why HTML should be taught as HTML without pretending it is XML from Jon Barnett on 2007-07-23 (public-html@w3.org from July 2007)

From: Jon Barnett <jonbarnett@gmail.com>
Date: Mon, 23 Jul 2007 11:19:31 -0500
To: "Robert Burns" <rob@robburns.com>
Cc: public-html <public-html@w3.org>
Message-ID: <bde87dd20707230919x7f287793s3b51d92aaad34da7@mail.gmail.com>
On 7/23/07, Robert Burns <rob@robburns.com> wrote:
>
> Well, I'm not sure we can conclude from one anecdote, what everyone
> else thinks about their HTML.

The measurable study would be the number of pages with XHTML DOCTYPEs,
served as HTML, and containing markup that would have unintended
consequences if served as XML.  I'm afraid I can't offer anything
other than anecdotes (experience on lots of forums, personal
conversations, etc., the fact my college professor was teaching
exactly what I learned to a number of other students), but the fact
that this page exists says something:

http://www.hixie.ch/advocacy/xhtml
>
> Telling authors they're somehow made a mistake because their beating
> down a cowpath that, for some strange reason you think is misguided,
> does not make it any less of a cowpath.
It's how you interpret the cowpath.  I interpret it to mean that
authors misunderstand how XHTML actually works.  I think that teaching
HTML as having XHTML-like syntax would lead to shock when the author
first tries to do <p><ol></ol></p>

> No one has ever, as far as I
> am aware, ever explained in a logical way, what could possibly be
> wrong with authoring content that adheres to XHTML appendix C. It has
> simply become a mantra amidst a certain web development clique.
> ...
> Those are very minor differences that would only be gotchas for those
> ignoring Appendix C. Often authors are told to go with external
> stylesheets and external scripts (so that takes care of CDATA
> sections). Do that;, don't count on implicit elements; use Unicode
> characters instead of named character entities and stick with DOM1
> through DOM3 and you'll be fine (oh and don't count on IE consuming
> your content). There's no need to raise the Homeland Security alert
> level over XHTML. It's just a few things to understand about it
> before vending as XML. However, all that has nothing to do with the
> other reason for following an appendix C syntax: for its consistency
> and readability.

All of those things you just mentioned are caveats when serving XHTML
as text/html, and none of them are mentioned in in XHTML 1.0 Appendix
C.

To that, I'll add that document.createElement(), one of the most basic
DOM methods, creates an element without a namespace.  If this
quasi-XHTML eventually gets served as XHTML, even
document.createElement would have unintended consequenses.

> And it's not just a pedagogical issue. XML actually separates two
> things that cannot be clearly separated in HTML: well-formedness and
> validity. Take Henri's favorite example from HTML5: <p><ol><//o></
> p>.. In HTML5, this is perfectly valid and well-formed (presuming its
> properly placed in a larger document). It's a part of a valid DOM
> tree state. It's a valid XML serialization. However, it's not
> possible to express this in HTML4 with MIME type text/html (I was
> under the impression that it would be valid in HTML5, but Henri
> suggests otherwise).

It's mentioned here:
http://www.whatwg.org/specs/web-apps/current-work/#element-restrictions

> Is it invalid in that the author
> put an ordered list in a paragraph where it didn't belong? Or is it
> ill-formed where the author included a closing </p> tag where it
> didn't belong.

The latter.  It's invalid (or malformed) because there's a closing
</p> tag where it didn't belong.  The <p> element was implicitly
closed when the parser reached the opening <ol> tag.

> Anyway, this is getting off topic. The main thing is that there are
> many reasons to go xml-like syntax: even for text/html.

There is at least one good reason not to.  By teaching authors to use
a "stricter" HTML syntax, authors expect the parser to follow that
stricter syntax (e.g. expecting <p><ol></ol><p> to work)  This leads
to unintended consequences.  (e.g. it gets parsed as <p></p><ol></ol>)
 These unintended consequences are analogous to the unintended
consequences of serving XHTML as text/html.

One can encourage authors to use good, consistent coding practice -
quoted attributes, etc.  But, teaching XML-like HTML as The Way to
write HTML would lead to those unintended consequences.

-- 
Jon Barnett
Received on Monday, 23 July 2007 16:19:37 UTC