Re: Why HTML should be taught as HTML without pretending it is XML from Ben Boyle on 2007-07-17 (public-html@w3.org from July 2007)

From: Ben Boyle <benjamins.boyle@gmail.com>
Date: Tue, 17 Jul 2007 22:27:28 +1000
To: "Henri Sivonen" <hsivonen@iki.fi>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <5f37426b0707170527h8369fa7g3cf46f8256ad5cd6@mail.gmail.com>
I'm more interested in being able to teach people a particular syntax
style of HTML like:
<ul>
  <li>item here</li>
  <li>item here</li>
</ul>

As opposed to:
<UL>
  <LI>item here
  <LI>item here
</UL>

I've nothing against the latter save it's a bit of grief to unlearn
the habit if (if!!) you do need to use XHTML, or even other XML
languages like XSLT and Atom (usage of which has been growing where
I've been working). And at times we do author in XHTML... one site I
worked on recently keeps most content in Atom <atom:content
type="xhtml"> for a feed, but this is also transformed to HTML for
publishing as web pages. In both the Atom documents and the XSLT
(which use HTML output), xml syntax is what you need. Can't have
people doing this:
<xsl:template match="item">
  <LI><xsl:value-of select="."/>
</xsl:template>

Yet the bulk of the work remains HTML. Still helps to use the same
syntax where we can.

I hope these make sense as real world scenarios where authoring to xml
well-formedness principles is useful and convenient. I haven't found
it harmful at all. It's not about teaching XML, it's just about a few
simple rules that apply commonly to all markup we are using (that's
"we" where I work, not "we" everyone everywhere in the world). It's
not about pretending "HTML is like XML" either. It's about saying
"write your tags in lowercase, and close your list items and table
cells" which is perfectly fine in HTML (and always has been). This is
still teaching "HTML as HTML" as you suggested, but leaning towards
consistent syntax with XHTML where possible. I mean, what benefit is
there in teaching "<LI>item here" over "<li>item here</li>" ... both
are valid HTML. One is valid XHTML, one isn't.

I tried explaining this on the wiki … http://esw.w3.org/topic/HTML/AuthorSyntax

It's pretty common these days, at least where I've been working, for
people to routinely close tags like </li> and </td>. Yes people can
make mistakes with <div/> and <script/> but then people can also make
mistakes with character encoding and nesting links and in building
inaccessible websites. Things we have all had to learn (and continue
to learn every day).


As this is largely possible with HTML4, and apparently improved in
HTML5, I'm very happy.
It works for me. I don't expect it to work for everyone, but we have a
choice and that's good.
(And a little bit of caution about mistaking HTML for XML never goes astray ;)

thanks
Ben



On 7/17/07, Henri Sivonen <hsivonen@iki.fi> wrote:
>
> First, why does HTML 5 allow certain XMLisms? The reason is to ease
> the migration from text/html bogo-XHTML to HTML5. *Not* the other way
> round.
>
> As a side effect of easing that migration, it is theoretically
> possible to write documents that mean the same thing both as text/
> html and as applications/xhtml+xml. However, almost anyone who tries
> to do his/her own stunts on this point is likely to shoot him/herself
> in the foot. Doing this is such a bag of gotchas that it definitely
> should not be recommended for novices. It shouldn't be even
> recommended to experts.
>
> Sure, it would be nice to teach novices that HTML is like XML if it
> were true. But it isn't. If someone is taught the convenient untruth
> that you can write HTML just like XML, it is likely that (s)he will
> write stuff such as:
>
> <div/>
>
> <script src='...'/>
>
> <p>foo <ol><li>bar</li></ol> baz</p> (permitted in XHTML5, btw, as
> per the current draft)
>
> or perhaps even
>
> <![CDATA[foo]]>
>
> None of these mean in text/html what they would mean in XML. To write
> proper text/html, a person needs know which elements don't have an
> end tag and which elements can go inside a p element. This should be
> taught to novices.
>
> text/html just is not XML for historical reasons. It is inconvenient,
> but hiding the truth from the student is doing a disservice to him/
> her, because the differences are so easy to hit accidentally if you
> don't know what they are and are thinking in terms of XML. Teaching
> that HTML is different from XML properly given the student notice
> that there indeed are differences.
>
> (However, a person who is learning to *write* text/html doesn't need
> to be taught up front which tags can be omitted, since it is quite ok
> not to omit tags.)
>
> The ease of migration to application/xhtml+xml is a flawed argument,
> because there are little things you can use very easily that
> immediately break the source-level migration path. The use of an
> entity that isn't predefined in XML or the use of document.write()
> are examples of such things. Experience with the infamous Appendix C
> suggests that people *will* use these features when they don't see
> immediate catastrophic failure.
>
> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/
>
>
>
>
Received on Tuesday, 17 July 2007 12:27:54 UTC