W3C home > Mailing lists > Public > public-html@w3.org > July 2007

Re: Why HTML should be taught as HTML without pretending it is XML

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 17 Jul 2007 17:41:04 +0300
Message-Id: <50937C8A-A06F-443B-B0FF-92DC2C25EA64@iki.fi>
Cc: "public-html@w3.org WG" <public-html@w3.org>
To: Ben Boyle <benjamins.boyle@gmail.com>

On Jul 17, 2007, at 15:27, Ben Boyle wrote:

> I'm more interested in being able to teach people a particular syntax
> style of HTML like:
> <ul>
>  <li>item here</li>
>  <li>item here</li>
> </ul>
> As opposed to:
> <UL>
>  <LI>item here
>  <LI>item here
> </UL>

Teaching the former is fine.

However, since the latter has been valid since the dawn of time in  
HTML, I think the latter should stay conforming, so there'd be no  
point in teaching that the latter is wrong. Otherwise, we make the  
tiny part of the Web that is conforming even smaller.

> I've nothing against the latter save it's a bit of grief to unlearn
> the habit if (if!!) you do need to use XHTML, or even other XML
> languages like XSLT and Atom (usage of which has been growing where
> I've been working). And at times we do author in XHTML... one site I
> worked on recently keeps most content in Atom <atom:content
> type="xhtml"> for a feed, but this is also transformed to HTML for
> publishing as web pages. In both the Atom documents and the XSLT
> (which use HTML output), xml syntax is what you need. Can't have
> people doing this:
> <xsl:template match="item">
>  <LI><xsl:value-of select="."/>
> </xsl:template>

Sure, but in those cases, it is XHTML5 snippets that are being used  
inside a larger XML document. That's what XHTML5 is for.

> Yet the bulk of the work remains HTML. Still helps to use the same
> syntax where we can.
> I hope these make sense as real world scenarios where authoring to xml
> well-formedness principles is useful and convenient. I haven't found
> it harmful at all. It's not about teaching XML, it's just about a few
> simple rules that apply commonly to all markup we are using (that's
> "we" where I work, not "we" everyone everywhere in the world). It's
> not about pretending "HTML is like XML" either. It's about saying
> "write your tags in lowercase, and close your list items and table
> cells" which is perfectly fine in HTML (and always has been). This is
> still teaching "HTML as HTML" as you suggested, but leaning towards
> consistent syntax with XHTML where possible. I mean, what benefit is
> there in teaching "<LI>item here" over "<li>item here</li>" ... both
> are valid HTML. One is valid XHTML, one isn't.

I think it is fine to teach habits that also apply to XHTML as long  
as the students understand they still are writing HTML. However, it  
seems to me that insisting that people learning HTML or "Pro Authors"  
write what is the intersection of HTML5 and XHTML5 is not productive  
when they are doing something that doesn't absolutely require it.  
Teaching the intersection *properly* is much harder than teaching  
both HTML or XML separately.

There are situation where one is authoring text/html. There are  
situations where one is authoring application/xhtml+xml. There are  
situations where one is using a subset of XHTML5 that after XSLT,  
Atom or other XML processing can be losslessly serialized and served  
as text/html, but the original source text isn't served as such...

> I tried explaining this on the wiki  http://esw.w3.org/topic/HTML/ 
> AuthorSyntax

...However, the situation where one would have to author *by hand*  
source text that doesn't go through an XML pipeline before serving  
and where the exact hand-authored source text needs to work as both  
HTML5 and XHTML5 is so rare that only those who really have a good  
reason to do it should be bothered with the associated spec  
lawyering. There's no point in teaching it or, as the wiki alludes,  
suggest that "pro" authors should pull it off even when they don't  
*need* to.

> It's pretty common these days, at least where I've been working, for
> people to routinely close tags like </li> and </td>.

That's fine.

> Yes people can
> make mistakes with <div/> and <script/> but then people can also make
> mistakes with character encoding and nesting links and in building
> inaccessible websites. Things we have all had to learn (and continue
> to learn every day).

That's why I think people who are learning HTML are better served by  
being told the truth about the ugliness of HTML instead of given a  
more convenient story that it is like XML.

> It works for me. I don't expect it to work for everyone, but we have a
> choice and that's good.

Indeed. Part of the point I'm trying to make is that it is OK for  
people never to omit tags or quotes around attributes values, but it  
would be nice if in discussions about conformance people tolerated  
the spec keeping these things conforming for other people who have a  
different sense of source aesthetics. (I don't mean you. I mean  
opinions expressed here and in blogs earlier where people seemed to  
be shocked at HTML 5 allowing certain things in text/html that HTML  
4.01 allowed, too, but that XML doesn't allow.)

Henri Sivonen
Received on Tuesday, 17 July 2007 14:41:32 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:24 UTC