Re: [W3C docs] We should teach by example. from Smylers on 2007-07-06 (public-html@w3.org from July 2007)

From: Smylers <Smylers@stripey.com>
Date: Fri, 6 Jul 2007 18:16:28 +0100
To: HTML WG <public-html@w3.org>
Message-ID: <20070706171628.GB23278@stripey.com>
Robert Burns writes:

> I can understand why someone might find xml-like syntax more human
> readable.

Agreed.

> I imagine that with practice, one gets more capable of coming up with
> the end-of-an-element in that manner. However, for novices, or those
> with a lot of experience reading xml-like HTM, or even just those who
> have trouble thinking like an SGML parser, I think leaving out closing
> tags is a human readability issue.

I agree about those with lots of XML experience.

I disagree that it's an issue for somebody just because she is a novice
or has trouble thinking like an SGML parser.  In fact, quite the
opposite: I suggest that for some novices this is going to be much more
natural.  Obviously the people in this group are not the same as the
people who prefer the XML syntax.

For somebody new to HTML it makes sense to have to use, say, </em> to
mark where the emphasis should stop, because otherwise the browser can't
know.  It does not necessarily make sense to have to include </body> and
</html>: why should the browser need to be told that it's reached the
end, when it can see that perfectly well for itself for the simple
reason that there's no more content.

Note that if a beginner wants to put those closing tags on, that's no
problem; she can do so.  The point of the HTML syntax is that it's more
lax, more foregiving of unimportant differences.

And putting quote marks round attributes is just one more thing for a
beginner to have to grasp, and remember.  It slightly raises the barrier
to entry, unnecessarily so.  (And again, it doesn't matter if a learner
does do it.)

Some users, because of their background or the way their minds work, are
going to prefer the XML syntax; some are going to prefer the HTML
syntax.  Having both available makes learning HTML easy for both groups.

And I'd be surprised if _anybody_ naturally thinks like an SGML parser.

> The fascination some get from the idea that certain end tags can be
> left out, to me seems a bit reminiscent of the fascination some
> pioneering programmer once got when he said "eureka, I can express
> every year throughout eternity with just two digits,... or at least
> the important ones."

That's thinking about it backwards; it's thinking about it from an
expert insider's point of view; we (people on this list) know HTML well
enough to realize it works like that.

A beginner doesn't (necessarily) think "there's a </body> here but I'm
allowed to omit it"; he simply doesn't even get as far as thinking that
the browser needs to be told where <body> ends.  Or even starts, since
<body> is optional too.  Which is great, cos it means a beginner doesn't
even need to be told about the concept of <body>; they can just write
HTML content and have it do the right thing.

Note the evolution of HTML: it wasn't that we started with an XHTML
syntax and then somebody realized that because some tags could be
unambiguously omitted, and 'advanced' feature was added to cope without
them; instead it was that those developing HTML early on saw no reason
to include the unnecessary tags.  Presumably if it'd been easier for
humans had those optional tags been there, they would have been
included.

> This later led to some problems.

Those (2-digit date) problems were because storing 1969 as 69 suffers a
loss of information; optional tags in HTML has no such problem, because
it's unambiguous what the assumed content is.

> I think now we're seeing similar problems with the optional omission
> of close tags: not the least of which we're finding our HTML
> serialization cannot be as expressive as our xml serialization. As
> examples, the discussion over tying to improve the <img> syntax

Even if HTML documents going forwards are always written with <img/>
closed, there's still the backwards compatability problem because of all
the <img> tags out there.  They could be either opening an <img> element
(that contains alternative content) and which will be closed by a later
</img> tag; or they could mean <img/> but be written without the slash.

In the HTML syntax there are:

* elements such as <em> which must be closed explicitly; it's
  unambiguous that what follows is content of that element until </em>
  is reached

* elements such as <td> which always have content, but optional closing
  tags; it's unambiguous what what follows is content of that element
  until there is a tag which marks the end of the cell (either
  explicitly, or by starting another cell)
  
* elements such as <hr> which can never have content; whether written as
  <hr> or <hr/> it's always unambiguous that the element has ended, and
  what follows is always a sibling, never child content

Allowing <img> to take optional content while also maintaining backwards
compatibility with standalone 'unclosed' <img> elements introduces an
ambiguity in "<img>text".

Unfortunately that ambiguity exists whether or not HTML5 insists on
XML-style closed tags for all new content, so your proposal does not
help improve the expressiveness of HTML syntax versus XHTML syntax.

> Also, as Henri just raised, the desire to include foreign namespaces
> in the HTML serialization is complicated by the lack of closing tags.

In the message I think you're referring to Henri said:

  For reasons of backwards compatibility, the we have only one namespace
  we can use and this section correctly designates exactly that
  namespace.

  ... As for foreign namespaces in the text/html serialization, I think
  the matter of serializing MathML and SVG in text/html has not been
  pursued far enough yet and is still worth pursuing by this WG.

It isn't immediately obvious to me how a lack of closing tags in newly
written HTML content is complicating things; please can you elucidate?

> I think simply the presence of XML and XHTML has led to greater
> awareness among authors of ill-formedness issues and invalidity. Its
> difficult to communicate proper nesting to authors while
> simultaneously trying to communicate the benefits of certain tags
> being omitted.

We do not need to communicate benefits in omitting certain tags!
Allowing people who prefer to omit them to do so is sufficient reason
for for doing so; there's no need to try to persuade those who prefer to
include them into _not_ doing so!

Neither syntax has to be superior to the other.

Smylers
Received on Friday, 6 July 2007 17:16:49 UTC