Re: self-closing tags in html5

William F Hammond wrote:

> In the spec at 8.1.2.1 (6) (for the text/html serialization):

You seem to refer to a clause in the W3C draft,
http://dev.w3.org/html5/spec/Overview.html#start-tags
and not in the WHATWG draft http://whatwg.org/html5 (which has different 
numbering). It would be nice to know in advance which draft is referred to, 
especially since both of them fairly often freeze my browsers.

>   Then, if the element is one of the void elements, or if the
>   element is a foreign element, then there may be a single U+002F
>   SOLIDUS character (/).  This character has no effect on void
>   elements, but on foreign elements it marks the start tag as
>   self-closing.

That may look like unnecessarily complex, but there's a point in the 
complexity. For "void elements" (elements with EMPTY declared content in the 
SGML world), syntax like <br /> or even <br/> has become common when people 
have tried to be modern and use XHTML, even when their documents mostly use 
old HTML syntax. For "foreign elements", i.e. for XML fragments from outside 
the HTML space, we must of course play by XML rules. For other elements, 
it's best to assume that the "/" got there by accident, and ignore it, as 
browsers currently do for HTML documents.

> It would be better to allow self-closing tags on all de facto empty
> elements, foreign or not and defined-empty or not.

I don't quite understand the phrase "de facto empty elements". If you treat 
"/" as making a tag "self-closing" (i.e. a closing the _element_ by acting 
as both start and end tag), then you of course make the element's content 
empty. So what's the point of the words "de facto"?

> This is better because (1) authors are given more choice and (2) DOM
> building is simplified.

Item (1) is a counter-argument, because we don't need any more choices for 
authors in the already confusing situation. Considering that HTML 5 will be 
an incomplete draft with only partial implementations for many, many years, 
there will be misunderstandings and hearsay-based authoring, so that people 
use different syntaxes without knowing what they are doing. Browsers do not 
actually treat <p /> as <p></p>, so why would you give authors the 
impression that they do?

Item (2) isn't relevant because DOM building would not be essentially 
simplified, and because any simplification there would be at most a minor 
convenience to people who write browsers. And this isn't really about DOM 
building but about parsing.

> For example, while it is true that major browsers seem to treat "<p/>"
> as an open tag, the relevant question for backward comptatibility is
> whether anyone has been relying on the idea that "<p/>" can be used to
> begin a non-empty paragraph.

It would be odd to intentionally rely on that, but if a document 
accidentally contains, say, <html /> at the start, should the page really be 
displayed as empty?

But there's a stronger reason too, related to the fact that people fairly 
often write like
<a href=/foobar.html>
relying on corrective processing of the attribute value implying quotes,
<a href="/foobar.html">
rather than any particular treatment of the slash. Such sloppy syntax of 
attribute denotations is fairly common and usually causes no problems, 
unless the author starts validating the page, making him wildly confused 
(see the Saga of the slashed validators, 
http://www.cs.tut.fi/~jkorpela/qattr.html ). It's common, and the HTML 5 
draft appears to "legalize" it.

And you can link to the root of a server using <a href=/>. This may be bad 
style, but it hasn't really harmed anyone. Making that tag "self-closing", 
i.e. equivalent to <a href=></a>, would not be nice.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 

Received on Sunday, 26 September 2010 05:54:32 UTC