- From: Smylers <Smylers@stripey.com>
- Date: Fri, 6 Jul 2007 18:16:28 +0100
- To: HTML WG <public-html@w3.org>
Robert Burns writes: > I can understand why someone might find xml-like syntax more human > readable. Agreed. > I imagine that with practice, one gets more capable of coming up with > the end-of-an-element in that manner. However, for novices, or those > with a lot of experience reading xml-like HTM, or even just those who > have trouble thinking like an SGML parser, I think leaving out closing > tags is a human readability issue. I agree about those with lots of XML experience. I disagree that it's an issue for somebody just because she is a novice or has trouble thinking like an SGML parser. In fact, quite the opposite: I suggest that for some novices this is going to be much more natural. Obviously the people in this group are not the same as the people who prefer the XML syntax. For somebody new to HTML it makes sense to have to use, say, </em> to mark where the emphasis should stop, because otherwise the browser can't know. It does not necessarily make sense to have to include </body> and </html>: why should the browser need to be told that it's reached the end, when it can see that perfectly well for itself for the simple reason that there's no more content. Note that if a beginner wants to put those closing tags on, that's no problem; she can do so. The point of the HTML syntax is that it's more lax, more foregiving of unimportant differences. And putting quote marks round attributes is just one more thing for a beginner to have to grasp, and remember. It slightly raises the barrier to entry, unnecessarily so. (And again, it doesn't matter if a learner does do it.) Some users, because of their background or the way their minds work, are going to prefer the XML syntax; some are going to prefer the HTML syntax. Having both available makes learning HTML easy for both groups. And I'd be surprised if _anybody_ naturally thinks like an SGML parser. > The fascination some get from the idea that certain end tags can be > left out, to me seems a bit reminiscent of the fascination some > pioneering programmer once got when he said "eureka, I can express > every year throughout eternity with just two digits,... or at least > the important ones." That's thinking about it backwards; it's thinking about it from an expert insider's point of view; we (people on this list) know HTML well enough to realize it works like that. A beginner doesn't (necessarily) think "there's a </body> here but I'm allowed to omit it"; he simply doesn't even get as far as thinking that the browser needs to be told where <body> ends. Or even starts, since <body> is optional too. Which is great, cos it means a beginner doesn't even need to be told about the concept of <body>; they can just write HTML content and have it do the right thing. Note the evolution of HTML: it wasn't that we started with an XHTML syntax and then somebody realized that because some tags could be unambiguously omitted, and 'advanced' feature was added to cope without them; instead it was that those developing HTML early on saw no reason to include the unnecessary tags. Presumably if it'd been easier for humans had those optional tags been there, they would have been included. > This later led to some problems. Those (2-digit date) problems were because storing 1969 as 69 suffers a loss of information; optional tags in HTML has no such problem, because it's unambiguous what the assumed content is. > I think now we're seeing similar problems with the optional omission > of close tags: not the least of which we're finding our HTML > serialization cannot be as expressive as our xml serialization. As > examples, the discussion over tying to improve the <img> syntax Even if HTML documents going forwards are always written with <img/> closed, there's still the backwards compatability problem because of all the <img> tags out there. They could be either opening an <img> element (that contains alternative content) and which will be closed by a later </img> tag; or they could mean <img/> but be written without the slash. In the HTML syntax there are: * elements such as <em> which must be closed explicitly; it's unambiguous that what follows is content of that element until </em> is reached * elements such as <td> which always have content, but optional closing tags; it's unambiguous what what follows is content of that element until there is a tag which marks the end of the cell (either explicitly, or by starting another cell) * elements such as <hr> which can never have content; whether written as <hr> or <hr/> it's always unambiguous that the element has ended, and what follows is always a sibling, never child content Allowing <img> to take optional content while also maintaining backwards compatibility with standalone 'unclosed' <img> elements introduces an ambiguity in "<img>text". Unfortunately that ambiguity exists whether or not HTML5 insists on XML-style closed tags for all new content, so your proposal does not help improve the expressiveness of HTML syntax versus XHTML syntax. > Also, as Henri just raised, the desire to include foreign namespaces > in the HTML serialization is complicated by the lack of closing tags. In the message I think you're referring to Henri said: For reasons of backwards compatibility, the we have only one namespace we can use and this section correctly designates exactly that namespace. ... As for foreign namespaces in the text/html serialization, I think the matter of serializing MathML and SVG in text/html has not been pursued far enough yet and is still worth pursuing by this WG. It isn't immediately obvious to me how a lack of closing tags in newly written HTML content is complicating things; please can you elucidate? > I think simply the presence of XML and XHTML has led to greater > awareness among authors of ill-formedness issues and invalidity. Its > difficult to communicate proper nesting to authors while > simultaneously trying to communicate the benefits of certain tags > being omitted. We do not need to communicate benefits in omitting certain tags! Allowing people who prefer to omit them to do so is sufficient reason for for doing so; there's no need to try to persuade those who prefer to include them into _not_ doing so! Neither syntax has to be superior to the other. Smylers
Received on Friday, 6 July 2007 17:16:49 UTC