- From: Robert Burns <rob@robburns.com>
- Date: Fri, 6 Jul 2007 18:53:13 -0500
- To: Smylers <Smylers@stripey.com>
- Cc: HTML WG <public-html@w3.org>
On Jul 6, 2007, at 12:16 PM, Smylers wrote: > > Robert Burns writes: > >> I can understand why someone might find xml-like syntax more human >> readable. > > Agreed. > >> I imagine that with practice, one gets more capable of coming up with >> the end-of-an-element in that manner. However, for novices, or those >> with a lot of experience reading xml-like HTM, or even just those who >> have trouble thinking like an SGML parser, I think leaving out >> closing >> tags is a human readability issue. > > I agree about those with lots of XML experience. > > I disagree that it's an issue for somebody just because she is a > novice > or has trouble thinking like an SGML parser. In fact, quite the > opposite: I suggest that for some novices this is going to be much > more > natural. Obviously the people in this group are not the same as the > people who prefer the XML syntax. > > For somebody new to HTML it makes sense to have to use, say, </em> to > mark where the emphasis should stop, because otherwise the browser > can't > know. It does not necessarily make sense to have to include </ > body> and > </html>: why should the browser need to be told that it's reached the > end, when it can see that perfectly well for itself for the simple > reason that there's no more content. You're now talking from the perspective of a browser (a machine processor) to justify how someone new to HTML might not need to see where an element ended. Its a lot to throw at someone that elements are bounded by start and end tags and then quickly add that some tags may be missing. Its much easier to simply include the all the tags (not for a machine, but for a person). > Note that if a beginner wants to put those closing tags on, that's no > problem; she can do so. The point of the HTML syntax is that it's > more > lax, more foregiving of unimportant differences. Yes, we all understand that here. We're not talking about machines though, we're talking about a person: and a novice to the language. > And putting quote marks round attributes is just one more thing for a > beginner to have to grasp, and remember. It slightly raises the > barrier > to entry, unnecessarily so. (And again, it doesn't matter if a > learner > does do it.) I personally don't reading HTML with optional quote marks at all difficult to read. However, its not at all a good idea to immediately burden a novice with all the places, they might be able to leave out the quotation marks. Its much easier to just tell them to include them always. > Some users, because of their background or the way their minds > work, are > going to prefer the XML syntax; some are going to prefer the HTML > syntax. Having both available makes learning HTML easy for both > groups. I've tried to explain how it can be difficult to read complex code with end tags omitted. Would you care to explain how including optional end tags when they aren't necessary for machine parsing makes it difficult to read for you? > And I'd be surprised if _anybody_ naturally thinks like an SGML > parser. I would say that someone who does prefer to read source with optional tags omitted is thinking a bit more like an SGML parser than I am able to do. >> The fascination some get from the idea that certain end tags can be >> left out, to me seems a bit reminiscent of the fascination some >> pioneering programmer once got when he said "eureka, I can express >> every year throughout eternity with just two digits,... or at least >> the important ones." > > That's thinking about it backwards; it's thinking about it from an > expert insider's point of view; we (people on this list) know HTML > well > enough to realize it works like that. > > A beginner doesn't (necessarily) think "there's a </body> here but I'm > allowed to omit it"; he simply doesn't even get as far as thinking > that > the browser needs to be told where <body> ends. Or even starts, since > <body> is optional too. Which is great, cos it means a beginner > doesn't > even need to be told about the concept of <body>; they can just write > HTML content and have it do the right thing. I don't see how that is great. It leaves authors using constructs they don't understand. It hides something mildly complicated from them in a paternalistic way that will only lead to more confusion in other ways: in particular confusion over ill-formedness, improper nesting and the like. > Note the evolution of HTML: it wasn't that we started with an XHTML > syntax and then somebody realized that because some tags could be > unambiguously omitted, and 'advanced' feature was added to cope > without > them; instead it was that those developing HTML early on saw no reason > to include the unnecessary tags. Presumably if it'd been easier for > humans had those optional tags been there, they would have been > included. My understanding was that this all predates HTML and is part of SGML. There the need to conserve on bits was much stronger than it is today. So they took steps to economize. This is where the two-digit year comes in. It was necessary to economize there too. Adding two more digits was a big deal. However, today reserving more data width for dates is ubiquitous. Similarly, the economizing by omitting optional tags is not as important anymore. It makes the language more accessible in that many more people can understand a language that doesn't economize on bits. >> This later led to some problems. > > Those (2-digit date) problems were because storing 1969 as 69 > suffers a > loss of information; optional tags in HTML has no such problem, > because > it's unambiguous what the assumed content is. Omitting tags also suffers a loss of information. The structure of a document has to be encoded into any processing UA. With explicit tags, a UA does not have to know up front that <anelement> hasn't yet ended until it sees the close tag </anelement>. It can ignore what it doesn't understand and simply process what it does understand. The entire prospect of adding arbitrary namespaces relies on including explicit opening and closing tags. >> I think now we're seeing similar problems with the optional omission >> of close tags: not the least of which we're finding our HTML >> serialization cannot be as expressive as our xml serialization. As >> examples, the discussion over tying to improve the <img> syntax > > > Allowing <img> to take optional content while also maintaining > backwards > compatibility with standalone 'unclosed' <img> elements introduces an > ambiguity in "<img>text". Yes because of an historical need to economize on tag usage. Just as the historical need to economize on year digits led to a need for processor to be hard-wired with explicit century processing. > Unfortunately that ambiguity exists whether or not HTML5 insists on > XML-style closed tags for all new content, so your proposal does not > help improve the expressiveness of HTML syntax versus XHTML syntax. Yes for legacy reasons. However, we should be thinking about ways to break from those legacy constraints in backwards compatible ways. >> Also, as Henri just raised, the desire to include foreign namespaces >> in the HTML serialization is complicated by the lack of closing tags. > > In the message I think you're referring to Henri said: > > For reasons of backwards compatibility, the we have only one > namespace > we can use and this section correctly designates exactly that > namespace. > > ... As for foreign namespaces in the text/html serialization, I > think > the matter of serializing MathML and SVG in text/html has not been > pursued far enough yet and is still worth pursuing by this WG. > > It isn't immediately obvious to me how a lack of closing tags in newly > written HTML content is complicating things; please can you elucidate? Explicit close tags allow UAs to process content from unfamiliar namespaces. The UA requires no prior knowledge of the namespace to at the very least ignore the content. This is a common mechanism for extensibility all over computing. >> I think simply the presence of XML and XHTML has led to greater >> awareness among authors of ill-formedness issues and invalidity. Its >> difficult to communicate proper nesting to authors while >> simultaneously trying to communicate the benefits of certain tags >> being omitted. > > We do not need to communicate benefits in omitting certain tags! > Allowing people who prefer to omit them to do so is sufficient reason > for for doing so; there's no need to try to persuade those who > prefer to > include them into _not_ doing so! > > Neither syntax has to be superior to the other. No, they both have there advantages and disadvantages. The advantages of the economizing syntax has been dwarfed by the advances in computing power over the last two decades. There may still be places where economizing is a good idea for optimization purposes. We could even add a more compact binary serialization to optimize further. However, in the context of an official document of this WG, I see no reason we should employ such optimizations at the expense of confusing authors who might actually turn to our work as their first example to understand HTML. Take care, Rob
Received on Friday, 6 July 2007 23:53:43 UTC