- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Mon, 30 Apr 2007 23:02:44 +1000
- To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
- CC: www-html@w3.org
Jukka K. Korpela wrote: > On Sun, 29 Apr 2007, Lachlan Hunt wrote: >> Considering that none of the major browsers support those, there >> probably isn't a significant [amount] of content in existence that >> relies [on] them. > > Since we have a notation like &emdash; which has no defined meaning by > existing HTML specs and has been used and described in the past as > denoting the em dash, and since this notation may spontaneously arise > from a typo or misremembering the name, is there really any reason > a) not to display it as if — had been countered > b) to declare behavior a) as incorrect? If that error handling were to be defined and implemented, it would require significant research to determine the cost and benefit of making the change. Considering that none of the 4 major browsers implement it that way, it strongly suggests that they do not consider it important for compatibility with content on the web. Consistent error handling among browsers is also more important than logical error handling. >>> Yet HTML 4 is the closest that we have to a useful "standard" on >>> HTML. Or would you rather use ISO HTML? :-) >> >> A spec's official status shouldn't be given much weight in light of >> evidence that shows the spec is irrelevant in the real world. > > So what _do_ you use as the current HTML specification? Do you construct > it from the observed behavior of web browsers? That is effectively the process that was used to define the parsing requirements for HTML5. > And what do you do when browsers disagree? The market leader takes all? > Your favorite browser takes all? Research is done on a case by case basis to determine the most sensible approach to take. Many factors are taken into consideration, such as the impact that a change would have on existing content, the complexity and the ability of browser vendors to implement it. See, for instance, some Hixie's articles discussing the way browsers handle various markup errors, which shows just some of the extensive research that has been done to develop the spec over the years. Tag Soup: How UAs handle <x> <y> </x> </y> http://ln.hixie.ch/?start=1037910467&count=1 Tag Soup: Crazy parsing adventures http://ln.hixie.ch/?start=1137740632&count=1 Tag Soup: Blocks-in-inlines http://ln.hixie.ch/?start=1138169545&count=1 Tag Soup: appendChild() of a script that calls document.write() http://ln.hixie.ch/?start=1155195074&count=1 Tag Soup: innerHTML interoperability (or lack thereof) http://ln.hixie.ch/?start=1158969783&count=1 >> The sensible definition for correct behaviour, is the behaviour that >> is required to be compatible with the web and interoperable with other >> browsers. > > This seems to be a contrived way of saying that in your opinion, correct > behavior is what most browsers do. Finally a way to make most browsers > comply! :-) Or maybe not, since they differ too much. In theory, it would be really nice if browsers could implement the existing specs exactly as they are written. However, in reality, that simply is not possible and there's no point trying to be idealistic in the face of reality. > "Compatible with the web" seems to be your pet phrase. It sounds > impressive, but what does it mean? It means being able to parse and render the existing content on the web in a way that meets the expectations of both users and authors, which are based upon their experiences with legacy browsers. > "Interoperability with other browsers" is not a reasonable criterion. > First, browsers don't really interoperate. That is a serious problem that HTML5 is trying to address. > Second, what you really mean is working the same way as other browsers. Yes, that's what interoperability means. > What's the point of having different browsers if they are all required > to work the same way [...]? To improve competition in the browser market, which in turn promotes choice and innovation. Think about this from a users' perspective. You should be able to choose what browser you wish to use, for whatever reason you like. It could be easy to use, have useful extensions, good looking themes, or whatever. Just take a look at how many Gecko based browsers there are, all of which offer different features. The rendering engine isn't the most important feature in the eyes of typical users. Users just expect their browser to display pages well and they shouldn't have to use different browsers for accessing different sites. The reality of the situation is that a significant portion of the web is built using extremely broken HTML, yet most users simply do not know or care about that. Browsers have a responsibility to handle errors in pages because it's what users implicitly expect. When different browsers handle errors differently, it increases the chances of pages working in some browsers, while being incredibly broken in others. That limits, or even prevents, the users ability to choose the browser they want to use. > This will get horrendously complex. Presumably you don't think that > Firefox and Opera and IE 7 are grossly incorrect since they differ in > _many_ ways from IE 6, which is still the most common browser. All browsers have bugs, but we need to determine the most reasonable behaviour based on the way existing browsers work. >> As I see it, there are 3 approaches to error handling that the spec >> could take: >> >> 1. Leave error handling undefined, like HTML4 and XHTML2. >> >> That is clearly unacceptable, because it just leads to the situation >> we are in now, where browsers have spent years reverse engineering >> each other. > > No, the fact that browsers have imitated each other (most importantly, > other browsers have imitated IE) is _not_ a result of undefined error > handling. Yes it is a direct result of undefined error handling. Authors build pages that rely on bugs in whatever browser they use, so in the eyes of the author, the buggy browser behaviour is correct. That has been happening since the early days of the web. When a user tries to view such pages in their browser, that doesn't have the same bugs that are relied upon in the authors browser, the page breaks. That is exactly why browser vendors have been reverse engineering each other. > I can't see your logic here. If you think I'm wrong, then please explain why you think browsers have reversed engineered each other? >> 3. Graceful error handling, where exact processing is defined in a way >> that is compatible with the web and all UAs can implement it >> interoperably. > > If you define exact processing and make it mandatory, how will the > situation differ from defining the errors as language features? Except > in wording, that is. In some cases, we did exactly that. For example, using the XML empty element syntax like <br/> was initially considered an error in HTML5 and had completely different processing on HTML4. But considering that the practice is so widespread, there are many authoring tools that output that syntax which could be costly to fix and upgrade, its use is harmless in reality, and some authors actually like using it, it was decided to make the syntax conforming. So authors can either use <br> or <br/> (and similarly for other empty elements). Another example is that charactrers like '/' can now be used in unquoted attribute values. e.g. <a href=http//example.com/>...</a>. In HTML4, that was an error, but no browser handled it according to SGML rules. There is no benefit in disallowing such a widely supported and used syntax, so it too was made conforming. Yet there are things that are clearly errors. For example, omitting an end tag from an element that doesn't allow it. For inline elements, compatibility restricitions require that unclosed elements get reopened after their parent closes. Although such errors occur very often, the required error handling isn't particularly sensible and it's often not what the author intended (though, sometimes it is, as in the case of <b><i>foo</b></i>) >>>> In this case, however, the reality is that major browsers output >>>> unknown entity references literally, without trying to expand them. >>>> So &emdash; is treated equivalent to &emdash;. That is also how >>>> HTML5 defines error handling for it. >>> >>> Is that useful? >> >> It doesn't matter if it's the most theoretically useful output, > > "Useful" is a practical concept. Usually, yes. But the error handling you're advocating for &emdash; is probably not practical in the real world (though, as I said above, it would require research to know for sure). >> it's what browsers do now, > > Mostly, yes. > >> and changing such behaviour could potentionally result in billions of >> pages breaking. > > No, there aren't billions of pages with &emdash; in them. Those that > have it _mean_ the em dash, so most browsers display the page as broken, > i.e. as contrary to what the author surely meant. Some browsers do > otherwise, and HTML5 wants to prohibit that. At least by defining the correct behaviour, the error handling will be consistent in all browsers, even if the result isn't exactly what the author intended. If it's not, the author should fix it. >> If such behaviour were to be implemented, the precise algorithm would >> need to be specced. > > No, definitely not. Error handling is an area where different strategies > can be applied. Wrong! Error handling is one of the most important things to do interoperably. Consider CSS, which does define precisely how to handle errors gracefully. That is one case where the spec got it right and many browsers to handle syntax errors interoperably (even though some browsers haven't quite got there yet). >> Every single HTML spec in existence from HTML 2.0 to HTML 4.01 and >> XHTML 1.0, 1.1 and 2.0, regardless of their official status, either >> is, or is very close to being, irrelevant in the real world. > > That's nonsense and you know it. Just because they contain unimplemented > features doesn't make them irrelevant. Those specs contain unimplemented features because they cannot be implemented. Some things are left undefined, others are defined in ways that aren't compatible with the content on the web. As far as implementers are concerned, they cannot implement any one of those specs exactly as written and expect to be usable in the real world. >> Regardless of what you may think, and regardless of its official >> status, HTML5 is the only really relevant HTML spec in existence for >> implementers these days. > > Despite not existing? It's not even close to a draft specification, just > a discussion document. It is a draft specification, despite not being published on w3.org yet. Browsers are much closer to implementing it, than they are to implementing HTML4 properly. >>> There will be little interest in it by most authors, if the dominant >>> browser will not conform to it or make any serious attempt at >>> conformance. It might start the next round of browser wars, though. >> >> The development of the HTML5 spec has the support of at least 4 major >> browser vendors (IE, Mozilla, Opera and Safari). None of them are >> interested in another round of browser wars. > > Really? Where can I read Microsoft's commitment to HTML5? Microsoft have several representatives in the HTMLWG, including Chris Wilson who is a co-chair, so they are clearly in support of the development of HTML. Chris indicated that he has no problem with using the WHATWG's work as the basis for the HTMLWG. Even though he doesn't agree with everything in the spec as is, he said it would be a disservice to not make use of it. http://lists.w3.org/Archives/Public/public-html/2007Apr/1240.html -- Lachlan Hunt http://lachy.id.au/
Received on Monday, 30 April 2007 13:02:57 UTC