Re: Semicolon after entities

On Sun, 29 Apr 2007, Lachlan Hunt wrote:

> Considering that none of the major browsers support those, there probably 
> isn't a significant [amount] of content in existence that relies [on] them.

Since we have a notation like &emdash; which has no defined meaning by 
existing HTML specs and has been used and described in the past as 
denoting the em dash, and since this notation may spontaneously arise from 
a typo or misremembering the name, is there really any reason
a) not to display it as if — had been countered
b) to declare behavior a) as incorrect?

>> Yet HTML 4 is the closest that we have to a useful "standard" on HTML. Or 
>> would you rather use ISO HTML? :-)
>
> A spec's official status shouldn't be given much weight in light of evidence 
> that shows the spec is irrelevant in the real world.

So what _do_ you use as the current HTML specification? Do you construct 
it from the observed behavior of web browsers? And what do you do when 
browsers disagree? The market leader takes all? Your favorite browser 
takes all?

> The sensible definition for correct behaviour, is the behaviour that is 
> required to be compatible with the web and interoperable with other browsers.

This seems to be a contrived way of saying that in your opinion, correct 
behavior is what most browsers do. Finally a way to make most browsers 
comply! :-) Or maybe not, since they differ too much.

"Compatible with the web" seems to be your pet phrase. It sounds 
impressive, but what does it mean?

"Interoperability with other browsers" is not a reasonable criterion. 
First, browsers don't really interoperate. Second, what you really mean is 
working the same way as other browsers. What's the point of having 
different browsers if they are all required to work the same way and this 
is even used as the _definition_ of correctness?

This will get horrendously complex. Presumably you don't think that 
Firefox and Opera and IE 7 are grossly incorrect since they differ in 
_many_ ways from IE 6, which is still the most common browser. It seems 
that you are not consistent with the idea of defining correctness in terms 
of what browsers now actually do. And that's natural, since otherwise any 
real progress would be prohibited.

> As I see it, there are 3 approaches to error handling that the spec could 
> take:
>
> 1. Leave error handling undefined, like HTML4 and XHTML2.
>
> That is clearly unacceptable, because it just leads to the situation we are 
> in now, where browsers have spent years reverse engineering each other.

No, the fact that browsers have imitated each other (most importantly, 
other browsers have imitated IE) is _not_ a result of undefined error 
handling. I can't see your logic here.

> 2. Draconian error handling, or at least handling that inflicts mysterious 
> error messages upon unsuspecting users.

Nobody wants that. Or those few who want that will use browsers or add-ons 
or tools that do error checking and reporting.

> 3. Graceful error handling, where exact processing is defined in a way that 
> is compatible with the web and all UAs can implement it interoperably.

If you define exact processing and make it mandatory, how will the 
situation differ from defining the errors as language features? Except in 
wording, that is.

>>> In this case, however, the reality is that major browsers output 
>>> unknown entity references literally, without trying to expand them. 
>>> So &emdash; is treated equivalent to &emdash;.  That is also how 
>>> HTML5 defines error handling for it.
>> 
>> Is that useful?
>
> It doesn't matter if it's the most theoretically useful output,

"Useful" is a practical concept.

> it's what browsers do now,

Mostly, yes.

> and changing such behaviour could potentionally 
> result in billions of pages breaking.

No, there aren't billions of pages with &emdash; in them. Those that have 
it _mean_ the em dash, so most browsers display the page as broken, i.e. 
as contrary to what the author surely meant. Some browsers do otherwise,
and HTML5 wants to prohibit that.

>> The odds that the author wanted such a display are very small.
>
> Perhaps in this one case, you could make that arugment, but in the general 
> case of &foo;, it's impossible to know what the author actually meant.

We need not treat all entity references as equal.

> How would it even be possible to implement in the general case? Sure, UAs 
> could easily hard code "emdash" and possibly a few other cases, but there are 
> hundreds of entity references and even more ways of slightly mistyping them.

Indeed. And since the documents are non-conforming in an essential way, it 
is quite acceptable that different browsers treat them differently, 
applying different error processing methods and guessing differently.

> If such behaviour were to be implemented, the precise algorithm would need to 
> be specced.

No, definitely not. Error handling is an area where different strategies 
can be applied.

> I called it a bug based on the reality of the 
> situation, which is that Lynx's behaviour in this case is incompatible with 
> that of every other browser I tested.

The you use the word "bug" to denote error processing that deviates from 
the error processing in browsers known to you. This isn't a useful 
definition for "bug", and it conflicts with the general idea that a bug is 
a failure to work as _required_ by a specification.

> Every single HTML spec in existence from HTML 2.0 to HTML 4.01 and XHTML 1.0, 
> 1.1 and 2.0, regardless of their official status, either is, or is very close 
> to being, irrelevant in the real world.

That's nonsense and you know it. Just because they contain unimplemented 
features doesn't make them irrelevant.

> Regardless of what you may think, and regardless of its official status, 
> HTML5 is the only really relevant HTML spec in existence for implementers 
> these days.

Despite not existing? It's not even close to a draft specification, just a 
discussion document.

>> There will be little interest in it by most authors, if the dominant 
>> browser will not conform to it or make any serious attempt at conformance. 
>> It might start the next round of browser wars, though.
>
> The development of the HTML5 spec has the support of at least 4 major browser 
> vendors (IE, Mozilla, Opera and Safari).  None of them are interested in 
> another round of browser wars.

Really? Where can I read Microsoft's commitment to HTML5?

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Sunday, 29 April 2007 14:31:33 UTC