W3C home > Mailing lists > Public > www-html@w3.org > April 2007

Re: Semicolon after entities

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Sat, 28 Apr 2007 20:41:31 +1000
Message-ID: <463324DB.80308@lachy.id.au>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
CC: www-html@w3.org

Jukka K. Korpela wrote:
> On Sat, 28 Apr 2007, Lachlan Hunt wrote:
>> if Lynx supports entity references that are not defined in HTML4, and 
>> not supported by any other browser either, that's a bug in Lynx.
> 
> For some odd reason, Lynx displays "&par;" as "PP". It's not the only 
> browser that recognizes references for entities not defined in HTML 4.

Which UAs and which entity references?

> It's not a bug, because there is no mandatory error processing.

Lack of defined error handling is one of the most serious issues with 
HTML4, and in reality, at least as far as interoperability is concerned, 
HTML4 is irrelevant.

HTML5 is defining error handling for entity references, which is based 
upon the error handling used by the major browsers.  It would be 
sensible for lynx to implement HTML parsing more interoperably with 
other UAs, and the best chance they have of doing that, is following HTML5.

> When a browser sees, say, &emdash; or &MDASH;, it may - as far as HTML 4 
> specifications are concerned - apply any error handling it likes,
> including implicit fix to &mdash;.

What?  Are you saying that &emdash; and &MDASH; should be silently 
treated the same as &mdash;, or am I misunderstanding you?

In this case, however, the reality is that major browsers output unknown 
entity references literally, without trying to expand them.  So &emdash; 
is treated equivalent to &amp;emdash;.  That is also how HTML5 defines 
error handling for it.

> We might even argue that this is the _best_ error processing strategy 
> in practice, since that's probably what the author meant, and if it
> isn't, we have little odds of achieving anything better using some
> including implicit fix to &mdash;.

Actually, in practice, when an author uses an undefined entity 
reference, it's usually because they forgot to encode & and &amp; and 
expect UAs to ouput it literally, exactly the way most browsers do. 
Applying some magic to work out that, for example, &emdash; means 
&mdash; is unworkable and not backwards compatible.

> Besides, &par; or &emdash; isn't really an error by SGML rules, which is 
> what HTML 4 is nominally based on. They are just undefined. :-)

SGML rules for HTML are irrelevant these days.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Saturday, 28 April 2007 10:41:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:09 GMT