W3C home > Mailing lists > Public > www-html@w3.org > April 2007

Re: Semicolon after entities

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Wed, 25 Apr 2007 12:37:45 +1000
Message-ID: <462EBEF9.3020402@lachy.id.au>
To: Mike S <mmiikkee13@gmail.com>
CC: www-html@w3.org

Mike S wrote:
> (I hope this is the right place to send suggestions for the next HTML
>  spec... it seems to be judging from some of the other messages
> here.)

The preferred mailing list is the whatwg mailing list or the new 
public-html mailing list, though feedback sent here will be taken into 
account anyway.

> The W3C validator (using HTML 4.01 Transitional) says that a & in a 
> URL should be encoded as &amp;. I don't think that this should be 
> required.  For one thing, I like to keep my code neat with as few
> entities as possible, and having to encode &'s all the time doesn't
> really help that.

Why do you consider the use of entity references to be messy?

> Another (more important) reason is that an entity is not recognized
> as an entity unless it starts with &, and ends with a semicolon. A
> URL such as the one in <a href="somepage.php?foo=1&copy=2"> has the
> string '&copy' in it, but it has no trailing semicolon and therefore
> should not recognized as an entity in a browser.

Actually, according to SGML rules for HTML4, the semi-colon is optional 
in that case.  That is, in fact, one of the entity references for which 
browsers use the correct behaviour and expand it to a copyright symbol.

HTML5 has simplified the document conformance rules to require a 
semi-colon in all cases.

> (I just tested this in Firefox, and it does indeed convert &copy to a 
> copyright symbol, but I see this as incorrect behavior as the HTML 
> spec itself states that "In SGML, it is possible to eliminate the final 
> ";" after a character reference in some cases (e.g., at a line break or
> immediately before a tag).", and inside an attribute value is not a
> line break or before a tag.)

Those are just example situations when the semi-colon may be omitted, 
not an exhaustive list of all situations.  The '=' character is another 
case where it can be omitted per SGML rules, along with several others.

That behaviour needs to be retained for backwards compatibility so that 
sites using entity refs without semi-colons won't break.

Lachlan Hunt
Received on Wednesday, 25 April 2007 02:38:06 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:15 UTC