Semicolon after entities from Mike S on 2007-04-24 (www-html@w3.org from April 2007)

From: Mike S <mmiikkee13@gmail.com>
Date: Tue, 24 Apr 2007 16:20:01 -0400
To: www-html@w3.org
Message-ID: <cf9be6530704241320u7be52ae4o35c8ccd7fefc6c79@mail.gmail.com>

(I hope this is the right place to send suggestions for the next HTML
spec... it seems to be judging from some of the other messages here.)

The W3C validator (using HTML 4.01 Transitional) says that a & in a URL
should be encoded as &amp;. I don't think that this should be required. For
one thing, I like to keep my code neat with as few entities as possible, and
having to encode &'s all the time doesn't really help that. (Maybe I'm just
obsessive about clean code or something :-)

Another (more important) reason is that an entity is not recognized as an
entity unless it starts with &, and ends with a semicolon. A URL such as the
one in <a href="somepage.php?foo=1&copy=2"> has the string '&copy' in it,
but it has no trailing semicolon and therefore should not recognized as an
entity in a browser. (I just tested this in Firefox, and it does indeed
convert &copy to a copyright symbol, but I see this as incorrect behavior as
the HTML spec itself states that "In SGML, it is possible to eliminate the
final ";" after a character reference in some cases (e.g., at a line break
or immediately before a tag).", and inside an attribute value is not a line
break or before a tag.)

(If I'm wrong, and it is actually legal to omit the ; after an entity, then
perhaps it should be required to stop confusion like this?)

Received on Tuesday, 24 April 2007 22:08:47 UTC