W3C home > Mailing lists > Public > www-html@w3.org > April 2007

Re: Semicolon after entities

From: David Dorward <david@dorward.me.uk>
Date: Tue, 24 Apr 2007 23:15:21 +0100
Message-ID: <462E8179.50000@dorward.me.uk>
To: www-html@w3.org

Mike S wrote:
> The W3C validator (using HTML 4.01 Transitional) says that a & in a URL
> should be encoded as &amp;. I don't think that this should be required.

& should be encoded as &amp; except in attribute values which represent
URLs?

Please, no! Simplicity is a virtue, and exceptions are the enemy of
simplicity.

> For one thing, I like to keep my code neat with as few entities as
> possible, and having to encode &'s all the time doesn't really help
> that.

Your options include using an authoring tool that does it for you, or
using semi-colons instead (most form data parsing libraries I've
encounted respect the advice of HTML 4.01:
http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2

> Another (more important) reason is that an entity is not recognized as
> an entity unless it starts with &, and ends with a semicolon.

If I remember correctly, that is not true. The semi-colon is optional
where a non-name character is present. So ?foo=bar&amp=12 is an HTML
representation of ?foo=bar&=12.

I'm not a big fan of this and would rather the semi-colon is required
(as it is in XML based languages) for the reasons mentioned above
(simplicity).

> A URL such
> as the one in <a href="somepage.php?foo=1&copy=2"> has the string
> '&copy' in it, but it has no trailing semicolon and therefore should not
> recognized as an entity in a browser. (I just tested this in Firefox,
> and it does indeed convert &copy to a copyright symbol, but I see this
> as incorrect behavior as the HTML spec itself states that "In SGML, it
> is possible to eliminate the final ";" after a character reference in
> some cases (e.g., at a line break or immediately before a tag).", and
> inside an attribute value is not a line break or before a tag.)

Those "some cases" include, I believe, "if the next character is a
non-name character such as an equals sign". The example was just that,
not a complete list of circumstances.

> (If I'm wrong, and it is actually legal to omit the ; after an entity,
> then perhaps it should be required to stop confusion like this?)

It is in XHTML.

-- 
David Dorward                               <http://dorward.me.uk/>
Received on Tuesday, 24 April 2007 22:15:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:09 GMT