- From: Christopher R. Maden <crism@exemplary.net>
- Date: Wed, 20 Oct 1999 00:46:33 -0700
- To: w3c-wai-ig@w3.org
[Gregory J. Rosmaita] >as Bruce pointed out, the way to avoid this error message is to substitute >"&" for all of the ampersands in the linkpopularity links and any other >link that employs ampersands in the URL encoding... > >my thanks to gerald oskoboiny (the maintainer of the W3C HTML validation >service [3], and the creator of it's predecessor of honored memory, the >Kinder Gentler Validator) for clarifying this for me some time ago when i >emailed him with a similar question... i only wish i remember his >excellent explanation as to why the use of unescaped/un-charsetted URL >ampersands is a bad idea and a bad authoring practice... It's very simple (modulo the pedant's note below). An ampersand starts an entity reference. '©' is a reference to the entity "copy"; '&foo;' is a reference to the entity "foo". In HTML, the former is defined as character 169, while the latter is undefined and thus an error. If you have an ampersand in an attribute value, including in an href attribute, it signals an entity reference. Some browsers ignore references to undefined entities (e.g., "&foo;" is left alone), but they all (since Netscape 1.1 or so) resolve references to known entities (like "&"). The semicolon at the end of the entity name is optional; any non-name-character (i.e., anything other than letters, numbers, hyphen, or period) ends the name. So an attribute value of "http://ieee.org/ohm.cgi?volt=3&=5" will be interpreted as having a reference to the "amp" entity, and will be resolved as "http://ieee.org/ohm.cgi?volt=3&=5", which isn't right. You'd need to enter this as "http://ieee.org/ohm.cgi?volt=3&amp=5". This conflict between HTML entities and CGI delimiters has long been noted; every definition of HTML since RFC 1866 has strongly recommended that semicolons be used instead of ampersands as CGI argument separators. When creating a CGI URL as an attribute value, check if semicolons are accepted by the script. If they're not, yell at the author (or fix your code); if they are, use them. (Perl's CGI module accepts either ampersands or semicolons.) [Pedant's note: Ampersands are recognized as what SGML calls a "delimiter in context"; an ampersand must be followed by a name start character (a letter) to be recognized as an entity reference. &9xxx, &.25sdf, and &=132kj are not recognized. If the ampersand is followed by a letter, then everything from the ampersand to (a) a semicolon (inclusive), (b) the end of the line (inclusive), or (c) a non-name character (something other than a letter, number, hyphen, or period) (exclusive) is recognized as part of the entity reference. Many browsers, however, take as much as they recognize; e.g., ◃ will be taken as a less-than sign followed by "ri;" rather than a reference to the possibly unknown entity "ltri". This is incorrect behavior.] -Chris -- Christopher R. Maden, Solutions Architect Exemplary Technologies One Embarcadero Center, Ste. 2405 San Francisco, CA 94111
Received on Wednesday, 20 October 1999 03:48:01 UTC