W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 1999

Re: Web Magazine featuring Accessibility issues

From: Christopher R. Maden <crism@exemplary.net>
Date: Wed, 20 Oct 1999 00:46:33 -0700
Message-Id: <v01530501b43320feea0c@[209.157.134.12]>
To: w3c-wai-ig@w3.org
[Gregory J. Rosmaita]
>as Bruce pointed out, the way to avoid this error message is to substitute
>"&amp;" for all of the ampersands in the linkpopularity links and any other
>link that employs ampersands in the URL encoding...
>
>my thanks to gerald oskoboiny (the maintainer of the W3C HTML validation
>service [3], and the creator of it's predecessor of honored memory, the
>Kinder Gentler Validator) for clarifying this for me some time ago when i
>emailed him with a similar question...  i only wish i remember his
>excellent explanation as to why the use of unescaped/un-charsetted URL
>ampersands is a bad idea and a bad authoring practice...

It's very simple (modulo the pedant's note below).  An ampersand starts an
entity reference.  '&copy;' is a reference to the entity "copy"; '&foo;' is
a reference to the entity "foo".  In HTML, the former is defined as
character 169, while the latter is undefined and thus an error.  If you
have an ampersand in an attribute value, including in an href attribute, it
signals an entity reference.  Some browsers ignore references to undefined
entities (e.g., "&foo;" is left alone), but they all (since Netscape 1.1 or
so) resolve references to known entities (like "&amp;").  The semicolon at
the end of the entity name is optional; any non-name-character (i.e.,
anything other than letters, numbers, hyphen, or period) ends the name.  So
an attribute value of "http://ieee.org/ohm.cgi?volt=3&amp=5" will be
interpreted as having a reference to the "amp" entity, and will be resolved
as "http://ieee.org/ohm.cgi?volt=3&=5", which isn't right.  You'd need to
enter this as "http://ieee.org/ohm.cgi?volt=3&amp;amp=5".

This conflict between HTML entities and CGI delimiters has long been noted;
every definition of HTML since RFC 1866 has strongly recommended that
semicolons be used instead of ampersands as CGI argument separators.  When
creating a CGI URL as an attribute value, check if semicolons are accepted
by the script.  If they're not, yell at the author (or fix your code); if
they are, use them.  (Perl's CGI module accepts either ampersands or
semicolons.)

[Pedant's note: Ampersands are recognized as what SGML calls a "delimiter
in context"; an ampersand must be followed by a name start character (a
letter) to be recognized as an entity reference.  &9xxx, &.25sdf, and
&=132kj are not recognized.  If the ampersand is followed by a letter, then
everything from the ampersand to (a) a semicolon (inclusive), (b) the end
of the line (inclusive), or (c) a non-name character (something other than
a letter, number, hyphen, or period) (exclusive) is recognized as part of
the entity reference.  Many browsers, however, take as much as they
recognize; e.g., &ltri; will be taken as a less-than sign followed by "ri;"
rather than a reference to the possibly unknown entity "ltri".  This is
incorrect behavior.]

-Chris

--
Christopher R. Maden, Solutions Architect
Exemplary Technologies
One Embarcadero Center, Ste. 2405
San Francisco, CA 94111
Received on Wednesday, 20 October 1999 03:48:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:13:45 GMT