W3C home > Mailing lists > Public > www-validator@w3.org > May 2004

Re: validator problem: bogus "entity" in URL

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 8 May 2004 20:35:04 +0000 (UTC)
To: David Dorward <david@dorward.me.uk>
Cc: www-validator@w3.org
Message-ID: <Pine.LNX.4.58.0405082027190.9965@dhalsim.dreamhost.com>

On Sat, 8 May 2004, David Dorward wrote:
>
> The validator even tells you exactly how to fix the problem:
>   If you wish to cause an "&" to appear within text _or_a_URL_, escape
> it by using "&amp;".
> ... and it does this directly under the error message.

I just tried this by validating this page:

   http://junkyard.damowmow.com/133

...as seen here:

   http://validator.w3.org/check?uri=http%3A%2F%2Fjunkyard.damowmow.com%2F133&charset=%28detect+automatically%29&doctype=%28detect+automatically%29

This page contains only one error, namely an unescaped ampersand:

     <p><a href="?test&test">Test</a></p>
                      ^^^^^
...but what I see when I validate the page is:

   1.  Line 9, column 20: cannot generate system identifier for general
       entity "test"
       <p><a href="?test&test">Test</a></p>
                         ^
   2.  Line 9, column 20: general entity "test" not defined and no default
       entity
       <p><a href="?test&test">Test</a></p>
                         ^
   3.  Line 9, column 24: reference to entity "test" for which no system
       identifier could be generated
       <p><a href="?test&test">Test</a></p>
                             ^
   4.  Line 9, column 19: entity was defined here
       <p><a href="?test&test">Test</a></p>
                        ^

This is quite appalling and unhelpful. As an author I would find the
following single error significantly more helpful:

   1.  Line 9, column 19: unescaped ampersand. "&" characters must be
       written as "&amp;" (even in URIs).
       <p><a href="?test&test">Test</a></p>
                        ^
...except if a semicolon was found following the & and some alphanumeric
characters (with no other punctuation or whitespace between the ampersand
and the semicolon), in which case I would prefer:

   1.  Line 9, column 19: unrecognised entity. See _HTML4_section_24_ for
       a full list of recognised entities.
       <p><a href="?test&test;test">Test</a></p>
                        ^^^^^^
HTH,
-- 
Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 8 May 2004 16:35:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:13 GMT