Entity reference names

Pottering about with the validator today, up cropped a wee 
inconsistency. Perhaps unimportant in the grand scheme of things, and 
label me as a nitpicker of you like, but I believe it's worthy of a 
mention nonetheless.

To set the scene: if I have an entity reference with the name 
"foo:bar" ("&foo:bar;" for instance), the validator will correctly 
spot and report this as an error, since that entity isn't defined in 
HTML 4.01. That's fine and dandy; everything's running marvelously.

My somewhat pedantic dispute is with the error message displayed. 
Although it doesn't belie the reason for the error, it seems that 
there is a glitch. For an entity reference with the name "foo:bar", 
the error message is:

[...] cannot generate system identifier for general entity "foo

Notice that the entity name is cut short, the colon and the following 
name characters are sliced off. The validator seems to be choking on 
the colon, which is a valid name character, as we can see by looking 
at the SGML declaration of HTML 4.01:

NAMING   LCNMSTRT ""
          UCNMSTRT ""
          LCNMCHAR ".-_:"
          UCNMCHAR ".-_:"

The other additions to the set of possible name characters in HTML 
don't have the same symptom. Surely the message should indicate the 
entity name as "foo:bar"? Or are there some mystifying workings going 
on that have passed me by? Neither the WDG's validator nor Nick Kew's 
Page Valet show this symptom.

Feed this to the validator (I'll leave it up for some time):
  http://www.cis.strath.ac.uk/~jdunlop/entity-name.html

I'll subscribe to the list for a while too.

Cheers,

-- 
Jock

Received on Saturday, 17 May 2003 13:00:58 UTC