Re: Fragment identifiers and 'ID' attribute values: case matters

Bjoern Hoehrmann wrote:

> No, please see http://www.w3.org/TR/html4/struct/links.html#h-12.2.1
> 
> [...]
>   * String matching: Comparisons between fragment identifiers and
>     anchor names must be done by exact (case-sensitive) match.
> [...]

I read that section of the specification and could find nothing that legitimately contradicts 
my point.  Perhaps the intent of the authors was otherwise, but taking the prose at face 
value and respecting the normative status of the reference to ISO 8879 (the SGML 
specification), I stand by my case.

A possibly confusing but crucial point in this discussion is the relationship between 
attribute values and their attribute value specifications.  An attribute value specification, 
which is what appears in a start tag, may require some processing to derive the attribute 
value, which exists only in memory in the application.  This is roughly comparable to the 
way a CSS processor derives computed property values from specified property values.  
For example, character references and entity references in an attribute value 
specification need replacement.  Another form of processing that an SGML parser must 
do to attribute value specifications for attributes with declared values of 'ID' is the 
character substitution as specified in the SGML declaration.  HTML4's SGML 
declaration has chosen to enable this character substitution and to replace lower-case 
English letters with their upper-case equivalents.  Thus, HTML4 'ID' attribute value 
specifications may be in any case or case combination while the resultant values are 
always and completely in upper case.  I quote from HTML4 section 12.2.1:

    An anchor name is the value of either the name or id
    attribute when used in the context of anchors.

The anchor name is not the attribute value specification, but the attribute value.  It is 
against this string that a fragment identifier must match.

The situation is further confused by the 'NAME' attribute.  With a declared value of 
'CDATA', the attribute value specification of the 'NAME' attribute undergoes no 
character substitution and may have lower-case letters in its value.  If I have an HTML4 
element with the start tag <A ID="AnElement">, the correct fragment identifier is 
"#ANELEMENT".  If I have an HTML4 element with the start tag
<A NAME="AnElement">, the correct fragment identifier is "#AnElement".

One of the impacts of this situation is that the start tag
<A ID="AnElement" NAME="AnElement"> is illegal in HTML4 as it produces two anchor 
names differing only in case.

> Maybe you like to refer to
> 
>   http://www.w3.org/2002/02/mid/20010706174702X.mimasa@w3.mag.keio.ac.jp
> 
> for additional information on this contradiction in the specification.

So far as I can tell, there is no contradiction.  Rather, there are some tricky and implicit 
requirements and results that elude not only beginners but also most of us immersed in 
HTML.  This regrettable state of affairs is probably inevitable when one tries to reconcile 
a lengthy and rigorous ISO standard with the loose practices of tag soup and the 
software which eats it.

In any case, I reiterate my suggestion that the Working Group either switch to XHTML or 
give attribute value specifications in upper case.  Taking either path will eliminate the 
ambiguity.

-- 
Etan Wexler <mailto:ewexler@stickdog.com>

Received on Sunday, 26 May 2002 04:40:34 UTC