Re: [VE][141] Validator throws "not-unique" error on Case Sensitive IDs

On Fri, 26 Jan 2007, John Lascurettes wrote:

> This paragraph has attribute id="Unique"
>
> This paragraph has attribute id="uNIQUE"
>
> The Validator throws this error:
> ID "UNIQUE" already defined.

The validator is correct. The HTML specification is misleading.

> Shouldn't they be considered not the same ID since the ID attribute is
> defined as case sensitive

It's labeled "CS" (for Case Sensitive), but that's really just an 
annotation, especially when we consider that formally HTML 4.01 
specification defines the language as an SGML application and declares the 
id attribute as being of type ID. By SGML rules, ID values are internally 
converted to upper case. (You need to read the SGML standard carefully to 
find this, but there's a hint: it's mentioned in the tutorial annex in the 
"SGML Handbook" on p. 52.)

This explains the spelling "UNIQUE" in the error message, too.

The prose in the HTML 4.01 specification gives a wrong impression. It's 
also more or less self-contradictory, since at
http://www.w3.org/TR/html4/struct/links.html#h-12.2.1
it says:

"An anchor name is the value of either the name or id attribute when used 
in the context of anchors. Anchor names must observe the following rules:
- Uniqueness: Anchor names must be unique within a document. Anchor names 
that differ only in case may not appear in the same document.
- String matching: Comparisons between fragment identifiers and anchor 
names must be done by exact (case-sensitive) match."

Since the names are converted to upper case by SGML rules, the case issue 
cannot arise. We can think that the second item wants to say that a 
reference like href="#foo" should refer to an anchor with the literal 
spelling "foo" (and not "FOO" which is its _meaning_).

This is of little practical value since browsers never really implemented 
HTML as an SGML application in issues like this. They use some matching 
routines souped up by someone who didn't really bother reading the SGML 
standard (either). But the bottom line is that browsers could do 
this either way, so just don't use id values or other anchor names that 
differ in case only.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Friday, 26 January 2007 22:56:37 UTC