W3C home > Mailing lists > Public > public-qa-dev@w3.org > April 2011

validator bug: id values compared case-insensitively

From: Michael[tm] Smith <mike@w3.org>
Date: Sat, 23 Apr 2011 18:38:42 +0900
To: public-qa-dev@w3.org
Message-ID: <20110423093841.GC33722@sideshowbarker>
In normal HTML mode (non-XML, non-HTML5), the markup validator is checking
HTML id values case-insensitively. As far as I can see according to the HTML4
spec, it shouldn't be -- it should instead check them case-sensitively.

But before raising a validator bug for it, I wanted to double-check here to
make sure I'm not misunderstanding something about HTML4, and also to find
out whether this is a known issue that's already been reported (if so, I
couldn't find anything for it in W3C bugzilla).

Anyway, here are the details -

The HTML4 spec has "id = name [CS]", where The [CS] anno means "case
sensitive":

  http://www.w3.org/TR/html4/struct/global.html#adef-id

The following is a minimal document which you can be used to test
case-insensitive id-value checking -

  http://people.w3.org/mike/bugs/id-check.html

That document has just two paragraphs:

  <p id="foobar">
  <p id="fooBar">

If you run it through the W3C validator, you'll get this error message:

  Line 5, Column 8: ID "FOOBAR" already defined

So from the "FOOBAR" in that error message, I assume the validator must be
upper-casing id values before it compares them. Which it should not be.

For a real-world document[1] that illustrates the same problem, see:

  http://dev.w3.org/html5/spec-author-view/named-character-references.html

For that document, the validator reports instances of the "ID X already
defined" error 403 times...

  --Mike

[1] FYI for anybody who's curious, that named-character-references.html
document is a list of all the named character references supported in
HTML5; it's essentially the same set defined in the "XML Entity Definitions
for Characters" rec http://www.w3.org/TR/xml-entity-names/ -- and uses ids
that match the names. So because there is, e.g, both an entity named
"Downarrow" (U+21D3) and an entity named "DownArrow" (U+2193), those are
what end up in the id values in the doc.

-- 
Michael[tm] Smith
http://people.w3.org/mike
Received on Saturday, 23 April 2011 09:38:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 23 April 2011 09:38:51 GMT