Re: Odd case sensitive rules regarding <A name="abc"> from Martin J. Duerst on 1998-02-18 (www-html@w3.org from February 1998)

From: Martin J. Duerst <duerst@w3.org>
Date: Wed, 18 Feb 1998 22:11:28 +0900
To: Ian Hickson <exxieh@bath.ac.uk>, www-html@w3.org
Message-Id: <199802182151.GAA26669@sh.w3.mag.keio.ac.jp>

At 16:50 98/02/17 -0500, Ian Hickson wrote:
> Can someone explan the logic in this?

Yes. The NAME attribute started out in HTML to target
a specific place in a page. The original implementations
did not fold case (i.e. were case-sensitive, because that
was easier to implement) and this has been continued.

The ID attribute is used in SGML (where actually any
attribute can be of type ID if declared correctly).
For HTML, it became of interest when it was realised
that not only the A element could be targeted; e.g. in
stylesheets. The ID attribute in SGML is case-insensitive.

Both attributes have a very similar function, and so
it makes sense to have them use one single namespace.

The rules were developped to cover all these cases.

> >From the HTML4 spec: [1]
> 
> > Section 12.2.1  Syntax of anchor names
> > An anchor name is the value of either the name or id attribute
> > when used in the context of anchors. Anchor names must observe
> > the following rules:
> > * Uniqueness: Anchor names must be unique within a document.
> > Anchor names that differ only in case may not appear in the
> > same document
> > * String matching: Comparisons between fragment identifiers
> > and anchor names must be done by exact (case-sensitive) match.
> 
> So the following code is illegal:
> 
> ----------------
> <P ID=ONE>...
> <P ID=one>...
> <P> <A HREF="ONE">Link to first paragraph</A>
> <P> <A HREF="one">Link to second paragraph</A>
> ----------------

Yes, because the two IDs have, in SGML, the same ID, and
this is not allowed.

> Yet the following code should do nothing:
> ----------------
> <P ID=one>...
> <P> <A HREF="ONE">Link to the paragraph</A>
> ----------------

I guess it has to be <A HREF="#ONE">, but otherwise you are
right.

The whole thing looks illogical if you think only about machines:
If a machine can distinguish between "one" and "ONE", why shoudn't
both be allowed. If you also think about humans, and about the many
languages there are on the planet, it can make more sense: For some
people, distinguishing upper case and lower case is not so easy.
On the other hand, various languages vary in how they associate
upper-case and lower-case letters (e.g. in Turkish, an "i" is not
the lower case of an "I"). So it makes more sense to have some
"safe area" between what matches and what doesn't.

Regards,   Martin.

Received on Wednesday, 18 February 1998 17:09:47 UTC