W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: ID Characters (was: Re: 3.4. Global attributes)

From: Jim Jewett <jimjjewett@gmail.com>
Date: Wed, 1 Aug 2007 10:01:08 -0400
Message-ID: <fb6fbf560708010701j3e6bc5e2v7e69333a010b52f9@mail.gmail.com>
To: "Robert Burns" <rob@robburns.com>
Cc: public-html@w3.org

On 7/31/07, Robert Burns <rob@robburns.com> wrote:
> On Jul 31, 2007, at 5:00 PM, Jim Jewett wrote:

> > Authors wishing to write robust applications are advised to use a more
> > restricted set of IDs.  While "1" and $^&" are technically valid
> > identifiers, they will trigger bugs in some tools.  Therefore, authors
> > SHOULD stick to ID characters from the ASCII digits [0-9] and one case
> > of ASCII letters (either [a-z] or [A-Z]), and SHOULD ensure that the
> > first character of each ID is a letter rather than a digit.

> > This probably applies to the name attribute as well.

I should probably have said "document", or at least "authoring tool".
I agree that user agents should support at least the full allowed
range, whatever that turns out to be.

> I am a bit concerned about XML compatibility.

Agreed, and I would have no objection to just saying "The rules for ID
characters are the same as in XML."

> However, I don't think we should be using only ASCII there either
> (perhaps you meant Unicode letters and digits, etc).

Absolutely not.  That would be worse than the current ad-hoc rule.

I *would* understand delegating to the unicode consortium, and UAX31,
Identifier and Pattern Syntax.  http://unicode.org/reports/

Basically, it says to use characters with the XID_Start property to
start an ID, and characters with the XID_CONTINUE property for the
rest of the ID.  (I don't see any reason to use the older ID_START and
ID_CONTINUE, unless we're trying to maintain compatibility with
another standard -- in which case, we should just cite that standard.)
 This is similar to letters then letters + numbers, but excludes some
(not all) that really ought to be excluded.

But my recommendation (to document authors) is still to stick with
ASCII letters of a single case plus (non-initial) digits.  This is
because there will always be tools that are buggy outside of that
range, or with case-folding -- even if the original author doesn't
happen to be using one of them at the time.

By analogy, you don't *need* a seatbelt unless you crash, and you
don't plan to crash.  You SHOULD wear the seatbelt anyhow.  And you
SHOULD be conservative in the IDchars you assume will handled
correctly by someone else's tools.



> [1]: <http://www.w3.org/TR/xml/#sec-common-syn>
> [2]: <http://www.w3.org/TR/xml/#sec-entexpand>
Received on Wednesday, 1 August 2007 14:01:19 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:25 UTC