Re: ID Characters (was: Re: 3.4. Global attributes) from Robert Burns on 2007-08-01 (public-html@w3.org from August 2007)

From: Robert Burns <rob@robburns.com>
Date: Wed, 1 Aug 2007 10:31:57 -0500
To: Jim Jewett <jimjjewett@gmail.com>
Cc: public-html@w3.org
Message-Id: <783A36DC-74D8-4DD6-A046-464B43959CA1@robburns.com>
On Aug 1, 2007, at 9:01 AM, Jim Jewett wrote:

>
> On 7/31/07, Robert Burns <rob@robburns.com> wrote:
>> On Jul 31, 2007, at 5:00 PM, Jim Jewett wrote:
>
>>> Authors wishing to write robust applications are advised to use a  
>>> more
>>> restricted set of IDs.  While "1" and $^&" are technically valid
>>> identifiers, they will trigger bugs in some tools.  Therefore,  
>>> authors
>>> SHOULD stick to ID characters from the ASCII digits [0-9] and one  
>>> case
>>> of ASCII letters (either [a-z] or [A-Z]), and SHOULD ensure that the
>>> first character of each ID is a letter rather than a digit.
>
>>> This probably applies to the name attribute as well.
>
> I should probably have said "document", or at least "authoring tool".
> I agree that user agents should support at least the full allowed
> range, whatever that turns out to be.
>
>> I am a bit concerned about XML compatibility.
>
> Agreed, and I would have no objection to just saying "The rules for ID
> characters are the same as in XML."
>
>
>> However, I don't think we should be using only ASCII there either
>> (perhaps you meant Unicode letters and digits, etc).
>
> Absolutely not.  That would be worse than the current ad-hoc rule.
>
> I *would* understand delegating to the unicode consortium, and UAX31,
> Identifier and Pattern Syntax.  http://unicode.org/reports/
>
> Basically, it says to use characters with the XID_Start property to
> start an ID, and characters with the XID_CONTINUE property for the
> rest of the ID.  (I don't see any reason to use the older ID_START and
> ID_CONTINUE, unless we're trying to maintain compatibility with
> another standard -- in which case, we should just cite that standard.)
>  This is similar to letters then letters + numbers, but excludes some
> (not all) that really ought to be excluded.
>
> But my recommendation (to document authors) is still to stick with
> ASCII letters of a single case plus (non-initial) digits.  This is
> because there will always be tools that are buggy outside of that
> range, or with case-folding -- even if the original author doesn't
> happen to be using one of them at the time.
>
> By analogy, you don't *need* a seatbelt unless you crash, and you
> don't plan to crash.  You SHOULD wear the seatbelt anyhow.  And you
> SHOULD be conservative in the IDchars you assume will handled
> correctly by someone else's tools.
>

I see what you're saying here now. I think I would rather see  
something like this addressed through one of the "green' notes.  
Something like:

"Note: Authors should be aware that some legacy tools may not handle  
Unicode characters outside the ASCII rang properly when processing  
IDs. For maximum compatibility authors should stick with ASCII only  
characters in producing a value for the @id attribute."

In this way we let authors know about wearing seatbelts. However, we  
don't make it seem like we endorse the continued poor state of the  
tools. To me this is similar to the XML/XHTML warning discussed in  
another thread[1]. There too i think we should only make mention (if  
at all) that XML processing of HTML has maturity problems in legacy  
applications. After all many of those maturity issues may be fixed  
even before we go to CR status.

Take care,
Rob

[1]: <http://lists.w3.org/Archives/Public/public-html/2007Aug/0037.html>
Received on Wednesday, 1 August 2007 15:32:10 UTC