Re: ID Characters (was: Re: 3.4. Global attributes) from Karl Dubost on 2007-08-01 (public-html@w3.org from August 2007)

From: Karl Dubost <karl@w3.org>
Date: Wed, 1 Aug 2007 09:36:37 +0900
To: Robert Burns <rob@robburns.com>
Cc: Jim Jewett <jimjjewett@gmail.com>, public-html@w3.org
Message-Id: <D485D244-17D0-41A3-9A4E-03D0F2ADDC9A@w3.org>

About
http://dev.w3.org/html5/spec/Overview.html#id

Le 1 août 2007 à 07:47, Robert Burns a écrit :
> On Jul 31, 2007, at 5:00 PM, Jim Jewett wrote:
>> Authors wishing to write robust applications are advised to use a  
>> more
>> restricted set of IDs.  While "1" and $^&" are technically valid
>> identifiers, they will trigger bugs in some tools.  Therefore,  
>> authors
>> SHOULD stick to ID characters from the ASCII digits [0-9] and one  
>> case
>> of ASCII letters (either [a-z] or [A-Z]), and SHOULD ensure that the
>> first character of each ID is a letter rather than a digit.
>>
>> This probably applies to the name attribute as well.
>
> I am a bit concerned about XML compatibility. Allowing IDs more  
> permissive than XML makes conversion to XML (or manipulation or  
> embedding within XML) more difficult. I don't see how we gain that  
> much by permitting authors to use these extra start characters.

Indeed.

> However, I don't think we should be using only ASCII there either  
> (perhaps you meant Unicode letters and digits, etc). Following the  
> same rules as XML on name production would make a lot of sense here 
> [1][2].
> [1]: <http://www.w3.org/TR/xml/#sec-common-syn>
> [2]: <http://www.w3.org/TR/xml/#sec-entexpand>

Agreed.
The section on ids could be written in a more elegant way if the  
class of products where identified. Let's see

3.4.1. The id attribute

The id attribute annotates an element with a unique identifier.

Author:

* The value is unique within the HTML document.
   WRONG
    <h1 id="foo">…</h1>
    <p id="foo">…</p>
   GOOD
    <h1 id="bar">…</h1>
    <p id="foo">…</p>
* An HTML element can have zero or one and only one unique identifier.
   WRONG
    <p id="foo" id="bar">…</p>
   GOOD
    <p id="foo">…</p>
    <p>…</p>
* empty value is not authorized.  (ex: id="" is wrong)
   WRONG
    <p id="">…</p>
   GOOD
    <p id="foo">…</p>
    <p>…</p>
* Values with space characters are not authorized
   WRONG
    <p id="foo bar">…</p>
   GOOD
    <p id="foobar">…</p>

User Agent:

The value must be unique in the subtree within which the element  
finds itself and must contain at least one character. The value must  
not contain any space characters.

(here I would add a reference to
http://www.w3.org/TR/xml-id/#id-avn )
If the value is not the empty string, user agents must associate the  
element with the given value (exactly, including any space  
characters) for the purposes of ID matching within the subtree the  
element finds itself (e.g. for selectors in CSS or for the  
getElementById() method in the DOM).

Identifiers are opaque strings. Particular meanings should not be  
derived from the value of the id attribute.

This specification doesn't preclude an element having multiple IDs  
for user agents, if other mechanisms (e.g. DOM Core methods) can set  
an element's ID in a way that doesn't conflict with the id attribute.

The id DOM attribute must reflect the id content attribute.

-- 
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***

Received on Wednesday, 1 August 2007 00:36:44 UTC