[Bug 12839] New: @id: Define how Unicode normalization affects the 'unique identifier' status

http://www.w3.org/Bugs/Public/show_bug.cgi?id=12839

           Summary: @id: Define how Unicode normalization affects the
                    'unique identifier' status
           Product: HTML WG
           Version: unspecified
          Platform: PC
               URL: http://dev.w3.org/html5/spec/elements#concept-id
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


PROPOSAL: 

  * DEFINE 'unique identifier'. 
  * SUGGESTED DEFINITION: State that W3C normalization [*] must be performed
before it can be established whether the @id is valid. That is: before it can
be established whether it constitutes a  'unique identifier'.  [*]
http://unicode.org/faq/normalization.html#7

This means (and this should perhaps be emphasized) that  if two id attributes
differ only with regard to their normalization form, then it is a violation of
the "unique identifier" requirement. 

 NOTE: 

private tests of today's user agents (IE8, Firefox4, Opera11, Safari, Chrome)
shows that

that <a href="#a&#x30a;">link</a> targets <p id="&#xe5;">
whereas <a href="#&#xe5;">link</a> targets <p id="&#xe5;">

Thus, today's user agents do actually treat them as unique identifiers, despite
that they both refer to the same "å" (&#xe5;).  

However, in order toi avoid author confusion as well as user confusion, this
should not be considered valid. It probably also breaks a number of other
specs, including Unicode, to treat them as unique.

CURRENT STATUS: Spec says:

   ]] The id attribute specifies its element's unique identifier (ID). The
value must be unique amongst all the IDs in the element's home subtree and must
contain at least one character.[[

PROBLEM: there is no definition of "unique".  Specifically, it does not state
whether two @id attributes that differs only with regard to the normalization,
are to be considered unique, or not.

EXAMPLE: In this example document, the letter 'å' (&#xe5;) is first represented
in decomposed form, and thereafter in composed form:

<!DOCTYPE html><title></title><p id="a&#x30a;"><p id="&#xe5;">

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Wednesday, 1 June 2011 01:18:22 UTC