[Bug 12100] UAs do not actually convert DOMStrings to sequences of Unicode characters. Test case: data:text/html,<!doctype html><script>document.documentElement.title = "\ud800"; alert(document.documentElement.title.charCodeAt(0));</script> Expected 65533, got 5529

http://www.w3.org/Bugs/Public/show_bug.cgi?id=12100

Cameron McCormack <cam@mcc.id.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cam@mcc.id.au

--- Comment #4 from Cameron McCormack <cam@mcc.id.au> 2011-05-04 23:19:55 UTC ---
At the top of the description of the SVG interface that has the methods that
allow indexing into strings (for rendered text length calculations etc.), we
have this text:

  For the methods on this interface that refer to an index to a character
  or number of characters, these references are to be interpreted as
  an index to a UTF-16 code unit or a number of UTF-16 code units,
  respectively. This is for consistency with DOM Level 2 Core, where
  methods on the CharacterData interface use UTF-16 code units as indexes
  and counts within the character data. Thus for example, if the text
  content of a ‘text’ element is a single non-BMP character, such as
  U+10000, then invoking getNumberOfChars on that element will return
  2 since there are two UTF-16 code units (the surrogate pair) used to
  represent that one character.

Something like that might be OK in the HTML spec too, although with the methods
spread throughout the spec more, it might be less obvious.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 4 May 2011 23:19:57 UTC