W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > May 2011

[Bug 12100] UAs do not actually convert DOMStrings to sequences of Unicode characters. Test case: data:text/html,<!doctype html><script>document.documentElement.title = "\ud800"; alert(document.documentElement.title.charCodeAt(0));</script> Expected 65533, got 5529

From: <bugzilla@jessica.w3.org>
Date: Wed, 04 May 2011 23:19:56 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1QHlMK-0007ct-7A@jessica.w3.org>

Cameron McCormack <cam@mcc.id.au> changed:

           What    |Removed                     |Added
                 CC|                            |cam@mcc.id.au

--- Comment #4 from Cameron McCormack <cam@mcc.id.au> 2011-05-04 23:19:55 UTC ---
At the top of the description of the SVG interface that has the methods that
allow indexing into strings (for rendered text length calculations etc.), we
have this text:

  For the methods on this interface that refer to an index to a character
  or number of characters, these references are to be interpreted as
  an index to a UTF-16 code unit or a number of UTF-16 code units,
  respectively. This is for consistency with DOM Level 2 Core, where
  methods on the CharacterData interface use UTF-16 code units as indexes
  and counts within the character data. Thus for example, if the text
  content of a ‘text’ element is a single non-BMP character, such as
  U+10000, then invoking getNumberOfChars on that element will return
  2 since there are two UTF-16 code units (the surrogate pair) used to
  represent that one character.

Something like that might be OK in the HTML spec too, although with the methods
spread throughout the spec more, it might be less obvious.

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Wednesday, 4 May 2011 23:19:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:01:49 UTC