W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > August 2011

[Bug 13676] Clarify what the code-point length of a string with isolated surrogate is.

From: <bugzilla@jessica.w3.org>
Date: Fri, 05 Aug 2011 02:09:02 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Qp9qQ-0002Np-58@jessica.w3.org>

KangHao Lu <kennyluck@w3.org> changed:

           What    |Removed                     |Added
                 CC|                            |kennyluck@w3.org

--- Comment #1 from KangHao Lu <kennyluck@w3.org> 2011-08-05 02:09:01 UTC ---
Assuming the intension of the current text is to count

"\ud840\udc87" // as code-point length = 1
"\ud840+\udc87" // as code-point length = 3

(which isn't very clear as far as I can tell), I would suggest the spec to
include a sentence like "Unpaired surrogates count as one code point each."
(wording from [1])

Alternatively, it might be clearer to replace the sentence

# The code-point length of a string is the number of Unicode code points in
that string.


| The code-point length of a string is the number of Unicode characters after
the string is converted to a sequence of Unicode characters[2].

This will then work for both a string of Unicode characters(theory) and a
DOMString(reality), before the internal representation of the value of an input
element[3] is made clear.

Having said that, I am not convinced that defining @maxlength this way is the
best, I tried to analyze other possibilities[4] but wasn't confident enough to
file a bug (my preference is to count 16 bits)

[1] http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html
(definition of String.codePointCount)
[2] http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode
[4] http://lists.w3.org/Archives/Public/www-international/2011AprJun/0105

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 5 August 2011 02:09:03 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:01:59 UTC