[Bug 13676] Clarify what the code-point length of a string with isolated surrogate is.

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13676

KangHao Lu <kennyluck@w3.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kennyluck@w3.org

--- Comment #1 from KangHao Lu <kennyluck@w3.org> 2011-08-05 02:09:01 UTC ---
Assuming the intension of the current text is to count

"\ud840\udc87" // as code-point length = 1
"\ud840+\udc87" // as code-point length = 3

(which isn't very clear as far as I can tell), I would suggest the spec to
include a sentence like "Unpaired surrogates count as one code point each."
(wording from [1])

Alternatively, it might be clearer to replace the sentence

# The code-point length of a string is the number of Unicode code points in
that string.

by

| The code-point length of a string is the number of Unicode characters after
the string is converted to a sequence of Unicode characters[2].

This will then work for both a string of Unicode characters(theory) and a
DOMString(reality), before the internal representation of the value of an input
element[3] is made clear.

Having said that, I am not convinced that defining @maxlength this way is the
best, I tried to analyze other possibilities[4] but wasn't confident enough to
file a bug (my preference is to count 16 bits)

[1] http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html
(definition of String.codePointCount)
[2] http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode
[3]
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#concept-fe-value
[4] http://lists.w3.org/Archives/Public/www-international/2011AprJun/0105

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 5 August 2011 02:09:03 UTC