- From: Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu>
- Date: Mon, 16 Jan 2012 19:36:22 +0800
- To: WWW Style <www-style@w3.org>, WWW International <www-international@w3.org>
Conceptually, UAX#29, on which the definition of a grapheme cluster in CSS3 Text relies upon, operates on a string of Unicode code points, while the DOM is in reality UTF-16. Although it is quite obvious what conversion should happen, it might be nice to say a little bit about this. A normative result from this clarification would be to ask UA to render a single emphasis dot instead of two in the following case <span style="text-emphasis: dots">(U+D840, U+DC87)</span> (a random ideograph out of BMP) As the HTML spec defines the term "Unicode code point"[1] [[ The term Unicode code point means a Unicode scalar value where possible, and an isolated surrogate code point when not. When a conformance requirement is defined in terms of characters or Unicode code points, a pair of code units consisting of a high surrogate followed by a low surrogate must be treated as the single code point represented by the surrogate pair, but isolated surrogates must each be treated as the single code point with the value of the surrogate. ]] , I think CSS3 Text can adopt this prose somewhere in the spec, perhaps near the definition of a grapheme cluster, and make it undefined as to what should happen if isolated surrogates are encountered. See [2] for such an example. [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#unicode-code-point [2] http://lists.w3.org/Archives/Public/www-style/2012Jan/0556 Cheers, Kenny
Received on Monday, 16 January 2012 11:37:37 UTC