Re: [css3-text] tweak the definition of a grapheme cluster a bit for UTF-16

On Tue, 17 Jan 2012, "Martin J. Dürst" wrote:
> >
> > As the HTML spec defines the term "Unicode code point"[1]
> > 
> > [[
> > The term Unicode code point means a Unicode scalar value where possible,
> > and an isolated surrogate code point when not. When a conformance
> > requirement is defined in terms of characters or Unicode code points, a
> > pair of code units consisting of a high surrogate followed by a low
> > surrogate must be treated as the single code point represented by the
> > surrogate pair, but isolated surrogates must each be treated as the
> > single code point with the value of the surrogate.
> > ]]
> 
> My guess is that the HTML spec came up with a special term (it should be 
> "UTF-16 code unit", rather than "Unicode code point", but that's a 
> separate issue) because in many cases, they define their algorithms and 
> procedures in a very low-level fashion.

Yes. We didn't call it "UTF-16" anything though because it doesn't really 
have anything to do with UTF-16 (other than UTF-16 is why the surrogates 
exist), and having UTF-16 in the name would therefore be quite confusing.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 17 January 2012 21:47:32 UTC