Re: [css3-text] tweak the definition of a grapheme cluster a bit for UTF-16 from Ian Hickson on 2012-01-17 (www-style@w3.org from January 2012)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 17 Jan 2012 21:46:53 +0000 (UTC)
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
cc: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>, WWW International <www-international@w3.org>
Message-ID: <Pine.LNX.4.64.1201172145450.14845@ps20323.dreamhostps.com>

On Tue, 17 Jan 2012, "Martin J. Dürst" wrote:
> >
> > As the HTML spec defines the term "Unicode code point"[1]
> > 
> > [[
> > The term Unicode code point means a Unicode scalar value where possible,
> > and an isolated surrogate code point when not. When a conformance
> > requirement is defined in terms of characters or Unicode code points, a
> > pair of code units consisting of a high surrogate followed by a low
> > surrogate must be treated as the single code point represented by the
> > surrogate pair, but isolated surrogates must each be treated as the
> > single code point with the value of the surrogate.
> > ]]
> 
> My guess is that the HTML spec came up with a special term (it should be 
> "UTF-16 code unit", rather than "Unicode code point", but that's a 
> separate issue) because in many cases, they define their algorithms and 
> procedures in a very low-level fashion.

Yes. We didn't call it "UTF-16" anything though because it doesn't really 
have anything to do with UTF-16 (other than UTF-16 is why the surrogates 
exist), and having UTF-16 in the name would therefore be quite confusing.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 17 January 2012 21:47:25 UTC