On Tue, 17 Jan 2012, "Martin J. Dürst" wrote:
> >
> > As the HTML spec defines the term "Unicode code point"[1]
> >
> > [[
> > The term Unicode code point means a Unicode scalar value where possible,
> > and an isolated surrogate code point when not. When a conformance
> > requirement is defined in terms of characters or Unicode code points, a
> > pair of code units consisting of a high surrogate followed by a low
> > surrogate must be treated as the single code point represented by the
> > surrogate pair, but isolated surrogates must each be treated as the
> > single code point with the value of the surrogate.
> > ]]
>
> My guess is that the HTML spec came up with a special term (it should be
> "UTF-16 code unit", rather than "Unicode code point", but that's a
> separate issue) because in many cases, they define their algorithms and
> procedures in a very low-level fashion.
Yes. We didn't call it "UTF-16" anything though because it doesn't really
have anything to do with UTF-16 (other than UTF-16 is why the surrogates
exist), and having UTF-16 in the name would therefore be quite confusing.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'