W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2006

Re: On citing Unicode

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 15 Mar 2006 14:30:12 +0900
Message-Id: <6.0.0.20.2.20060315133704.08b55960@localhost>
To: "Eric Prud'hommeaux" <eric@w3.org>, Richard Ishida <ishida@w3.org>
Cc: "'Felix Sasaki'" <fsasaki@w3.org>, public-i18n-core@w3.org

At 22:17 06/03/14, Eric Prud'hommeaux wrote:
 >On Tue, Mar 14, 2006 at 12:03:05PM -0000, Richard Ishida wrote:
 >> The Unicode Standard defines 'code point' as "Any value in the Unicode code
 >> space" (p.64).  ie. you can have unassigned code points.
 >
 >Excellent! It would be nice if that info were in the web.  Is that
 >excerpted somewhere with a convenient anchor near it? If not, I
 >suppose I need to include it in SPARQL grammar definition.

It's on the Web, but not in HTML. Heavy PDF, almost as heavy as the
real thing. The poiter you want is to section 3.4,
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G2212,
Definitions D4b and D4a.


 >> Note that CharMod refers to the full range of Unicode code points as "from
 >> U+0000 to U+10FFFF inclusive." http://www.w3.org/TR/charmod/#C070
 >
 >That almost gives me what I need, except that _Character_string_ is
 >not defined in terms of a clearly stable character set:
 >[[
 >Character string: A string viewed as a sequence of characters, each
 >represented by a code point in Unicode [Unicode].
 >]]
 >C070 and C077 say that specs should use U+0000-U+10FFFF but charmod
 >doesn't define a character string in terms of that range

Yes, because while C070 warns about arbitrary exclusions, it does
allow well-motivated exclusions (such as C0 controls in XML 1.0,
#x0 in XML 1.1, and so on).

Charmod explains how you work with wood to build furniture,
but it doesn't provide ready-made parts you can just plug
together. This was done by design.

 >It does not attribute the definition of the range
 >#x00-#x10FFFF to either CharMod, as I don't see where CharMod actually
 >defines _Character_string_ as being that range, or to Unicode, as a I
 >haven't read it enough to know where it states the contact to use the
 >range U+0000-U+10FFFF for a very long time.

This is not explicitly stated, at least not e.g. at
http://www.unicode.org/standard/stability_policy.html.
In terms of contact, it's good to put the actual numbers into
your spec, to save developers the work to look them up.

Regards,   Martin. 
Received on Wednesday, 15 March 2006 09:07:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:50 GMT