- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Tue, 14 Mar 2006 08:17:13 -0500
- To: Richard Ishida <ishida@w3.org>
- Cc: 'Felix Sasaki' <fsasaki@w3.org>, public-i18n-core@w3.org
- Message-ID: <20060314131713.GF20832@w3.org>
On Tue, Mar 14, 2006 at 12:03:05PM -0000, Richard Ishida wrote: > > *This point about *assigned* code points is the crux of my argument. > > If it is wrong, and "code point" includes unassinged code > > ponts, we don't need this extra text in SPARQL. > > The Unicode Standard defines 'code point' as "Any value in the Unicode code > space" (p.64). ie. you can have unassigned code points. Excellent! It would be nice if that info were in the web. Is that excerpted somewhere with a convenient anchor near it? If not, I suppose I need to include it in SPARQL grammar definition. > Note that CharMod refers to the full range of Unicode code points as "from > U+0000 to U+10FFFF inclusive." http://www.w3.org/TR/charmod/#C070 That almost gives me what I need, except that _Character_string_ is not defined in terms of a clearly stable character set: [[ Character string: A string viewed as a sequence of characters, each represented by a code point in Unicode [Unicode]. ]] C070 and C077 say that specs should use U+0000-U+10FFFF but charmod doesn't define a character string in terms of that range except by suggesting that you cite an evolving document. We know, by social context, that the Unicode consortium will only fill in code points in that range for the foreseeable future, so _Character_string_ is good for the same time. Not all readers of the spec share that social context. I'm looking for the specific words to add to give them that. Do you think that the text [[ A SPARQL query string is a Unicode character string (c.f. section 6.1 String concepts of [CHARMOD]) in the language defined by the following grammar, starting with the Query production. For compatibility with future versions of Unicode, the characters in this string may include unassigned Unicode codepoints (see Identifier and Pattern Syntax [UNIID] section 4 Pattern Syntax). For productions with excluded character classes (for example "[^<>'{}|^`]"), the characters are excluded from the range #x00 - #x10FFFF. ]] is sufficient? It does not attribute the definition of the range #x00-#x10FFFF to either CharMod, as I don't see where CharMod actually defines _Character_string_ as being that range, or to Unicode, as a I haven't read it enough to know where it states the contact to use the range U+0000-U+10FFFF for a very long time. So, advice on wording is actively solicited. The WG will may be approving this text in 62 minutes. > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/People/Ishida/ > http://www.w3.org/International/ > http://people.w3.org/rishida/blog/ > http://www.flickr.com/photos/ishida/ > > -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Tuesday, 14 March 2006 13:17:22 UTC