- From: Mark Davis <mark.davis@icu-project.org>
- Date: Tue, 14 Mar 2006 19:12:13 -0800
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Richard Ishida <ishida@w3.org>, 'Felix Sasaki' <fsasaki@w3.org>, public-i18n-core@w3.org
To be absolutely correct, you would write: [[ Unicode String: A sequence of Unicode code points [Unicode]. Also known as a 'character string': note however that a Unicode String may contain code points that are reserved (that is, not assigned to characters), to allow for compatibility with future versions of Unicode that may assign them. ]] [[ A SPARQL query string is a Unicode string (c.f. section 6.1 String concepts of [CHARMOD]) in the language defined by the following grammar, starting with the Query production. For productions with excluded character classes (for example "[^<>'{}|^`]"), the characters are excluded from the range #x0000 - #x10FFFF. ]] Eric Prud'hommeaux wrote: > On Tue, Mar 14, 2006 at 12:03:05PM -0000, Richard Ishida wrote: > >>> *This point about *assigned* code points is the crux of my argument. >>> If it is wrong, and "code point" includes unassinged code >>> ponts, we don't need this extra text in SPARQL. >>> >> The Unicode Standard defines 'code point' as "Any value in the Unicode code >> space" (p.64). ie. you can have unassigned code points. >> > > Excellent! It would be nice if that info were in the web. Is that > excerpted somewhere with a convenient anchor near it? If not, I > suppose I need to include it in SPARQL grammar definition. > > >> Note that CharMod refers to the full range of Unicode code points as "from >> U+0000 to U+10FFFF inclusive." http://www.w3.org/TR/charmod/#C070 >> > > That almost gives me what I need, except that _Character_string_ is > not defined in terms of a clearly stable character set: > [[ > Character string: A string viewed as a sequence of characters, each > represented by a code point in Unicode [Unicode]. > ]] > C070 and C077 say that specs should use U+0000-U+10FFFF but charmod > doesn't define a character string in terms of that range except by > suggesting that you cite an evolving document. We know, by social > context, that the Unicode consortium will only fill in code points in > that range for the foreseeable future, so _Character_string_ is good > for the same time. Not all readers of the spec share that social > context. I'm looking for the specific words to add to give them that. > > Do you think that the text > [[ > A SPARQL query string is a Unicode character string (c.f. section 6.1 > String concepts of [CHARMOD]) in the language defined by the following > grammar, starting with the Query production. For compatibility with > future versions of Unicode, the characters in this string may include > unassigned Unicode codepoints (see Identifier and Pattern Syntax > [UNIID] section 4 Pattern Syntax). For productions with excluded > character classes (for example "[^<>'{}|^`]"), the characters are > excluded from the range #x00 - #x10FFFF. > ]] > is sufficient? It does not attribute the definition of the range > #x00-#x10FFFF to either CharMod, as I don't see where CharMod actually > defines _Character_string_ as being that range, or to Unicode, as a I > haven't read it enough to know where it states the contact to use the > range U+0000-U+10FFFF for a very long time. > > So, advice on wording is actively solicited. The WG will may be > approving this text in 62 minutes. > > >> ============ >> Richard Ishida >> Internationalization Lead >> W3C (World Wide Web Consortium) >> >> http://www.w3.org/People/Ishida/ >> http://www.w3.org/International/ >> http://people.w3.org/rishida/blog/ >> http://www.flickr.com/photos/ishida/ >> >> >> > >
Received on Wednesday, 15 March 2006 03:12:34 UTC