- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 26 Jan 2006 10:10:56 -0500
- To: Dave Beckett <dave@dajobe.org>
- Cc: Dan Connolly <connolly@w3.org>, public-rdf-dawg-comments@w3.org
- Message-ID: <20060126151056.GB17752@w3.org>
replying with a [CLOSED] subject to indicate comment resolution. On Wed, Jan 25, 2006 at 07:29:50PM -0800, Dave Beckett wrote: > > Eric Prud'hommeaux wrote: > > On Sun, Jan 08, 2006 at 08:54:15AM -0600, Dan Connolly wrote: > > > >>On Sat, 2006-01-07 at 20:01 -0800, Dave Beckett wrote: > >> > >>>Dan Connolly wrote: > >>> > >>>>On Sat, 2006-01-07 at 12:38 -0800, Dave Beckett wrote: > >>>> > >>>> > >>>>>SPARQL refers to: > >>>>> > >>>>>[[ > >>>>> [UNICODE] > >>>>> The Unicode Standard, Version 4. ISBN 0-321-18578-1, as updated from > >>>>> time to time by the publication of new versions. The latest version of > >>>>> Unicode and additional information on versions of the standard and of > >>>>> the Unicode Character Database is available at > >>>>> http://www.unicode.org/unicode/standard/versions/. > >>>>> > >>>>>]] > >>>>> > >>>>>which cites a moving target. Please define SPARQL in terms of a > >>>>>particular version of Unicode only, and no other. Otherwise if or when > >>>>>this Unicode consortium makes some incompatible changes, all existing > >>>>>implementations become invalid. > >>>> > >>>> > >>>>How so? How is conformance to SPARQL sensitive to changes in Unicode? > >>> > >>>The SPARQL query syntax is defined on Unicode characters: > >>> > >>>[[ > >>>A. SPARQL Grammar > >>> > >>>A SPARQL query string is a Unicode character string (c.f. section 6.1 > >>>String concepts of [CHARMOD]) > >>>... > >>>]] > >>> > >>>although the grammar defines precise ranges of codepoints for particular > >>>things such as names of variables (based on XML 1.1 I think). > >>> > >>>If the definition of a Unicode character string changes in some future > >>>Unicode revision, such as for example by allowing additional codepoints, > >>>then there will be additional codepoints allowed in a SPARQL query > >>>string, following the sentence above. > >> > >>I believe that's by design, following... > >> > >>"C063 [S] A generic reference to the Unicode Standard MUST be made if > >>it is desired that characters allocated after a specification is > >>published are usable with that specification". > >> http://www.w3.org/TR/2005/REC-charmod-20050215/#C063 > >> > >>I suppose I should check with the WG. > >> > >> > >>>Any part of the grammar that uses an negated range such as with '[^...]' > >>>will allow such codepoints. Examples include: > >>> http://www.w3.org/TR/rdf-sparql-query/#rQ_IRI_REF > >>>and all string literals. > >>> > >>>These codepoints may be refused by something implementing Unicode 4.0 > >>>and no more. > >> > >>I suppose we need a test case that uses a codepoint that isn't currently > >>allocated in Unicode 4.0. > >> > >>I still can't think of any reason why changes in Unicode specs would > >>make any difference to SPARQL producers/consumers. It's not like > >>they need to reference the Unicode tables to check the grammar or > >>anything. > > > > > > Do to lineage and good intentions, the SPARQL grammar mirrors the > > XML1.1 spec. For instance, our name chars > > http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR1p > > are slight liberalizations of XML name chars > > http://www.w3.org/TR/xml11/#NT-NameStartChar > > Strings > > http://www.w3.org/2001/sw/DataAccess/rq23/#rSTRING_LITERAL1 > > are analogous to CharData > > http://www.w3.org/TR/xml11/#NT-CharData > > > > Basically, our grammar follows XML's lead and maps out the use Unicode > > chars from #x00 to #xEFFFF . All Unicode chars are in this range, but > > there are lots of holes (currently undefined chars). My reading of the > > XML spec is that the grammar is fixed as Unicode grows and fills these > > holes. However, if Unicode extends beyond #xEFFFF, XML1.1 apps will > > not handle these new chars. To clarify this, and to address the > > Björn's comments, I will propose the following text at the top of the > > grammar definition: > > > > [[ > > A SPARQL query string is a Unicode character string (c.f. section 6.1 > > String concepts of [CHARMOD]) in the language defined by the following > > grammar, starting with the Query production. The EBNF format is the > > same as that used in the XML 1.1 specification[XML11]. Please see the > > "Notation" section of that specification for specific information > > about the notation. > > > > [ Informative: this specification maps out the useage of Unicode > > characters between #x00 and #xEFFFF. Excluded character sets, > > for example "[^<>'{}|^`]", indicate the range of [#x00-#xEFFFF] minus > > those the listed characters. This specification does not include any > > future Unicode characters outside of the range [#x00-#xEFFFF]. ] > > > > The following sections list all additional constraints on a valid > > SPARQL query: > > ... > > A.5 Escape sequences in strings > > > > Escaped characters in strings (STRING_LITERAL1, STRING_LITERAL2, > > STRING_LITERAL_LONG1, STRING_LITERAL_LONG2) must be in the character > > ranges defined by those rules. > > ]] > > > > Dave, Björn, what do you think? > > That change is OK with me. I guess having found out from your > description more about how future Unicode changes will occur, I would be > quite happy with no change to the text if that suits you. The > informative addition helps this understanding. My preference is to have that text in the spec so that other folks will gain the same understanding. I assume from your response that that is also acceptable. The XML1.1 spec does not explain this and I think it leaves people wondering. > [ Your message didn't seem to be addressed to Björn ] Björn see all actually, in response to that point, I mailed Björn directly with a link to this thread. tx. > Thanks > > Dave -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Thursday, 26 January 2006 15:11:05 UTC