- From: Dave Beckett <dave@dajobe.org>
- Date: Wed, 25 Jan 2006 19:29:50 -0800
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Dan Connolly <connolly@w3.org>, public-rdf-dawg-comments@w3.org
Eric Prud'hommeaux wrote: > On Sun, Jan 08, 2006 at 08:54:15AM -0600, Dan Connolly wrote: > >>On Sat, 2006-01-07 at 20:01 -0800, Dave Beckett wrote: >> >>>Dan Connolly wrote: >>> >>>>On Sat, 2006-01-07 at 12:38 -0800, Dave Beckett wrote: >>>> >>>> >>>>>SPARQL refers to: >>>>> >>>>>[[ >>>>> [UNICODE] >>>>> The Unicode Standard, Version 4. ISBN 0-321-18578-1, as updated from >>>>> time to time by the publication of new versions. The latest version of >>>>> Unicode and additional information on versions of the standard and of >>>>> the Unicode Character Database is available at >>>>> http://www.unicode.org/unicode/standard/versions/. >>>>> >>>>>]] >>>>> >>>>>which cites a moving target. Please define SPARQL in terms of a >>>>>particular version of Unicode only, and no other. Otherwise if or when >>>>>this Unicode consortium makes some incompatible changes, all existing >>>>>implementations become invalid. >>>> >>>> >>>>How so? How is conformance to SPARQL sensitive to changes in Unicode? >>> >>>The SPARQL query syntax is defined on Unicode characters: >>> >>>[[ >>>A. SPARQL Grammar >>> >>>A SPARQL query string is a Unicode character string (c.f. section 6.1 >>>String concepts of [CHARMOD]) >>>... >>>]] >>> >>>although the grammar defines precise ranges of codepoints for particular >>>things such as names of variables (based on XML 1.1 I think). >>> >>>If the definition of a Unicode character string changes in some future >>>Unicode revision, such as for example by allowing additional codepoints, >>>then there will be additional codepoints allowed in a SPARQL query >>>string, following the sentence above. >> >>I believe that's by design, following... >> >>"C063 [S] A generic reference to the Unicode Standard MUST be made if >>it is desired that characters allocated after a specification is >>published are usable with that specification". >> http://www.w3.org/TR/2005/REC-charmod-20050215/#C063 >> >>I suppose I should check with the WG. >> >> >>>Any part of the grammar that uses an negated range such as with '[^...]' >>>will allow such codepoints. Examples include: >>> http://www.w3.org/TR/rdf-sparql-query/#rQ_IRI_REF >>>and all string literals. >>> >>>These codepoints may be refused by something implementing Unicode 4.0 >>>and no more. >> >>I suppose we need a test case that uses a codepoint that isn't currently >>allocated in Unicode 4.0. >> >>I still can't think of any reason why changes in Unicode specs would >>make any difference to SPARQL producers/consumers. It's not like >>they need to reference the Unicode tables to check the grammar or >>anything. > > > Do to lineage and good intentions, the SPARQL grammar mirrors the > XML1.1 spec. For instance, our name chars > http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR1p > are slight liberalizations of XML name chars > http://www.w3.org/TR/xml11/#NT-NameStartChar > Strings > http://www.w3.org/2001/sw/DataAccess/rq23/#rSTRING_LITERAL1 > are analogous to CharData > http://www.w3.org/TR/xml11/#NT-CharData > > Basically, our grammar follows XML's lead and maps out the use Unicode > chars from #x00 to #xEFFFF . All Unicode chars are in this range, but > there are lots of holes (currently undefined chars). My reading of the > XML spec is that the grammar is fixed as Unicode grows and fills these > holes. However, if Unicode extends beyond #xEFFFF, XML1.1 apps will > not handle these new chars. To clarify this, and to address the > Björn's comments, I will propose the following text at the top of the > grammar definition: > > [[ > A SPARQL query string is a Unicode character string (c.f. section 6.1 > String concepts of [CHARMOD]) in the language defined by the following > grammar, starting with the Query production. The EBNF format is the > same as that used in the XML 1.1 specification[XML11]. Please see the > "Notation" section of that specification for specific information > about the notation. > > [ Informative: this specification maps out the useage of Unicode > characters between #x00 and #xEFFFF. Excluded character sets, > for example "[^<>'{}|^`]", indicate the range of [#x00-#xEFFFF] minus > those the listed characters. This specification does not include any > future Unicode characters outside of the range [#x00-#xEFFFF]. ] > > The following sections list all additional constraints on a valid > SPARQL query: > ... > A.5 Escape sequences in strings > > Escaped characters in strings (STRING_LITERAL1, STRING_LITERAL2, > STRING_LITERAL_LONG1, STRING_LITERAL_LONG2) must be in the character > ranges defined by those rules. > ]] > > Dave, Björn, what do you think? That change is OK with me. I guess having found out from your description more about how future Unicode changes will occur, I would be quite happy with no change to the text if that suits you. The informative addition helps this understanding. [ Your message didn't seem to be addressed to Björn ] Thanks Dave
Received on Thursday, 26 January 2006 03:29:57 UTC