W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > January 2006

Re: [OK?] Re: SPARQL and Unicode versions

From: Dave Beckett <dave@dajobe.org>
Date: Wed, 25 Jan 2006 19:29:50 -0800
Message-ID: <43D8422E.1010009@dajobe.org>
To: Eric Prud'hommeaux <eric@w3.org>
CC: Dan Connolly <connolly@w3.org>, public-rdf-dawg-comments@w3.org

Eric Prud'hommeaux wrote:
> On Sun, Jan 08, 2006 at 08:54:15AM -0600, Dan Connolly wrote:
>>On Sat, 2006-01-07 at 20:01 -0800, Dave Beckett wrote:
>>>Dan Connolly wrote:
>>>>On Sat, 2006-01-07 at 12:38 -0800, Dave Beckett wrote:
>>>>>SPARQL refers to:
>>>>> [UNICODE]
>>>>>   The Unicode Standard, Version 4. ISBN 0-321-18578-1, as updated from
>>>>> time to time by the publication of new versions. The latest version of
>>>>> Unicode and additional information on versions of the standard and of
>>>>> the Unicode Character Database is available at
>>>>> http://www.unicode.org/unicode/standard/versions/.
>>>>>which cites a moving target.  Please define SPARQL in terms of a
>>>>>particular version of Unicode only, and no other.  Otherwise if or when
>>>>>this Unicode consortium makes some incompatible changes, all existing
>>>>>implementations become invalid.
>>>>How so? How is conformance to SPARQL sensitive to changes in Unicode?
>>>The SPARQL query syntax is defined on Unicode characters:
>>>A. SPARQL Grammar
>>>A SPARQL query string is a Unicode character string (c.f. section 6.1
>>>String concepts of [CHARMOD])
>>>although the grammar defines precise ranges of codepoints for particular
>>>things such as names of variables (based on XML 1.1 I think).
>>>If the definition of a Unicode character string changes in some future
>>>Unicode revision, such as for example by allowing additional codepoints,
>>>then there will be additional codepoints allowed in a SPARQL query
>>>string, following the sentence above.
>>I believe that's by design, following...
>>"C063  [S]  A generic reference to the Unicode Standard MUST be made if
>>it is desired that characters allocated after a specification is
>>published are usable with that specification".
>>  http://www.w3.org/TR/2005/REC-charmod-20050215/#C063
>>I suppose I should check with the WG.
>>>Any part of the grammar that uses an negated range such as with '[^...]'
>>>will allow such codepoints.  Examples include:
>>>  http://www.w3.org/TR/rdf-sparql-query/#rQ_IRI_REF
>>>and all string literals.
>>>These codepoints may be refused by something implementing Unicode 4.0
>>>and no more.
>>I suppose we need a test case that uses a codepoint that isn't currently
>>allocated in Unicode 4.0.
>>I still can't think of any reason why changes in Unicode specs would
>>make any difference to SPARQL producers/consumers. It's not like
>>they need to reference the Unicode tables to check the grammar or
> Do to lineage and good intentions, the SPARQL grammar mirrors the
> XML1.1 spec. For instance, our name chars
>   http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR1p
> are slight liberalizations of XML name chars
>   http://www.w3.org/TR/xml11/#NT-NameStartChar
> Strings
>   http://www.w3.org/2001/sw/DataAccess/rq23/#rSTRING_LITERAL1
> are analogous to CharData
>   http://www.w3.org/TR/xml11/#NT-CharData
> Basically, our grammar follows XML's lead and maps out the use Unicode
> chars from #x00 to #xEFFFF . All Unicode chars are in this range, but
> there are lots of holes (currently undefined chars). My reading of the
> XML spec is that the grammar is fixed as Unicode grows and fills these
> holes. However, if Unicode extends beyond #xEFFFF, XML1.1 apps will
> not handle these new chars. To clarify this, and to address the
> Björn's comments, I will propose the following text at the top of the
> grammar definition:
> [[
> A SPARQL query string is a Unicode character string (c.f. section 6.1
> String concepts of [CHARMOD]) in the language defined by the following
> grammar, starting with the Query production.  The EBNF format is the
> same as that used in the XML 1.1 specification[XML11]. Please see the
> "Notation" section of that specification for specific information
> about the notation.
> [ Informative: this specification maps out the useage of Unicode
> characters between #x00 and #xEFFFF. Excluded character sets,
> for example "[^<>'{}|^`]", indicate the range of [#x00-#xEFFFF] minus
> those the listed characters. This specification does not include any
> future Unicode characters outside of the range [#x00-#xEFFFF]. ]
> The following sections list all additional constraints on a valid
> SPARQL query:
> ...
> A.5 Escape sequences in strings
> Escaped characters in strings (STRING_LITERAL1, STRING_LITERAL2,
> STRING_LITERAL_LONG1, STRING_LITERAL_LONG2) must be in the character
> ranges defined by those rules.
> ]]
> Dave, Björn, what do you think?

That change is OK with me.  I guess having found out from your
description more about how future Unicode changes will occur, I would be
quite happy with no change to the text if that suits you.  The
informative addition helps this understanding.

[ Your message didn't seem to be addressed to Björn ]


Received on Thursday, 26 January 2006 03:29:57 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:52:07 UTC