Re: SPARQL and Unicode versions from Dave Beckett on 2006-01-08 (public-rdf-dawg-comments@w3.org from January 2006)

From: Dave Beckett <dave@dajobe.org>
Date: Sat, 07 Jan 2006 20:01:40 -0800
To: Dan Connolly <connolly@w3.org>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <43C08EA4.3060406@dajobe.org>

Dan Connolly wrote:
> On Sat, 2006-01-07 at 12:38 -0800, Dave Beckett wrote:
> 
>>SPARQL refers to:
>>
>>[[
>>  [UNICODE]
>>    The Unicode Standard, Version 4. ISBN 0-321-18578-1, as updated from
>>  time to time by the publication of new versions. The latest version of
>>  Unicode and additional information on versions of the standard and of
>>  the Unicode Character Database is available at
>>  http://www.unicode.org/unicode/standard/versions/.
>>
>>]]
>>
>>which cites a moving target.  Please define SPARQL in terms of a
>>particular version of Unicode only, and no other.  Otherwise if or when
>>this Unicode consortium makes some incompatible changes, all existing
>>implementations become invalid.
> 
> 
> How so? How is conformance to SPARQL sensitive to changes in Unicode?

The SPARQL query syntax is defined on Unicode characters:

[[
A. SPARQL Grammar

A SPARQL query string is a Unicode character string (c.f. section 6.1
String concepts of [CHARMOD])
...
]]

although the grammar defines precise ranges of codepoints for particular
things such as names of variables (based on XML 1.1 I think).

If the definition of a Unicode character string changes in some future
Unicode revision, such as for example by allowing additional codepoints,
then there will be additional codepoints allowed in a SPARQL query
string, following the sentence above.

Any part of the grammar that uses an negated range such as with '[^...]'
will allow such codepoints.  Examples include:
  http://www.w3.org/TR/rdf-sparql-query/#rQ_IRI_REF
and all string literals.

These codepoints may be refused by something implementing Unicode 4.0
and no more.

Dave

Received on Sunday, 8 January 2006 04:02:02 UTC