Re: Small items for SPARQL from Seaborne, Andy on 2004-12-17 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 17 Dec 2004 21:21:56 +0000
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: Dan Connolly <connolly@w3.org>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <41C34DF4.3000903@hp.com>
Seaborne, Andy wrote:
> 
> 
> 
> Dan Connolly wrote:
> 
>> On Fri, 2004-12-17 at 17:56 +0000, Seaborne, Andy wrote:
>>
>>> 1/ Character sets
>>>
>>> I propose SPARQL queries use UTF-8
>>
>>
>>
>> SPARQL queries are sequences of characters; how they're
>> encoded is a protocol issue, right?
> 
> 
> Yes, noting that Content-Type does not apply to the request URI.  And 
> RFC 2396 is a bit vague on the matter as to the charset of what is being 
> encoded.
> 
> Literals can contain any character from UTF and there is no 
> distinguishing markers.  I think this means we have to choose one.
> 
> As currently stated, the SPARQL query language syntax uses XML 1.1 
> qnames which includes a wide range of characters in UTF.
> 
> What does IRI say?  Any suggestions from that direction?

More mundanely, queries might be written into files on disk (like in the 
test suite!).  A single convention would less confusing.

Nearby:

N3 is defined for files over UTF-8:
"N3 files are encoded in UTF-8"
http://www.w3.org/DesignIssues/Notation3.html

N-triples uses US-ASCII

The Turtle grammar is over UNICODE.  It doesn't specific an encoding for a 
file explicitly but does say:
"""
the content encoding of Turtle content is always UTF-8.
"""

I will follow this trend and put in rq23/ that the grammar is over the 
UNICODE character set, and say that the files are encoded in UTF-8.

	Andy

> 
>>
>> i.e. under the "chair expects editor to respond to each
>> proposal to change that editor's spec; others in the
>> WG are welcome to advise; chair steps in if consensus
>> does not emerge" sort of game, I'm watching for Kendall's response.
>>
>>
>>> This allows multi (natural) language queries.
>>>
>>> HTTP GET will have to encoded as usual - we do need to decide the 
>>> string being
>>> encoded.
>>>
>>> In HTTP POST, Content-Type applies to the entity body.
>>> A request sent by HTTP POST may use Content-Type to change the charset.
>>>
>>> Experiences with declaring the charset in the content show
>>> this to be very error prone:
>>>
>>>       a/ it may disagree with the HTTP header
>>>
>>>       b/ once opened in one fashion, say the default platform charset,
>>>          it can be hard to reopen in another fashion: the underlying
>>>          stream maybe buffered.
>>>
>>> Aside: as the syntax currently stands (a keyword must be first), it 
>>> is possible
>>> to snoop and tell the difference between UTF-8 and UTF-16.
>>>
>>>
>>> 2/ We will need a URI for SPARQL
>>
>>
>>
>> I'm not so sure.
>>
>> My implementation experience suggests we choose
>> a URI for the relationship between
>> a KB and a SPARQL query for that KB.
> 
> 
> In Joseki, there is a URI for the language and this is associated with a 
> KB/service by a property.  This fits with the "query-lang=" parameter 
> but if you wish to define that as the relationship between SPARQL and 
> any KB then fine.
> 
> Which ever, "SPARQL" is a concept so we should give it a URI so people 
> can reference it anyway.
> 
> 
>>
>>
>>> Suggestions:
>>> http://www.w3.org/2001/sw/DataAccess/SPARQL
>>>
>>> (We might want to allow for future revisions but I assume a new WG 
>>> would have a new URI itself so versioning isn't needed here).
>>>
>>>
>>> 3/ Relative URIs
>>>
>>> Queries would need a base URI to resolve any relative URIs.
>>
>>
>>
>> would... subjunctive...
>> is this an issue in the current draft?
>>
>> I can't tell from the grammar...
>>   http://www.w3.org/2001/sw/DataAccess/rq23/#term-sparql-URI
>>   $Revision: 1.160 $ of $Date: 2004/12/17 18:16:17 $
>>
>> I suggest uriRef as the terminal name, if relative URI references
>> are, by intent, allowed.
>>
>> Hmm... we don't currently specify how the syntactic productions
>> relate to the formal definitions, do we?
>>
>>
>>
>>> We can either say "no relative URIs" (that might makes the tests 
>>> harder if we follow the style of the manifests in using relative URIs).
>>>
>>> For the protocol, "query-uri=" is a natural default base but there 
>>> isn't a natural one in all situations like local queries from a 
>>> program or one sent as plain "query="
>>>
>>> I suggest a BASE clause in the QL that must be before PREFIXes.  It 
>>> takes a single, <> quoted URI. It is not required in every query.
>>
>>
>>
>> Seems reasonable.
>>
>>
>>>     Andy
>>
>>
>>
>
Received on Friday, 17 December 2004 21:22:25 UTC