Re: SPARQL Protocol for RDF

On Jun 1, 2005, at 22:18, ext Kendall Clark wrote:

>
> Working Draft: SPARQL Protocol for RDF
>
> The RDF Data Access Working Group has released a second Working
> Draft of the SPARQL Protocol for RDF. The draft describes a
> protocol for conveying RDF queries from clients to query
> services. The protocol is compatible with the SPARQL query
> language (pronounced "sparkle") and may be used to to convey
> queries from other RDF query languages as well.
>
>    http://www.w3.org/TR/2005/WD-rdf-sparql-protocol-20050527/
>
> The RDF Data Access Working Group is seeking feedback and
> comments on the SPARQL Protocol for RDF from stakeholders,
> interested parties, and potential implementors.
>
> Please direct all feedback to the DAWG comments mailing list:
>
>    public-rdf-dawg-comments@w3.org
>
> Thanks,
> Kendall Clark
>
>


Great work!

A few questions/comments:

1. All examples explicitly specify the background graph, and while 
that's
of course not incorrect, it would perhaps be good if the
first few basic examples would omit specification of a background graph 
to
reflect what will most likely be the most common use case, that of a
given SPARQL portal recieving queries without any specification of a
the background graph, and defaulting to the default background graph of 
the
portal itself.

2. The parameter for specifying the background graph should follow the
terminology used in the SPARQL spec and thus should be named
'background-graph-uri' and not 'default-graph-uri' as the term
"default graph" has no definition in SPARQL.

(The above comments of course presume that the parameter for specifying
a background graph will remain, which, per my next comment, may not be
the case)

3. Since the SPARQL language itself provides facilities for
explicitly specifying which graph a given query, or components
of a query, should be evaluated against, and the query itself can
(and if needed, must) include FROM/FROM NAMED qualifications
naming the relevant graphs, why is it necessary to redundantly
specify any graph URIs in the parameters?

If a specific graph needs to be specified, then it seems to me
that it would be better, as in more economical and elegant, to
use the existing machinery in the SPARQL query language itself
to identify those graphs rather than introducing alternative,
and potentially redundant parameters for doing so. This also
alleviates any chance of conflicts between the query and the
parameters, such as if they disagree about which graph is the
background graph.

It seems to me that the only parameter needed is 'query',
and everything else can then be specified as required in the
actual SPARQL query provided to the SPARQL service.

If there is sufficient justification for adding these potentially
redundant parameters, then that should be discussed in sufficient
detail in the specification (otherwise, they should be removed).

4. Related to the above, but actually a comment regarding the
SPARQL spec itself, it seems there is a conflict between
the FROM construct and the definition of a dataset, since, if
the background graph is "unnamed", then how could one
refer to it with a FROM construct? I think the problem here
is simply with language, not an inherent flaw in SPARQL.

It is my understanding that, while not manditory, the URIs
specified using the FROM and FROM NAMED constructs are
often expected/hoped to be resolvable at run time to a graph,
by dereferencing such URIs, and that many SPARQL processors
when encountering unknown graph names will attempt to retrieve
those graphs via their URIs. That's fine, and demonstrates
how well the OFWeb and SemWeb can be integrated on the basis
of a shared set of URIs (let's just hope that everyone agrees
that graphs are information resources ;-) but the bottom line
is that a named graph is a named graph is a named graph, so
if one can use FROM to specify the background graph of a dataset,
then the background graph of a dataset can be a named graph (even
if it need not be named for all queries/applications).

I think that the definition of a dataset should not state that
the background graph is necessarily unnamed, but rather than it is
simply the background graph, such that any queries evaluated against
that dataset, which do not specify any graph, are evaluated
against that background graph. Now, how a given SPARQL processor
knows which graph is the background graph for a given query is
of course relevant, and I don't see that any major changes are
needed to SPARQL to identify the background graph.

Namely, if no FROM clause is provided, then it is left up to the
SPARQL processor to decide which is the background graph for a
given query. If there is a FROM clause provided, then the graph
thus specified is the background graph for the query. Thus, it
is not essential to stipulate whether the background graph be
either unnamed or named insofar as the definition of a dataset
is concerned, only that it is clear to the processor which
graph is the background graph of a dataset when evaluating a
given query.

This can be fixed easily enough, I think, by changing the single word
'does' to 'need' in section 7 of the SPARQL spec.

I.e. change

[
    There is one graph, the background graph, which does not
    have a name, and zero or more named graphs, identified by
    URI reference.
]

to

[
    There is one graph, the background graph, which need not
    have a name, and zero or more named graphs, identified by
    URI reference.
]

and then later, add some statement such as

[
    If a given query does not specify the background graph by
    name, using the FROM operator, then the SPARQL processor
    must decide which background graph is most appropriate
    for evaluating the query. The SPARQL processor should
    be consistent in the default background graph
    used for all queries not specifying a background graph
    explicitly.
]

Of course, serialization of a dataset introduces some additional
issues, as to how to identify the background graph. My recommendation
would be to use any generic RDF serialization which supports named
graphs, and define a vocabulary to describe a dataset, which specifies
the background and/or named graphs belonging to that particular dataset.

E.g. using TriG, the dataset from Example 1 in section 7.1 of
the SPARQL spec could be unambiguously serialized as:

@prefix sparql: <http://www.w3.org/TR/rdf-sparql-query/> .
@prefix dc:     <http://purl.org/dc/elements/1.1/> .
@prefix foaf:   <http://xmlns.com/foaf/0.1/> .
@prefix :       <http://example.com/myDatasetSerialization/> .

:ds a sparql:Dataset ;
     sparql:BackgroundGraph :bg ;
     sparql:NamedGraph      <http://example.org/bob> ;
     sparql:NamedGraph      <http://example.org/alice>.

:bg
{
    <http://example.org/bob>    dc:publisher  "Bob" .
    <http://example.org/alice>  dc:publisher  "Alice" .
}

<http://example.org/bob>
{
    _:a foaf:name "Bob" .
    _:a foaf:mbox <mailto:bob@oldcorp.example.org> .
}

<http://example.org/alice>
{
    _:a foaf:name "Alice" .
    _:a foaf:mbox <mailto:alice@work.example.org> .
}

It's important to note that, in the case of serializing datasets with
unnamed background graphs, it is necessary to give the background graph
a name, but in doing so, it also means that by using this approach,
serialization formats such as TriG and TriX can be used to serialize
multiple datasets in a single TriG or TriX instance (if ever useful
or necessary to do so) in addition to unambiguously serializing a
single dataset.

(I've been generally uncomfortable with processors naming unnamed 
graphs,
for the sake of round trip integrity and consistency, but I've come to
see this approach as the least expensive and disruptive to existing
tools and processes, and one which maximally exploits the RDF machinery.
Earlier comments regarding serialization were also based in the 
understanding
that background graphs must be unnamed, hence introducing a problem when
directly parsing/syndicating a serialization where the background graph 
has
been named -- but as this is actually not the case, and such a conflict
would not arise, I feel much more comfortable with this approach)

Regards,

Patrick

--

Patrick Stickler
Senior Architect
Forum Nokia
Hatanpäänkatu 1 A
33900 Tampere Finland

phone:  +358 40 801 9690
fax:    +358 7180 75700
email:  patrick.stickler@nokia.com

Forum Nokia provides a wealth of resources to mobile
developers. For the latest on mobile tools, devices and
technologies, go to http://www.forum.nokia.com

Received on Thursday, 2 June 2005 07:12:41 UTC