Re: Status update on SPARQL Language document

here are my comments on draft 1.104 of the SPARQL Query Lang.
document. so far, i've only reviewed sections 1-5. i'll send
comments, if any, on the remaining sections later this week.

in general, great job by eric and andy. many of my comments
are word-smithing. ignore or incorporate as you see fit.

i used the following notation in my comments.
   '-' delimits things to remove.
   '+' delimits things to add.
   *NOTE: blah, blah, blah * delimits my comments.
   undelimited text is used to provide context.

kevin
 
Comments on SPARQL draft 1.104 (2004/10/08) - Kevin Wilkinson

1 Introduction

An RDF graph is +encoded as+ a set of triples,
-each consisting of -
+each comprising+ a subject, an object-,- and a
property relationship between them [12]. 
+A triple is also referred to as a statement. The
RDF terms in a triple are either URIs, blank nodes
(bNodes), plain literals and typed literals (defined in RDF 
Concepts and Abstract syntax).+ 
... it may be a graph that is partly calculated on 
demand +(e.g., by giving the inference closure)+,
or it may be an RDF representation of a legacy database.

SPARQL is a query language for accessing such RDF graphs.
It provides facilities to:
   * -select- +extract+ information +, i.e.  extract
     subjects, properties and/or objects, from queried graphs+
   * extract RDF subgraphs +of queried graphs+
   * construct new RDF graphs based on information -from
     the target of the query- +in the queried graphs+.

As a data access language, it is suitable for -both local
and remote use- +querying graphs that are either local to
or remote from the client (host machine).+


2 Making Simple Queries

-Queries match graph patterns against the target graph of
the query.  Patterns are like graphs but may named variables
in place of some of the nodes or predicates; the simplest graph
patterns are single triple patterns. The RDF terms are URIs,
blank nodes (bNodes), plain literals and typed literals (defined
in RDF Concepts and Abstract syntax). Graph patterns can be
combined using various operators into more complicated graph
patterns.-
+Queries match graph patterns against the target graph(s) of
the query.  The simplest graph pattern is a single triple
pattern. This is a triple comprising RDF terms or named
variables and it matches all triples in a graph whose
corresponding subject, object or property are equal to the
correspond RDF term in the pattern. The named variables in
the pattern, if any, are then bound to their corresponding 
subject, object or property in the matched triples. More
complicated graph patterns can be constructed from single
triple patterns and various operators.+ 

A binding is a mapping from the variables in a query to terms.
A result mapping is a binding which, when applied to the
variables in the query,
-produces a subgraph of the target graph-
+produces a set of terms from the queried graph+; a result
is a set of result mappings. If there are no result mappings,
the result set is empty.

Pictorially, suppose we have a graph with two triples and
+apply+ the given triple pattern:

-with- +we get the+ result:

*NOTE: I suggest using the graph0 and query0 rather than
triple1-2 and triplePattern1. Multiple triples form a graph
and a triple pattern IS a query applied to a graph. So, the
picture is a bit confusing.*

-RDF graphs are constructed from one or more triples, ex.  graph1.-
+A more complicated query may combine bindings from multiple
triple patterns. Consider query1 applied to graph1.+

*NOTE: the figure for query1 has a typo: change ?addrm to ?addr.*

2.1 Writing a Simple Query

+SPARQL uses an SQL-like syntax for expressing queries.+
The example below ... and WHERE clause -gives- +contains just+
one triple pattern.

The terms -quoted- +delimited+ by "<>" are URI References.

-Variables are indicated by '??'; the '?' does not form part
of the variables' name.-
+Variable names are prefixed by '??';  the '?' is not part
of the variable?'s name.+

Because URIRefs can -become- +be+ long,

Prefixes are syntactic: the +prefix+ name -chosen- does not
-effect- +affect+ the query,
-nor does it have to be the same as the data-
+nor do prefix names in queries need to be the same prefixes
used for data+.

*NOTE: just wondering if, in the context here of typed literals,
the document should mention that plain literals will match typed
literals with the type xsd:string. Also, would a plain literal
match a literal with a lang tag? Or would an int-typed literal 
match a float? etc. At some point, the doc should point out some
of the nuances with typed and lang-tag literal matching.*

2.2 Triple Patterns

The building blocks of queries are triple patterns. Syntactically,
a SPARQL triple pattern is a subject, predicate and object
-enclosed in '()'s- +delimited by parentheses+.
The previous example +query+ shows a triple pattern with a 
-variable subject (book), a predicate of dcore:title and a
variable object (title).-
+a predicate of dcore:title and variables for subject and object.+

-A triple pattern is matched against the graph by finding values
for values for variables so that the triple pattern, with values
substituted for variables, is a triple in the graph being queried.-
+A triple pattern applied to a graph matches all triples with
identical RDF terms for the corresponding subject, predicate
and object.The variables in the triple pattern, if any, are then
bound to the corresponding RDF terms in the matching triples.+

*NOTE: "RDF URI Reference" is frequently used. Why not just say
URI? Is an RDF URI somehow different from a URI? Is a URI Reference
different from a URI?*

A query variable is a name -, used to define queries as graph patterns-.
*NOTE: I have no idea what that last phrase means. Delete or
rephrase it.*

*NOTE: this section introduces the term ?query variable?. Is this
different from ?variable?? I think not. So, why not just stick with
?variable?? Another inconsistency in this section is that ?Triple
Pattern? is capitalized whereas previously it was lower-case.
It?s unclear why. Is it a mistake?*

-We show- +In this document, we illustrate+ bindings in results
in tabular form -, for example:- +with one header row containing
all variable names and a value row for each mapping of the
result variables. For example:+

+Note that literal values are quoted, except for integers. URI?s
are delimited by angle brackets except occasionally QNames will
be used.+
*NOTE: I added the above because the examples are NOT consistent
with respect to formatting of the result bindings. You may want
to change the examples to be consistent (e.g., all literals are
quoted, all URIs delimited). If not, you should definitely 
add the above sentence.*

-Not every binding needs to exist in every row of the table.-
*NOTE: I am not sure what is meant by the above. Please rephrase
it. Do you mean that, due to optionals, that some variables will
not be bound in a result row?*

*NOTE: in the Definition of Triple Pattern Matching, I?m having
trouble making the leap from B, a binding of one variable, to SB,
a set of bindings for multiple variables. I?m really confused how
the individual bindings, B, are combined, e.g.  cross-product,
concatenated, what? I know it?s neither but that?s how I read it.* 

If the same variable name is used more than once in a pattern
then, within each *solution* to the query, the variable has the
same value.
*NOTE: ?solution? is undefined in the above sentence.
Did you mean to say ?substitution?? If not, you need to define
?solution?.*

2.3 Graph Patterns

The keyword WHERE is followed by a Graph Pattern which is 
-made of one or more Triple Patterns. These Triple Patterns are
"and"ed together.  More formally, the Graph Pattern is the conjunction
of the Triple Patterns.-
+a Triple Pattern or a conjunction of Triple Patterns.+
In each query *solution*, each triple pattern must be satisfied
with the same binding of variables to values.
*NOTE: again, ?solution? is undefined. I?m not sure I understand
the above sentence.*

There is a bNode [12] in this dataset. Just within the file, for
encoding purposes, the bNode is identified by _:a but the
information about the bNode label is not in the RDF graph. No query
will be able to identify that bNode by name.
*NOTE: I'?m not sure I understand the last sentence. It implies that
a bNode CANNOT be a value in a triple pattern, since that would be
identifying the bNode by name. I don?t think that is the intention
but that is how it reads to me.*

*NOTE: in the Definition of Graph Pattern (Partial Definition), it
states that a set of triple patterns is a graph pattern. However,
the sentence above this definition states that a graph pattern is
*two* or more triple patterns. This is not consistent. A set can
have one member.*

2.4 Multiple Matches

The results of a query are all the ways a query can match the graph
being queried.
*NOTE: I'?m confused here. Does ?'result' refer only
to the result variables or to the complete set of bindings for the
graph pattern? If the former, then, since the result variable list
may not include ALL variables in the query, it seems like it could 
exclude some ways in which the query matches the graph (especially
if duplicates are eliminated). So, please be specific if you?re
referring to the result variables or all variables.*

*NOTE: aha, here?s the definition I was looking for. Unfortunately,
I don?t understand it. But, I?ll keep trying.  One thing I?m concerned
about is what happens if the query variables are not ?connected??
Does the definition still make sense? For example, consider the query
"Select ?name, ?mbox Where (?x  foaf:name ?name) (?y  foaf:box ?mbox)".
There are no linking variables in this query. We need to ensure 
that these queries are well-defined.*


4 Including Optional Values

*NOTE: rename section to simply ?Optional Values?.*

For every solution of the query, every variable has
-an RDF Term- +a value+.
-But RDF data is semi-structured data;-
Sometimes useful, additional information about some item of interest
in the graph can be found but, for another item, the information is
not present.
-The application writer would such additional information but does
not want the query to not match just because the some information
is missing.-
+If the application writer wants that additional information, the
query should not fail just because the some information is missing.+

In the example, only a single triple +pattern+ is given in the
optional match part of the query but in general it is a graph
pattern.

Optional blocks can also be nested +as described in Section .xxx+.

*NOTE: I assume that the optional block need NOT be connected to the
rest of the query. For example,
"Select ?name, ?time Where (?x foaf:name ?name)
[ ex:timezone/#ECT ex:datetime ?time ] "  *

-If a variable was introduced in one optional block and mentioned
in another, it would be used to constrain the second. Reversing the
order of the optional blocks would reverse the blocks in which the
variable was was introduced and was used to constraint.-
+If a variable was bound in one optional block and referenced in
another, it would constrain the second. Reversing the order of
the optional blocks would reverse the blocks in which the
variable was bound and constrained.+


4.3 Optional Matching - Formal Definition

*NOTE: in the definition, GP = (GP1 union GP2). Shouldn?t this be
(GP1 and GP2)?  Or, maybe I?m not understanding this correctly.*

The outer optional block must match for a nested one-s- to apply.
That is, the outer graph pattern pattern-s- is fixed for the
purposes of any nested optional block.


5.0 Nested Patterns

*NOTE: the second nesting example might be more interesting
if the nested clause had optional data, e.g., some might have
optional middle names. For example:

SELECT ?foafName ?mbox ?fname ?gname
PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
PREFIX vcard:   <http://www.w3.org/2001/vcard-rdf/3.0#>
WHERE  ( ?x foaf:name ?foafname )
  [ (?x foaf:mbox ?mbox) ]
  [ (?x  vcard:N  ?vc) (?vc vcard:Family ?fname) (?vc vcard:Given ?gname)
    [ (?vc vcard:Given  ?gname) ]
  ]

Received on Sunday, 10 October 2004 22:09:55 UTC