Re: Status update on SPARQL Language document from Seaborne, Andy on 2004-10-11 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 11 Oct 2004 12:05:24 +0100
To: Kevin Wilkinson <wilkinson@hpl.hp.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <416A68F4.5050004@hp.com>
Kevin,

Thank you for the comments. The audit trail is below and the changes, where 
made, go into v1.108, v1.109, v1.110

Kevin Wilkinson wrote:
> here are my comments on draft 1.104 of the SPARQL Query Lang.
> document. so far, i've only reviewed sections 1-5. i'll send
> comments, if any, on the remaining sections later this week.
> 
> in general, great job by eric and andy. many of my comments
> are word-smithing. ignore or incorporate as you see fit.
> 
> i used the following notation in my comments.
>    '-' delimits things to remove.
>    '+' delimits things to add.
>    *NOTE: blah, blah, blah * delimits my comments.
>    undelimited text is used to provide context.
> 
> kevin
> 
> 
> ------------------------------------------------------------------------
> 
>  
> Comments on SPARQL draft 1.104 (2004/10/08) - Kevin Wilkinson
> 
> 1 Introduction
> 
> An RDF graph is +encoded as+ a set of triples,
> -each consisting of -
> +each comprising+ a subject, an object-,- and a
> property relationship between them [12]. 

Nack.  The RDF graph *is* a set of triples.

> +A triple is also referred to as a statement.

I have been avoiding "statement" and using triples through out.  Query works on 
the set of triples, statement is the logic concept represented by a triple.

Statement is the term used by the first RDF spec; the recent WG prefers triple 
in this context, I believe.

> The
> RDF terms in a triple are either URIs, blank nodes
> (bNodes), plain literals and typed literals (defined in RDF 
> Concepts and Abstract syntax).+

I have moved the link to "RDF Concepts" to this point.

> ... it may be a graph that is partly calculated on 
> demand +(e.g., by giving the inference closure)+,
> or it may be an RDF representation of a legacy database.
> 
> SPARQL is a query language for accessing such RDF graphs.
> It provides facilities to:
>    * -select- +extract+ information +, i.e.  extract
>      subjects, properties and/or objects, from queried graphs+

"extract is better : now says

"""
* extract information in the form of URIs, bNodes, plain and typed literals.
"""

>    * extract RDF subgraphs +of queried graphs+

I think this is implied by "sub"

>    * construct new RDF graphs based on information -from
>      the target of the query- +in the queried graphs+.

Done.

> 
> As a data access language, it is suitable for -both local
> and remote use- +querying graphs that are either local to
> or remote from the client (host machine).+

Hmm - need also to consider the case of same machine, different process.  I'll 
leave it as the vaguer "local and remote".

> 
> 
> 2 Making Simple Queries
> 
> -Queries match graph patterns against the target graph of
> the query.  Patterns are like graphs but may named variables
> in place of some of the nodes or predicates; the simplest graph
> patterns are single triple patterns. The RDF terms are URIs,
> blank nodes (bNodes), plain literals and typed literals (defined
> in RDF Concepts and Abstract syntax). Graph patterns can be
> combined using various operators into more complicated graph
> patterns.-
> +Queries match graph patterns against the target graph(s) of
> the query.  The simplest graph pattern is a single triple
> pattern. This is a triple comprising RDF terms or named
> variables and it matches all triples in a graph whose
> corresponding subject, object or property are equal to the
> correspond RDF term in the pattern. The named variables in
> the pattern, if any, are then bound to their corresponding 
> subject, object or property in the matched triples. More
> complicated graph patterns can be constructed from single
> triple patterns and various operators.+ 

I leave this to Eric but I don't think we have to make the change in order to 
publish.  This part needs to be rewored based on the rest of section 2 and 
especially 2.1.  As such, one set of approximaye words will do.

> 
> A binding is a mapping from the variables in a query to terms.
> A result mapping is a binding which, when applied to the
> variables in the query,
> -produces a subgraph of the target graph-
> +produces a set of terms from the queried graph+; a result
> is a set of result mappings. If there are no result mappings,
> the result set is empty.

Some rewording.  Needs to brought into line with the rest of the document.

> 
> Pictorially, suppose we have a graph with two triples and
> +apply+ the given triple pattern:
> 
> -with- +we get the+ result:

Will leave pending revision of this section.

> 
> *NOTE: I suggest using the graph0 and query0 rather than
> triple1-2 and triplePattern1. Multiple triples form a graph
> and a triple pattern IS a query applied to a graph. So, the
> picture is a bit confusing.*
> 
> -RDF graphs are constructed from one or more triples, ex.  graph1.-
> +A more complicated query may combine bindings from multiple
> triple patterns. Consider query1 applied to graph1.+
> 
> *NOTE: the figure for query1 has a typo: change ?addrm to ?addr.*

Leave for Eric - I can't edit the pictures.

> 
> 2.1 Writing a Simple Query
> 
> +SPARQL uses an SQL-like syntax for expressing queries.+

I was hoping not tohave to justify a claim of "SQL-like" as it means different 
things to different people.

> The example below ... and WHERE clause -gives- +contains just+
> one triple pattern.

Done by s/gives/has/

> 
> The terms -quoted- +delimited+ by "<>" are URI References.

Done

> 
> -Variables are indicated by '??'; the '?' does not form part
> of the variables' name.-
> +Variable names are prefixed by '??';  the '?' is not part
> of the variable?'s name.+

There has been some processing error here - entities keep geeting corrupted (any 
idea why Eric? tidy?).

> 
> Because URIRefs can -become- +be+ long,

Done

> 
> Prefixes are syntactic: the +prefix+ name -chosen- does not
> -effect- +affect+ the query,

Done

> -nor does it have to be the same as the data-
> +nor do prefix names in queries need to be the same prefixes
> used for data+.

Done

> 
> *NOTE: just wondering if, in the context here of typed literals,
> the document should mention that plain literals will match typed
> literals with the type xsd:string. Also, would a plain literal
 > match a literal with a lang tag? Or would an int-typed literal
 > match a float? etc. At some point, the doc should point out some
> of the nuances with typed and lang-tag literal matching.*

Good points but we can be delay until section 12?  I don't want to get the 
reader sidetracked by plain literals match xsd:strings or issues about 
xsd:integer comparing to xsd:float/xsd:double just at this point

Dalyed until after publication.

> 
> 2.2 Triple Patterns
> 
> The building blocks of queries are triple patterns. Syntactically,
> a SPARQL triple pattern is a subject, predicate and object
> -enclosed in '()'s- +delimited by parentheses+.

Done

> The previous example +query+ shows a triple pattern with a 
> -variable subject (book), a predicate of dcore:title and a
> variable object (title).-
> +a predicate of dcore:title and variables for subject and object.+

Have attempted rewording here.

> 
> -A triple pattern is matched against the graph by finding values
> for values for variables so that the triple pattern, with values
> substituted for variables, is a triple in the graph being queried.-
> +A triple pattern applied to a graph matches all triples with
> identical RDF terms for the corresponding subject, predicate
> and object.The variables in the triple pattern, if any, are then
> bound to the corresponding RDF terms in the matching triples.+

Done.

> 
> *NOTE: "RDF URI Reference" is frequently used. Why not just say
> URI? Is an RDF URI somehow different from a URI? Is a URI Reference
> different from a URI?*

URI (currently - RDF2396) does not include the #frag part.  URIRef includes teh 
#frag part.  I understand that this is to change and "URI" will cover URIRefs as 
well in a revised 2396.


> 
> A query variable is a name -, used to define queries as graph patterns-.
> *NOTE: I have no idea what that last phrase means. Delete or
> rephrase it.*

Its trying to informall scope the variables.

> 
> *NOTE: this section introduces the term ?query variable?. Is this
> different from ?variable?? I think not. So, why not just stick with
> ?variable?? Another inconsistency in this section is that ?Triple
> Pattern? is capitalized whereas previously it was lower-case.
> It?s unclear why. Is it a mistake?*

Yes - a mistake.
Tried to fix up.

> 
> -We show- +In this document, we illustrate+ bindings in results
> in tabular form -, for example:- +with one header row containing
> all variable names and a value row for each mapping of the
> result variables. For example:+

Done.

> 
> +Note that literal values are quoted, except for integers. URI?s
> are delimited by angle brackets except occasionally QNames will
> be used.+
> *NOTE: I added the above because the examples are NOT consistent
> with respect to formatting of the result bindings. You may want
> to change the examples to be consistent (e.g., all literals are
> quoted, all URIs delimited). If not, you should definitely 
> add the above sentence.*
> 
> -Not every binding needs to exist in every row of the table.-
> *NOTE: I am not sure what is meant by the above. Please rephrase
> it. Do you mean that, due to optionals, that some variables will
> not be bound in a result row?*

They should be consisteny.  If you find any that aren't, please let me know.

> 
> *NOTE: in the Definition of Triple Pattern Matching, I?m having
> trouble making the leap from B, a binding of one variable, to SB,
> a set of bindings for multiple variables. I?m really confused how
> the individual bindings, B, are combined, e.g.  cross-product,
> concatenated, what? I know it?s neither but that?s how I read it.* 

A binding is a single pair (var, RDF Term)
A set of bindings is a set of pairs.
{ (var1, term1) , (var2, term2) , ... }

> 
> If the same variable name is used more than once in a pattern
> then, within each *solution* to the query, the variable has the
> same value.
> *NOTE: ?solution? is undefined in the above sentence.
> Did you mean to say ?substitution?? If not, you need to define
> ?solution?.*

It's a forward reference.  Will leave for now.  I'd like to use the right 
terminology if it reads OK.

> 
> 2.3 Graph Patterns
> 
> The keyword WHERE is followed by a Graph Pattern which is 
> -made of one or more Triple Patterns. These Triple Patterns are
> "and"ed together.  More formally, the Graph Pattern is the conjunction
> of the Triple Patterns.-
> +a Triple Pattern or a conjunction of Triple Patterns.+
> In each query *solution*, each triple pattern must be satisfied
> with the same binding of variables to values.
> *NOTE: again, ?solution? is undefined. I?m not sure I understand
> the above sentence.*

Minor working change s/each triple pattern/all the triple patterns/

> 
> There is a bNode [12] in this dataset. Just within the file, for
> encoding purposes, the bNode is identified by _:a but the
> information about the bNode label is not in the RDF graph. No query
> will be able to identify that bNode by name.
> *NOTE: I'?m not sure I understand the last sentence. It implies that
> a bNode CANNOT be a value in a triple pattern, since that would be
> identifying the bNode by name. I don?t think that is the intention
> but that is how it reads to me.*

It can be a value - it can't be written in a query.

Is
"""
No query will be able to identify that bNode by the label used in the serialization.
"""

> 
> *NOTE: in the Definition of Graph Pattern (Partial Definition), it
> states that a set of triple patterns is a graph pattern. However,
> the sentence above this definition states that a graph pattern is
> *two* or more triple patterns. This is not consistent. A set can
> have one member.*

Fixed.

"""
A conjunctive graph pattern is a set of triple patterns
"""

> 
> 2.4 Multiple Matches
> 
> The results of a query are all the ways a query can match the graph
> being queried.
> *NOTE: I'?m confused here. Does ?'result' refer only
> to the result variables or to the complete set of bindings for the
> graph pattern? If the former, then, since the result variable list
> may not include ALL variables in the query, it seems like it could 
> exclude some ways in which the query matches the graph (especially
> if duplicates are eliminated). So, please be specific if you?re
> referring to the result variables or all variables.*

We haven't defined SELECT variables yet - its complete set of bindings.  It's 
all variables.

> 
> *NOTE: aha, here?s the definition I was looking for. Unfortunately,
> I don?t understand it. But, I?ll keep trying.  One thing I?m concerned
> about is what happens if the query variables are not ?connected??

The only meaning I understand for "connected" is in graph terms.  Graph patterns 
do not define connected pattern graphs.

> Does the definition still make sense? For example, consider the query
> "Select ?name, ?mbox Where (?x  foaf:name ?name) (?y  foaf:box ?mbox)".
> There are no linking variables in this query. We need to ensure 
> that these queries are well-defined.*

A legal query.  It may not be intended :-)

> 
> 
> 4 Including Optional Values
> 
> *NOTE: rename section to simply ?Optional Values?.*

Nack - out of style.

> 
> For every solution of the query, every variable has
> -an RDF Term- +a value+.

Pat recommedns avoiding "value" due to confusion with the value space of typed 
literals.

> -But RDF data is semi-structured data;-
> Sometimes useful, additional information about some item of interest
> in the graph can be found but, for another item, the information is
> not present.

Done

> -The application writer would such additional information but does
> not want the query to not match just because the some information
> is missing.-
> +If the application writer wants that additional information, the
> query should not fail just because the some information is missing.+

Done

> 
> In the example, only a single triple +pattern+ is given in the

Done

> optional match part of the query but in general it is a graph
> pattern.
> 
> Optional blocks can also be nested +as described in Section .xxx+.

I will add pointers around the document when section structure is stable.

> 
> *NOTE: I assume that the optional block need NOT be connected to the
> rest of the query. For example,
> "Select ?name, ?time Where (?x foaf:name ?name)
> [ ex:timezone/#ECT ex:datetime ?time ] "  *
> 
> -If a variable was introduced in one optional block and mentioned
> in another, it would be used to constrain the second. Reversing the
> order of the optional blocks would reverse the blocks in which the
> variable was was introduced and was used to constraint.-
> +If a variable was bound in one optional block and referenced in
> another, it would constrain the second. Reversing the order of
> the optional blocks would reverse the blocks in which the
> variable was bound and constrained.+

Nack - but needs rethinking anyway as it is too procedural.

> 
> 
> 4.3 Optional Matching - Formal Definition
> 
> *NOTE: in the definition, GP = (GP1 union GP2). Shouldn?t this be
> (GP1 and GP2)?  Or, maybe I?m not understanding this correctly.*

Its is union - the query formed by GP1 merged with GP2.  "And" would be satisfy 
GP1, satisfy GP2 independently.  We want a big pattern, see if it works, if it 
does, use it, else use the small one GP1.

> 
> The outer optional block must match for a nested one-s- to apply.
> That is, the outer graph pattern pattern-s- is fixed for the
> purposes of any nested optional block.

Done.

> 
> 
> 5.0 Nested Patterns
> 
> *NOTE: the second nesting example might be more interesting
> if the nested clause had optional data, e.g., some might have
> optional middle names. For example:
> 
> SELECT ?foafName ?mbox ?fname ?gname
> PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> PREFIX vcard:   <http://www.w3.org/2001/vcard-rdf/3.0#>
> WHERE  ( ?x foaf:name ?foafname )
>   [ (?x foaf:mbox ?mbox) ]
>   [ (?x  vcard:N  ?vc) (?vc vcard:Family ?fname) (?vc vcard:Given ?gname)
>     [ (?vc vcard:Given  ?gname) ]
>   ]

Left as is.

	Andy
Received on Monday, 11 October 2004 11:06:12 UTC