- From: Kevin Wilkinson <wilkinson@hpl.hp.com>
- Date: Sun, 10 Oct 2004 15:12:58 -0700
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
- Message-ID: <4169B3EA.896B44EF@hpl.hp.com>
here are my comments on draft 1.104 of the SPARQL Query Lang. document. so far, i've only reviewed sections 1-5. i'll send comments, if any, on the remaining sections later this week. in general, great job by eric and andy. many of my comments are word-smithing. ignore or incorporate as you see fit. i used the following notation in my comments. '-' delimits things to remove. '+' delimits things to add. *NOTE: blah, blah, blah * delimits my comments. undelimited text is used to provide context. kevin
Comments on SPARQL draft 1.104 (2004/10/08) - Kevin Wilkinson 1 Introduction An RDF graph is +encoded as+ a set of triples, -each consisting of - +each comprising+ a subject, an object-,- and a property relationship between them [12]. +A triple is also referred to as a statement. The RDF terms in a triple are either URIs, blank nodes (bNodes), plain literals and typed literals (defined in RDF Concepts and Abstract syntax).+ ... it may be a graph that is partly calculated on demand +(e.g., by giving the inference closure)+, or it may be an RDF representation of a legacy database. SPARQL is a query language for accessing such RDF graphs. It provides facilities to: * -select- +extract+ information +, i.e. extract subjects, properties and/or objects, from queried graphs+ * extract RDF subgraphs +of queried graphs+ * construct new RDF graphs based on information -from the target of the query- +in the queried graphs+. As a data access language, it is suitable for -both local and remote use- +querying graphs that are either local to or remote from the client (host machine).+ 2 Making Simple Queries -Queries match graph patterns against the target graph of the query. Patterns are like graphs but may named variables in place of some of the nodes or predicates; the simplest graph patterns are single triple patterns. The RDF terms are URIs, blank nodes (bNodes), plain literals and typed literals (defined in RDF Concepts and Abstract syntax). Graph patterns can be combined using various operators into more complicated graph patterns.- +Queries match graph patterns against the target graph(s) of the query. The simplest graph pattern is a single triple pattern. This is a triple comprising RDF terms or named variables and it matches all triples in a graph whose corresponding subject, object or property are equal to the correspond RDF term in the pattern. The named variables in the pattern, if any, are then bound to their corresponding subject, object or property in the matched triples. More complicated graph patterns can be constructed from single triple patterns and various operators.+ A binding is a mapping from the variables in a query to terms. A result mapping is a binding which, when applied to the variables in the query, -produces a subgraph of the target graph- +produces a set of terms from the queried graph+; a result is a set of result mappings. If there are no result mappings, the result set is empty. Pictorially, suppose we have a graph with two triples and +apply+ the given triple pattern: -with- +we get the+ result: *NOTE: I suggest using the graph0 and query0 rather than triple1-2 and triplePattern1. Multiple triples form a graph and a triple pattern IS a query applied to a graph. So, the picture is a bit confusing.* -RDF graphs are constructed from one or more triples, ex. graph1.- +A more complicated query may combine bindings from multiple triple patterns. Consider query1 applied to graph1.+ *NOTE: the figure for query1 has a typo: change ?addrm to ?addr.* 2.1 Writing a Simple Query +SPARQL uses an SQL-like syntax for expressing queries.+ The example below ... and WHERE clause -gives- +contains just+ one triple pattern. The terms -quoted- +delimited+ by "<>" are URI References. -Variables are indicated by '??'; the '?' does not form part of the variables' name.- +Variable names are prefixed by '??'; the '?' is not part of the variable?'s name.+ Because URIRefs can -become- +be+ long, Prefixes are syntactic: the +prefix+ name -chosen- does not -effect- +affect+ the query, -nor does it have to be the same as the data- +nor do prefix names in queries need to be the same prefixes used for data+. *NOTE: just wondering if, in the context here of typed literals, the document should mention that plain literals will match typed literals with the type xsd:string. Also, would a plain literal match a literal with a lang tag? Or would an int-typed literal match a float? etc. At some point, the doc should point out some of the nuances with typed and lang-tag literal matching.* 2.2 Triple Patterns The building blocks of queries are triple patterns. Syntactically, a SPARQL triple pattern is a subject, predicate and object -enclosed in '()'s- +delimited by parentheses+. The previous example +query+ shows a triple pattern with a -variable subject (book), a predicate of dcore:title and a variable object (title).- +a predicate of dcore:title and variables for subject and object.+ -A triple pattern is matched against the graph by finding values for values for variables so that the triple pattern, with values substituted for variables, is a triple in the graph being queried.- +A triple pattern applied to a graph matches all triples with identical RDF terms for the corresponding subject, predicate and object.The variables in the triple pattern, if any, are then bound to the corresponding RDF terms in the matching triples.+ *NOTE: "RDF URI Reference" is frequently used. Why not just say URI? Is an RDF URI somehow different from a URI? Is a URI Reference different from a URI?* A query variable is a name -, used to define queries as graph patterns-. *NOTE: I have no idea what that last phrase means. Delete or rephrase it.* *NOTE: this section introduces the term ?query variable?. Is this different from ?variable?? I think not. So, why not just stick with ?variable?? Another inconsistency in this section is that ?Triple Pattern? is capitalized whereas previously it was lower-case. It?s unclear why. Is it a mistake?* -We show- +In this document, we illustrate+ bindings in results in tabular form -, for example:- +with one header row containing all variable names and a value row for each mapping of the result variables. For example:+ +Note that literal values are quoted, except for integers. URI?s are delimited by angle brackets except occasionally QNames will be used.+ *NOTE: I added the above because the examples are NOT consistent with respect to formatting of the result bindings. You may want to change the examples to be consistent (e.g., all literals are quoted, all URIs delimited). If not, you should definitely add the above sentence.* -Not every binding needs to exist in every row of the table.- *NOTE: I am not sure what is meant by the above. Please rephrase it. Do you mean that, due to optionals, that some variables will not be bound in a result row?* *NOTE: in the Definition of Triple Pattern Matching, I?m having trouble making the leap from B, a binding of one variable, to SB, a set of bindings for multiple variables. I?m really confused how the individual bindings, B, are combined, e.g. cross-product, concatenated, what? I know it?s neither but that?s how I read it.* If the same variable name is used more than once in a pattern then, within each *solution* to the query, the variable has the same value. *NOTE: ?solution? is undefined in the above sentence. Did you mean to say ?substitution?? If not, you need to define ?solution?.* 2.3 Graph Patterns The keyword WHERE is followed by a Graph Pattern which is -made of one or more Triple Patterns. These Triple Patterns are "and"ed together. More formally, the Graph Pattern is the conjunction of the Triple Patterns.- +a Triple Pattern or a conjunction of Triple Patterns.+ In each query *solution*, each triple pattern must be satisfied with the same binding of variables to values. *NOTE: again, ?solution? is undefined. I?m not sure I understand the above sentence.* There is a bNode [12] in this dataset. Just within the file, for encoding purposes, the bNode is identified by _:a but the information about the bNode label is not in the RDF graph. No query will be able to identify that bNode by name. *NOTE: I'?m not sure I understand the last sentence. It implies that a bNode CANNOT be a value in a triple pattern, since that would be identifying the bNode by name. I don?t think that is the intention but that is how it reads to me.* *NOTE: in the Definition of Graph Pattern (Partial Definition), it states that a set of triple patterns is a graph pattern. However, the sentence above this definition states that a graph pattern is *two* or more triple patterns. This is not consistent. A set can have one member.* 2.4 Multiple Matches The results of a query are all the ways a query can match the graph being queried. *NOTE: I'?m confused here. Does ?'result' refer only to the result variables or to the complete set of bindings for the graph pattern? If the former, then, since the result variable list may not include ALL variables in the query, it seems like it could exclude some ways in which the query matches the graph (especially if duplicates are eliminated). So, please be specific if you?re referring to the result variables or all variables.* *NOTE: aha, here?s the definition I was looking for. Unfortunately, I don?t understand it. But, I?ll keep trying. One thing I?m concerned about is what happens if the query variables are not ?connected?? Does the definition still make sense? For example, consider the query "Select ?name, ?mbox Where (?x foaf:name ?name) (?y foaf:box ?mbox)". There are no linking variables in this query. We need to ensure that these queries are well-defined.* 4 Including Optional Values *NOTE: rename section to simply ?Optional Values?.* For every solution of the query, every variable has -an RDF Term- +a value+. -But RDF data is semi-structured data;- Sometimes useful, additional information about some item of interest in the graph can be found but, for another item, the information is not present. -The application writer would such additional information but does not want the query to not match just because the some information is missing.- +If the application writer wants that additional information, the query should not fail just because the some information is missing.+ In the example, only a single triple +pattern+ is given in the optional match part of the query but in general it is a graph pattern. Optional blocks can also be nested +as described in Section .xxx+. *NOTE: I assume that the optional block need NOT be connected to the rest of the query. For example, "Select ?name, ?time Where (?x foaf:name ?name) [ ex:timezone/#ECT ex:datetime ?time ] " * -If a variable was introduced in one optional block and mentioned in another, it would be used to constrain the second. Reversing the order of the optional blocks would reverse the blocks in which the variable was was introduced and was used to constraint.- +If a variable was bound in one optional block and referenced in another, it would constrain the second. Reversing the order of the optional blocks would reverse the blocks in which the variable was bound and constrained.+ 4.3 Optional Matching - Formal Definition *NOTE: in the definition, GP = (GP1 union GP2). Shouldn?t this be (GP1 and GP2)? Or, maybe I?m not understanding this correctly.* The outer optional block must match for a nested one-s- to apply. That is, the outer graph pattern pattern-s- is fixed for the purposes of any nested optional block. 5.0 Nested Patterns *NOTE: the second nesting example might be more interesting if the nested clause had optional data, e.g., some might have optional middle names. For example: SELECT ?foafName ?mbox ?fname ?gname PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> WHERE ( ?x foaf:name ?foafname ) [ (?x foaf:mbox ?mbox) ] [ (?x vcard:N ?vc) (?vc vcard:Family ?fname) (?vc vcard:Given ?gname) [ (?vc vcard:Given ?gname) ] ]
Received on Sunday, 10 October 2004 22:09:55 UTC