Re: Inline data from Eric Prud'hommeaux on 2012-04-30 (public-rdf-dawg@w3.org from April to June 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 30 Apr 2012 08:12:24 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <20120430121223.GE7808@w3.org>
* Andy Seaborne <andy.seaborne@epimorphics.com> [2012-04-28 11:46+0100]
> This is the "BINDINGS anywhere in a graph pattern" feature.
> 
> I called it "inline data" to step back from the syntax details.  The
> word "BINDINGS" gets a bit mixed up with "BIND" when they can both
> be in group graph patterns.
> 
> 	Andy
> 
> == Summary
> 
> SELECT *
> {
>     DATA ?x { :x1 :x2 }
>     ?x rdfs:label ?label .
> }
> 
> 
> SELECT *
> {
>     DATA (?x ?mylabel) {
>         (:x1 "X1")
>         (:x2 "X2")
>     }
>     OPTIONAL { ?x rdfs:label ?label }
> }
> 
> "DATA" being a word that isn't BINDINGS.
> 
> == Syntax
> 
> The comments and modification of use cases prompted me to check the
> syntax. BINDINGS with one variable is a bit ugly as each term needs
> a () round it.  This really isn't necessary.  I'm guessing that
> one-variable inline data will be a common use cases if we think of
> it as inline data.  The comments suggest this.  (It comes up in my
> work on the linked data API where one query gets some candidates and
> a second query gets more information on each candidate.)
> 
> A possibility is:
> 
> # Short form - one variable, no () at all.
> DATA ?var { <iri1> <iri2> 3 4 }
> 
> # Full from.  Consistently (...) around a row header or row data.
> DATA (?var1 ?var2) {
>   (<iri1> "a")
>   (<iri2> "b")
> }

This seems like a balance between consistency and convenience. It's
grammatically distinguishable in LL/LALR(1) with this patch:

-[29]  DataBlock  ::=      Var*     '{' ( '(' DataBlockValue* ')' | NIL )* '}'
+[29]  DataBlock  ::=  '(' Var* ')' '{' ( '(' DataBlockValue* ')' | NIL )* '}'
                    |      Var      '{'       DataBlockValue*              '}'

IMO, it actually adds some consistency by sticking parens around the
var list.

> Insert your favorite keyword choice here:
> 
> TABLE
>    it's a bit concrete and tables don't get a mention anywhere else
> 
> DATA
>    my pref, even though its used in INSERT DATA in SPARQL Update
> 
> BINDINGS
>    OK but confusion with BIND? Unnecessarily long?

One problem with "DATA" is that SPARQL's data is RDF triples, not
variable bindings. Practically, we may some day want to add premises
like:

  DATA { :Fido a :Dog }
  SELECT ?mammal { ?mammal a :Mammal }

I don't have a strong opinion yet, want to reflect a bit after
gathering ideas.


> The choices of delimiters is fairly free - the only requirement is
> an explicit end for variables (the "{"), end of data rows (the "}").
> Having row grouping is very useful for the multi-variable case.
> 
> c.f.1.
> 
> BINDINGS has an un-delimited list of variables always and the data
> rows must have (...)
> 
> BINDINGS ?var { (<iri1>) (<iri2>) (3) (4) }
> 
> c.f.2.
> 
> FILTER ( ?x IN (<iri1>, <iri2>, 3, 4) )
> 
> 
> ==== Spec changes
> 
> == 10.2 BINDINGS
> 
> Rework description and examples.
> 
> == Grammar
> 
> Grammar: add to list of units in a GroupGraphPattern
> 
> []  GraphPatternNotTriples
>    ::=  GroupOrUnionGraphPattern | OptionalGraphPattern |
>         MinusGraphPattern | GraphGraphPattern | ServiceGraphPattern |
>         Filter | Bind | InlineDataClause

SWObects already has this. I find it quite handy for debugging and
test cases.


> == Algebra
> 
> I suggested earlier that it should float to the end of the group,
> just before the FILTERs but that does not work out.  It needs to be
> joined into the group in the location it occurs in (just like a
> subquery).  It is like BIND in that it ends the BGP.
> 
> Worked example below.
> 
> No new operators - everything is there already.
> 
> 18.2.4.3 BINDINGS
> Move the text from here which turns the BINDINGs syntax into a table
> to just before the pattern translation step (18.2.2.6)
> 
> 18.2.2.6 Translate Graph Patterns
> 
> The algebra transformation step does not have to be changed at all
> because it falls under the catch all
> 
>    If E is any other form
>         Let A := Translate(E)
>         G := Join(G, A)
>         End
> 
> == Evaluation
> 
> No change.  We were already turning BINDINGS into join(..., data
> table) and this is just the same.  BINDINGS didn't have anything
> special by the time evaluation is defined.
> 
> == What to do with BINDINGS?
> 
> We can leave BINDINGS as it is ("legacy"), rename it to be the same
> as the inline data (if name changes) or remove it.
> 
> BINDINGS happens after Grouping/Aggregates, HAVING and before select
> expressions.  It seems to me to be unlikely to see it used with
> group/aggregate - if removed, you'd need a subquery for the group,
> the a join with inline data.  That is, this more specialized case
> needs more syntax.
> 
> Caveat the different syntax from DATA.
> 
> My preference is to bite the bullet now and remove BINDINGS.  There
> may be complaints, and they are right to complain as we have done 2
> LC's, but if we are making changes, I think doing it properly for
> the long term is better.
> 
> I am also happy for it to be left as-is as legacy.
> 
> ==== Worked example
> 
> ---- Data
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix : <http://example/> .
> 
> :x1 rdfs:label "foo" .
> ## No :x2
> :x3 rdfs:label "foo" .
> ---- Data
> 
> ---- Query 1
> # Intuitively, start with some possibilities,
> # and add rdfs:labels if available.
> PREFIX : <http://example/>
> PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
> 
> SELECT *
> {
>     DATA ?x { :x1 :x2 }
>     OPTIONAL { ?x rdfs:label ?label }
> }
> 
> ---- Query 1
> ====>
> ---------------
> | x   | label |
> ===============
> | :x1 | "foo" |
> | :x2 |       |
> ---------------
> 
> Join data table with the empty BGP (this is a no-op removed by
> "simplification").
> 
> Do an optional (leftjoin) on
>    (?x=:x1 ?x=:x2)
>    leftjoin ((?x=:x1 ?label="foo), (?x=:x3 ?label="foo"))
> 
> so :x2 gets no ?label but is in the answers
> 
> ---- Query 2
> # Intuitively, do some process, restrict output
> # by joining with some fixed data.
> 
> PREFIX : <http://example/>
> PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
> 
> SELECT *
> {
>     OPTIONAL { ?x rdfs:label ?label }
>     DATA ?x { :x1 :x2 }
> }
> ---- Query 2
> 
> ====>
> ---------------
> | x   | label |
> ===============
> | :x1 | "foo" |
> ---------------
> 
> The OPTIONAL finds :x1 and :x3; the join does not have :x3 in it so
> only :x1 is in the results.
> 
> join(?x=:x1 ?x=:x2) with ((?x=:x1 ?label="foo), (?x=:x3 ?label="foo")
> 
> not leftjoin.
> 

-- 
-ericP
Received on Monday, 30 April 2012 12:12:55 UTC