- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Mon, 21 Aug 2006 17:24:34 -0700
- To: andy.seaborne@hp.com
- CC: public-rdf-dawg@w3.org
Continuing my reply to Andy message below: Seaborne, Andy wrote: > > > Fred Zemke wrote: > >> For your consideration, >> the attached paper presents a formal semantics for SPARQL >> that is based on mapping rather than entailment. >> Fred > > > I was a bit confused by the 'bindable' terminology - am I right in > reading you your document as saying that blank nodes, in queries, are > not be bound to a graph term, as their existential. > > Test case: > > Data: > > :x :p 1 . > :y :p 1 . > > Query: > SELECT ?x { [] :p ?x } > > has one solution ?x/1 doesn't it (and implementation terms it is much > like the same with blank node treated as regular variables then a > partial term-distinct being applied over the BGP match ; it's > implementing this partial-distinct that worries me if we care about a > moderately direct 1-1 mapping to some form of SQL). In this example, let G be a table of three columns (subj, verb and obj) whose rows are the triples of the default graph. A query to find all mappings of the one variable ?x and the one blank node identifier _:a satisfying the sample query is SELECT G.subj AS "_:a", G.obj AS "?x" FROM G WHERE G.verb = ':p' (actually the translation of the prefix ':' should be substituted before submitting this query). Then to factor out the bindings of blank node identifiers, use an outer query block with DISTINCT and a list of all variables (in this example, ?x): SELECT DISTINCT SQ."?x" FROM (SELECT G.subj AS "_:a", G.obj AS "?x" FROM G WHERE G.verb = ':p') AS SQ So it looks straightforward to get SQL to do the partial distinct. If you can write an SQL query to perform the raw SPARQL, including the bindings of blank node identifiers, then you can nest it in a subquery and use an outer query to factor our the bindings of blank node identifiers. > > > What do you expect from this example (I know we are not that we are > covering aggregation but I'm interested in your view): > > Data: as above > > Query: > SELECT sum(?x) { [] :p ?x } > > Is it 1 or 2? If we take SQL as a guide, then I expect the sum to aggregate the rows that are returned by the WHERE clause. With the semantics proposed in my latest paper, the WHERE clause returns a single row, ?x -> 1, so the sum would be 1. On the other hand, if the user rewrites the query as SELECT sum(?x) WHERE { ?w :p ?x } then my paper would say the WHERE clause returns two rows, both of which have ?x -> 1, and so sum(?x) is 2. > > > This example shows me that treating blank nodes as pure existentials, > when the entailment regime can provide bindings, will given > counter-intuitive results, especially where the graph pattern though > of as a "graph with holes" (making blank nodes as in unlabelled nodes > of the RDF syntax) quite naturally bindings. I think I agree with you that there is a concern that users will find it non-intuitive to factor out the bindings of blank node identifiers to give them an existential semantic. It is more straightforward to have a mapping semantic that does not do the factoring. In that case the blank node identifiers function just like variables. If we want the ability to reduce the solution sequence by an existential factoring, I would prefer a syntax that did not rely on the difference between _:a and ?x to express it. Instead, I personally would prefer to see explicit existential quantifiers added to the language, so that one could write, for example SELECT ?x WHERE { (exists ?y) { ?y :p ?x } } > > On the spanning blank nodes, > we could remove the problem by requiring labels do not span BGPs. > Then blank node labels can't appear in two different BGPs. True but ugly from a usability standpoint. I believe that users want to build their queries incrementally, and a natural starting point is to create a query with no OPTIONAL or UNION, get that working, and then start editing it. As soon you add an OPTIONAL or UNION, you need curly braces and new BGPs, and then the user will be forced to do a lot of collateral editing of his blank node identifiers, which will create a lot of user frustration. I think that if SPARQL is specified so that it is unfriendly to incremental development, this will be a major obstacle to adoption. I think that if we have blank node identifiers at all, then we have to make their scope be as large as possible, preferably the whole query, though it might be tolerable to have some limitation regarding GRAPH graph patterns. > Alternatively, no blank nodes with labels at all and have "??x" - if > we want existential variables, maybe an explicit syntax for them such > as ??x (double question mark) might be better. I personally think the best syntactic option for existentials would be to follow first order predicate calculus and have an explicit quantifier syntax with explicit scope, as illustrated above. Fred > > It's only the use of [] from N3/Turtle that complicates this. It's > the syntax issue about [] that introduced them - it would be a shame > to loose that convenience form when it has a reasonable meaning for > the problem space covered by the charter. An extreme approach would > be to make [] translate to named variables, just the system gets to > choose the names so they don't clash with any named variables and > don't appear in "SELECT *" > > Andy > >
Received on Tuesday, 22 August 2006 00:26:10 UTC