Re: ACTION: counting test case [Was: Re: Agenda request: characterize the diffs between subgraph-matching and E-entailment] from Seaborne, Andy on 2006-11-28 (public-rdf-dawg@w3.org from October to December 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 28 Nov 2006 13:49:46 +0000
To: Pat Hayes <phayes@ihmc.us>
CC: Fred Zemke <fred.zemke@oracle.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <456C3E7A.1020609@hp.com>
Pat Hayes wrote:
>> Eric Prud'hommeaux wrote:
>>
>> <heavy edits>
>>
>>>>> Some test cases to characterize the behavoir of the language
>>>>> apparently not captured in the current semantics:
>>>>>
>>>>>  bnode-type-var [CNT]: can we count duplicate results?

Open.

>>>>>
>>>>>  bNode-constraint [BCN]: are bNode labels allowed in FILTERs?

No (they are not syntactically valid at the moment and it might lead to 
trouble with optimizers moving filters around unless CNT in which case bnodes 
behave like any other variable.

Note also that bNode labels in FILTERs fails the substitution intuition:

FILTER(_:a < 3)

and

?x = _:a
FILTER(?x < 3)

are rather different.

>>>>>
>>>>>  bNode-join [BJN]: do bNode lables bridge basic graph patterns?
>>>>>     
>>>>>
>> At this point I cannot decipher who wrote the preceding sentence, but it
>> is an issue that I have raised.  I believe that 
>> people will naturally write simple
>> queries and then edit them into more complex ones.  A partciularly natural
>> evolution will be to test a join first, and then break it up with an OPTIONAL.
>> This may cause a bnode token to appear in both operands of the OPTIONAL.
>> Currently it seems that the scope of a bnode token is a basic graph pattern,
>> so that means that introducing the OPTIONAL will break the join.  This will
>> seem counterintuitive to users.  They can of 
>> course be educated to always change
>> bnode tokens to variables before introducing OPTIONAL, but it will frequently
>> trip users up, and may be an ongoing complaint.
> 
> You might be right, but (1) this is all somewhat 
> hypothetical, as we really don't yet know what 
> users will in fact do, and (2) its more a matter 
> of initial expectations rather than an on-going 
> problem, since users will in fact get used to the 
> rules when they use them. On the other hand...

I have never seen this occur with support questions on jena-dev and there isa 
steady streams of questions coming these days.  This is because (my educated 
guess here, based on what I've seen) the main use of bNodes in queries is with 
[], not with _:a labels.  If people are going to use _:style labels, they use 
named variables as they have to name things anyway.

Use of [] can produce some concise query expressions but they also have the 
characteristic that it is unnatural to later split them up.

>> Therefore I hope it is possible to make the scope of a bnode token as
>> large as possible.  My thinking is that it would not make sense to try
>> to join on a bnode token across different graphs.  Therefore every
>> GRAPH pattern must introduce a new lexical scope, similar to the way
>> block structured languages operate.  bnode tokens are local to the
>> nearest containing GRAPH pattern, or the outermost pattern if none,
>> whereas variables are global to the whole query.

In checking the algebra work this week, I produced a quad-based query 
compiler, showing that GRAPH is just a way of specifying the fourth slot in a 
quad, with the query starting off with the 4th slot being the default graph. 
This matches the expectations of multi-graph stores so it was good to check 
that it worked out nicely (care with custom functions in FILTERs needed).

>> I have worked on formulating this precisely, but it looks very difficult
>> and my work is not complete. I originally thought that we could analyze
>> query trees into 'paths' (subtrees on which a join is formed); however,
>> this technique foundered on the case of an OPTIONAL with a UNION
>> in its second operand.  I believe it is possible but it has eluded me so far.
>> My vision is that every pattern P implies a predicate Pred(P) on mappings,
>> such that the results of a query on pattern P is {mapping S | Pred(P)(S) }
>> where the bnode tokens have been pulled to the front of the Pred(P)
>> as existentially quantified variables.  Pred would be defined recursively,
>> but the case of UNION inside the second operand of OPTIONAL has
>> eluded me.
> 
> 
> ... this all strongly suggests to me that we 
> should not try to be this clever at this stage. 
> The chances of our being able to get something 
> this complicated exactly right are low, and if 
> the result has to be robust enough to survive 
> more general entailment schemes then they are 
> even lower. I suggest that we strive to keep 
> things as simple as we possibly can.
> 
> Pat

We have bnodes in BGPs as an extension point, for example being able to 
dispatch a BGP to a DL-reasoner.  The systems that provide RDF access to 
existing SQL data also use this point so it seems to hit some kind of sweet spot.

Extending the scope of bnode labels across OPTIONAL/UNION has not come up as a 
application need.

Fred - Your case of writing queries, with bnodes in the _:a form, does not 
meet my experience.  Use of _:a, as opposed to [], by application writers 
appears quite rare in practice.

As BGP's has proven natural extension point, it suggestes to me that we have 
it right to scope bNodes to BGPs and not handle them in the algebra.

	Andy

> 
> 
> 
>> Fred
> 
>
Received on Tuesday, 28 November 2006 13:50:10 UTC