Re: ACTION: counting test case [Was: Re: Agenda request: characterize the diffs between subgraph-matching and E-entailment] from Fred Zemke on 2006-11-28 (public-rdf-dawg@w3.org from October to December 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Tue, 28 Nov 2006 09:49:29 -0800
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <456C76A9.6020708@oracle.com>
Whereas no one else seems concerned about query evolution involving
_:a bnodes, and people have presented user feedback from testbeds
that indicates it is not an issue, then I withdraw my suggestion that the
scope of a bnode should be larger than a basic graph pattern.

Fred

Seaborne, Andy wrote:

>
>
> Pat Hayes wrote:
>
>>> Eric Prud'hommeaux wrote:
>>>
>>> <heavy edits>
>>>
>>>>>> Some test cases to characterize the behavoir of the language
>>>>>> apparently not captured in the current semantics:
>>>>>>
>>>>>>  bnode-type-var [CNT]: can we count duplicate results?
>>>>>
>
> Open.
>
>>>>>>
>>>>>>  bNode-constraint [BCN]: are bNode labels allowed in FILTERs?
>>>>>
>
> No (they are not syntactically valid at the moment and it might lead 
> to trouble with optimizers moving filters around unless CNT in which 
> case bnodes behave like any other variable.
>
> Note also that bNode labels in FILTERs fails the substitution intuition:
>
> FILTER(_:a < 3)
>
> and
>
> ?x = _:a
> FILTER(?x < 3)
>
> are rather different.
>
>>>>>>
>>>>>>  bNode-join [BJN]: do bNode lables bridge basic graph patterns?
>>>>>>    
>>>>>
>>> At this point I cannot decipher who wrote the preceding sentence, 
>>> but it
>>> is an issue that I have raised.  I believe that people will 
>>> naturally write simple
>>> queries and then edit them into more complex ones.  A partciularly 
>>> natural
>>> evolution will be to test a join first, and then break it up with an 
>>> OPTIONAL.
>>> This may cause a bnode token to appear in both operands of the 
>>> OPTIONAL.
>>> Currently it seems that the scope of a bnode token is a basic graph 
>>> pattern,
>>> so that means that introducing the OPTIONAL will break the join.  
>>> This will
>>> seem counterintuitive to users.  They can of course be educated to 
>>> always change
>>> bnode tokens to variables before introducing OPTIONAL, but it will 
>>> frequently
>>> trip users up, and may be an ongoing complaint.
>>
>>
>> You might be right, but (1) this is all somewhat hypothetical, as we 
>> really don't yet know what users will in fact do, and (2) its more a 
>> matter of initial expectations rather than an on-going problem, since 
>> users will in fact get used to the rules when they use them. On the 
>> other hand...
>
>
> I have never seen this occur with support questions on jena-dev and 
> there isa steady streams of questions coming these days.  This is 
> because (my educated guess here, based on what I've seen) the main use 
> of bNodes in queries is with [], not with _:a labels.  If people are 
> going to use _:style labels, they use named variables as they have to 
> name things anyway.
>
> Use of [] can produce some concise query expressions but they also 
> have the characteristic that it is unnatural to later split them up.
>
>>> Therefore I hope it is possible to make the scope of a bnode token as
>>> large as possible.  My thinking is that it would not make sense to try
>>> to join on a bnode token across different graphs.  Therefore every
>>> GRAPH pattern must introduce a new lexical scope, similar to the way
>>> block structured languages operate.  bnode tokens are local to the
>>> nearest containing GRAPH pattern, or the outermost pattern if none,
>>> whereas variables are global to the whole query.
>>
>
> In checking the algebra work this week, I produced a quad-based query 
> compiler, showing that GRAPH is just a way of specifying the fourth 
> slot in a quad, with the query starting off with the 4th slot being 
> the default graph. This matches the expectations of multi-graph stores 
> so it was good to check that it worked out nicely (care with custom 
> functions in FILTERs needed).
>
>>> I have worked on formulating this precisely, but it looks very 
>>> difficult
>>> and my work is not complete. I originally thought that we could analyze
>>> query trees into 'paths' (subtrees on which a join is formed); however,
>>> this technique foundered on the case of an OPTIONAL with a UNION
>>> in its second operand.  I believe it is possible but it has eluded 
>>> me so far.
>>> My vision is that every pattern P implies a predicate Pred(P) on 
>>> mappings,
>>> such that the results of a query on pattern P is {mapping S | 
>>> Pred(P)(S) }
>>> where the bnode tokens have been pulled to the front of the Pred(P)
>>> as existentially quantified variables.  Pred would be defined 
>>> recursively,
>>> but the case of UNION inside the second operand of OPTIONAL has
>>> eluded me.
>>
>>
>>
>> ... this all strongly suggests to me that we should not try to be 
>> this clever at this stage. The chances of our being able to get 
>> something this complicated exactly right are low, and if the result 
>> has to be robust enough to survive more general entailment schemes 
>> then they are even lower. I suggest that we strive to keep things as 
>> simple as we possibly can.
>>
>> Pat
>
>
> We have bnodes in BGPs as an extension point, for example being able 
> to dispatch a BGP to a DL-reasoner.  The systems that provide RDF 
> access to existing SQL data also use this point so it seems to hit 
> some kind of sweet spot.
>
> Extending the scope of bnode labels across OPTIONAL/UNION has not come 
> up as a application need.
>
> Fred - Your case of writing queries, with bnodes in the _:a form, does 
> not meet my experience.  Use of _:a, as opposed to [], by application 
> writers appears quite rare in practice.
>
> As BGP's has proven natural extension point, it suggestes to me that 
> we have it right to scope bNodes to BGPs and not handle them in the 
> algebra.
>
>     Andy
>
>>
>>
>>
>>> Fred
>>
>>
>>
>
Received on Tuesday, 28 November 2006 17:50:22 UTC