Re: [TED] SPARQL, data sources and blackboxes [was (Re: [UCR] ISSUE-12 and ACTION6198)]

Christian de Sainte Marie wrote:

> DaveR wrote:
>>
>> For example, one version might look something like:
>>
>>   sparql(srcURL, query, ?X, ... ?Z)
>>
>> where the ?X..?Z variables would be bound to the corresponding query 
>> variables, e.g.
>>
>>   sparql("http://jena.hpl.hp.com/somedata",
>>          "PREFIX vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>
>>           SELECT ?y WHERE { ?y vcard:Family 'Smith' }",
>>          ?y) 
> 
> 
> Wouldn't that typically be a blackbox, requiring the application that 
> executes the containing rule to send the query to the srcURL, or, 
> rather, to a query processor linked to the srcURL?

Just to the srcURL, I don't think a separate query processor makes sense 
here.

> The RIF version could thus look like:
> 
>   sparqlblackbox(processorURL, ?i1, ..., ?in, srcURL, query, ?X, ... ?Z)
> 
> where the ?ix would be required input data and ?X...?Z the query 
> variables. 

For the specific case of accessing data from a SPARQL endpoint (as 
opposed to generic blackboxes) there is no need for input data or a 
separate processor URL. If you want a specific dataset within the end 
point then you would use FROM/FROM NAMED within the SPARQL.

> QUESTION: What would be a use case where the query would have to be 
> processed locally (and thus, could not be handled as a blackbox call to 
> be processed somewhere else); and where it would need be processed as a 
> SPARQL query, of course?

This would come back to convenience and performance rather than 
necessity. If you are working with an RDF datamodel then SPARQL has a 
convenient syntax and semantics for navigating the semi-structured data 
and is already implemented by processors which exploit the capabilities 
of the underlying datastore. So you might want to use it rather than 
translate the query into the equivalent raw predicates. Not necessary, 
any more that the slotted syntax we've spend so much time on is 
necessary, but convenient.

> Because the example query above could as well be translated for JRules 
> into something like (which is also typically what would go into a 
> constraint, if everything that is an initialExpression for a 
> PRR::RuleVariable in OMG PRR is to be interpreted as a constraint):
> 
>   when { ?y: Person( ?y.vcardFamily() = 'Smith') }
> 
> And, for RIF, in a data source neutral predicate:
> 
>   VcardFamily(?y, 'Smith')

Sure, bNodes-aside RDF data is just a bunch of binary property instances 
so you can certainly translate simple queries into raw RIF.
However, SPARQL does have a number of useful notions like OPTIONAL, 
UNION and a good set of functions & operators which make it convenient. 
I've lost track of whether disjunction is in the core condition language 
or not, and I've no idea what our core set of functions and operators 
looks like, nor have I thought through exactly how you would emulate the 
semantics of SPARQL OPTIONAL but in principle I'd agree that translation 
is possible.

However, performance-wise it would be better to retain the SPARQL query 
structure so you can delegate that to an external SPARQL "solver".

I guess there is an analogy with SQL here. I'm sure many business rule 
languages allow you to embed SQL queries. You could have a simpler 
tuple-access primitive and do the joins in the rule engine instead of 
the database but that would often be horribly inefficient so I bet 
people put effort into partitioning the processing nicely between the 
database and the rule engine. Practical interchange of rules via RIF 
would probably require a preservation of the SQL/rule partitioning. 
SPARQL seems analogous.

> QUESTION: I am not sure why a slotted form (of the relational variety) 
> would be needed, here (except for typing?)...

You've lost me there. I didn't have any slotted syntax. I don't 
particularly want any slotted syntax.

> QUESTION: once it is in the latter (predicate) form, is there still 
> something that makes the condition (that ?y's Vcard name is Smith) a 
> constraint?

You can certainly regard it as a constraint. On it's own it is not an 
interesting one. However, as noted above you could have more complex 
queries. In the general case such processing is at least as bad as 
subgraph isomorphism and so can certainly benefit from delegating to 
tailored solver.

Dave

Received on Tuesday, 9 January 2007 22:40:52 UTC