Re: Prototype SPARQL engine based on Datalog and relation of SPARQL to the Rules Layer

On Dec 7, 2006, at 5:07 PM, Bob MacGregor wrote:

> I realize that the UNSAID issue is closed.  However, it shouldn't  
> have been.

I agree with this, but not for the reasons given below. As I pointed  
out on behalf of Axel,

> Here's what Pat Hayes said about it:
>
>
>   If SPARQL contains UNSAID then it will be inconsistent with any
>   account of meaning which is based on the RDF/RDFS/OWL normative
>   semantics. This will not render SPARQL unusable, but it will  
> place it
>   outside the 'semantic web layer cake' and probably lead to the
>   eventual construction of a different, and rival, query language for
>   use by Web reasoners.

Let me just interject that this argument does nothing for me. I  
totally fail to see why having *convenience syntax* for expressivity  
*already* in the language (via unbound) magically breaks the  
architecture. (Oh, and I'm unconvincable on this point, Pat, so  
please don't bother :)) Generally, it's much easier and safer to  
shove funky expressivity into query languages than it is to do so in  
the representation formalism itself.

And frankly, if I were going to bolt and try to set up a SPARQL  
rival, "UNSAID" wouldn't weigh it *at all*. Turtle syntax is way more  
likely to send *me* running :)

> Unfortunately, Pat has things exactly backwards.  The omission of  
> UNSAID
> INCREASES  the odds that a rival query language will be constructed
> for use by Web reasoners, for the simple reason that UNSAID is useful
> for a great many things.

Expressively useful, yes.

>   Almost all LARGE SCALE reasoning assumes both
> unique name assumption and closed world assumption.

Even with the qualifier "Almost" I don't believe this is true, and if  
it happened to be true, it's not the case that UNA and CWA  
necessarily help the scale of reasoning.

RDF and RDFS has neither and Steve Harris has reported a store  
holding over 1.5 billion triples and responding in user sufficient  
real time to various classes of SPARQL queries.

Pellet has a UNA mode which can actually slow down reasoning :) (It's  
funny, sometimes adding negation, e.g., disjoints, can radically  
speed up reasoning (hey! the clash is obvious!) or it can slow things  
down (damn, I have to try a bunch of node merges that fail))

> The numbers simply
> make it impractical to assert  owl:differentFrom   or   
> owl:maxCardinality
> over and over.

This is a different issue. If you want UNA and CWA it is certainly  
pragmatically a non-answer to say, "Hey, you can add it yourself!",  
at least, IMHO.

> The "semantic web layer cake" that Pat refers to is
> currently designed so that it only applies to relatively small  
> knowledge
> sets (say, below 1 million triples).

I have no idea what the grounds for this are, but it certainly seems  
to me to be false in many points.

I believe Pellet, Racer, FaCT++ and KAON2 can scale, given reasonable  
server hardware, to more than a "million triples" (but this is  
meaningless anyway; much depends on the *nature* of the axioms  
encoded by those triples). If they cannot, it's *really* hard to see  
how adding UNA or some sort of *non-mon* (non-mon makes things  
harder!!!) would magically help. Even restricting to a CWA, it's hard  
to see how it would help. (And I'd need to hear in considerable more  
detail exactly what sort of CWA you wish to add to SHOIN to make it  
"scale better".)

Finally, current SPARQL has Logspace datacomplexity, which indicates  
that SPARQL query answering is in relationalland. There are a number  
of fragments of OWL that either have logspace datacomplexity (e.g.,  
DL Lite, RDFS(DL)) or PTime data complexity (e.g., EL++, DLP, and  
hornSHIQ) at least for various key inference services. (E.g., EL++ is  
PTIme complete in the size of data for consistency, subsumption,  
concept satisfiability, and instance checking (not surprisingly,  
given their interreducibility) but for conjunctive query, it's only  
known that it's PTime-hard.

None of these impose UNA or CWA. Of course, some of them are rather  
weak on equality, which helps.

This is, of course, ignoring new research. To pick just one newer  
one, we see from IBM:
	<http://iswc2006.semanticweb.org/items/Kershenbaum2006qo.pdf>

I direct your attention to table 2. A 6 million role assertion  
ontology (with many type assertions as well) can be checked for  
consistency in 485 seconds. While they list the expressivity as SHIN,  
I think it's problably hornSHIQ, so if you were tuned for that, you  
might do much better. I'll also note that the times as you scale up  
the LUBM in that chart is roughly linear.

This doesn't mean that you can do useful arbitrary conjunctive query  
on SHIN or hornSHIQ kbs, but it's highly suggestive. No UNA or CWA.  
I'll also point out that this is preliminary work, and using a  
version of Pellet that I *know* is not nearly pushing the boundaries  
of want moderately good engineering, much less novel techniques, can do.

This is a complex topic with a lot of parameters left underspecified  
at the moment. But I believe the considerations I've raised as  
sufficient to block your reasoning. I'd be very interested in a  
survey of large scale, in your sense, reasoning that goes beyond  
database expressivity (and beyond datalog expressivity) and was shown  
to be hopeless without CWA and UNA and when these were added in, they  
became all peanuts and good cheer. I would be interested in just one  
example, actually.

> That means that rival standards that
> CAN handle larger datasets are certain to emerge.

I fail to see how lack of an explicit construct for negation as  
failure will change the scalability of SPARQL one whit. I do think  
that the lack is unfortunate for users who could put it to good use,  
as I've argued before. I dislike having such a feature accessible in  
a contorted way. But eh. Let the vendors support an alternative.  
There's not so many we couldn't get agreement, especially with a  
simple rewriter adding compatibility. If we had a nice XML syntax,  
this would be an easy XSLT.

My recommendation to the DAWG is that if you are not moved by the  
user considerations to reopen the issue (which is reasonable), then  
there is no "scalability" argument of any sort that counts as new  
information. And frankly, if I were you all, I'd dismiss scare tactic  
arguments *either way* out of hand. Though, aherm, perhaps with a bit  
of that thing they call "tact". :)

Cheers,
Bijan.

Received on Tuesday, 12 December 2006 16:32:40 UTC