Re: comparing to OWL and SPIN

On 21 Jul 2014, at 20:16, Sandro Hawke <sandro@w3.org> wrote:

> On 07/21/2014 01:54 PM, Kendall Clark wrote:
>> n Mon, Jul 21, 2014 at 1:49 PM, Sandro Hawke <sandro@w3.org> wrote:
>> On 07/21/2014 08:09 AM, Peter F. Patel-Schneider wrote:
>> I could be that the Regular Expression derivatives algorithm, although much less expressive then OWL, is outperforming the OWL reasoners.  Only some research and testing will give an useful answer, but certainly something nice to consider and test.
>> 
>> Yes, this could be tested.  I expect that StarDog ICV will perform very well, as it works by translation into SPARQL queries. 
>> 
>> It looks to me like ShEx could validate a graph serialization in linear time (with the size of the serialization), with no need for storing the graph.  That's appealing to me when we're talking
>> about validating messages that are being sent between systems.
>>  
>> No need to store the graph unless its size exceed available memory, right? That does happen from time to time.
>> 
> 
> When I said "store", I meant in RAM.  :)   I was thinking it would be nice to have validation as part of a streaming serializer and streaming parser.  It's nice to have those things not buffer the whole input/output before moving it on.
You can only do that if you know the order of triples you are going to get i.e. bounded messages. And in any case you will have to validate on a sliding window of a number of triples, this is no different between ShEx or SPARQL. So you need an in memory buffer, on which you can execute SPARQL. At this size you most likely don’t need indexes because you can build your binding sets on the fly.
> 
>> SPARQL based solutions require storing and searching the graph, which is exponential (and likely slow unless properly indexed), but that's probably fine if you're just validating data that you need to keep in a SPARQL system anyway.
>> 
>> Actually Stardog ICV does both; either transactionally for data under storage or in-memory for message passing and middleware contexts.
>> 
>> Also, the complexity of SPARQL query answering is well understood and it's not EXP.
>> 
> 
> Interesting, this is what I get for stretching myself too thin across too many technologies.   I would have thought executing a query with a graph pattern like { <s> <p1> ?v1.  ?v1 <p2> ?v2. ...    ?v(n-1) <pn> ?vn } would take time proportional to k^n.   With sufficient indexing, k might be very close to 1, but without indexing, I'd think k would be the mean cardinality of p1...pn.   And of course indexing takes time.
In a sliding window you can build any triple pattern as it comes in. e.g. you only need to materialise the joins. Lets work that out with an example.
Assume messages of the type

example:person1 a :Intelligence ;
                a :Human .

ASK
{
	?wrong a :Intelligence .
        MINUS {?wrong a :Human } # Assume a world where AI does not exist and is not allowed.
}

If the query returns true the validation failed.

As the data comes in you can direct the BGP patterns into different FIFO queues (e.g. what is normally filled from disk in a method like getStatements in sesame [1])
and the execution is merely a straight filter between two FIFO queues.

This of course means you need to know the queries in advance, but that is the same for ShEx. 

Regards,
Jerven 


[1]
http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/query/algebra/evaluation/TripleSource.html#getStatements(org.openrdf.model.Resource, org.openrdf.model.URI, org.openrdf.model.Value, org.openrdf.model.Resource...)





> 
>       -- Sandro
> 
>> Cheers,
>> Kendall
> 

-------------------------------------------------------------------
Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.isb-sib.ch - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------

Received on Monday, 21 July 2014 18:51:04 UTC