Re: RDF Scoping Mechanism from Pat Hayes on 2010-07-03 (semantic-web@w3.org from July 2010)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 3 Jul 2010 11:08:25 -0500
To: Tim Berners-Lee <timbl@w3.org>
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <E7FFE600-624B-4B23-A23E-CC0A4B585DF3@ihmc.us>
On Jul 3, 2010, at 7:08 AM, Tim Berners-Lee wrote:

>
>
> Pat Hayes wrote:
>> It wouldn't take very much to make into full first-order logic: all  
>> it needs is a scoping mechanism (think graph literals in N3 or  
>> named graphs, or my 'surfaces' idea from the Blogic talk) and  
>> negation. Mind you, that scoping mechanism would drive a truck  
>> through triple-store-based implementations, I suspect.
>
> Well, given that every triple store I know has a fourth component,   
> Often used to store the document it came from,  that can just allow  
> that to alternatively be the id of a subgraph.  You just have to  
> allow a subgraph id to fill an S or O slot and viola!

Sure, you can encode it this way, but its not obvious that this will  
work effectively during inference. And it doesn't keep track of  
variable scopes for you: you need Skolem functions as well (or  
something equivalent.) This last point is what I was thinking about.

> Triple store object-oriented systems typically have a "store" or  
> "graph" object already. You extend the statement object you can make  
> a graph be a valid holder of a S or O position.
>
> Your parser recursively calls the document parser you already has  
> for the stuff between the  { }. Same with the serializer. Code you  
> already have. A lot of the code is there.

For parsing, yes. But how long does matching take when you have to  
recurse through a subgraph-id lookup at every syntactic level? At the  
very least, you would need to have a structure that supports linear- 
time unification by giving you fail-back pointers, rather than using  
recursion to back out of match failures.

> The slightly more complicated bit is when you realize that ?x  
> variables in SPARQL and _:y nodes in RDF are special cases of  
> something more general, and when you have nested graphs you have to  
> be able to track within which scope each is quantified. So each  
> graph also gets a list of the variables quantified in each way.  It  
> is a very logical extension of what we have with RDF which led to N3.

It was a logical extension way back before RDF was even thought of.  
But still, it requires some more thought. There are decades of  
experience writing code to handle recursive syntactic structures with  
bound variables in them, and AFAIK, very little of it has been based  
on the triple-store machinery. The single most important operation in  
a logical reasoner is matching, AKA unification, and searching  
efficiently for structures that match a given structure, when most of  
them don't. It is absolutely not clear that storing these structures  
in a quad store is the best way to achieve efficiency here. The key  
idea in effective systems seems to be having a really clever hash- 
coding scheme to get you 'near' the root of a matching structure if  
there is one to be found, and having a fail-safe way of finding out  
when there isn't, as quickly as possible. But all this requires that  
these structures are kept in something with a lot more structure to  
it, to facilitate all this hopping around that needs to be done.

BTW, the really tricky issue is when you have a variable of one kind  
bound inside the scope of a variable of the other kind. Then you  
really must have some mechanism like Skolem functions to keep track of  
the scope relationships. These in turn get nested in new ways during  
unification. All this stuff was kind of standard when I was in grad  
school, so there is a world of implementation experience (and backup  
theory) to draw on. Prolog engines are probably the single richest  
lode, though there are some FOL engines which are pretty impressive,  
such as RACER or the CYCL inference engine. Term rewriting systems  
generally face the same implementation issues.

>>> Back to tree structures and Sexpressions, no doubt :-)
>>
>
> (Actually I think we need to make sure that lists are first class  
> objects in the implementations, so that ordered collections can be  
> handled efficiently. )

Agreed, though this makes everything a degree more complicated. Maybe  
we can make do with tuples, as in JSON? Most of the uses of lists in  
OWL/RDF syntax are really fixed length tuples, in fact. The length is  
arbitrary, but it doesn't change dynamically once created, so S- 
expressions as used in the RDF collection vocabulary are overkill.  
Tuples have the terrific advantage that they can be searched  
iteratively rather than recursively.

>> Obvious question, regardless of implementations, is there any  
>> chance of getting that scoping mechanism in to RDF through W3C to  
>> rec?
>>
>> Any rough ideas how long that process may take? (I'm assuming the  
>> RDF Semantics are bug-less and this would just be an addition).
>>
>> My logic here is that if other serializations or even something N3- 
>> like were to go through standardization, then work would probably  
>> have to start on getting said scoping mech in to RDF sooner rather  
>> than later.
>
> Well, from the standards track point of view, one could add things  
> incrementally to RDF 2.0, 2.1, etc or one could just standardize N3  
> as it is, within minimum changes, focussing on code which has been  
> working for many years.  That is generally an very accepted way to  
> make a standard.    Get n3 1.0 nailed as a standard. Demonstrate  
> that it can be considered a superset of RDF. Demonstrate its use for  
> carrying RIF.  Standardize some built-in function ontologies. Set up  
> an agenda for any later developments to be done after  basic N3.

How many implementations of N3 are there? How many N3 reasoners have  
been built, and how do they compare in performance to, say, commercial  
Prolog engines or high-end FOL reasoners? Obviously N3 is an important  
data point, but I think we should cast our net wider.

Pat

>
> Tim
>
>>
>> Best,
>>
>> Nathan
>>
>
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 3 July 2010 16:09:29 UTC