Re: how we refer to both g-boxes and g-snaps (ISSUE-32) from Sandro Hawke on 2012-05-30 (public-rdf-wg@w3.org from May 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 30 May 2012 11:49:45 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: Richard Cyganiak <richard@cyganiak.de>, Yves Raimond <Yves.Raimond@bbc.co.uk>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <1338392985.2332.327.camel@waldron>
On Wed, 2012-05-30 at 08:10 -0500, Pat Hayes wrote:
> On May 29, 2012, at 6:27 PM, Sandro Hawke wrote:
> 
> > On Wed, 2012-05-23 at 13:41 -0500, Pat Hayes wrote:
> >> Richard, I am confused. 
> >> 
> >> Sometimes I get the sense that you want the graph names to refer not to graphs as such, but rather to 'stateful resources' (or whatever) which have a robust identity and emit graphs when poked, a REST-inspired kind of a thing.. (Cf. your responses on other threads.) At other times, however (as here) you seem to want the graph names to refer to an actual set of triples, a true Platonic RDF graph.
> >> 
> >> It really does matter which we choose, and I don't see how we can choose both (or not without a lot of new machinery to make the distinction, that we have not even discussed yet) and I don't think it is viable to just be muddled or ambiguous about it, as that is the muddle we are in already and are trying to get straight. 
> >> 
> >> For example, if the graph names refer to stateful resources, then there are two rather different ways to identify a subgraph or a larger graph. ONe is to speak of a subset (defined somehow) of the graph that is the current state of the stateful resource, the other is to have a relation between two resources such that one returns a subset of what the other returns, at any time. These behave differently and would need to be implemented differently. 
> >> 
> >> I have no axe to grind here. I would be quite happy if we were to declare that graph names in datasets always refer to stateful resources. I would also be happy if we decide they always refer to graphs. But I am not happy about it being ambiguous or undecided. I do feel that it is very important that we choose one story and stick to it. Which one do you want to pitch for?
> > 
> > I think Richard replied to this well, but since you haven't replied to
> > that (and shown you understand
> 
> I didn't understand Richard's reply, which is partly why I havnt replied to him yet. (The other reason being, I am re-wiring a kitchen.) 
> 
> > ), let me answer in my own way.
> 
> Thanks.
> 
> >  I believe
> > I'm agreeing with Richard on the substance, but perhaps thinking about
> > it quite differently.
> > 
> > The answer is: we're being a bit tricky, so that we can have our graphs
> > and eat them, too, so to speak.
> > 
> > We'd like to be able to refer to g-snaps AND we'd also like to be able
> > to refer g-boxes.
> 
> I'm OK with that, but I think we need to have two ways to refer, in this case. 

I believe we can do it with one direct way and one indirect way.

Imagine this situation:

        Every Citizen has one or more citizenNumbers.

        No Citizens share a citizenNumber.

        Every Citizen has at most one cityOfResidence.

        For every for City, there exists at least one Citizen for whom
        this is the cityOfResidence.

Now, I can refer to each Citizens directly using their citizenNumber:

    The (thing which has citizenNumber 5) was born in 1934.

and I can also refer to each City indirectly:

        The (thing which is the cityOfResidence of (the thing which has
        the citizenNumber 5)) has population 2500.

Do you agree this works, formally?

So, the idea is to do this for g-snaps and g-boxes.   g-boxes are like
the Citizens here, and g-snaps are like cities.   

        Every g-box has one or more IRI.
        
        No g-boxes share an IRI.
        
        Every g-box has at most one containedGraph.
        
        For every InterestingRDFGraph, there exists at least one g-box
        for which this is the containedGraph.
        
Now, I can refer to each g-box directly using its IRI:

        <http://www.w3.org/People/Sandro/data> eg:MaintainedBy :sandro

And I can logically refer to the g-snap found there, as we did with
cities:

        The graph which is contained in <http://www.w3.org/People/Sandro/data>
        contains 954 triples.
        
I'm not sure how to formalize that in RDF.  Perhaps in FOL we could say:

    Forall ?g 
       If <http://www.w3.org/People/Sandro/data> containsGraph ?g
       Then ?g eg:numberOfTriples 954

and I think we can do that in OWL, too, if containsGraph is an RDF
predicate.

But for RDF?  I think the vocabulary developer who wants
eg:numberOfTriples has to instead define eg:numberOfTriplesHeld, which
could be used like this:

        <http://www.w3.org/People/Sandro/data> eg:numberOfHeldTriples
        954.

The rest of this email is about how someone might specify that
eg:numberOfTriplesHeld predicate.   Before continuing, let's make sure
you're comfortable with everything I've said so far.

       -- Sandro


> >  (I'm staying out of the source/resource/space/etc
> > discussion for now.  I think I can live with any of the names that have
> > been proposed.)   We do this by defining the semantics of datasets such
> > that the graph names refer to g-boxes, and let the way they are used
> > indicated whether/how the associated g-snaps are to actually be used.
> 
> Um. Im tempted to ask, why be so clever? But OK, Im still reading. 
> 
> > 
> > For example, in my implementation of use case 2 (simple web provenance),
> > the aggregated phone book looks like this:
> > 
> >  :corp :hasDivision :div1, :div2, ...
> >  :div1 :hasFeed <div1url>.
> >  :div2 :hasFeed <div2url>.
> >  ...
> >  <div1url> { ... triples fetched  ... }
> >  <div2url> { ... triples fetched  ... }
> > 
> > Here, "div1url" is the working HTTP URL which HQ uses periodically to
> > get an updated copy of the Division 1's employee directory, building
> > this pseudo-trig file.
> > 
> > The definition of :hasFeed is where things all come together.  I use it
> > with a meaning like this:
> > 
> >                ?subj :hasFeed ?obj 
> > 
> >        means
> > 
> >                ?subj is a social entity, such as a person or department
> > 
> >                *if* a successful dereference of any IRI which
> >                denotes ?obj returns an RDF Graph serialization, then
> >                the serialized graph is considered by ?subj to be valid
> >                data.  
> > 
> >                *if* any IRI which denotes ?obj occurs as the name in a
> >                (name, graph) pair in a valid dataset, then the graph
> >                part of that pair is considered by ?subj to be valid
> >                data.
> 
> OK, here is where I crash and burn. I am not sure what any of this means. 
> 1. What category is "social entity" supposed to be? I cannot imagine what would include both people and departments (departments??) In any case, why mention this? If ?subj were something entirely different, such as a galaxy, what would break?
> 2. What does "considered to be valid data" mean? If I consider a graph G to be valid data, does that mean I hold G to be true? (Does it mean more than this?)
> 3. What makes a dataset "valid"? Do you mean, any asserted dataset? 
> 4. You have this property being an object, but its meaning does not refer to the object but rather to "any IRI which denotes" the object. This is just wrong. The object of these triples is apparently not the object itself, but the IRI, which is in effect quoted. So this IRI does not mean the same thing when it occurs as the object of :hasFeed as it means everywhere else.
> 
> Also, even if I imagine some story which makes a kind of sense here, Im puzzled about your second paragraph. If I am ?subj and I dereference that IRI and get a graph G1, and then later I do the same thing and get a different graph G2, what happens to the semantic story? Do I now accept both G1 and G2? Or now accept G2 and repudiate G1? Or do I throw a hissyfit error because the graph has 'changed' and graphs can't  change? Or, put another way, am I supposed to think that I am dereferencing a graph, or looking into a graph container? 
> 
> Pat
>  l
> 
> > 
> > I think this covers both the current sometimes-odd SPARQL deployments
> > and the linked data/web-centric deployments.  It's the kind of
> > minimally-restrictive solution that I think Richard has been arguing
> > for.  It needs *very* little in the semantics of datasets.
> > 
> > In a sense it doesn't need anything, but I'd like to factor out 90% of
> > that definition of :hasFeed, since every predicate that I use with
> > dataset graph names has that same text.
> > 
> > I've worked my way through the use cases I put in rdf-spaces like this,
> > and it seems to be working fine.    I have come up with a few more use
> > cases in doing so, though.   And I definitely want some syntactic sugar
> > that trig doesn't offer.    (Should we still call it trig if we get rid
> > of the braces around the default graph to make it an extension of
> > Turtle, or should we give it another name?)
> > 
> >   -- Sandro
> > 
> > 
> > 
> > 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
>
Received on Wednesday, 30 May 2012 15:50:00 UTC