Re: Semantics for stateful resources from Antoine Zimmermann on 2012-05-24 (public-rdf-wg@w3.org from May 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 24 May 2012 18:02:33 +0200
To: Richard Cyganiak <richard@cyganiak.de>
CC: Pat Hayes <phayes@ihmc.us>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4FBE5B99.7000806@emse.fr>
Le 24/05/2012 02:45, Richard Cyganiak a écrit :
> Hi Antoine,
>
> On 24 May 2012, at 00:36, Antoine Zimmermann wrote:
>> Quick reply:
>>
>> if I have the following TriG file:
>>
>>
>> { #default graph :Employee  rdfs:subClassOf  :Person . :Unemployed
>> rdfs:subClassOf  :Person . } :year2008 { :Joe  :worksFor  :AcmeCorp
>> . :worksFor  rdfs:domain  :Employee . } :year2009 { :Joe  a
>> :Unemployed . :Unemployed  owl:disjointWith  :Employee . }
>>
>>
>> 1. Would this be entailed:
>>
>> :year2008 { :Joe  a  :Employee }
>>
>> yes/no?
>
> No.
>
> But under the proposal, the graph names denote “stateful resources”,
> which have “state extensions”, which are sets of interpretations.
>
> And (assuming RDFS-entailment) all the interpretations in :year2008
> satisfy the triple { :Joe a :Employee }. This isn't quite the same as
> entailing the state pair you ask, but close.

Yes, but this does not make my computer able to reach the conclusion 
that  { :Joe a :Employee } is valid in the state labelled :year2008. 
Those interpretations only exist in the abstract world of 
DS-interpretion. Entailment allows one to provide inferences that can be 
stored in a computer, while the things that are in interpretations are 
purely abstract and cannot be reached by the machine. IR, the set of 
resources in an intepretation, is never "known" by the machine, as well 
as the denotation of URIs. Even when a system is doing things *with 
respect to* the semantics, all it can do is manipulate the syntax.

What is missing in your proposal in order to ordre the use case, is a 
mechanism which allows the system to exploit the interpretations in the 
"state extensions". If such mechanism can be defined conveniently, then 
I may be ok with the proposal, although I find it bizarre and 
unconventional (yet I'm ok with unconventional semantics, e.g., see my 
own proposal).


>
>> 2. and this:
>>
>> :year2008 { :Joe  a  :Person }
>
> No.

And I'm glad it does not.

> Some, but not all, interpretations in :year2008 satisfy the
> triple { :Joe a :Person }. So we cannot rule it out, but can't
> confirm it either.
>
> The presence of additional triples in the default graph, like the
> subclass triples here, doesn't affect the state extension of a
> stateful resource.
>
>> 3. Would it be inconsistent?
>>
>> yes/no?
>
> No.

Again, I'm happy with that.

> But there's (assuming OWL-entailment) no interpretation in
> :year2008 that satisfy the state of :year2009, and vice versa.
>
>> Considering that "graph changes over time" is a *PRIORITY A* use
>> case, and this in fact applies to all sorts of dimensions of
>> context (including provenance, also in the high priorities---where
>> are inferences coming from?), if these inferences are
>> non-entailments, there will be extremely important use cases not
>> addressed by the design.
>
> Why?
>
> Case 1) has nothing to do with change over time. It's about whether
> we want to record what someone *said*, or what we assume they *meant*
> under our entailment regime. And I'd argue that keeping track of
> provenance requires that we know *exactly* what someone said, and not
> what we inferred from what they said.

Let us consider the use case "graphs change over time".
Now we formally resolved the case of abstract syntax for working with 
multiple graphs and it's dataset. So I'm trying to show how the 
graph-changing-over-time case is addressed in this setting. 
Syntactically, it seems reasonable to mint a new "graph IRI" for each 
time frame when statements of interest hold. I have memories that /you/ 
even proposed to do that in the past of this WG. Of course, from a 
formal semantics point of view, there is nothing temporal with these 
IRIs, but the application can have its own IRI scheme such that it keeps 
track of which graph IRI corresponds to which time frame.

To understand what is true in what time frame, the application has to 
make inferences on the content of the state-graph pairs. And the result 
of the inference must to attached somehow to the "graph IRI" which 
labels the original RDF graph.

Of course, all this could be done with customised piece of code. But 
considering the importance of the use case (and other use cases in a 
similar line), it would be a pitty that it's not standardly implemented 
in conformant APIs.

If you can show me how the conclusion that Joe is an Employee in 2008 
can be deduced automatically, using your proposed semantics, or any 
other semantics, in a standard way, I'd be all in favour of the choice.


> Case 2) has nothing to do with change over time either. It's about
> whether every named graph entails the default graph, if I understand
> you correctly. And I think it shouldn't, because adding a graph as a
> named graph to a dataset shouldn't change the meaning of that graph.

Indeed, nothing related to time, I was just checking what the 
assumptions were behind your design. In my proposal, default graph does 
not impact on "named graphs" either.

>
> For case 3), are you saying that it *should* be inconsistent? I think
> that dealing well with changes over time requires that it is *not*
> inconsistent.

It shouldn't indeed, I'm glad we agree on this.


> Is this the latest version of your proposal?
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>
>  It seems to me that one inconsistent graph in this proposal makes
> the entire dataset inconsistent. Is this correct?

This is correct. It's a drawback, I admit. But there are possible 
workarounds.


> Wouldn't this go
> against the use case of web crawling, where we have to assume the
> presence of broken data?

As currently stated, certainly.


> If I wanted a semantic extension that enforces web semantics (that
> is, the IRI-graph-pair<i,G>  causes an inconsistency if dereferencing
> i doesn't yield G), how would I express this extension in your
> semantics?

This is fairly problematic to put in any formal semantics, as you need 
to introduce a notion (dereferencing) which is normally alien to a 
knowledge representation logic. I would prefer to leave 
dereferencibility out of the semantics. Or provide a looser version of 
dereferencing, based on an application specific mapping GM from URIs to 
RDF graphs (a graph-map, analogical to datatype-map) and each 
application decide whether it is HTTP-dereferencing, 
local-file-retrieving, in-memory locating, or 
named-graph-in-datastore-extracting...


>
>> Therefore, either there must be a semantics where you can make
>> inferences inside a "named" graph and put back the inferred
>> statement inside the graph (for instance, the semantics I proposed,
>> but it's not the only way) or we'd better not normalise any
>> semantics as pfps suggests to do. It is also possible to have 2
>> semantics, it's been done in OWL.
>
> What I tried to address in the semantics is Pat's request that we
> must be explicit about what the IRIs in the IRI-graph-pairs denote.
> That's all the proposal below answers, really: they denote
> (indirectly, via an extension) the set of all interpretations of the
> graph.

In your proposal, I don't see any significant constraint on what the 
graph IRIs denote. They can be any resource, provided that the resource 
is of type StateResource. Nothing in your proposal prevent a foaf:Person 
to be a StateResource, for instance.
So, it is not defining more precisely what the graph IRI denote than my 
proposal, it seems.



AZ.

> Best, Richard
>
>
>
>
>>
>>
>> Best, AZ
>>
>> Le 24/05/2012 01:09, Richard Cyganiak a écrit :
>>> Pat,
>>>
>>> On 23 May 2012, at 19:41, Pat Hayes wrote:
>>>> Richard, I am confused.
>>>
>>> No, in this case, I am. I used the phrase “identifying
>>> subgraphs” sloppily in my mail to Yves because I echoed the
>>> wording of the original issue. I should have said “managing
>>> subgraphs” or something like that.
>>>
>>> But let's talk, this is interesting.
>>>
>>>> Sometimes I get the sense that you want the graph names to
>>>> refer not to graphs as such, but rather to 'stateful resources'
>>>> (or whatever) which have a robust identity and emit graphs when
>>>> poked, a REST-inspired kind of a thing.. (Cf. your responses on
>>>> other threads.)
>>>
>>> Yes, this. Well, I'd weaken that a bit: I'd like graph names to
>>> denote things with robust identity that somehow have a graph
>>> associated. “Emit graph when poked” is one particular and
>>> probably the most useful kind of association, but I'd like to be
>>> less specific about the nature of the association. The less
>>> specific we are, the closer we remain to SPARQL semantics, and
>>> the less we get in the way of current practice. Tighter
>>> definitions can be done as semantic extensions where required.
>>>
>>>> At other times, however (as here) you seem to want the graph
>>>> names to refer to an actual set of triples, a true Platonic RDF
>>>> graph.
>>>
>>> No, not in general.
>>>
>>> Although “is same as” is just another kind of association, and
>>> that can be sometimes useful too. If it works as a tortured edge
>>> case, then that's a plus.
>>>
>>>> It really does matter which we choose, and I don't see how we
>>>> can choose both (or not without a lot of new machinery to make
>>>> the distinction, that we have not even discussed yet) and I
>>>> don't think it is viable to just be muddled or ambiguous about
>>>> it, as that is the muddle we are in already and are trying to
>>>> get straight.
>>>
>>> I believe we can do both. I'll try to show how. I'm an amateur
>>> at this stuff, so forgive me if it's a horrible mess, but it
>>> might be enough to give you an idea where I'm trying to go with
>>> this “stateful resource” and “state relationship” business.
>>>
>>> A DS-interpretation is a simple interpretation plus a “state
>>> relationship”, let's call it ISREL, that contains pairs of
>>> resources and graphs.
>>>
>>> We could say that if<x,G>   is in ISREL, then x is an
>>> rdfs:StatefulResource.
>>>
>>> And a state pair<i,G>   is true iff<I(i),G>   is in ISREL.
>>>
>>> Borrowing an idea from Antoine's proposed semantics, I think
>>> that every rdfs:StatefulResource should have an associated set
>>> of interpretations, let's call that the “state extension” of the
>>> stateful resource, that contains exactly the interpretations
>>> that satisfy the graph associated by the state relationship.
>>>
>>> Something along these lines would probably be the maximum amount
>>> of normative semantics I'd go along with for datasets.
>>>
>>> But if this works as I hope, then this would give us a base from
>>> which one could go further. If we want to rigidly denote graphs,
>>> for example, then we can define a semantic extension that imposes
>>> an additional semantic condition: For every<x,G>   in ISREL, x =
>>> G. Done!
>>>
>>> Or, if we want to capture “web semantics”, so that a state pair
>>> is true iff prodding x yields the graph G, then that's a
>>> different semantic extension with a different additional
>>> condition:<I(i),G>   is in ISREL if and only if dereferencing i
>>> and parsing as RDF yields graph G.
>>>
>>> This keeps the semantics of different graphs in the dataset
>>> entirely separate. As you know, I think this is a feature.
>>> However, I suppose that again, additional semantic conditions
>>> could change this. I can definitely see how it could be useful in
>>> the case of “web semantics” to require that the names of stateful
>>> resources actually denote the resource in all the interpretations
>>> in the state extension. I suppose this could be imposed by
>>> requiring that all these interpretations in the state extensions
>>> share at least the state relationship with the “main”
>>> interpretation.
>>>
>>>> For example, if the graph names refer to stateful resources,
>>>> then there are two rather different ways to identify a subgraph
>>>> or a larger graph. ONe is to speak of a subset (defined
>>>> somehow) of the graph that is the current state of the stateful
>>>> resource, the other is to have a relation between two resources
>>>> such that one returns a subset of what the other returns, at
>>>> any time. These behave differently and would need to be
>>>> implemented differently.
>>>
>>> The second approach sounds better to me because the relationship
>>> between names and graphs is the same for both the larger graph
>>> and the subgraph. As I said, I think that being noncommittal
>>> about the actual nature of the relationship between resource and
>>> graph (is it identity, dereference, or something else?) is a
>>> feature.
>>>
>>>> I have no axe to grind here. I would be quite happy if we were
>>>> to declare that graph names in datasets always refer to
>>>> stateful resources.
>>>
>>> Then let's go with that.
>>>
>>>> I would also be happy if we decide they always refer to
>>>> graphs.
>>>
>>> I think *always* doing that is unacceptable. That's because of
>>> the case where I want to fetch RDF from the web and stick it into
>>> a dataset using the source URL as a graph name. The source URL
>>> denotes something out there on the web (an RDF document
>>> probably); it certainly doesn't denote a graph. So I'm
>>> contradicting the web.
>>>
>>>> But I am not happy about it being ambiguous or undecided. I do
>>>> feel that it is very important that we choose one story and
>>>> stick to it. Which one do you want to pitch for?
>>>
>>> I feel that the semantic model *needs* an indirection between
>>> the denoted resource and the graph.
>>>
>>> (What we call the class of denoted resources, and what we call
>>> the relationship to the graph, then becomes a somewhat secondary
>>> question. I'm currently trying to see whether “stateful resource”
>>> and “state” will stick, but that's not actually so terribly
>>> important.)
>>>
>>> Best, Richard
>>>
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On May 23, 2012, at 1:12 PM, Richard Cyganiak wrote:
>>>>
>>>>> Hi Yves,
>>>>>
>>>>> I took an action to propose some informative wording
>>>>> regarding the possibility of identifying subgraphs of a
>>>>> larger graph. See below for a first attempt. I suppose this
>>>>> would go somewhere near the definition of “RDF dataset” or
>>>>> whatever we end up calling these things. The terminology
>>>>> (named graphs etc.) still may have to change of course. Is
>>>>> this wording ok for you?
>>>>>
>>>>> Best, Richard
>>>>>
>>>>>
>>>>> [[ Note: Graphs in an RDF dataset may overlap. The same
>>>>> underlying set of triples may be divided up into named
>>>>> graphs along multiple dimensions (such as data owner or
>>>>> subject area) by repeating each triple in multiple graphs.
>>>>> Whether such a setup would be realized by storing each triple
>>>>> multiple times, or through views of some sort, is up to the
>>>>> implementation. ]]
>>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>> (850)202 4416   office Pensacola (850)202 4440   fax FL 32502
>>>> (850)291 0667   mobile phayesAT-SIGNihmc.us
>>>> http://www.ihmc.us/users/phayes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 24 May 2012 16:03:04 UTC