Re: rdfs:Graph ? comment on http://www.w3.org/TR/rdf11-concepts/#section-dataset and issue 35 from Sandro Hawke on 2013-07-16 (www-archive@w3.org from July 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 15 Jul 2013 20:39:06 -0400
To: Jeremy J Carroll <jjc@syapse.com>
CC: www-archive@w3.org
Message-ID: <51E4962A.9070707@w3.org>
off-list but public response, so we can continue informally for a bit.  
(this is what I should have done the first time, given the current 
rather strict rules about public-rdf-comments, because it had gotten out 
of control.)

On 07/15/2013 11:14 AM, Jeremy J Carroll wrote:
> Hi Sandro
>
> to reply, in turn informally
>
> I think you have largely understood my position, although - without introducing time, I have difficulty seeing much difference between your two concepts: rdf:WebviewDataset   and  rdf:DirectDataset.

Equality is another perfectly good way to distinguish them (we don't 
need time).

It follows from this dataset:

  <> a rdf:DirectDataset.
  GRAPH _:a { <s> <p> <o> }
  GRAPH _:b { <s> <p> <o> }

that _:a = _:b.   Each blank node denotes the same g-snap, therefore 
they must be equal.

In contrast, this is totally different:

   <> a rdf:WebviewDataset.
   GRAPH <a> { <s> <p> <o> }
   GRAPH <b> { <s> <p> <o> }

Here we've said that <a> and <b> are each Web Resources (g-boxes) that 
happen to have the same triples.  There's no reason to think they are equal.


> Since RDF is known not to do time very well, it doesn't surprise me that there may be difficulties in thinking about changes in the graph content in a dataset. I am really not expecting RDF semantics to address that.
> I am expecting RDF to allow me to describe resources - specifically resources introduced by RDF; I am not expecting RDF to provide me the ability to make paradoxical statements about resources (which an overly excessive version of the rdf:DirectDataset view risks, see:
> http://lists.w3.org/Archives/Public/www-rdf-logic/2004Apr/0029
> )
> I want to say simple stuff like who wrote a graph named in the dataset. The easiest way to do this is to attach the metadata to the name.

So, given this dataset:

    <> a rdf:DirectDataset.
<> a rdf:WebviewDataset.
    GRAPH _:a { <s> <p> <o> }
    GRAPH _:b { <s> <p> <o> }
    GRAPH <a> { <s> <p> <o> }
    GRAPH <b> { <s> <p> <o> }
    _:a dc:creator "Alice".
    <a> dc:creator "Bob".

it follows that

    _:b dc:creator "Alice".

it does NOT follow that:

    <b> dc:creator "Bob".

Kind of important difference, right?   And you can see why people would 
want both kinds of semantics.

(and they tell me the also want other things, like entailments within 
the named graphs, although I'm not convinced that's worth it.)

>   This currently is not supported by RDF and I would like to have a clear technical explanation as to why, rather than a political rationale (which is of course totally understandable) "we didn't pick a semantics for datasets, because there are so many different ones out there already, so nothing we could pick wouldn't cause someone problems."

Is the above convincing?     I agree there should be some kind of 
rationale in the documents, if possible.

      -- Sandro
> Jeremy J Carroll
> Principal Architect
> Syapse, Inc.
>
>
>
> On Jul 11, 2013, at 5:59 PM, Sandro Hawke <sandro@w3.org> wrote:
>
>> On 07/11/2013 03:06 PM, Jeremy J Carroll wrote:
>>> Hello
>>>
>>> This is a formal comment on http://www.w3.org/TR/rdf11-concepts/#section-dataset, and it appears a comment on
>>> https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-schema/index.html
>>> and quite possibly on the RDF Semantics ….
>> This is a brief, informal reply to both the message I'm replying to [1] and your following message [2].
>>
>> The short answer is: we didn't pick a semantics for datasets, because there are so many different ones out there already, so nothing we could pick wouldn't cause someone problems.   So we say that datasets, on their own, have a minimal semantics plus application-specific semantic extensions.   If you want interoperability between application, you need to indicate your semantic extensions.  You can do that out-of-band (in some way you figure out) or in band, by putting some metadata in the dataset saying which semantic extensions you're using.
>>
>> We are hoping to produce a NOTE which provides some options, so people don't have to start from scratch with these indicators.   We don't think the subject is mature enough yet to put designs in a Recommendation, though.
>>
>> My current thinking, which the group hasn't really talked about, is:
>>
>>    <> a rdf:WebviewDataset   (Or ResourceStateDataset or GraphStoreSnapshot)
>>
>> would provide the semantics I think you want, where a URL graph name is associated with the graph you'd get if you dereferenced that URL.   You might think of the URLs as denoting the Web Resource whose state is represented by the associated graph.   My sense from your examples is that's how you're thinking about datasets.
>>
>>   <> a rdf:DirectDataset
>>
>> would provide the semantics some other folks want, where the graph names actually denote the associated graphs (the pure mathematical set of triples, not a thing which can change over time).    This is what people are used to from N3 and (I think) from most provenance work.
>>
>> I'm inclined to say DirectDataset only constrains name/graph pairs where the graph names are blank nodes and WebviewDataset only constrains name/graph pairs where the graph names are http(s) IRIs.   This would allow these two semantic extensions to be used together.   If you said:
>>
>>   <> a rdf:WebviewDataset, rdf:DirectDataset.
>>   GRAPH _:a { <s> <p> 1 }
>>   GRAPH <b> { <s> <p> 2}
>>   _:a eg:endorsedBy eg:sandro.
>>   <b> eg:endorsedBy eg:sandro.
>>
>> Then you'd be saying I endorsed the statement {<s> <p> 1 } and I endorsed the (mutable) Web Resource <b>, whose contents happen to be { <s> <p> 2 }.     (On that latter bit, hopefully there will be some other metadata to help clarify *when* those are the contents of <b>, but we haven't figured out yet how to do that.)
>>
>> Does that make any sense?   Does this change your comments?     I have to apologize for not having the NOTE drafted yet, and thus adding to the confusion.
>>
>>       -- Sandro
>>
>>
>> [1] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jul/0021.html
>> [2] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jul/0022.html
>>> It seems to be a suggestion to reopen issue 35
>>> http://www.w3.org/2011/rdf-wg/track/issues/35
>>> which points to
>>> http://www.w3.org/TR/sparql11-service-description/
>>> hence I am CC-ing dawg.
>>> The last part of this message discusses problems in using service description to meet my use case: to me, this is not a comment on DAWG's work, but a comment that RDF Core cannot use DAWG's work of more limited scope to duck the issue.
>>>
>>>
>>> Summary: I would like to use rdf to describe graphs in a dataset, e.g. to say who the author was.
>>>
>>> as a simple example
>>>
>>> my:graph {
>>>     my:graph dc:creator "Jeremy J. Carroll" .
>>> }
>>>
>>> I cannot see how to do this with the current drafts, editors drafts, etc.
>>>
>>> A possible approach would be to reopen issue 35  and have a class rdfs:Graph, s.t. for a <URI> used as the name of a graph in a dataset the triple
>>>     <URI> rdf:type rdfs:Graph
>>> holds.
>>> More weakly, I would be satisfied with such a concept being added to the RDF vocabulary, without the implication above holding, but a suggested usage pattern.
>>>
>>> Also, I basically finished this message before finding issue 35 and it's superficially reasonable resolution that sd:Graph may meet my needs. This suggests that some documentation link from either RDF Concepts or RDF Schema or RDF Semantics to SPARQL Service Description would be helpful ….
>>> However, the Service Description doc
>>> http://www.w3.org/TR/sparql11-service-description/
>>> ducks on the issue of whether the name denotes the graph, and so does not give me a clear place to put such metadata.
>>> I think if the RDF WG tried writing such documentation, they would discover that the resolution of issue 35 would unravel - the trick is to allow such unravelling without having too much of the named graphs work unravel.
>>>
>>> ----
>>>
>>>
>>> Here is my actual use case …..
>>>
>>>
>>>
>>>
>>>
>>> I first give my motivation, then I give my weak suggestion.
>>>
>>> Motivation:
>>> =========
>>>
>>> I referred to RDF Concepts 1.1 today because I am constructing an RDF dataset and wished to add metadata concerning the named graphs.
>>> I am trying to articulate a multi tenant architecture over a SPARQL end point, in which each user is assigned to a specific organization, and then depending on this organization, they have access to different named graphs.
>>>
>>> I wish to refer to the named graphs using the URI names I have assigned to them, and I wish to create my own property to add this metadata
>>>
>>>
>>> Concretely, my property might be
>>>         syapse:owningOrganization
>>>
>>> and the quads I was thinking of producing include
>>>
>>> GRAPH <https://test.syapse.com/graph/syapse> {
>>>      <https://test.syapse.com/graph/syapse> syapse:owningOrganization syapse: .
>>>       syapse:owningOrganization rdf:type owl:FunctionalProperty .
>>>       syapse:owningOrganization rdfs:range syapse:Organization .
>>>       syapse:   rdf:type syapse:Organization .
>>>       syapse:Organization rdf:type rdfs:Class .
>>>      …
>>>      …
>>> }
>>>
>>> GRAPH <https://test.syapse.com/graph/ontology/base> {
>>>      <https://test.syapse.com/graph/ontology/base> syapse:owningOrganization syapse: .
>>>      …
>>>      …
>>> }
>>>
>>> GRAPH <https://test.syapse.com/graph/ontology/sys> {
>>>      <https://test.syapse.com/graph/ontology/sys> syapse:owningOrganization syapse: .
>>>      …
>>>      …
>>> }
>>>
>>> GRAPH <https://test.syapse.com/graph/ontology/c2> {
>>>      <https://test.syapse.com/graph/ontology/c2> syapse:owningOrganization <https://test.syapse.com/graph/southParkUniversity> .
>>>      …
>>>      …
>>> }
>>>
>>> GRAPH <https://test.syapse.com/graph/southParkUniversity/abox> {
>>>      <https://test.syapse.com/graph/southParkUniversity/abox> syapse:owningOrganization <https://test.syapse.com/graph/southParkUniversity> .
>>>      <https://test.syapse.com/graph/southParkUniversity> rdf:type syapse:Organization .
>>>      …
>>>      …
>>> }
>>>
>>>
>>> This allows me to run a privileged SPARQL query across the whole dataset to find out which graphs are assigned to which organization, and then knowing which organization a user is in, I can have application logic to determine which named graphs they can access, and restrict their queries to those named graphs.
>>>
>>>
>>> Weak suggestion
>>> ==============
>>>
>>> I read the very limited text in the dataset section, and the note as reflecting a victory for those who do not want the implication that the name of the graph is a graph to hold.
>>> As a long standing advocate of the other position in which, of course, it denotes … I am somewhat disappointed.
>>>
>>> However, adding such a vocab item can allow the users to decide on a case-by-case basis whether such denotation is intended or not.
>>>
>>> e.g.
>>>
>>>     rdfs:Graph
>>>       rdfs:Graph is the class of RDF Graphs as defined by RDF Concepts.
>>>       
>>>    Semantics:
>>>
>>>     <g> { …. }
>>>
>>>     does not imply
>>>           g rdf:type rdfs:Graph
>>>
>>>
>>> but
>>>
>>>      <g> { …. } .
>>>      <g>  rdf:type rdfs:Graph
>>>
>>> does imply that the interpretation of <g> is given by the graph.
>>>
>>>
>>> Problems with the Service Description approach
>>> =====================================
>>>
>>> Reading
>>> http://www.w3.org/TR/sparql11-service-description/
>>> my understanding is that the intent is for the endpoint to provide (closed) metadata about the dataset, which does not enable further comment even from someone with update privileges on the dataset.
>>>
>>> e.g. in
>>>
>>>
>>>
>>> @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
>>> @prefix ent: <http://www.w3.org/ns/entailment/> .
>>> @prefix prof: <http://www.w3.org/ns/owl-profile/> .
>>> @prefix void: <http://rdfs.org/ns/void#> .
>>>
>>> [] a sd:Service ;
>>>      sd:endpoint <http://www.example/sparql/> ;
>>>      sd:supportedLanguage sd:SPARQL11Query ;
>>>      sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML>, <http://www.w3.org/ns/formats/Turtle> ;
>>>      sd:extensionFunction <http://example.org/Distance> ;
>>>      sd:feature sd:DereferencesURIs ;
>>>      sd:defaultEntailmentRegime ent:RDFS ;
>>>      sd:defaultDataset [
>>>          a sd:Dataset ;
>>>          sd:defaultGraph [
>>>              a sd:Graph ;
>>>              void:triples 100
>>>          ] ;
>>>          sd:namedGraph [
>>>              a sd:NamedGraph ;
>>>              sd:name <http://www.example/named-graph> ;
>>>              sd:entailmentRegime ent:OWL-RDF-Based ;
>>>              sd:supportedEntailmentProfile prof:RL ;
>>>              sd:graph [
>>>                  a sd:Graph ;
>>>                  void:triples 2000
>>>              ]
>>>          ]
>>>      ] .
>>>
>>> <http://example.org/Distance> a sd:Function .
>>>
>>>
>>> The description of the named graph is attached to an explicitly blank node, that I then cannot make further comment in in my own graph or indeed inside the graph named <http://www.example/named-graph> itself.
>>> Thus I cannot add a dc:creator (or syapse:owningOrganization) triple inside this service description (because SPARQL 1.1 does not give me, nor does it intend to give me) write access to the service description, even if I have write access to <http://www.example/named-graph>
>>>
>>> These issues perhaps could be addressed by making sd:graph and sd:name  both 1-1 properties …. but I imagine there may be some reluctance ….
>>>
>>> NB - this last comment, is not a formal comment on the Service Description Spec, which seems fit-for-purpose, it is a comment on the current resolution of Issue-35 which neglects that the purpose of SPARQL Service Description is less than is needed to address the issue
>>>
>>>
>>>
>>>
>>>
>>>
>>> Jeremy J Carroll
>>> Principal Architect
>>> Syapse, Inc.
>>>
>>>
>>>
>>>
>>>
>
Received on Tuesday, 16 July 2013 00:39:13 UTC