Re: Comments on RDF Spaces document from Sandro Hawke on 2012-05-30 (public-rdf-wg@w3.org from May 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 29 May 2012 20:25:47 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <1338337547.2332.163.camel@waldron>
On Fri, 2012-05-25 at 09:42 +0100, Richard Cyganiak wrote:
> Hi Sandro,
> 
> Below some comments on your RDF Spaces and Datasets draft:
> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html
> 
> Summary: I have some rather fundamental issues with the document. I think it contains *lots* of things that I see no motivation for and that I think we shouldn't define. I think that something much shorter and simpler is perfectly sufficient. As a consequence, I also think that it's not necessary to pull these bits into a single document; they're better kept in the respective individual documents (Semantics, Concepts, Schema, Primer, and the various syntax documents). These fundamental complaints also make it a bit hard for me to comment on the details, so my comments below will consist mostly of complaints that we don't need to standardize XYZ.

Summary: okay with me, but I'm somewhat concerned your approach wont
work with the rest of the group and/or external reviewers.   On the
other hand, my approach isn't working with them either, so,... yeah.

My main concern with doing it like you suggest is that I think having
more text is likely to help folks understand what we're talking about.

I'm thinking the use cases and worked-example could turn into a WG Note
or some sort of Datasets Primer, and that would probably serve.

> Brief section-by-section comments follow.
> 
> > 1 Introduction
> 
> Ok nice.
> 
> > 2 Use Cases
> 
> Ok nice. I wouldn't know what to do with the use cases if the document were to be broken up. Any ideas?
> 
> > 3 Concepts
> >   3.1 Space
> 
> I strongly disagree with defining “space” based on the nature or characteristics of the thing identified. Strongly disagree with the whole “container” metaphor. The definitions mean that if I produce some triples by running some NLP on web pages, I'm not allowed to stick that into a SPARQL store using the web page URL as graph name. This is not acceptable to me.
> 
> I've written this up in more detail here, including a different proposal:
> http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0481.html

I don't really agree, but I'm getting pretty comfortable with your
minimalist approach here.  That is, I think it's okay to avoid
characterizing what sort of thing the graph names denote.  I'm not sure
how that will play with everyone else, though.

> >   3.2 Quad and Quadset
> 
> Why is this needed? Propose remove.
> 
> >   3.3 Dataset
> 
> Ok. Not sure about the last two paragraphs — they should be informative notes at best I think.
> 
> >   3.4 Named Graph
> 
> I don't think we are writing a folk dictionary of the Semantic Web. We define terms for use in other specifications and that's it. No need to explain where the term originated or that it's sometimes used in this way and sometimes in another way. Either we define the term, then it should go into the Dataset section. Or we don't, then it should go away.
> 
> >   3.5 Quadset/Dataset Relationship
> 
> I don't think we need quadsets. Propose remove.
> 
> >   3.6 Graph Store
> 
> Not entirely convinced that we need both the “snapshot” and “mutable” versions of the abstract syntax. Why do you think we do?
> 
> >   3.7 Merge and Union
> 
> Why do we need this? My working assumption is that these are SPARQL-specific things that SPARQL should define.
> 
> >   3.8 Untrusting Merge
> 
> Well, it's good that you have worked out a way of doing this, but it seems like application stuff to me. We don't need to define this. Remove, or Primer.
> 
> > 4 Semantics
> 
> I'm generally ok with assigning truth values to the named graphs based on whether the thing identified by the graph name actually “contains” the triples.
> 
> I'd prefer an expression of the semantics that assigns truth values to IRI-graph-pairs, rather than quads. I think that a definition in terms of a “state relationship” or “state function” works better than the space-contains-triple relationship used here.
> 
> If this is true:
> 
>   :a { :b :c :d. :e :f :g }
> 
> then in your semantics it follows that this is true:
> 
>   :a { :b :c :d. }
> 
> But it doesn't follow that this is true:
> 
>   :a { [] :c :d. }
> 
> I don't really understand how it makes sense to allow the one entailment but not the other. Either the semantics are pure quoting (then the first entailment above shouldn't hold), or they are entailment-based (then the second entailment should hold). This “subset semantics” feels wrong to me.

I have some trouble with the subset semantics, too, but several people
in the WG were very strongly in favor of them.   They wanted it to be
the case that:

   :a { :b :c 1 }
and
   :a { :b :c 2 }
entails
   :a { :b :c 1,2 }

With complete-graph semantics those two datasets contract each other.  I
think the logic is clear, but that's not the answer people wanted to
hear.

This is the place I'm feeling most stuck right now.  Today, I came up
with a (federated phonebook) use case that reveals this: people want to
use the phone book to find out whether someone is a member of the staff
or not.  They would like to make the assumption that if someone is not
listed, then they are not a staff member.   And the feeds from some
divisions are complete, so this can be done for the staff of that
division.  But other divisions have only partial data.  How can HQ
convey in the dataset given to phonebook display software which graphs
are complete in their listing of employees of that division?

> > 5 Dataset Languages
> 
> Will leave this to the syntax folks.
> 
> > 6 Conformance
> 
> This section is not necessary as this document should be broken up and distributed over the various relevant specs, which (hopefully) have their own conformance clauses.
> 
> > A Detailed Example
> 
> Ok, looking forward to rest
> 
> > B Folding
> 
> If you want to convey a dataset, then why not use a dataset syntax? What is the use of turning a perfectly fine dataset into a stinking triple tarpit? Why clutter the RDF namespace with a Reification 2.0 Vocabulary?
> 
> Here's a not entirely serious proposal for a better folding method: Serialize it as N-Quads, bzip2 it, base64 encode it, and stuff it into a data URI:
> 
>   <data:application/n-quads;base64,SSBjYW4gaGF6IEpTT04/>.
> 
> I *do* believe that this will objectively better meet whatever use cases you have in mind for the folding.

I keep thinking of the folks at the workshop who were adamant that
RDF/XML is still crucial to their vast projects and business plans.

I'm fine with moving this to the Note, or whatever, until/unless someone
complains.
 
> I propose to remove this section.
> 
> Best,
> Richard

Wishing he had a good closing term like "cheers" or "best", 
   -- Sandro
Received on Wednesday, 30 May 2012 00:25:52 UTC