Re: Comments on RDF Spaces document

Richard,

I try to understand where exactly the disagreements on the content are... To make it clear, my opinion is not Sandro's, and he may not agree with what I say...:-)

On May 25, 2012, at 10:42 , Richard Cyganiak wrote:

> Hi Sandro,
> 
> Below some comments on your RDF Spaces and Datasets draft:
> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html
> 
[snip]


> Summary: I have some rather fundamental issues with the document. I think it contains *lots* of things that I see no motivation for and that I think we shouldn't define. I think that something much shorter and simpler is perfectly sufficient. As a consequence, I also think that it's not necessary to pull these bits into a single document; they're better kept in the respective individual documents (Semantics, Concepts, Schema, Primer, and the various syntax documents).

I actually think that having all the issues in one document _temporarily_ is helpful. That does not mean that this is the final format of the document; actually I would agree that, eventually, this document may be better carved up into individual pieces, ie,  they could be spread around concept, primer, syntax, etc. But, for the time being, the alternative is that we have a a (huge) set of disparate texts spread over emails in the archives and it becomes increasingly complicated to follow and to see the big picture.

But, I believe, this is really a detail for now.

[snip]
> 
>> 3 Concepts
>>  3.1 Space
> 
> I strongly disagree with defining “space” based on the nature or characteristics of the thing identified. Strongly disagree with the whole “container” metaphor. The definitions mean that if I produce some triples by running some NLP on web pages, I'm not allowed to stick that into a SPARQL store using the web page URL as graph name. This is not acceptable to me.
> 

I am not sure that your last objection is actually fair. The text discloses NLP, but only because extraction with NLP is too vague (the results would depend on the NLP engine). I believe, taking a different example, that the current "space" notion would allow an HTML5+microdata page being a space (ie, its URI being the identifier of the space), because there is a well defined algorithm that produces RDF triples. To come back to your text, if the URI of the text made it clear that that the NLP extraction is based on, say, Zemanta's tagging engine then, for me, that would be a perfectly o.k. name for a space.

In other words: apart from a terminological mismatch, I do not think that the differences are so utterly huge as you seem to say.

> I've written this up in more detail here, including a different proposal:
> http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0481.html
> 
>>  3.2 Quad and Quadset
> 
> Why is this needed? Propose remove.

I am lukewarm about this, I must say. On the one hand, indeed, we could have named graphs (or whatever we call them) defined without explicit quads. One the other hand: shouldn't we, somewhere in our documents (remember that I look at this document as a 'gathering place') define quads? After all, they *are* widely used, and some sort of a relationships to named graphs should be defined somewhere.

So I am not sure myself... But I am not as clear-cut as you are.

> 
>>  3.3 Dataset
> 
> Ok. Not sure about the last two paragraphs — they should be informative notes at best I think.

*if* we keep them they *are* informative, that is for sure...

> 
>>  3.4 Named Graph
> 
> I don't think we are writing a folk dictionary of the Semantic Web. We define terms for use in other specifications and that's it. No need to explain where the term originated or that it's sometimes used in this way and sometimes in another way. Either we define the term, then it should go into the Dataset section. Or we don't, then it should go away.

Remember what I said: in my view, some parts of this document may end up in a primer, where this may be appropriate.

(I am not sure what you meant by 'folk dictionary', to be honest...)

> 
>>  3.5 Quadset/Dataset Relationship
> 
> I don't think we need quadsets. Propose remove.

I guess that is related to 3.2 above... the comments from there apply.

> 
>>  3.6 Graph Store
> 
> Not entirely convinced that we need both the “snapshot” and “mutable” versions of the abstract syntax. Why do you think we do?

Hm. My reading of this section is how the rest relates to what SPARQL 1.1 defines (not being 100% up-to-date with that part of SPARQL 1.1, I cannot judge whether what is in this section is correct or not).


> 
>>  3.7 Merge and Union
> 
> Why do we need this? My working assumption is that these are SPARQL-specific things that SPARQL should define.

SPARQL or not SPARQL, I am not convinced about the necessity of this section either, ie, I would be fine removing it.

> 
>>  3.8 Untrusting Merge
> 
> Well, it's good that you have worked out a way of doing this, but it seems like application stuff to me. We don't need to define this. Remove, or Primer.

My understanding was that this described a means to handle nested graphs which, on our last meeting, we declared out of scope for the time being. I'd agree that this section does not have a role for standards, ie, should be removed.

> 
>> 4 Semantics
> 
> I'm generally ok with assigning truth values to the named graphs based on whether the thing identified by the graph name actually “contains” the triples.
> 
> I'd prefer an expression of the semantics that assigns truth values to IRI-graph-pairs, rather than quads. I think that a definition in terms of a “state relationship” or “state function” works better than the space-contains-triple relationship used here.
> 
> If this is true:
> 
>  :a { :b :c :d. :e :f :g }
> 
> then in your semantics it follows that this is true:
> 
>  :a { :b :c :d. }
> 
> But it doesn't follow that this is true:
> 
>  :a { [] :c :d. }
> 
> I don't really understand how it makes sense to allow the one entailment but not the other. Either the semantics are pure quoting (then the first entailment above shouldn't hold), or they are entailment-based (then the second entailment should hold). This “subset semantics” feels wrong to me.

Yes, I see the issue. A long long time ago (time measured in the the named graph discussions:-) I had an attempt to formalize an earlier snapshot of the discussion:

http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1/Sem

which was also very minimalist. It was, in your terminology, a 'quoting' semantics, ie, for a datasets, nothing is said of the individual graphs in terms of entailment. (Of course, stronger entailment regimes could be built on the top, but I am not sure we should do that)

But I did not spend time on how that old design would work for the datasets, I must admit.

[snip]
> 
>> 6 Conformance
> 
> This section is not necessary as this document should be broken up and distributed over the various relevant specs, which (hopefully) have their own conformance clauses.
> 

But it contributes to those...


>> A Detailed Example
> 
> Ok, looking forward to rest
> 
>> B Folding
> 
> If you want to convey a dataset, then why not use a dataset syntax? What is the use of turning a perfectly fine dataset into a stinking triple tarpit? Why clutter the RDF namespace with a Reification 2.0 Vocabulary?
> 
> Here's a not entirely serious proposal for a better folding method: Serialize it as N-Quads, bzip2 it, base64 encode it, and stuff it into a data URI:
> 
>  <data:application/n-quads;base64,SSBjYW4gaGF6IEpTT04/>.
> 
> I *do* believe that this will objectively better meet whatever use cases you have in mind for the folding.
> 
> I propose to remove this section.

I agree.

Cheers

Ivan

> 
> Best,
> Richard


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Saturday, 26 May 2012 15:15:43 UTC