Comments on RDF Spaces document from Richard Cyganiak on 2012-05-25 (public-rdf-wg@w3.org from May 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 25 May 2012 09:42:05 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <00AEDA34-4575-4700-97F6-E85304C94C3A@cyganiak.de>
Hi Sandro,

Below some comments on your RDF Spaces and Datasets draft:
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html

Summary: I have some rather fundamental issues with the document. I think it contains *lots* of things that I see no motivation for and that I think we shouldn't define. I think that something much shorter and simpler is perfectly sufficient. As a consequence, I also think that it's not necessary to pull these bits into a single document; they're better kept in the respective individual documents (Semantics, Concepts, Schema, Primer, and the various syntax documents). These fundamental complaints also make it a bit hard for me to comment on the details, so my comments below will consist mostly of complaints that we don't need to standardize XYZ.

Brief section-by-section comments follow.

> 1 Introduction

Ok nice.

> 2 Use Cases

Ok nice. I wouldn't know what to do with the use cases if the document were to be broken up. Any ideas?

> 3 Concepts
>   3.1 Space

I strongly disagree with defining “space” based on the nature or characteristics of the thing identified. Strongly disagree with the whole “container” metaphor. The definitions mean that if I produce some triples by running some NLP on web pages, I'm not allowed to stick that into a SPARQL store using the web page URL as graph name. This is not acceptable to me.

I've written this up in more detail here, including a different proposal:
http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0481.html

>   3.2 Quad and Quadset

Why is this needed? Propose remove.

>   3.3 Dataset

Ok. Not sure about the last two paragraphs — they should be informative notes at best I think.

>   3.4 Named Graph

I don't think we are writing a folk dictionary of the Semantic Web. We define terms for use in other specifications and that's it. No need to explain where the term originated or that it's sometimes used in this way and sometimes in another way. Either we define the term, then it should go into the Dataset section. Or we don't, then it should go away.

>   3.5 Quadset/Dataset Relationship

I don't think we need quadsets. Propose remove.

>   3.6 Graph Store

Not entirely convinced that we need both the “snapshot” and “mutable” versions of the abstract syntax. Why do you think we do?

>   3.7 Merge and Union

Why do we need this? My working assumption is that these are SPARQL-specific things that SPARQL should define.

>   3.8 Untrusting Merge

Well, it's good that you have worked out a way of doing this, but it seems like application stuff to me. We don't need to define this. Remove, or Primer.

> 4 Semantics

I'm generally ok with assigning truth values to the named graphs based on whether the thing identified by the graph name actually “contains” the triples.

I'd prefer an expression of the semantics that assigns truth values to IRI-graph-pairs, rather than quads. I think that a definition in terms of a “state relationship” or “state function” works better than the space-contains-triple relationship used here.

If this is true:

  :a { :b :c :d. :e :f :g }

then in your semantics it follows that this is true:

  :a { :b :c :d. }

But it doesn't follow that this is true:

  :a { [] :c :d. }

I don't really understand how it makes sense to allow the one entailment but not the other. Either the semantics are pure quoting (then the first entailment above shouldn't hold), or they are entailment-based (then the second entailment should hold). This “subset semantics” feels wrong to me.

> 5 Dataset Languages

Will leave this to the syntax folks.

> 6 Conformance

This section is not necessary as this document should be broken up and distributed over the various relevant specs, which (hopefully) have their own conformance clauses.

> A Detailed Example

Ok, looking forward to rest

> B Folding

If you want to convey a dataset, then why not use a dataset syntax? What is the use of turning a perfectly fine dataset into a stinking triple tarpit? Why clutter the RDF namespace with a Reification 2.0 Vocabulary?

Here's a not entirely serious proposal for a better folding method: Serialize it as N-Quads, bzip2 it, base64 encode it, and stuff it into a data URI:

  <data:application/n-quads;base64,SSBjYW4gaGF6IEpTT04/>.

I *do* believe that this will objectively better meet whatever use cases you have in mind for the folding.

I propose to remove this section.

Best,
Richard
Received on Friday, 25 May 2012 08:42:45 UTC