Re: Comment on the Dataset proposal (syntax) from Antoine Zimmermann on 2012-04-26 (public-rdf-wg@w3.org from April 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 26 Apr 2012 19:04:49 +0200
To: Sandro Hawke <sandro@w3.org>
CC: Richard Cyganiak <richard@cyganiak.de>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <4F998031.8050000@emse.fr>
Le 26/04/2012 18:13, Sandro Hawke a écrit :
> On Thu, 2012-04-26 at 17:30 +0200, Antoine Zimmermann wrote:
>> Hi,
>>
>>
>> (This email is mostly for Richard's attention)
>>
>> Putting aside the discussion on dataset semantics, I have a few comments
>> on the way the dataset proposal is described in terms of syntax:
>>
>>
>> "The RDF data model expresses information as graphs consisting of
>> triples with subject, predicate and object."
>>
>> The word "graph", in the RDF specifications, should never appear alone
>> like this. It is well known that a graph is a pair (V,E) where V is a
>> set of vertices and E is a set of edges. This is not what RDF Graphs
>> are. RDF Graphs are not graphs, in any of the accepted mathematical
>> definition of the term.
>
> Aren't RDF Graphs a kind of graph?   The restrictions, I think, are that
> there are no unconnected vertices, the edges are directed and labeled
> with an IRI, and the nodes may be labeled with an IRI or a datatype
> expression.   If this is true, that every RDF Graph is a graph, then I
> think linguistically it's okay to sometimes use the term "graph" if it
> makes the text read better and doesn't introduce too much ambiguity.


Right, an RDF graph could be defined as a directed labelled multigraph, 
but this would make the structure more complicated. You would need a set 
of vertices, a set of arcs, a function that associates an arc with the 
pair of nodes it connects, a labelling function for arcs which only 
assigns URIs, a labelling function for vertices such that if a vertice 
appears as the source of an arc, then it cannot assign a literal label 
to it, otherwise it can be labelled as a bnode, URI or literal. This is 
*very* different from a set of triples.
We only allow ourselves to say "graphs" because there is an isomorphism 
between the two definitions, and mostly because it is very convenient to 
draw graphs on paper or on a black board.

Anyways, I'm happy with Richard's answer, it addresses my concerns.



AZ


>
>> We already agreed that the word "graph" alone is
>> ambiguous and we resolved to use the phrase "RDF Graph" whenever we talk
>> about sets of triples.
>>
>> SUGGESTION:
>> "The RDF data model expresses information as RDF Graphs consisting of a
>> set of triples with subject, predicate and object."
>>
>> -----
>>
>> "Often, one wants to hold multiple RDF graphs and record information
>> about each graph, allowing an application to work with datasets that
>> involve information from more than one graph."
>>
>> SUGGESTION:
>> "... each RDF Graph, ... than one RDF Graph."
>>
>> To sound less redundent, "hold multiple RDF graphs and record
>> information about each one, ..."
>>
>> -----
>>
>> "An RDF Dataset represents a collection of graphs. An RDF Dataset
>> comprises one graph, the default graph, which does not have a name, and
>> zero or more named graphs, where each named graph is identified by an IRI."
>>
>> Maybe say "distinguished RDF Graph":
>>
>> SUGGESTION:
>> "An RDF Dataset comprises one distinguished RDF Graph, the /default
>> graph/, which does not have a name, ..."
>>
>> Moreover, the word "identified" may be missinterpreted.
>>
>> SUGGESTION:
>> "..., where each named graph associates an IRI with an RDF Graph."
>>
>> -----
>>
>> "An RDF Dataset may contain zero named graphs; an RDF Dataset always
>> contains one default graph."
>>
>> SUGGESTION:
>> add "The default graph MAY be empty."
>>
>> -----
>>
>> Maybe a definition for "named graph" could be given before the formal
>> definition of RDF Dataset:
>>
>> SUGGESTION:
>> "A /named graph/ is a pair (n,g) where n is an IRI called the /graph
>> name/ and g is an RDF Graph."
>>
>> -----
>>
>> "Formally, an RDF dataset is a set:
>>
>> { G, (<u1>, G1), (<u2>, G2), . . . (<un>, Gn) }
>>
>> where G and each Gi are graphs, and each<ui>  is an IRI. Each<ui>  is
>> distinct."
>>
>> "... are RDF Graphs, ..."
>>
>> ----
>>
>> "G is called the default graph. The pairs (<ui>, Gi) are called named
>> graphs."
>>
>> If "named graph" is defined before, it could look like this:
>>
>> SUGGESTION:
>> "G is called the default graph. The pairs (<ui>, Gi) are named graphs."
>
> I have to say (again) that I'm not okay with calling something a "named
> graph", especially formally, when it isn't named and isn't a graph (or
> RDF Graph).   If we have to use the terms "name" and "graph", then the
> pair (ui, Gi) is a name-graph pair, and Gi is the named graph.
>
> I don't think wordsmithing this section will productive until/unless we
> have a shared understand of what we actually want to say, though.
>
>      -- Sandro
>
>
>


-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 26 April 2012 17:03:47 UTC