Re: Comment on the Dataset proposal (syntax) from Pat Hayes on 2012-04-29 (public-rdf-wg@w3.org from April 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 28 Apr 2012 22:11:20 -0500
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: Sandro Hawke <sandro@w3.org>, Richard Cyganiak <richard@cyganiak.de>, RDF WG <public-rdf-wg@w3.org>
Message-Id: <F02FDA33-4882-4B72-868A-723EA72BB223@ihmc.us>
On Apr 26, 2012, at 12:04 PM, Antoine Zimmermann wrote:

> Le 26/04/2012 18:13, Sandro Hawke a écrit :
>> On Thu, 2012-04-26 at 17:30 +0200, Antoine Zimmermann wrote:
>>> Hi,
>>> 
>>> 
>>> (This email is mostly for Richard's attention)
>>> 
>>> Putting aside the discussion on dataset semantics, I have a few comments
>>> on the way the dataset proposal is described in terms of syntax:
>>> 
>>> 
>>> "The RDF data model expresses information as graphs consisting of
>>> triples with subject, predicate and object."
>>> 
>>> The word "graph", in the RDF specifications, should never appear alone
>>> like this. It is well known that a graph is a pair (V,E) where V is a
>>> set of vertices and E is a set of edges. This is not what RDF Graphs
>>> are. RDF Graphs are not graphs, in any of the accepted mathematical
>>> definition of the term.
>> 
>> Aren't RDF Graphs a kind of graph?   The restrictions, I think, are that
>> there are no unconnected vertices, the edges are directed and labeled
>> with an IRI, and the nodes may be labeled with an IRI or a datatype
>> expression.   If this is true, that every RDF Graph is a graph, then I
>> think linguistically it's okay to sometimes use the term "graph" if it
>> makes the text read better and doesn't introduce too much ambiguity.
> 
> 
> Right, an RDF graph could be defined as a directed labelled multigraph, but this would make the structure more complicated. You would need a set of vertices, a set of arcs, a function that associates an arc with the pair of nodes it connects, a labelling function for arcs which only assigns URIs, a labelling function for vertices such that if a vertice appears as the source of an arc, then it cannot assign a literal label to it, otherwise it can be labelled as a bnode, URI or literal. This is *very* different from a set of triples.
> We only allow ourselves to say "graphs" because there is an isomorphism between the two definitions, and mostly because it is very convenient to draw graphs on paper or on a black board.

FWIW, the first RDF WG went through this exercise and came to the same conclusion. A set of triples is so much simpler a concept than a mathematical graph; and we can always understand the "graph" in "RDF graph" is short for "graphical". 

Pat


> 
> Anyways, I'm happy with Richard's answer, it addresses my concerns.
> 
> 
> 
> AZ
> 
> 
>> 
>>> We already agreed that the word "graph" alone is
>>> ambiguous and we resolved to use the phrase "RDF Graph" whenever we talk
>>> about sets of triples.
>>> 
>>> SUGGESTION:
>>> "The RDF data model expresses information as RDF Graphs consisting of a
>>> set of triples with subject, predicate and object."
>>> 
>>> -----
>>> 
>>> "Often, one wants to hold multiple RDF graphs and record information
>>> about each graph, allowing an application to work with datasets that
>>> involve information from more than one graph."
>>> 
>>> SUGGESTION:
>>> "... each RDF Graph, ... than one RDF Graph."
>>> 
>>> To sound less redundent, "hold multiple RDF graphs and record
>>> information about each one, ..."
>>> 
>>> -----
>>> 
>>> "An RDF Dataset represents a collection of graphs. An RDF Dataset
>>> comprises one graph, the default graph, which does not have a name, and
>>> zero or more named graphs, where each named graph is identified by an IRI."
>>> 
>>> Maybe say "distinguished RDF Graph":
>>> 
>>> SUGGESTION:
>>> "An RDF Dataset comprises one distinguished RDF Graph, the /default
>>> graph/, which does not have a name, ..."
>>> 
>>> Moreover, the word "identified" may be missinterpreted.
>>> 
>>> SUGGESTION:
>>> "..., where each named graph associates an IRI with an RDF Graph."
>>> 
>>> -----
>>> 
>>> "An RDF Dataset may contain zero named graphs; an RDF Dataset always
>>> contains one default graph."
>>> 
>>> SUGGESTION:
>>> add "The default graph MAY be empty."
>>> 
>>> -----
>>> 
>>> Maybe a definition for "named graph" could be given before the formal
>>> definition of RDF Dataset:
>>> 
>>> SUGGESTION:
>>> "A /named graph/ is a pair (n,g) where n is an IRI called the /graph
>>> name/ and g is an RDF Graph."
>>> 
>>> -----
>>> 
>>> "Formally, an RDF dataset is a set:
>>> 
>>> { G, (<u1>, G1), (<u2>, G2), . . . (<un>, Gn) }
>>> 
>>> where G and each Gi are graphs, and each<ui>  is an IRI. Each<ui>  is
>>> distinct."
>>> 
>>> "... are RDF Graphs, ..."
>>> 
>>> ----
>>> 
>>> "G is called the default graph. The pairs (<ui>, Gi) are called named
>>> graphs."
>>> 
>>> If "named graph" is defined before, it could look like this:
>>> 
>>> SUGGESTION:
>>> "G is called the default graph. The pairs (<ui>, Gi) are named graphs."
>> 
>> I have to say (again) that I'm not okay with calling something a "named
>> graph", especially formally, when it isn't named and isn't a graph (or
>> RDF Graph).   If we have to use the terms "name" and "graph", then the
>> pair (ui, Gi) is a name-graph pair, and Gi is the named graph.
>> 
>> I don't think wordsmithing this section will productive until/unless we
>> have a shared understand of what we actually want to say, though.
>> 
>>     -- Sandro
>> 
>> 
>> 
> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 83 36
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 29 April 2012 03:12:03 UTC