Re: rdfms-graph: Food for thought from Stephen Petschulat/CanWest/IBM on 2001-07-18 (w3c-rdfcore-wg@w3.org from July 2001)

From: Stephen Petschulat/CanWest/IBM <spetschu@ca.ibm.com>
Date: Wed, 18 Jul 2001 07:53:36 -0700
To: Frank Manola <fmanola@mitre.org>
Cc: Graham Klyne <Graham.Klyne@baltimore.com>, w3c-rdfcore-wg@w3.org
Message-ID: <OF575E8111.42646F96-ON88256A8C.00739EB8@mkm.can.ibm.com>
> I think it's important not to let the tail wag the dog too much here.
> My understanding is that the key question involved in "rdfms-graph" is
> one of scope: we talk about "a graph" or "a model" without being able
> to describe very easily what that consists of, or what kind of thing it
> is (the "model" term is particularly significant when we aren't
> specifically talking about one of the graph-like pictures in the M&S,
> but rather about a collection of triples or some XML serialization).

Agreed. The term 'model' is over used. I don't see the issue #rdfms-graph
driving the RDF Formal Model, but rather the Formal Model (driven primarily
by logic) should be mappable to a graph theoretical representation once it
has been nailed down. The nice thing with mathematics is that if you get it
right in one branch, it will often work out well in other branches.

http://www.w3.org/2000/03/rdf-tracking/#rdfms-graph

...notes that "The term 'model' is often used as a synonym for an RDF
graph." Lots of room for confusion here... which is why this issue was
raised.

I believe most of your discussion below isn't about clarifying the M&S
formal use of mathematical graphs as a model (which is what I was driving
at), but rather raises the issue of how the Formal Model should define some
special notion of a collection of RDF statements. M&S currently uses the
term RDF graph interchangeably with RDF model when the underlying concept
it is getting at is an RDF dataset (collection of statements) or something
similar. A graph is just one of many useful views on this dataset and
probably not the most fundamental one if n-triples are the basis for the
abstract syntax.

To clarify our discussion on this issue, I would propose there are actually
two separate parts to it:

1) What kind of data structure should the RDF Formal Model use to define an
RDF dataset/model/collection? Once we decide this, we should clarify the
spec so it says RDF graph when it is talking about a graph and RDF <pick a
name> when it is talking about this data structure.
2) We still need to properly define what the mapping is between this RDF
dataset and a mathematical graph.

1) and 2) are largely independent ie. you can map a bag, a set, etc. into a
graph without too much difficulty. 1) is critical so we can avoid using
misleading terms and 2) is important since it will provide users who want
to treat a bunch of RDF statements as a graph the rules necessary to create
this graph in a consistent manner. We would then be able to talk about an
"RDF Graph" and properly understand what everyone means by that.

Obviously, for no other reason than to avoid changing much of RDF M&S, we
may want to decide that RDF Formal Model use mathematical graphs as the
formalism to represent collections of RDF statements. In this case, I think
the tail may need to do some wagging or else we'll end up with a spec
'clarification' that continues to cause confusion. Another option may be to
say that the Formal Model defines no such collection of statements and
arrows and circle diagrams are simply used for visualization.

> From an abstract point of view, it's obviously some kind of collection
> of RDF statements, but then come the questions, like:
> a.  what *kind* of collection is it?
> b.  is it a resource (and do we need to explicitly specify how it gets a
> URI?)
> c.  what kinds of things go in it?
> d.  what is the purpose of defining such collections (e.g., do you scope
> it for the purpose of attributing all of its contents in a common way)?

These questions seem to be aimed at answering the first part of this issue
(the graph theoretical representation of a collection of RDF statements
would formally be two sets or a set & a bag depending on the distinctness
requirement for edges... one for vertices and one for edges a.k.a. nodes &
arcs. Obviously, putting a bunch of statements in a set or a bag doesn't
give you a graph, that is the dataset you want to map to the graph).

My initial stab at the above four would be:

a. A bag.
b. Yes. You can put a bunch of statements in an XML doc and put in on the
web and give it a URI. No, the user gets to decide how they want to
identify it. If it is stored in a database the URI might be
jdbc:db:my_rdf_data_source. If it is in document it might be
http://mystuff.com/assertions.rdf.
c. (s, p, o) triples.
d. Good question. The best answer I can come up with is that I don't think
we can avoid it since many applications will want to take an RDF document,
a collection of RDF documents, an ontology, etc. and call that their domain
of discourse. They may want to refer to a collection of statements or
assert who created this ontology (not necessarily who created the
serialization, but who actually created or owns the dataset). They may want
to merge two separate RDF statement collections or find the intersection of
statements in them (which brings up issues of equivalence). Plus RDF M&S
constantly talks about groups of RDF statements and graphically represents
them.

If the above four were defined, this bag of triples can then be mapped to a
graph theoretical representation & the spec cleaned up so that it uses a
new term when it is refering to the concept of a dataset or collection of
statements. The problem with the term "model" is that it is easily confused
with the concept of the RDF (meta)Model. The only route out of this is to
take the MOF/XMI route and start talking about meta-models and
meta-meta-models. IMHO this just makes things worse :-).

FWIW, SiRPAC accepts repeated statements and currently draws two arcs. So
the following:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description about="http://www.w3.org/RDF/Implementations/SiRPAC/">
    <dc:creator rdf:resource="http://www.w3.org/People/Janne/"/>
  </rdf:Description>
  <rdf:Description about="http://www.w3.org/RDF/Implementations/SiRPAC/">
    <dc:creator rdf:resource="http://www.w3.org/People/Janne/"/>
  </rdf:Description>
</rdf:RDF>

renders two indentical triples:

<http://www.w3.org/RDF/Implementations/SiRPAC/> <
http://purl.org/dc/elements/1.1/creator> <http://www.w3.org/People/Janne/>
.
<http://www.w3.org/RDF/Implementations/SiRPAC/> <
http://purl.org/dc/elements/1.1/creator> <http://www.w3.org/People/Janne/>
.

which map to the following graph G(V,E):

Vertices
V={http://www.w3.org/RDF/Implementations/SiRPAC/,
  http://www.w3.org/People/Janne/,
  http://purl.org/dc/elements/1.1/creator }

Edges
E=((http://www.w3.org/RDF/Implementations/SiRPAC/,
  http://www.w3.org/People/Janne/,
  http://purl.org/dc/elements/1.1/creator),
  (http://www.w3.org/RDF/Implementations/SiRPAC/,
  http://www.w3.org/People/Janne/,
  http://purl.org/dc/elements/1.1/creator))

Where the elements in the bag of edges E of the labelled digraph G(V,E)
have the form (source vertex label, sink vertex label, edge label). Of
course, we don't need to define a graph syntax in M&S in order to clarify.
It is should be unambiguous to simply state what the mapping of (s,p,o)
triples are to vertices, edges, and labels and whether the edge collection
is distinct or not.

> The issue of whether an isolated subject ought to be permitted in the
> contents of such a collection needs to come out of the answers to these
> (and possibly other) questions (like, what do we intend the meaning of
> such a thing to be?), rather than being simply decided on the basis that
> isolated nodes are legal graphs in graph theory

True, the semantics w.r.t. the RDF Model must drive this decision, not "cuz
we can". However, see my previous point about choosing a graph as the
fundamental "statement collection" formalism. Note also that isolated nodes
aren't simply legal in graphs, they are a fundamental part of the
definition of a graph.

> (we may only be talking about a certain subset of the graphs definable in
> graph theory--we could still call them "graphs").

If you are going to use formal logic terminology, then you had better nail
down your formal model. Likewise if you are going to use terms like
'directed graph', 'arc', 'node', etc. then be precise or drop the
terminology & simply say that "a group of RDF statements can be graphically
shown using circles and arrows." Otherwise we'll end up with a URI looking
like this:

http://www.w3.org/RDF-2.0/rdf-issue-tracking/#rdfms-graph

:-)

- steve

IStephen Petschulat



                                                                                                                        
                    Frank Manola                                                                                        
                    <fmanola@mitre.org>       To:     Graham Klyne <Graham.Klyne@baltimore.com>                         
                    Sent by:                  cc:     Stephen Petschulat/CanWest/IBM@IBMCA, w3c-rdfcore-wg@w3.org       
                    w3c-rdfcore-wg-requ       Subject:     Re: rdfms-graph: Food for thought                            
                    est@w3.org                                                                                          
                                                                                                                        
                                                                                                                        
                    17/07/2001 12:43 PM                                                                                 
                    Please respond to                                                                                   
                    Frank Manola                                                                                        
                                                                                                                        
                                                                                                                        



I think it's important not to let the tail wag the dog too much here.
My understanding is that the key question involved in "rdfms-graph" is
one of scope:  we talk about "a graph" or "a model" without being able
to describe very easily what that consists of, or what kind of thing it
is (the "model" term is particularly significant when we aren't
specifically talking about one of the graph-like pictures in the M&S,
but rather about a collection of triples or some XML serialization).
 From an abstract point of view, it's obviously some kind of collection
of RDF statements, but then come the questions, like:

a.  what *kind* of collection is it?
b.  is it a resource (and do we need to explicitly specify how it gets a
URI?)
c.  what kinds of things go in it?
d.  what is the purpose of defining such collections (e.g., do you scope
it for the purpose of attributing all of its contents in a common way)?

The issue of whether an isolated subject ought to be permitted in the
contents of such a collection needs to come out of the answers to these
(and possibly other) questions (like, what do we intend the meaning of
such a thing to be?), rather than being simply decided on the basis that
isolated nodes are legal graphs in graph theory (we may only be talking
about a certain subset of the graphs definable in graph theory--we could
still call them "graphs").  Note that even if we don't wind up dealing
with disconnected nodes (like subjects), we will still wind up dealing
with "models" or "graphs" that contain disconnected subgraphs in any
sensible interpretation of "model" or "graph".  For example, many
collections of RDF statements will consist of disconnected subgraphs,
each subgroup consisting of the descriptions pertaining to a separate
subject (Web resource).  (I'm assuming here that you can separately
scope a collection of RDF statements, even when objects in those
statements are sometimes URIs of resources (including literals)
"located" elsewhere.  If such references mean that those referred-to
resources are also in the graph, then I don't see how we can talk about
more than one RDF model at all, particularly if literals wind up having
URIs).

--Frank

Graham Klyne wrote:

> Steve,
>
> I think I broadly agree with what you say.  My term "awkward" isn't
> meant to imply problematic, or even difficult.  My purpose of engagement
> here is based on:
> (a) my perception that representing isolated nodes adds some complexity
> (though maybe not very much), and
> (b) questioning whether there is any real purpose in adding this small
> extra complexity to RDF.
>
> That said, Aaaron's proposal to represent isolated nodes as ( <foo>
> rdf:type rdfs:Resource ) overcomes those objections (but introduces
> another because it would make the RDF core dependent on a schema
> definition, viz rdfs:Resource).
>
> You also say:
>
>> "An RDF Subject that does not have any associated Properties
>> corresponds to
>> a disconnected node in a graph. The value of the about/ID attribute of
>> this
>> element is the label of the disconnected node."
>
>
> With which I'd pick a nit:
>
> My take on the current M&S is that the concept of "an RDF Subject" is
> meaningful only in the context of a property -- a "Subject" doesn't
> exist in isolation.  A resource can be any or all of Subject, Object or
> Property depending on how it is used.
>
> (This isn't affected by your rewording in a different message.)
>
> #g
> --
>
> At 08:44 AM 7/17/01 -0700, Stephen Petschulat/CanWest/IBM wrote:
>
>> I don't really see this as being about the abstract syntax as much as
the
>> graph theoretical model. Right now RDF pays lip service to being a
>> "graph",
>> but doesn't formalize this in the model. If we do intend to lay down a
>> graph theoretical foundation for RDF then this issue is fundamental.
>> Graph
>> theory makes use of disconnected nodes in graphs (ie. a graph is defined
>> such that it may contain disconnected nodes) so it would seem we should
>> either explicitly define what it means or have a good reason to disallow
>> out it (an possibly lose out on the body of graph theory that requires a
>> graph be able to have edgeless/arcless nodes). As far as being awkward
to
>> define, I don't think this is the case for the graph theoretical model,
>> although I don't know how the logic people would deal with it. The
>> definition can be as simple as:
>>
>> "An RDF Subject that does not have any associated Properties
>> corresponds to
>> a disconnected node in a graph. The value of the about/ID attribute of
>> this
>> element is the label of the disconnected node."


--
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
Received on Wednesday, 18 July 2001 10:59:15 UTC