- From: Stephen Petschulat/CanWest/IBM <spetschu@ca.ibm.com>
- Date: Wed, 18 Jul 2001 07:53:36 -0700
- To: Frank Manola <fmanola@mitre.org>
- Cc: Graham Klyne <Graham.Klyne@baltimore.com>, w3c-rdfcore-wg@w3.org
> I think it's important not to let the tail wag the dog too much here. > My understanding is that the key question involved in "rdfms-graph" is > one of scope: we talk about "a graph" or "a model" without being able > to describe very easily what that consists of, or what kind of thing it > is (the "model" term is particularly significant when we aren't > specifically talking about one of the graph-like pictures in the M&S, > but rather about a collection of triples or some XML serialization). Agreed. The term 'model' is over used. I don't see the issue #rdfms-graph driving the RDF Formal Model, but rather the Formal Model (driven primarily by logic) should be mappable to a graph theoretical representation once it has been nailed down. The nice thing with mathematics is that if you get it right in one branch, it will often work out well in other branches. http://www.w3.org/2000/03/rdf-tracking/#rdfms-graph ...notes that "The term 'model' is often used as a synonym for an RDF graph." Lots of room for confusion here... which is why this issue was raised. I believe most of your discussion below isn't about clarifying the M&S formal use of mathematical graphs as a model (which is what I was driving at), but rather raises the issue of how the Formal Model should define some special notion of a collection of RDF statements. M&S currently uses the term RDF graph interchangeably with RDF model when the underlying concept it is getting at is an RDF dataset (collection of statements) or something similar. A graph is just one of many useful views on this dataset and probably not the most fundamental one if n-triples are the basis for the abstract syntax. To clarify our discussion on this issue, I would propose there are actually two separate parts to it: 1) What kind of data structure should the RDF Formal Model use to define an RDF dataset/model/collection? Once we decide this, we should clarify the spec so it says RDF graph when it is talking about a graph and RDF <pick a name> when it is talking about this data structure. 2) We still need to properly define what the mapping is between this RDF dataset and a mathematical graph. 1) and 2) are largely independent ie. you can map a bag, a set, etc. into a graph without too much difficulty. 1) is critical so we can avoid using misleading terms and 2) is important since it will provide users who want to treat a bunch of RDF statements as a graph the rules necessary to create this graph in a consistent manner. We would then be able to talk about an "RDF Graph" and properly understand what everyone means by that. Obviously, for no other reason than to avoid changing much of RDF M&S, we may want to decide that RDF Formal Model use mathematical graphs as the formalism to represent collections of RDF statements. In this case, I think the tail may need to do some wagging or else we'll end up with a spec 'clarification' that continues to cause confusion. Another option may be to say that the Formal Model defines no such collection of statements and arrows and circle diagrams are simply used for visualization. > From an abstract point of view, it's obviously some kind of collection > of RDF statements, but then come the questions, like: > a. what *kind* of collection is it? > b. is it a resource (and do we need to explicitly specify how it gets a > URI?) > c. what kinds of things go in it? > d. what is the purpose of defining such collections (e.g., do you scope > it for the purpose of attributing all of its contents in a common way)? These questions seem to be aimed at answering the first part of this issue (the graph theoretical representation of a collection of RDF statements would formally be two sets or a set & a bag depending on the distinctness requirement for edges... one for vertices and one for edges a.k.a. nodes & arcs. Obviously, putting a bunch of statements in a set or a bag doesn't give you a graph, that is the dataset you want to map to the graph). My initial stab at the above four would be: a. A bag. b. Yes. You can put a bunch of statements in an XML doc and put in on the web and give it a URI. No, the user gets to decide how they want to identify it. If it is stored in a database the URI might be jdbc:db:my_rdf_data_source. If it is in document it might be http://mystuff.com/assertions.rdf. c. (s, p, o) triples. d. Good question. The best answer I can come up with is that I don't think we can avoid it since many applications will want to take an RDF document, a collection of RDF documents, an ontology, etc. and call that their domain of discourse. They may want to refer to a collection of statements or assert who created this ontology (not necessarily who created the serialization, but who actually created or owns the dataset). They may want to merge two separate RDF statement collections or find the intersection of statements in them (which brings up issues of equivalence). Plus RDF M&S constantly talks about groups of RDF statements and graphically represents them. If the above four were defined, this bag of triples can then be mapped to a graph theoretical representation & the spec cleaned up so that it uses a new term when it is refering to the concept of a dataset or collection of statements. The problem with the term "model" is that it is easily confused with the concept of the RDF (meta)Model. The only route out of this is to take the MOF/XMI route and start talking about meta-models and meta-meta-models. IMHO this just makes things worse :-). FWIW, SiRPAC accepts repeated statements and currently draws two arcs. So the following: <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description about="http://www.w3.org/RDF/Implementations/SiRPAC/"> <dc:creator rdf:resource="http://www.w3.org/People/Janne/"/> </rdf:Description> <rdf:Description about="http://www.w3.org/RDF/Implementations/SiRPAC/"> <dc:creator rdf:resource="http://www.w3.org/People/Janne/"/> </rdf:Description> </rdf:RDF> renders two indentical triples: <http://www.w3.org/RDF/Implementations/SiRPAC/> < http://purl.org/dc/elements/1.1/creator> <http://www.w3.org/People/Janne/> . <http://www.w3.org/RDF/Implementations/SiRPAC/> < http://purl.org/dc/elements/1.1/creator> <http://www.w3.org/People/Janne/> . which map to the following graph G(V,E): Vertices V={http://www.w3.org/RDF/Implementations/SiRPAC/, http://www.w3.org/People/Janne/, http://purl.org/dc/elements/1.1/creator } Edges E=((http://www.w3.org/RDF/Implementations/SiRPAC/, http://www.w3.org/People/Janne/, http://purl.org/dc/elements/1.1/creator), (http://www.w3.org/RDF/Implementations/SiRPAC/, http://www.w3.org/People/Janne/, http://purl.org/dc/elements/1.1/creator)) Where the elements in the bag of edges E of the labelled digraph G(V,E) have the form (source vertex label, sink vertex label, edge label). Of course, we don't need to define a graph syntax in M&S in order to clarify. It is should be unambiguous to simply state what the mapping of (s,p,o) triples are to vertices, edges, and labels and whether the edge collection is distinct or not. > The issue of whether an isolated subject ought to be permitted in the > contents of such a collection needs to come out of the answers to these > (and possibly other) questions (like, what do we intend the meaning of > such a thing to be?), rather than being simply decided on the basis that > isolated nodes are legal graphs in graph theory True, the semantics w.r.t. the RDF Model must drive this decision, not "cuz we can". However, see my previous point about choosing a graph as the fundamental "statement collection" formalism. Note also that isolated nodes aren't simply legal in graphs, they are a fundamental part of the definition of a graph. > (we may only be talking about a certain subset of the graphs definable in > graph theory--we could still call them "graphs"). If you are going to use formal logic terminology, then you had better nail down your formal model. Likewise if you are going to use terms like 'directed graph', 'arc', 'node', etc. then be precise or drop the terminology & simply say that "a group of RDF statements can be graphically shown using circles and arrows." Otherwise we'll end up with a URI looking like this: http://www.w3.org/RDF-2.0/rdf-issue-tracking/#rdfms-graph :-) - steve IStephen Petschulat Frank Manola <fmanola@mitre.org> To: Graham Klyne <Graham.Klyne@baltimore.com> Sent by: cc: Stephen Petschulat/CanWest/IBM@IBMCA, w3c-rdfcore-wg@w3.org w3c-rdfcore-wg-requ Subject: Re: rdfms-graph: Food for thought est@w3.org 17/07/2001 12:43 PM Please respond to Frank Manola I think it's important not to let the tail wag the dog too much here. My understanding is that the key question involved in "rdfms-graph" is one of scope: we talk about "a graph" or "a model" without being able to describe very easily what that consists of, or what kind of thing it is (the "model" term is particularly significant when we aren't specifically talking about one of the graph-like pictures in the M&S, but rather about a collection of triples or some XML serialization). From an abstract point of view, it's obviously some kind of collection of RDF statements, but then come the questions, like: a. what *kind* of collection is it? b. is it a resource (and do we need to explicitly specify how it gets a URI?) c. what kinds of things go in it? d. what is the purpose of defining such collections (e.g., do you scope it for the purpose of attributing all of its contents in a common way)? The issue of whether an isolated subject ought to be permitted in the contents of such a collection needs to come out of the answers to these (and possibly other) questions (like, what do we intend the meaning of such a thing to be?), rather than being simply decided on the basis that isolated nodes are legal graphs in graph theory (we may only be talking about a certain subset of the graphs definable in graph theory--we could still call them "graphs"). Note that even if we don't wind up dealing with disconnected nodes (like subjects), we will still wind up dealing with "models" or "graphs" that contain disconnected subgraphs in any sensible interpretation of "model" or "graph". For example, many collections of RDF statements will consist of disconnected subgraphs, each subgroup consisting of the descriptions pertaining to a separate subject (Web resource). (I'm assuming here that you can separately scope a collection of RDF statements, even when objects in those statements are sometimes URIs of resources (including literals) "located" elsewhere. If such references mean that those referred-to resources are also in the graph, then I don't see how we can talk about more than one RDF model at all, particularly if literals wind up having URIs). --Frank Graham Klyne wrote: > Steve, > > I think I broadly agree with what you say. My term "awkward" isn't > meant to imply problematic, or even difficult. My purpose of engagement > here is based on: > (a) my perception that representing isolated nodes adds some complexity > (though maybe not very much), and > (b) questioning whether there is any real purpose in adding this small > extra complexity to RDF. > > That said, Aaaron's proposal to represent isolated nodes as ( <foo> > rdf:type rdfs:Resource ) overcomes those objections (but introduces > another because it would make the RDF core dependent on a schema > definition, viz rdfs:Resource). > > You also say: > >> "An RDF Subject that does not have any associated Properties >> corresponds to >> a disconnected node in a graph. The value of the about/ID attribute of >> this >> element is the label of the disconnected node." > > > With which I'd pick a nit: > > My take on the current M&S is that the concept of "an RDF Subject" is > meaningful only in the context of a property -- a "Subject" doesn't > exist in isolation. A resource can be any or all of Subject, Object or > Property depending on how it is used. > > (This isn't affected by your rewording in a different message.) > > #g > -- > > At 08:44 AM 7/17/01 -0700, Stephen Petschulat/CanWest/IBM wrote: > >> I don't really see this as being about the abstract syntax as much as the >> graph theoretical model. Right now RDF pays lip service to being a >> "graph", >> but doesn't formalize this in the model. If we do intend to lay down a >> graph theoretical foundation for RDF then this issue is fundamental. >> Graph >> theory makes use of disconnected nodes in graphs (ie. a graph is defined >> such that it may contain disconnected nodes) so it would seem we should >> either explicitly define what it means or have a good reason to disallow >> out it (an possibly lose out on the body of graph theory that requires a >> graph be able to have edgeless/arcless nodes). As far as being awkward to >> define, I don't think this is the case for the graph theoretical model, >> although I don't know how the logic people would deal with it. The >> definition can be as simple as: >> >> "An RDF Subject that does not have any associated Properties >> corresponds to >> a disconnected node in a graph. The value of the about/ID attribute of >> this >> element is the label of the disconnected node." -- Frank Manola The MITRE Corporation 202 Burlington Road, MS A345 Bedford, MA 01730-1420 mailto:fmanola@mitre.org voice: 781-271-8147 FAX: 781-271-875
Received on Wednesday, 18 July 2001 10:59:15 UTC