- From: Frank Manola <fmanola@acm.org>
- Date: Sat, 5 Jan 2008 15:28:20 -0500
- To: Garret Wilson <garret@globalmentor.com>
- Cc: SWIG <semantic-web@w3.org>
I don't think it's as straightforward as you think to take Date's comments about one notation for relations and apply them to another (RDF). See below. On Jan 5, 2008, at 1:15 PM, Garret Wilson wrote: > > Frank Manola wrote: >> You're right that I'm thinking of only one relation, essentially >> the "triple" relation. > > ... > >> >> As this text notes, an equivalent way to think about it is that >> each RDF predicate is a distinct binary relation (in the >> relational model) having two columns, one containing the subject >> URI (the key column), and the other containing the RDF object >> value (literal or URI). This is essentially the normalized version >> I described in my earlier message. > > OK, I understand that I can create a single relation representing > triples and everything fits within the relational model. This one > possible technical implementation of an RDF graph I understand. But > are the semantics of the RDF graph being represented here? Is the > single relation describing resources, or describing triples? > > Let me put it another way. Let's say I have the RDF description of > #mybook that we've been discussing: > > <rdf:Description rdf:about="#mybook"> > <dc:subject>semantic web</dc:subject> > <dc:subject>airplanes</dc:subject> > </rdf:Description> > > How would we explain the semantics of this RDF graph? Something > like, "There exists a book, identified by the URI <#mybook>, and > that book has a subject 'semantic web' and a subject 'airplanes'." Yes, that's right. > What are the equivalent relational semantics? Let me turn to Date, > _Introduction to Database Systems, Eighth Edition_: > > A "given fact" in turn corresponds > to what logicians call a true proposition; for example, the > statement "Supplier SI > is located in London" might be such a true proposition. ... It > follows that a database > is really a collection of true propositions. ... > To be specific, in the relational model: > 1. Data is represented by means of rows in tables, and such rows > can be directly interpreted > as true propositions. For example, the row for BIN# 72 in Fig. 1.1 > can be interpreted > in an obvious way as the following true proposition: > Bin number 72 contains two bottles of 1999 Rafanelli Zinfandel > which will be ready > to drink in 2007 (page 15) > > And again in Date, _Database in Depth_, speaking of > "relvars" (relational variables): > > Like all relvars, that relvar is supposed to represent some > portion of the real world. In fact, I can be more precise: the > heading of that relvar > represents a certain predicate, meaning it's a kind of generic > statement about some portion > of the real world .... (page 72) > > Again, let P be the relvar predicate or intension for relvar R, and > let > the value of R at some given time be relation r. Then r—or the body > of r, to be more > precise—constitutes the extension of P at that time. (page 74) > > So although I can put all my triples in a single relation, the > relational semantics (according to Date) of the relation is not > representing the same semantics as my RDF graph represents. The > semantics of a single per-graph relation seems to have semantics > describing the reification of a particular graph. I understand where you're getting this from, but I don't think it's right. When Date talks about the semantics of a tuple in a relation, he necessarily has to combine both the data in the row *and* the information in the header (the name of the relation and the names and domains of the columns) to get the complete semantics (he goes into this more in Chapter 6, but you mention it above in citing page 72). For example, imagine trying to write down (to send to someone else) the first tuple in Fig. 1.1. In conventional n-ary logic notation you'd write something like: Cellar( 2, Chardonnay, Buena Vista, 2001, 1, 2003 ). This particular notation omits the names of the columns though (which is where you "attach" the information that 2 actually represents a bin number, for example). So how do you provide that information? In a conventional relational model, it's understood that you have available the predicate (header) as well as the tuple containing the values, in interpreting it. But if you had to invent a notation for sending the complete information to someone else (as you do in RDF), your notation would have to somehow include the header information as well as the values. You might encode the header as separate pieces of information, as in Relation ( Cellar ) Column ( Cellar, 1, bin# ) Column ( Cellar, 2, wine ) Column ( Cellar, 3, producer ) .... along with the tuple, or you might encode the header information in the tuple, as in: Cellar ( bin#, 2, wine, Chardonnay, producer, Buena Vista, year, 2001, bottles, 1, ready, 2003 ). Now, while the representation is different, I would claim that the semantics (the intended meaning) of these are the same. You would simply have to have agreement with those you're communicating with as to how to interpret one of these representations as the more abstract relational interpretation that Date is talking about. RDF defines triples as a notation for encoding tuple information that's sort of a mix of the above two techniques. That is, the column name (of the corresponding binary relation) is explicitly included in the triple, and other information is included in other triples (e.g., the fact that bin# is an attribute of a cellar can be represented using RDFS vocabulary). The conventional relational interpretation (actually, logical interpretation), is as defined in RDF semantics, i.e., as binary predicates, related to the triples by the rule I cited in my earlier message. > > To put this in terms of the first Date quote cited above, the > equivalent single-relation representation of "Bin number 72 > contains two bottles of 1999 Rafanelli Zinfandel which will be > ready to drink in 2007" would mean (see figure 1.1 on page 5) , > "statement #1 has a subject of 'bin number 72', a predicate of > 'wine' and a value of 'Zinfandel'; statement #2 has a subject of > 'bin number 72', a predicate of 'producer', and a value of > 'Rafanelli'; statement #3 has a subject of 'bin number 72', a > predicate of 'year', and a value of '1999'", etc. > > So in summary, the semantics of a single-relation-per-RDF graph > does not seem to have the same semantics of an RDF graph; instead, > it has the semantics of the reification of the RDF graph, although > you can add other semantics on top of that (i.e. an external > understanding of what "triples" mean) to get back to the same > semantics of the RDF graph. > > In summary: if I take the semantics of an RDF graph (e.g. "Book > <#mybook> has a subject of 'semantic web' and a subject of > 'airplanes') and I try to map that to the equivalent relational > semantics as explained by C.J. Date, I run into trouble if > multivalued properties are allowed. This is what made me > uncomfortable; does it make anyone else uncomfortable? > > (And to summarize your response, the solution is to first reify the > RDF graph and then map *those* semantics to the equivalent > relational semantics.) > > Whew---did I get all that correct? I don't think so! As I said earlier, I think the "equivalent relational semantics" involve an interpretation of some explicit notation for relational tables similar to that involved in interpreting the RDF triples. (In addition, there are some tricky cases in relational database design that Hugh Darwin has explored that involve effectively including column names in tuples just as RDF triples do). --Frank > > Garret >
Received on Saturday, 5 January 2008 20:28:33 UTC