comment: Defining N-ary Relations on the Semantic Web from Frank Manola on 2004-08-06 (public-swbp-wg@w3.org from August 2004)

From: Frank Manola <fmanola@acm.org>
Date: Fri, 06 Aug 2004 12:16:14 -0400
To: public-swbp-wg@w3.org
Message-ID: <4113AECE.4010600@acm.org>
A few comments on W3C Working Draft "Defining N-ary Relations on the 
Semantic Web:  Use with Individuals", 21 July 2004

Overall, this is useful material, and an important topic, since it comes 
up all the time.  It can also be tricky to describe (as I know by 
personal experience!).

1.  Under the "Representation Pattern" heading, the text between the 
first two figures, if interpreted strictly, appears to only cover the 
first two use cases.  Perhaps it could read something like:  "We would 
like to have another individual or simple value C (and possibly 
additional individuals or values in the case of Use Case 3) to be part 
of this relation):"?

2.  Just below the figures:  "A common solution to representing n-ary 
relations such as these is to create an individual which stands for an 
instance of the relation and relates the things that are involved in 
that instance of the relation."  And just below that: "In the first 
case...one of the individuals in the relation (say, A) is distinguished 
from others in that it is the *originator* of the relation."

Here the text introduces new terminology "instance of the relation" and 
"originator" in places where there already is terminology to cover these 
concepts ("object", "owner of the relation", and "relationship" are also 
used in various places later in the text).  An instance of a relation in 
RDF is a "statement" ("tuple" could be used too, as per relational 
database terminology).  The "originator" of such a statement or tuple is 
the "subject" (note also that if, as the text says, you're choosing an 
individual, then even if you retained this "originator" terminology it 
wouldn't be the "originator of the relation", but rather the "originator 
of an *instance* of the relation").

Part of this terminology problem is due to the way the OWL specs 
sometimes refer to "statements" (in the RDF-ish sense) and their various 
components, and sometimes use other terminology in referring to these 
binary relations.  For example, Section 3.2.2 of the OWL Reference, in 
describing owl:equivalentClass, uses RDF-ish terminology in the text: 
"NOTE: OWL DL does not put any constraints on the types of class 
descriptions that can be used as subject and object of an 
owl:equivalentClass statement. In OWL Lite the subject must be a class 
name and the object must be either a class name or a property 
restriction."  On the other hand, Section 4 of the OWL Reference uses 
alternative terminology (including "tuple") in:  "NOTE: In this section 
we use the term "property extension" in a similar fashion to "class 
extension". The property extension is the set of instances that is 
associated with the property. Instances of properties are not single 
elements, but subject-object pairs of property statements. In relational 
database terms, property instances would be called "tuples" of a binary 
relation (the property)."

Whatever terminology is decided on, it might be a good idea to introduce 
definitions for it right away (say, in describing the use cases), rather 
than referring generally to "binary relations" at this point, and be 
very consistent in using that terminology.

3.  Just below, in the initial description of pattern 1: "...here, the 
instance of the relation itself is a property of A, with the value that 
is a complex object in itself, relating several values and individuals." 
  This is a bit confusing (particularly with reference to the second of 
the two diagrams above it).  For one thing, it's not clear how an 
instance of a relation (i.e., a statement) can be a property (property 
value, perhaps).  For another, in the normal binary relation, the 
"instance of the relation" is considered to include the originator (A in 
this case).  But the new individuals being created (in pattern 1) 
*don't* include the originator.  Having tried several alternative 
descriptions of this sort myself, I appreciate how hard it is to come up 
with concise descriptions of these patterns here.  I suspect it may be 
better to simply jump directly to describing these patterns using 
examples, as the material under the "Pattern 1" and "Pattern 2" headings 
does, rather than trying for these abstract summaries).

4.  The first paragraph under the "Pattern 1" heading introduces another 
new term "relation object" which should be introduced more explicitly, 
assuming it's needed (NB:  this is not the same as "an object of a 
relation", a phrase also used in the same paragraph).

Also, under the "Pattern 1" and "Pattern 2" headings, introducing 
concepts like "Diagnosis_Relation_1" and "Temperature_Relation_1" may 
help emphasize that these in some sense represent instances of 
relations, but I think that there should be some text pointing out how 
often real life use cases often have corresponding concepts already.

The "relation object" idea might be better introduced by something like: 
  It is often possible to think of the relation among multiple facts as 
a separate object.  Then the multiple facts can be represented as 
describing that object.  This happens so often in real life that there 
are often separate concepts (and names) for these separate objects. 
Thus, it is possible to talk about a "diagnosis" (instead of 
"diagnosis-relation").  This diagnosis can have various properties that 
describe it (the value, probability, who made it, when, etc.). 
Similarly, Steve may have a "temperature_reading".

There are a couple of points to be made here:

a.  The need to introduce new individuals to (in some sense) represent 
instances of relationships shouldn't be thought of as just a peculiar 
artifact of RDF, OWL, or the Semantic Web in general (although it may 
appear in a particularly acute form here).  Instead, people do this all 
the time in relational database systems, even though these systems 
directly support n-ary relations, and in recording information of all 
kinds in even less-structured forms (reports, documents, and so on).

b.  People do this so frequently that real-world domain conceptual 
models frequently include such concepts, e.g., "purchases", "temperature 
readings", "diagnostic reports", "weather reports", etc., even when 
there is no idea of RDF or OWL (or the Semantic Web in general) anywhere 
in the vicinity.  These concepts are used because an instance (e.g., a 
diagnosis) frequently has numerous attributes of its own, like when it 
was made, who made it, etc.  People defining N-ary relations on the 
Semantic Web should keep an eye out for such naturally-occuring 
concepts, and try to use them.  (However, there's not *always* a natural 
concept to represent a relation as an individual, so sometimes you have 
to make them up!)

5.  It ought to be noted somewhere (it may be there and I've overlooked 
it) that you can always reverse the "original" relation and turn pattern 
1 into pattern 2.  E.g., you can reverse "Christine has_diagnosis 
diagnosis_1" to form "diagnosis_1 about_patient Christine".  This is 
related to the bullet about inverse relations under the "Considerations" 
heading, but makes a slightly different point.

6.  Some people are naturally going to think of using RDF reification in 
these situations and, rather than avoiding the subject, the text should 
explicitly point this out, and then go on to say why this is a bad idea. 
  The primary reason it's a bad idea is that explicitly using the 
reification vocabulary involves talking about RDF (or OWL) statements 
(e.g., individuals are introduced having rdf:type rdf:Statement) and, as 
the examples illustrate, more natural concepts from the actual problem 
domain can generally be used instead.  E.g., instead of defining 
individuals that are statements, define individuals that are 
"diagnoses", "temperature readings", "purchases", etc.  This can be 
looked on as a kind of "reification", but it shouldn't be confused with 
the RDF concept (and its vocabulary).

7. As this is part of a best practices activity, it seems to me that a 
Note of this kind should explicitly point people to the relational 
database design literature for examples and ideas (at least to a 
standard textbook, such as Date's "An Introduction to Database 
Systems").  On a more theoretical level, all the work on functional 
dependencies and various "normal forms" is relevant to the sorts of 
design practices being discussed here.  If a reference to the database 
literature isn't considered relevant enough to this specific WD, it 
certainly should be to any larger-scale document in which these contents 
might be collected.  After all, the concepts being considered here apply 
to more things than just those that people have classically considered 
"ontologies".

--Frank
Received on Friday, 6 August 2004 12:14:11 UTC