- From: Frank Manola <fmanola@acm.org>
- Date: Fri, 06 Aug 2004 12:16:14 -0400
- To: public-swbp-wg@w3.org
A few comments on W3C Working Draft "Defining N-ary Relations on the
Semantic Web: Use with Individuals", 21 July 2004
Overall, this is useful material, and an important topic, since it comes
up all the time. It can also be tricky to describe (as I know by
personal experience!).
1. Under the "Representation Pattern" heading, the text between the
first two figures, if interpreted strictly, appears to only cover the
first two use cases. Perhaps it could read something like: "We would
like to have another individual or simple value C (and possibly
additional individuals or values in the case of Use Case 3) to be part
of this relation):"?
2. Just below the figures: "A common solution to representing n-ary
relations such as these is to create an individual which stands for an
instance of the relation and relates the things that are involved in
that instance of the relation." And just below that: "In the first
case...one of the individuals in the relation (say, A) is distinguished
from others in that it is the *originator* of the relation."
Here the text introduces new terminology "instance of the relation" and
"originator" in places where there already is terminology to cover these
concepts ("object", "owner of the relation", and "relationship" are also
used in various places later in the text). An instance of a relation in
RDF is a "statement" ("tuple" could be used too, as per relational
database terminology). The "originator" of such a statement or tuple is
the "subject" (note also that if, as the text says, you're choosing an
individual, then even if you retained this "originator" terminology it
wouldn't be the "originator of the relation", but rather the "originator
of an *instance* of the relation").
Part of this terminology problem is due to the way the OWL specs
sometimes refer to "statements" (in the RDF-ish sense) and their various
components, and sometimes use other terminology in referring to these
binary relations. For example, Section 3.2.2 of the OWL Reference, in
describing owl:equivalentClass, uses RDF-ish terminology in the text:
"NOTE: OWL DL does not put any constraints on the types of class
descriptions that can be used as subject and object of an
owl:equivalentClass statement. In OWL Lite the subject must be a class
name and the object must be either a class name or a property
restriction." On the other hand, Section 4 of the OWL Reference uses
alternative terminology (including "tuple") in: "NOTE: In this section
we use the term "property extension" in a similar fashion to "class
extension". The property extension is the set of instances that is
associated with the property. Instances of properties are not single
elements, but subject-object pairs of property statements. In relational
database terms, property instances would be called "tuples" of a binary
relation (the property)."
Whatever terminology is decided on, it might be a good idea to introduce
definitions for it right away (say, in describing the use cases), rather
than referring generally to "binary relations" at this point, and be
very consistent in using that terminology.
3. Just below, in the initial description of pattern 1: "...here, the
instance of the relation itself is a property of A, with the value that
is a complex object in itself, relating several values and individuals."
This is a bit confusing (particularly with reference to the second of
the two diagrams above it). For one thing, it's not clear how an
instance of a relation (i.e., a statement) can be a property (property
value, perhaps). For another, in the normal binary relation, the
"instance of the relation" is considered to include the originator (A in
this case). But the new individuals being created (in pattern 1)
*don't* include the originator. Having tried several alternative
descriptions of this sort myself, I appreciate how hard it is to come up
with concise descriptions of these patterns here. I suspect it may be
better to simply jump directly to describing these patterns using
examples, as the material under the "Pattern 1" and "Pattern 2" headings
does, rather than trying for these abstract summaries).
4. The first paragraph under the "Pattern 1" heading introduces another
new term "relation object" which should be introduced more explicitly,
assuming it's needed (NB: this is not the same as "an object of a
relation", a phrase also used in the same paragraph).
Also, under the "Pattern 1" and "Pattern 2" headings, introducing
concepts like "Diagnosis_Relation_1" and "Temperature_Relation_1" may
help emphasize that these in some sense represent instances of
relations, but I think that there should be some text pointing out how
often real life use cases often have corresponding concepts already.
The "relation object" idea might be better introduced by something like:
It is often possible to think of the relation among multiple facts as
a separate object. Then the multiple facts can be represented as
describing that object. This happens so often in real life that there
are often separate concepts (and names) for these separate objects.
Thus, it is possible to talk about a "diagnosis" (instead of
"diagnosis-relation"). This diagnosis can have various properties that
describe it (the value, probability, who made it, when, etc.).
Similarly, Steve may have a "temperature_reading".
There are a couple of points to be made here:
a. The need to introduce new individuals to (in some sense) represent
instances of relationships shouldn't be thought of as just a peculiar
artifact of RDF, OWL, or the Semantic Web in general (although it may
appear in a particularly acute form here). Instead, people do this all
the time in relational database systems, even though these systems
directly support n-ary relations, and in recording information of all
kinds in even less-structured forms (reports, documents, and so on).
b. People do this so frequently that real-world domain conceptual
models frequently include such concepts, e.g., "purchases", "temperature
readings", "diagnostic reports", "weather reports", etc., even when
there is no idea of RDF or OWL (or the Semantic Web in general) anywhere
in the vicinity. These concepts are used because an instance (e.g., a
diagnosis) frequently has numerous attributes of its own, like when it
was made, who made it, etc. People defining N-ary relations on the
Semantic Web should keep an eye out for such naturally-occuring
concepts, and try to use them. (However, there's not *always* a natural
concept to represent a relation as an individual, so sometimes you have
to make them up!)
5. It ought to be noted somewhere (it may be there and I've overlooked
it) that you can always reverse the "original" relation and turn pattern
1 into pattern 2. E.g., you can reverse "Christine has_diagnosis
diagnosis_1" to form "diagnosis_1 about_patient Christine". This is
related to the bullet about inverse relations under the "Considerations"
heading, but makes a slightly different point.
6. Some people are naturally going to think of using RDF reification in
these situations and, rather than avoiding the subject, the text should
explicitly point this out, and then go on to say why this is a bad idea.
The primary reason it's a bad idea is that explicitly using the
reification vocabulary involves talking about RDF (or OWL) statements
(e.g., individuals are introduced having rdf:type rdf:Statement) and, as
the examples illustrate, more natural concepts from the actual problem
domain can generally be used instead. E.g., instead of defining
individuals that are statements, define individuals that are
"diagnoses", "temperature readings", "purchases", etc. This can be
looked on as a kind of "reification", but it shouldn't be confused with
the RDF concept (and its vocabulary).
7. As this is part of a best practices activity, it seems to me that a
Note of this kind should explicitly point people to the relational
database design literature for examples and ideas (at least to a
standard textbook, such as Date's "An Introduction to Database
Systems"). On a more theoretical level, all the work on functional
dependencies and various "normal forms" is relevant to the sorts of
design practices being discussed here. If a reference to the database
literature isn't considered relevant enough to this specific WD, it
certainly should be to any larger-scale document in which these contents
might be collected. After all, the concepts being considered here apply
to more things than just those that people have classically considered
"ontologies".
--Frank
Received on Friday, 6 August 2004 12:14:11 UTC