Re: [OEP] The n-ary relations draft is ready for outside review from Guus Schreiber on 2005-08-08 (public-swbp-wg@w3.org from August 2005)

From: Guus Schreiber <schreiber@cs.vu.nl>
Date: Mon, 08 Aug 2005 13:39:30 +0200
To: Natasha Noy <noy@SMI.Stanford.EDU>
CC: "Ralph R. Swick" <swick@w3.org>, swbp <public-swbp-wg@w3.org>
Message-ID: <42F74472.7010104@cs.vu.nl>
Natasha, Alan,

Her is my review. Sorry for the delay. The reviews is a bit biased by my 
use of this note in a ontology-engineering course, which mainly focused 
on issues wrt real-world modeling (and not on RDF/OWL details).

Guus

PS. My spelling checker wanted me to replace "reification" with 
"deification" :-).


Defining N-ary Relations on the Semantic Web
Editor's Draft 20 June 2005
http://smi-web.stanford.edu/people/noy/nAryRelations/n-aryRelations-2nd-WD.html

[[
   Issue 1: If property instances can link only two individuals, how do
   we deal with cases where we need to describe the instances of
   relations, such as its certainty, strength, etc?

   Issue 2: If instances of properties can link only two individuals,
   how do we represent relations among more than two individuals?
   ("n-ary relations")

   Issue 3: If properties can link only two individuals, how do we
   represent relations in which one of the participants is an ordered
   list of individuals rather than a single individual?
]]

One could say this is not really a n-ary relation problem, but the
"how to make statements about statements" problem, , i.e an
alternative for RDF reification. I propose to make this explicit in
the text, and move the issue to be the second issue.

Vocabulary (issue 1 & 2): some readers might not grasp "property instances"
directly. Suggest to either add in parentheses "cf. tuples" or drop
"instances" (as done in the description of issue 3).

[[
   Use case examples
]]

Again, examples 3 is the prototypical n-ary relation, so maybe this
should be the first example. The point is that for people from
relational databases the first two examples are not "real" n-ary
relations: e.g. in example 1 the probability value is functionally
dependent on the person and the disease. In example 3 there is no such
dependency (the primary key is the combination of all three
arguments). So, reification would work with examples 1 and 2, but not
with example 3 (because the instances are not unique).

[[
   4. United Airlines flight 3177 visits the following airports: LAX,
   DFW, and JFK. There is a relation between the individual flight and
   the three cities that it visits, LAX, DFW, JFK. Note that the order
   of the airports is important and indicates the order in which the
   flight visits these airports.
]]

UML users may not recognize this as an n-ary relation. UML has the
notion of "ordered" associations, which would handle this
situation. It is in fact a binary relation where one of the arguments
is not a simple individual but an ordered list of individuals. I
suggest to add a UML note.

Reflecting on this, we might just want to say:
- issue 2 / example 3 describe the "real" n-ary relation issue
- issue 1 and 3 / example 1+2 and 4 describe related but different
problems that can be modeled using the same patterns.
But maybe I'm making it too complicated now.

[[
Sec. Representation patterns

   ... Examples 1, 2, and 3 above correspond to this pattern. For instance,
   in the example 1 the instance of a new class Diagnosis_Relation
   would represent the fact that Christine has been diagnosed with a
   breast tumor with high probability.
]]

"correspond to" is too strong. Suggest to rephrase as: "Examples 1, 2,
and 3 above can be modeled with this pattern.".

Maybe it is a good place here to indicate that example 1 and 2
could alternatively have been represented with RDF reification.

I suggest to include example 3 here, also because a name such
as "Purchase" would seem to come less out of the blue than
"Diagnosis_Relation".

I suggest to include a UML note, indicating that pattern 1 is
close to what is called an "association class" in UML.

[[
   Pattern 1
]]

In line with the previous comments, I suggest to change the order of
the use cases. The current use case 3 should be the first one.

[[
   Use Case 1: additional attributes describing a relation
]]

I've tried to explain the modeling solution in my
ontology-engineering" class and observed the following:

- it requires "breast tumor" to be treated as an instance, where it
will usually be a class (one could see it as a use case for the
"classes as values" note).

   I suggest to consider using an instance of BreastTumor as the
   value. This also has the advantages described in the value-partition
   note (easy to add later the statement that MyBreastTumor is an instance
   of a subclass of "BreastTumor").

- there are two other solutions which are worth discussing as
alternatives:

   1. Person -> hasDiagnosis -> Disease -> hasProbability -> Number
   This would work if the instance of disease is not BreastTumor" but
   a unique instance of BreastTumor.  By the way, I do not think this
   solution would work in practice, as a statement about a diagnosis
   with a certain probability is always time dependent (which we cannot
   easily add).

   2. Representing Diagnosis in a similar way as Purchase.
   My students found this solution easier to understand (for whatever
   it is worth). They found the juxtaposition of BreastTumor and
   Probability weird, as the second is clearly despondent on the
   first. The only real difference of course is the direction of the
   hasDiagnosis property.

[[
   Use Case 2: different aspects of the same relation
]]

This use case is a better example than use case 1 of how to use the
pattern for avoiding the use of RDF reification.

A drastic solution could be to drop use case 1 altogether and keep this
one in. Adding time information to this example would make it more
realistic.

"TemperatureObservation" would be a good name for this relation. I think
this use case is close to the Observation pattern in Fowler's book on
Analysis Patterns (I tried to verify this, but I cannot find my
copy of the book).

[[
   Use Case 3: N-ary relation with no distinguished participant
]]

I think it is worthwhile to point out that in use case 3 the
domain actually provides a natural name for the relation as a whole,
namely "Purchase". There are many of these nouns that represent static
aspects of an activity and thus are candidates for this pattern:
"transaction", "enrollment", "subscription". This makes it different
from use cases 1 and 2 (but see also my remarks there).

[[
   Pattern 2: Using lists for arguments in a relation
]]

Alternatives which avoid the use of  RDF list would be worth
mentioning:

1. A Flight  is linked to a number of FlightPorts. Each FlightPort is a
class, representing the relation between a port and its sequence number
in the Flight. I find this rather ugly, but it is in a sense close to
the way use case 1 is represented.

2. A Flight is linked to a number of FlightMovement instances. Each
Flight movement represents a relation between from/to
airports. This would probably be my preferred solution.

-- 
Free University Amsterdam, Computer Science
De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands
Tel: +31 20 598 7739/7718; E-mail: schreiber@cs.vu.nl
Home page: http://www.cs.vu.nl/~guus/
Received on Monday, 8 August 2005 11:39:40 UTC