Re: [OEP] The n-ary relations draft is ready for outside review from Natasha Noy on 2005-08-11 (public-swbp-wg@w3.org from August 2005)

From: Natasha Noy <noy@SMI.Stanford.EDU>
Date: Thu, 11 Aug 2005 14:34:59 -0700
To: Guus Schreiber <schreiber@cs.vu.nl>
Cc: "Ralph R. Swick" <swick@w3.org>, swbp <public-swbp-wg@w3.org>
Message-Id: <EDE0B6C5-E630-4F10-A0A6-D8762A9F9B67@SMI.Stanford.EDU>
Guus,

Thank you very much for your comments. Some replies/discussion below.  
I think some points require a bit more discussion (there maybe some  
misunderstanding).

> [[
>   Issue 1: If property instances can link only two individuals, how do
>   we deal with cases where we need to describe the instances of
>   relations, such as its certainty, strength, etc?
>
>   Issue 2: If instances of properties can link only two individuals,
>   how do we represent relations among more than two individuals?
>   ("n-ary relations")
>
>   Issue 3: If properties can link only two individuals, how do we
>   represent relations in which one of the participants is an ordered
>   list of individuals rather than a single individual?
> ]]
>
> One could say this is not really a n-ary relation problem, but the
> "how to make statements about statements" problem, , i.e an
> alternative for RDF reification. I propose to make this explicit in
> the text, and move the issue to be the second issue.

I think there are two issues here. One is what do people mean by the  
term "n-ary relations". Most of the times I've seen this come up,  
people were referring to either the first or the second of the issues  
above. In KR, I think we all agree that the term we think about is  
"reification", but that has been taken from us by RDF to mean  
something different. Second, on RDF reification: I am not sure if you  
are suggesting that the same issue is addressed by RDF reification. I  
don't believe it is. In RDF, reification is really statements about  
statement, where it is not even implied that the latter statement  
holds. Here, we are really putting additional information on the  
relationship that does hold (at least we are trying to make that  
assertion)

> Vocabulary (issue 1 & 2): some readers might not grasp "property  
> instances"
> directly. Suggest to either add in parentheses "cf. tuples" or drop
> "instances" (as done in the description of issue 3).

I think we've put "property instances" to distinguish from properties  
and in the effort to have consistent terminology throughout the  
document. Issue 3 should also refer to "property instances". We can  
add "cf. tuples" to clarify the issue.

> [[
>   Use case examples
> ]]
>
> Again, examples 3 is the prototypical n-ary relation, so maybe this
> should be the first example. The point is that for people from
> relational databases the first two examples are not "real" n-ary
> relations: e.g. in example 1 the probability value is functionally
> dependent on the person and the disease. In example 3 there is no such
> dependency (the primary key is the combination of all three
> arguments). So, reification would work with examples 1 and 2, but not
> with example 3 (because the instances are not unique).

Again, if you have a different term to refer to all of these (that  
doesn't use the word "reification") instead of "n-ary relation", we  
can try to change. What the note is really about are these types of  
relations, and there is probably not a single term that would be  
accepted in all communities. Again, if we could say "reified  
relations", we probably would.

I also like Mike's suggestion on how to summarize these three cases  
-- we'll weave that in.

>
> [[
>   4. United Airlines flight 3177 visits the following airports: LAX,
>   DFW, and JFK. There is a relation between the individual flight and
>   the three cities that it visits, LAX, DFW, JFK. Note that the order
>   of the airports is important and indicates the order in which the
>   flight visits these airports.
> ]]
>
> UML users may not recognize this as an n-ary relation. UML has the
> notion of "ordered" associations, which would handle this
> situation. It is in fact a binary relation where one of the arguments
> is not a simple individual but an ordered list of individuals. I
> suggest to add a UML note.

I am not sure how to phrase this. Could you perhaps provide some  
specific text that you would like to see there?

>
> Reflecting on this, we might just want to say:
> - issue 2 / example 3 describe the "real" n-ary relation issue
> - issue 1 and 3 / example 1+2 and 4 describe related but different
> problems that can be modeled using the same patterns.
> But maybe I'm making it too complicated now.

perhaps? (I really think these things would mean different things to  
different communities and we won't be able to satisfy them all; we  
just need to explain what *we* mean by these terms, and hopefully the  
examples do that. If not, we need to do a better job)

> [[
> Sec. Representation patterns
>
>   ... Examples 1, 2, and 3 above correspond to this pattern. For  
> instance,
>   in the example 1 the instance of a new class Diagnosis_Relation
>   would represent the fact that Christine has been diagnosed with a
>   breast tumor with high probability.
> ]]
>
> "correspond to" is too strong. Suggest to rephrase as: "Examples 1, 2,
> and 3 above can be modeled with this pattern.".

sure

> Maybe it is a good place here to indicate that example 1 and 2
> could alternatively have been represented with RDF reification.

I don't think this is true though. If we represented this with RDF  
reification, we wouldn't actually be making a statement that Steve  
has high temperature for example; rather about the statement about  
his temperature -- that's a crucial difference.

> I suggest to include example 3 here, also because a name such
> as "Purchase" would seem to come less out of the blue than
> "Diagnosis_Relation".

sure

> I suggest to include a UML note, indicating that pattern 1 is
> close to what is called an "association class" in UML.

Again, I would appreciate some specific text since whatever I say  
would end up being imprecise since I know very little about UML.

> [[
>   Pattern 1
> ]]
>
> In line with the previous comments, I suggest to change the order of
> the use cases. The current use case 3 should be the first one.
>
> [[
>   Use Case 1: additional attributes describing a relation
> ]]
>
> I've tried to explain the modeling solution in my
> ontology-engineering" class and observed the following:
>
> - it requires "breast tumor" to be treated as an instance, where it
> will usually be a class (one could see it as a use case for the
> "classes as values" note).
>
>   I suggest to consider using an instance of BreastTumor as the
>   value. This also has the advantages described in the value-partition
>   note (easy to add later the statement that MyBreastTumor is an  
> instance
>   of a subclass of "BreastTumor").

yes, this has come up before. I think your solution of having this as  
an instance would solve the problem

> - there are two other solutions which are worth discussing as
> alternatives:
>
>   1. Person -> hasDiagnosis -> Disease -> hasProbability -> Number
>   This would work if the instance of disease is not BreastTumor" but
>   a unique instance of BreastTumor.  By the way, I do not think this
>   solution would work in practice, as a statement about a diagnosis
>   with a certain probability is always time dependent (which we cannot
>   easily add).

If we do use the instance as you suggested above, how is this  
solution different? You simply call Disease what the note calls  
"Diagnosis_Relation", no?

>   2. Representing Diagnosis in a similar way as Purchase.
>   My students found this solution easier to understand (for whatever
>   it is worth). They found the juxtaposition of BreastTumor and
>   Probability weird, as the second is clearly despondent on the
>   first. The only real difference of course is the direction of the
>   hasDiagnosis property.

Why did they find it weird? That's exactly why we need reification  
here -- to link the two. I don't see what the problem is.

> [[
>   Use Case 2: different aspects of the same relation
> ]]
>
> This use case is a better example than use case 1 of how to use the
> pattern for avoiding the use of RDF reification.

I am not sure I understand (perhaps with the comments above on RDF  
reification, this is no longer valid?)

>
> A drastic solution could be to drop use case 1 altogether and keep  
> this
> one in. Adding time information to this example would make it more
> realistic.

I really don't want to add time -- this would open such a huge can of  
worms. It would be very hard to explain to people that this has  
nothing to do with time actually. I would really like to stay out of it.

I think use case 1 does present a common way that people come across  
this problem, and lack of n-ary relations in OWL. Thus, even though  
it is logically equivalent to other cases, I would prefer to keep it in.

> "TemperatureObservation" would be a good name for this relation. I  
> think
> this use case is close to the Observation pattern in Fowler's book on
> Analysis Patterns (I tried to verify this, but I cannot find my
> copy of the book).

yes, that's better

> [[
>   Use Case 3: N-ary relation with no distinguished participant
> ]]
>
> I think it is worthwhile to point out that in use case 3 the
> domain actually provides a natural name for the relation as a whole,
> namely "Purchase". There are many of these nouns that represent static
> aspects of an activity and thus are candidates for this pattern:
> "transaction", "enrollment", "subscription". This makes it different
> from use cases 1 and 2 (but see also my remarks there).

agreed

> [[
>   Pattern 2: Using lists for arguments in a relation
> ]]
>
> Alternatives which avoid the use of  RDF list would be worth
> mentioning:
>
> 1. A Flight  is linked to a number of FlightPorts. Each FlightPort  
> is a
> class, representing the relation between a port and its sequence  
> number
> in the Flight. I find this rather ugly, but it is in a sense close to
> the way use case 1 is represented.
>
> 2. A Flight is linked to a number of FlightMovement instances. Each
> Flight movement represents a relation between from/to
> airports. This would probably be my preferred solution.

Ok, I'll try to put them in. Would it be ok simple to mention this or  
do you think it needs a fleshed out example, with code, diagrams, etc.?

Thanks again for your comments!

Natasha
Received on Thursday, 11 August 2005 21:34:59 UTC