- From: Frank Manola <fmanola@acm.org>
- Date: Fri, 06 Aug 2004 12:16:14 -0400
- To: public-swbp-wg@w3.org
A few comments on W3C Working Draft "Defining N-ary Relations on the Semantic Web: Use with Individuals", 21 July 2004 Overall, this is useful material, and an important topic, since it comes up all the time. It can also be tricky to describe (as I know by personal experience!). 1. Under the "Representation Pattern" heading, the text between the first two figures, if interpreted strictly, appears to only cover the first two use cases. Perhaps it could read something like: "We would like to have another individual or simple value C (and possibly additional individuals or values in the case of Use Case 3) to be part of this relation):"? 2. Just below the figures: "A common solution to representing n-ary relations such as these is to create an individual which stands for an instance of the relation and relates the things that are involved in that instance of the relation." And just below that: "In the first case...one of the individuals in the relation (say, A) is distinguished from others in that it is the *originator* of the relation." Here the text introduces new terminology "instance of the relation" and "originator" in places where there already is terminology to cover these concepts ("object", "owner of the relation", and "relationship" are also used in various places later in the text). An instance of a relation in RDF is a "statement" ("tuple" could be used too, as per relational database terminology). The "originator" of such a statement or tuple is the "subject" (note also that if, as the text says, you're choosing an individual, then even if you retained this "originator" terminology it wouldn't be the "originator of the relation", but rather the "originator of an *instance* of the relation"). Part of this terminology problem is due to the way the OWL specs sometimes refer to "statements" (in the RDF-ish sense) and their various components, and sometimes use other terminology in referring to these binary relations. For example, Section 3.2.2 of the OWL Reference, in describing owl:equivalentClass, uses RDF-ish terminology in the text: "NOTE: OWL DL does not put any constraints on the types of class descriptions that can be used as subject and object of an owl:equivalentClass statement. In OWL Lite the subject must be a class name and the object must be either a class name or a property restriction." On the other hand, Section 4 of the OWL Reference uses alternative terminology (including "tuple") in: "NOTE: In this section we use the term "property extension" in a similar fashion to "class extension". The property extension is the set of instances that is associated with the property. Instances of properties are not single elements, but subject-object pairs of property statements. In relational database terms, property instances would be called "tuples" of a binary relation (the property)." Whatever terminology is decided on, it might be a good idea to introduce definitions for it right away (say, in describing the use cases), rather than referring generally to "binary relations" at this point, and be very consistent in using that terminology. 3. Just below, in the initial description of pattern 1: "...here, the instance of the relation itself is a property of A, with the value that is a complex object in itself, relating several values and individuals." This is a bit confusing (particularly with reference to the second of the two diagrams above it). For one thing, it's not clear how an instance of a relation (i.e., a statement) can be a property (property value, perhaps). For another, in the normal binary relation, the "instance of the relation" is considered to include the originator (A in this case). But the new individuals being created (in pattern 1) *don't* include the originator. Having tried several alternative descriptions of this sort myself, I appreciate how hard it is to come up with concise descriptions of these patterns here. I suspect it may be better to simply jump directly to describing these patterns using examples, as the material under the "Pattern 1" and "Pattern 2" headings does, rather than trying for these abstract summaries). 4. The first paragraph under the "Pattern 1" heading introduces another new term "relation object" which should be introduced more explicitly, assuming it's needed (NB: this is not the same as "an object of a relation", a phrase also used in the same paragraph). Also, under the "Pattern 1" and "Pattern 2" headings, introducing concepts like "Diagnosis_Relation_1" and "Temperature_Relation_1" may help emphasize that these in some sense represent instances of relations, but I think that there should be some text pointing out how often real life use cases often have corresponding concepts already. The "relation object" idea might be better introduced by something like: It is often possible to think of the relation among multiple facts as a separate object. Then the multiple facts can be represented as describing that object. This happens so often in real life that there are often separate concepts (and names) for these separate objects. Thus, it is possible to talk about a "diagnosis" (instead of "diagnosis-relation"). This diagnosis can have various properties that describe it (the value, probability, who made it, when, etc.). Similarly, Steve may have a "temperature_reading". There are a couple of points to be made here: a. The need to introduce new individuals to (in some sense) represent instances of relationships shouldn't be thought of as just a peculiar artifact of RDF, OWL, or the Semantic Web in general (although it may appear in a particularly acute form here). Instead, people do this all the time in relational database systems, even though these systems directly support n-ary relations, and in recording information of all kinds in even less-structured forms (reports, documents, and so on). b. People do this so frequently that real-world domain conceptual models frequently include such concepts, e.g., "purchases", "temperature readings", "diagnostic reports", "weather reports", etc., even when there is no idea of RDF or OWL (or the Semantic Web in general) anywhere in the vicinity. These concepts are used because an instance (e.g., a diagnosis) frequently has numerous attributes of its own, like when it was made, who made it, etc. People defining N-ary relations on the Semantic Web should keep an eye out for such naturally-occuring concepts, and try to use them. (However, there's not *always* a natural concept to represent a relation as an individual, so sometimes you have to make them up!) 5. It ought to be noted somewhere (it may be there and I've overlooked it) that you can always reverse the "original" relation and turn pattern 1 into pattern 2. E.g., you can reverse "Christine has_diagnosis diagnosis_1" to form "diagnosis_1 about_patient Christine". This is related to the bullet about inverse relations under the "Considerations" heading, but makes a slightly different point. 6. Some people are naturally going to think of using RDF reification in these situations and, rather than avoiding the subject, the text should explicitly point this out, and then go on to say why this is a bad idea. The primary reason it's a bad idea is that explicitly using the reification vocabulary involves talking about RDF (or OWL) statements (e.g., individuals are introduced having rdf:type rdf:Statement) and, as the examples illustrate, more natural concepts from the actual problem domain can generally be used instead. E.g., instead of defining individuals that are statements, define individuals that are "diagnoses", "temperature readings", "purchases", etc. This can be looked on as a kind of "reification", but it shouldn't be confused with the RDF concept (and its vocabulary). 7. As this is part of a best practices activity, it seems to me that a Note of this kind should explicitly point people to the relational database design literature for examples and ideas (at least to a standard textbook, such as Date's "An Introduction to Database Systems"). On a more theoretical level, all the work on functional dependencies and various "normal forms" is relevant to the sorts of design practices being discussed here. If a reference to the database literature isn't considered relevant enough to this specific WD, it certainly should be to any larger-scale document in which these contents might be collected. After all, the concepts being considered here apply to more things than just those that people have classically considered "ontologies". --Frank
Received on Friday, 6 August 2004 12:14:11 UTC