RE: subclasses (RDF vocabulary definitions) from Jon Hanna on 2002-11-22 (www-rdf-interest@w3.org from November 2002)

From: Jon Hanna <jon@spin.ie>
Date: Fri, 22 Nov 2002 12:14:58 -0000
To: <www-rdf-interest@w3.org>
Message-ID: <NDBBLCBLIMDOPKMOPHLHMEBIEKAA.jon@spin.ie>
> The English is
>    When RDFS says that man is a subclass of animal
>    what it means is that either (1) the set of all men
> is identical to the set of all animals or (2) the set
> of all men is a proper subset of the set of all animals

This is quite true. Indeed it is true that either (1) or (2) holds, so
<:_man> <rdfs:subClassOf> <:_animal> isn't wrong as such.

Adding other statements we could say "the set of all men is not identical to
the set of all animals" (hence implying (2)). Or "the set of animals is an
rdfs:subClassOf of the set of all men" (hence implying 1). Or we can say 1
directly or 2 directly.

We can also say A. The set of all dogs is an rdfs:subClassOf the set of all
animals and B. The set of all dogs is disjoint with the set of all men
(which hence implies 2 - since to be disjoint from another subset it must be
a proper subset, although we may or may not care about that at the time).

Hence the language is not lacking in its ability to express all of these
possibilities, but some possibilities may require more verbose collections
of statements than others.

While the interaction of these statements may seem pointlessly indirect,
there are advantages. The intention is that these statements may be used by
applications to do something useful. Such applications generally won't want
all of the possibilities above. An application that wishes to know "is
Richard H. McCullough an animal" can find that from knowing that "Richard H.
McCullough is a man" and man is an rdfs:subClassOf animal.

In a distributed environment it can be useful to be able to transmit as
close to the minimum information needed for the task as possible (I don't
know if these issues have affected KR or not).

Even in a non-distributed environment it is generally easier for programs to
maintain the relationships between "identical sets", "subset" and "proper
subset" by breaking them down into more atomic concepts. That is the program
can either:

1. Define "proper subset" as "is subset and is not identical".
2. Define "subset" as "is proper subset or is identical".
3. Define identical as "is subset and is not proper subset".

This isn't the only way to do this, but it is a way to do so that causes
comparisons between the three concepts to happen at clearly defined point,
and hence less error-prone than alternatives that would compare the concepts
directly at various different times.

Hence we need to decide which of these we use. Since 1, 2, and 3 above are
logically equivalent we should do so on the basis of practical concerns.

Inference of a type from a known type is likely to be the primary practical
use of rdfs:subClassOf (for example, knowing that what one program calls
"the type of document I produce" can be used as what another program calls
"Any stream of bytes").
This inference can be made if either one is a subset of the other (proper or
not). The programs involved won't care if it is a proper subset or not.

As such it makes sense to allow this inference of type to occur with the
briefest statement, with definitions of whether the two classes involved are
a proper subset being more verbose. Similarly knowing that two classes are
identical is likely to be more commonly needed information than knowing that
one is a proper subset of the other. It makes sense therefore to make
"subset" easier to express than "proper subset".

In summary:

1. I believe that all the relevant possibilities can be expressed in the
language. Hence theorists should be happy.

2. The most useful statements to applications are the most concise. Hence
hackers should also be happy.

While a case for a xxx:properSubsetOf could be made for notational
convenience it could be defined in OWL anyway.
Received on Friday, 22 November 2002 07:08:17 UTC