Re: FORTH COMMENTS ON RDF Schema: Last Call

>Hi,
>
>in reply to FORTH COMMENTS ON RDF Schema: Last Call and Pat Hayes 
>comments to it.
>I want to concentrate my comments on Class Cycles. The corresponding 
>parts are also attatched at the
>end of this mail.
>
>
>Class Cycles:
>For merging different RDF Schemas, I think it is needed to be able to
>declare classes to be equal. There are different ways to define classes to
>be equal. Either you allow cycles in class hierarchies or you define a
>special property. In OWL we can find the property owl:sameClassAs (which is
>a subproperty of rdfs:subClassOf) for denoting that two classes have the
>same meaning and the instances should belong to both classes.
>
>Additional we can find the property owl:sameAs, which is not related to
>other properties.
>
>In all cases we can encounter the problem of miss usage, especially when the
>class hierarchy is defined in multiple namespaces.
>
>When defining a cycle via rdfs:subClassOf this can result in the statement
>that many classes are equal, e.g.,
>
>             ns2:Class2  rdfs:subClassOf  ns1:Class1
>
>             ns3:Class3  rdfs:subClassOf  ns2:Class2
>
>             ns4:Class4  rdfs:subClassOf  ns3:Class3
>
>             ns4:Class5  rdfs:subClassOf  ns4:Class4
>
>             ns6:Class6  rdfs:subClassOf  ns4:Class5
>
>             ns1:Class1  rdfs:subClassOf  ns6:Class6
>
>This would mean that the last triple will state that all classes (Class1 -
>Class 6) are equal. Since owl:sameClassAs is a subproperty of
>rdfs:subClassOf and we use owl:sameClassAs inside a hierarchy we could infer
>the cycle. Exchanging the last triple above with:
>
>             ns1:Class1  owl:sameClassAs  ns6:Class6
>
>the last triple could be inferred.
>
>If this was used by mistake the hole class hierarchy will loose their
>semantic meaning, since they are all equal. Therefore cycles should be
>handles with care.

Indeed they should. One might also reasonably say that if a very 
'long' such cycle is found then it would be a wise heuristic to 
report this to some competent authority - such a human user - as a 
potential error, or at least flag it as an unusual condition. 
However, there are also examples where 'small' subclass cycles in 
fact have correct conclusions of identity and are genuinely useful. 
The RDF position on this is therefore that cycles are allowed, and 
that any checking or flagging of 'long' cycles should be done by code 
external to RDF, eg ontology editing and composing or testing tools.

The matter was discussed during the design of DAML (on which OWL is 
based, and which does have an explicit identity property) and it was 
decided that allowing class cycles was in fact useful, while 
acknowledging the risks of the kind of error you describe. The 
DAML/OWL development effort therefore formally requested a change to 
RDF so as to allow such cycles, and this has been adopted. Since this 
was a deliberate decision taken, after extended discussion, at the 
request of another working group, we are unlikely to change it now 
unless some very pressing new observations emerge.
----

Further comments below. Note, all these are my personal comments, not 
a formal WG response.

>The question is can we figure out if it is a miss usage and therefore an
>error or is it done on purpose?

There do not seem to be any simple ways to do that reliably, in all cases.

>The main aim for the property
>rdfs:subClassOf is to build up a hierarchy. This property is and will be
>used for it.

Yes, but one must always consider the case of RDF from different 
sources being combined, and the possible interactions which can arise 
then. I agree with your implicit point here that a cycle should not 
appear in a designed subclass hierarchy (except possibly right at the 
top); and in fact one can use cycle-detection, I am told, as a 
debugging tool in large hierarchies. But RDF must be useable in the 
wild, so to speak, where information from many different sources is 
being combined; and then we can have no guarantees that any merged 
graph is globally conformant to any such conditions.

>In the beginning it was not allowed to build cycles. When using
>the owl:sameClassAs explicit, we know that the author of the RDF Schema had
>in mind to state that at least two classes should be equal. The probability
>that the given statements are correct is higher than just using the
>rdfs:subClassOf property. This means when encountering a cycle in the class
>definitions (including inferred rdfs:subClassOf definitions), we need to
>find at least one time the owl:sameClassAs property inside the cycle.

I am not sure what you mean by 'need'.  Speaking from the RDF design 
standpoint, the question we must ask is: what if an RDF engine 
*finds* a subclass cycle, particularly one which arises from 
combining information from two or more sources? What does it mean? I 
sincerely hope that such things are rare, myself, but we cannot 
simply deprecate them or require them to be flagged as errors, as 
they can arise in actual use and may not have any identifiable source 
to report the error condition to.

>It
>would therefore be worth to introduce a similar property to the RDF Schema
>vocabulary.

In view of the possibility of using OWL terminology in an RDF graph, 
there seems to be little point in introducing an RDFS term which is 
identical in meaning to an OWL term.

>But note there will stay some drawbacks:
>1. There is no mechanism to forbid such kind of inferences, and
>therefore to exclude non valid class equivalance, if we are
>purely based on automated RDF/S processing of numerous schemas and
>descriptions on the Web.

Quite.

>
>2. Inferring class equivalance using the above mechanism (using 
>cycles) do not scale
>for numerous and voluminous schemas since an RDF/S processor is
>forced to fetch and analyze all the involved class hierarchies
>each time a new subsumption relationship is added.
>
>3. Efficient access methods to class subsumption graphs with cycles
>can hardly be supported.
>Another way of protecting from miss usage would be the introduction of a
>property like owl:sameAs, not related to rdfs:subClassOf. This way cycles in
>rdfs:subClassOf will not be inferred. The question is what happens if two
>classes inside one class hierarchy are related by rdfs:subClassOf and then
>used with owl:sameAs? In this case I think the cycle would make sense. But
>will this really happen? This could be constrained to be restricted, e.g.,
>
>             ns2:Class2  rdfs:subClassOf  ns1:Class1
>
>             ns2:Class2  owl:sameAs  ns1:Class1
>
>Would not be valid.

So subClassOf would be interpreted as *proper* subclass of? This 
alternative seems somewhat unworkable in practice, since it would 
require users to be extremely careful about the subClass/sameClass 
distinction, and would make it impossible to know that A was a 
subclass of B, and later discover that A was identical to B, an 
inference path that seems to have some utility.  And bear in mind 
that RDF has no way to express disjunction explicitly, and that 
giving it one would drastically change its computational properties.

>
>My personal suggestion: I would prefer to have a property like
>owl:sameClassAs that would be mandatory to exist inside valid
>rdfs:subClassOf cycles. But the noted drawbacks 1-3 will stay. I think it
>is worth to think about and explore futher ways.

Since this issue has been explored by two working groups already, I 
do not think that further exploration is appropriate at this stage.

Pat

>
>Karsten
>_________________________________________________________________________________
>
>FORTH COMMENTS on Class Cycles and Pat Hayes comments to it.
>
>  >  Class Cycles
>>
>>   Finally, contrary to the new RDFS specification and the RDF Semantics,
>>   the RQL formal model forbids the existence of cycles in the
>>   subsumption hierarchies. Cycles are also allowed in DAML+OIL and
>>   OWL. According to these specifications if a resource is declared to be
>>   an instance of one classes in a subsumption cycle, then it will also
>>   be an instance of all the other classes of the cycle. In other words,
>>   all classes participating in the cycle will have the same
>>   extent.
>
>That is correct.
>
>>Thus, cycles in a subsumption hierarchy are mainly used to
>  >  provide different names for the same (meta)class or property.
>
>I think this is slightly misleading. The central point is less that
>cycles are *used* to provide equivalent names - that would indeed be
>rather a silly way to use them deliberately - but rather more that
>when putting together pieces of information from divergent sources,
>one might *discover* that two different hierarchies have the
>conclusion, when put together, that some classes are in fact all the
>same class.
>
>>The
>>   rationale behind this modeling choice is the inference of class
>>   equivalence. However, the introduction of cycles may considerably
>>   affect the semantics of already created RDF/S schemas and resource
>>   descriptions, especially when the subclass declarations are provided
>>   in many, different namespaces.
>
>I am not sure what you mean. The point is not that cycles will be
>introduced, but that they will be discovered. Once discovered, one
>has two options: to consider this an error, or to consider it a
>discovery.
>
>>Once more, declaring versus inferring
>>   class equivalences is more preferable for developers mastering the
>>   semantics of their applications.
>
>Is your point that it should be impossible to infer class equivalences?
>
>Thanks for your help in clarifying your points.
>
>Pat Hayes
>
>___________________________________
>Karsten Tolle


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam

Received on Friday, 21 February 2003 10:32:36 UTC