[OEP] Classes as Values - A detailed review from Uschold, Michael F on 2005-03-02 (public-swbp-wg@w3.org from March 2005)

From: Uschold, Michael F <michael.f.uschold@boeing.com>
Date: Wed, 2 Mar 2005 15:25:30 -0800
To: <public-swbp-wg@w3.org>, <noy@SMI.Stanford.EDU>
Message-ID: <823043AB1B52784D97754D186877B6CF0583C2FD@xch-nw-12.nw.nos.boeing.com>
GENERAL

This is an excellent note, most of my comments are geared at fine-tuning
things like formatting, grammar, terminology, word-smithing etc. to
improve clarity and readability. There are also some suggestions on
re-structuring certain things.

If you think this translates to a short review, you are in for a shock
:-) .

Same comment as to Alan Rector on the specified values note:
Bite the bullet and where it is reasonably clear, call the
considerations pros and cons.
Put the pros first and the cons last, and order the 5 approaches so that
[as much as possible] the cons for one approach motivate and naturally
lead to pros of the next one.

Printing anomaly:  when my copy got printed, the second figure (which
should be NUMBERED) was split across two pages. Other figures also got
split across pages. 

The font for the heading for the APPROACH sections is section is less
prominent than that for the next-level subheading. Boldface dominates,
even though the font is smaller. Ideally, all the notes would use the
same heading/subheading font conventions. Also, I prefer numbered
headings, easier to reference portions in the document, especially when
there are no page numbers.

It would also help a lot to use numbered lists instead of just bullets.
It may not matter so much for end users, but it would make reviewing the
note easier [very minor point].

The considerations for each approach should be identified up front as
evaluation criteria, and then each considered for each approach. The
criteria are something like:
	*	compatibility: what languages is it compatible with
	*	understandability: improved by succinctness and
intuitiveness, diminished by messiness
	*	degree of semantic 'tidiness' or coherence, e.g. is it a
semantic hack or workaround, or is 'clean' [this might be a mine field
worth avoiding, focusing instead on the practical import of the
particular choice, aside from whether it might be widely viewed as a
hack.]. Particular objective things one can say might be:
			o	semantic overloading (e.g. class whose
instances are lions, vs. whose instances are subjects)
			o	non-standard semantic interpretations
which are likely to cause confusion and [tehrefore] hinder
interoperability 
	*	maintainability - goes down with workarounds and
semantic 'hacks'
	*	ability to support interoperability: goes down when
there are semantic hacks.
	*	ability to support a particular inference (whose value
must be made clear)
			o	e.g. ability to infer that any book
whose subject is 'Lion' is also a book whose subject is 'Animal'.  In
the case of MusicCDs one would wish to infer that a CD annotated by the
class HardRock would also be annotated (by inference) by the class Rock.

	*	Nature of the [formal represented] relationship between
the class hierarchy and the [parallel] subject hierarchy. Relate this to
the ability to perform one or more desired inferences.
	*	I'm struggling to generalize the 4th consideration under
approach 1. It is too focused on the particular example, referring to
"dc:subject". It needs to be augmented or replaced by something more
general, so the reader can apply the information in the note to a
different context. Oh wait, maybe this is it:
			o	ability to restrict the range of values
for properties whose values are naturally represented as classes (e.g.
dc:subject).  What would another property be, say in another example
that does not use subject?

There may be more, I did not do a thorough check. 

For each approach, the considerations and the summary should explicitly
address most/all of these requirements/criteria. I have given some
examples of how this would work for the summary section for each
approach.  I have attempted to make these summary sections as complete
as possible, but I will have missed some things.  

It would be good to have a table listing the approaches and the criteria
summarizing how each approach does according to the various criteria.

Going this route will entail re-structuring and re-writing much of the
text in the considerations sections.  Most or all of the points being
made will remain. I believe it will make the note significantly more
clear and useful to readers.

Some of these criteria may not be relevant to some of the approaches.
They might be explicitly listed as not mattering (agnosticity).

I'm surprised that in the initial discussions, you never mention Range
Constraints. After all, the value that a property has is the range of
the property. An interesting and important fact is that the range for a
property that has classes as values, is a meta-class.

Before going into each approach, perhaps give a short overview
describing some of the common themes used in the approaches. Like
creating a hierarchy that is a parallel to the hierarchy of classes. It
is helpful for the reader to see the similarities and differences among
the approaches. It is also helpful for them to see a big picture, that
all the specific approaches fit into.
A good example of this kind of text is the introduction to approach 4:
"We can approximate the interpretation that we used in the previous
approaches by using ..."

An even better example would be to introduce approach 5 as follows:
"This approach is almost identical to approach 1. The only difference is
that the property is an annotationProperty, rather than a
objectProperty. This difference is very important because..."

Also, introduce each approach by saying what disadvantage(s) it is
specifically designed to overcome, or which criteria is it designed to
meet. e.g.  approach 4 seems to be designed to support a particular
inference: "A DL classifier will be able to classify
LionsLifeInThePrideBook as an instance of the class BookAboutLions". You
could introduce approach 4 by saying something along those lines.

Also, it would be good for the figures to better reflect the
similarities and differences between the different approaches. For
example, approaches 1 and 5 are virtually identical. This should be
evident in the figures as follows: if you had two powerpoint slides and
showed one right after the other, the only thing the viewer should see
that is different is:
	1.	the addition of the annotationProperty class with links
from the dc:subject arrows
	2.	the arrow for dc:subject turned green
Everything else should be exactly aligned.


It would also be good to have a summery table with all the figures 'side
by side' for easy comparison. Make sure to put the ones that are most
similar to each other next to each other.


Clearly say what are the key factors that distinguish the different
approaches, from a REPRESENTATIONAL view (ie. what the user has to do
with OWL code).
	*	the heart of the matter is: what is the exact thing that
is the value of the dc:subject property and how is it related to the
corresponding class?
			o	1:  the actual class, e.g. Lion
			the relationship of this value to the class Lion
is identity (it IS the class)
			o	2:  an instance (called LionSubject) of
the class: Lion denoting the subject of Lions.  
			The relationship of this value to the class,
Lion is: rdf:Type (or instance)
			o	3:  an instance (called LionSubject) of
the class: Subject denoting the subject of Lions.  
			LionSubject is related to the class Lion via an
rdf:seeAlso link.
			o	4: an [implicit] unidentified instance
of the class Lion.
			The relationship of this [nonexistent implicit]
value to the class Lion is rdf:type
			o	5: the actual class, e.g. Lion
			the relationship of this value to the class Lion
is identity (it IS the class)
			NB: this is identical to approach 1. The
difference is that the property is an annotation property.
	*	is there an explicit parallel hierarchy (akin to the
question of whether there is an explicit property that represents the
relationship between say the subject lion from the subject AfricanLion)?
	*	is a literal class used as a value, or some substitute
for a class?
	*	others?

This relates to but is different from a user functionality pros/cons
view which is the main emphasis of the note, now.


SPECIFIC section by section comments 

STATUS: 'compaible' is misspelled.

ABSTRACT: give a specific example so reader can immediately relate to
the issue. 'using classes as property values' is hard to relate to, it
is very abstract.

I suggest something like this to replace the first part of the abstract:


It is frequently convenient to use a class (such as 'Lions') as a value
for a property (e.g. topic_of_book) when building an ontology. While
this presents no difficulties in OWL-Full and RDF-Schema, most
properties cannot have classes as their values in OWL-Lite and OWL-DL.
This document ... [same as now]
---

Be clear about whether this note is going to give advice about using
OWL-Lite, or only OWL-DL.

USE CASE EXAMPLE: 

In: "is a particular species or class of animal (or animals)" it is
unnecessary to say "(or animals)".
---

capitalize 'African'.
---

stray quote in: "behavior of the" African lions"
---

Remove the word "However" in "However, we also want to retrieve"
---

Reword: "Furthermore, we have a class hierarchy representing different
animal species and would like to reuse it as these subjects." to be

"Furthermore, we wish to use as our subjects various species from an
existing class hierarchy of different animal species."
---

"We discuss a number of approaches for representing this pattern in OWL
DL and their implications."
What about OWL-Lite?
---

It is not immediately clear what is being said in the following
paragraph. I'm having trouble parsing the key sentence (with "own" in
it).
===
"One goal of the web publisher is to enable maximum reuse of published
information. It will be common on the Semantic Web to import and reuse
other published ontologies. In doing so, it is important for web
developers to preserve the original semantics of imported resources.
Therefore, an important consideration in choosing a representation
pattern in this case is the following: If the pattern requires a
different interpretation of classes to be used as values, does the
designer "own" the definitions of these classes (in this case, the
hierarchy of animals) to change them according to the new
interpretation? Are others already using this hierarchy of animals in
their applications and will this change affect those applications?"
===

I'm going to have a go rewording it, reflecting my best guess.

===
In the representation patterns that we present below, it is important to
be aware of what if any new interpretation of existing classes is
required for them to be used as values of properties.  Ideally, any
reuse of existing ontologies on the Semantic Web (in our case, a
hierarchy of animal species) should  preserve the original intended
semantics in the new context.  If the semantics changes, this could
adversely impact other applications already using the ontology - or make
interoperation with existing applications error-prone (since there will
be two different interpretations of the same ontology). More generally,
caution must be taken whenever non-standard interpretations of commonly
used terms or representational patterns are being adopted. 
===

OTHER USE CASE SCENARIOS
This is very important, I suggest making this more prominent somehow, as
it is, it could easily be missed, especially since approach 2 refers to
a hierarchy of subjects anyway (see also comments under Approach 2
below).

The considerations are too focused on the particular example. The note
is supposed to be about classes as values in general, not focused on
subjects or lions. 

termss -> terms

NOTATIONS:
What is an 'annotation property'? You mention it as if reader already
knows.
You don't say anywhere that the code examples will be in N3, say it and
point to a URL for an N3 tutorial.

APPROACH 1: 
As you introduce this approach, it seems to me it can't work. You
already said you can NOT use classes directly as values. So I'm
confused. I expect to be enlightened, but I would suggest that the
opening wording be changed to avoid this confusion.

Ah, I see, you said the note was about how to do classes as values in
OWL-DL, so I expected all examples to be in OWL-DL. It was quite a
surprise to learn you actually are using OWL-Full or RDF-Schema.
Mention that up front. OR: don't say in the abstract and introduction
that this note is about how to do this in OWL-DL, but rather say it is
about how to do it in all variants of OWL (because it is!)

It is slightly jarring that you name the class of all books about
animals as singular 'BookAboutAnimals' - although I see this is the
consistent naming convention. Might be worth mentioning this in a
footnote? 
[very minor point]

The definition is a good example of why it would be useful to have a
distinction between properSubclassOf and subclassOf. Then you don't need
the union clause. I'm not sure about my OWL. If there is such a
distinction, then this definition could be shortened by only using the
subclassOf which includes the class itself (Animals).

I'm a strong advocate of fairly literal, but also readable English
translations of all/most N3 expressions in these documents. For example,
the bookaboutanimals would read:

The class BookAboutAnimals is an OWL class. It is a subclass of Book. It
is also a subclass of the class of all things whose subject property has
a value that is either an animal, or is a member of the class of all
things that are subclasses of Animal.

If anyone questions the need for this, let me just say that it took me
3-4 minutes to puzzle out this English text from the N3. As a way to
teach people about OWL, having English definitions of all code
alongside, is going to be very helpful indeed. It helps me a lot, and I
already know a lot about OWL (though I am not so great at reading any of
the raw syntaxes).

Summary of approach 1, a bit hard to read. I suggest the following
variant:

===
This approach will be suitable to the extent that the following applies
to you:
1. simplicity is important 
2. being in OWL DL or OWL-Lite is not important 
3. there is no requirement to limit the range of the dc:subject
<http://purl.org/dc/elements/1.1/subject>  values or you are amenable to
using classes as subjects to implement this restriction.
===

You don't say whether this is OWL-DL or OWL-Lite.

APPROACH 2:

Above you said that "This note should not be interpreted as a general
discussion of how to represent subject hierarchies or terminologies on
the semantic web." I strongly agree with this. However, this is
undermined by the name of this approach, which refers to a hierarchy of
subjects. 
QUESTION: is there any use case that has nothing to do with subjects? 
If so then you can perhaps name this approach using the phrase 'creating
a parallel hierarchy' and not mention subjects explicitly. For the
example, it is indeed creating a hierarchy of subjects, what do you do
in a more general case, e.g. music CDs.

On the other hand, you say that using classes as values arises in
general when you want to use a hierarchy of classes to ANNOTATE other
classes or individuals. This may be true. Can anyone think of any use of
classes of values that does NOT involve using the classes for
annotation?  If not, then maybe we are tied to subjects all the time
after all. For, one can frequently/usually/always interpret any
annotation as a topic or subject.

Maybe you could avoid 'subject' and say that this approach is about
creating a parallel hierarchy of annotation individuals?

Maybe the ONLY use of classes as values is when the classes are used as
annotations? If so, then perhaps this note is really about 'classes as
annotations', from a user perspective, and from an OWL perspective, it
is about classes as property values.

This goes to the heart of the question of under what circumstances one
might use classes as values. How much do we really know about this? It
can be difficult to abstract from examples to a general case.

The figure for this approach should actually show a 'parallel
hierarchy'. It does not. I'm not sure there IS a parallel hierarchy. If
so, what is the relation that defines the hierarchy? i.e. what is the
relationship between LionSubject and AnimalSubject? Is that relation
explicit?  The common sense meaning of the relationship is indeed, 'is a
kind of'. For, anything that is about Lions is also about Animals. BUT,
here the Subjects are not classes.

OOPS: I just realized that you never said there was a parallel
hierarchy. You said there was a parallel set. So maybe the above
comments do not apply?

Minor point: you say 
"The resulting ontology is compatible with RDF Schema and OWL Lite (and
hence OWL DL)" I had to think for a moment about the hence clause, it
seemed backwards at first, but then I saw it was correct. For this
audience, it might be better to keep it simple. It is also compatible
with OWL-Full, which you do not mention. I suggest rewording to: 

"The resulting ontology is compatible with RDF Schema and all variants
of OWL (Full, DL, and Lite)."

It is not germane to this discussion that:
IF it is compatible with OWL-Lite, 
THEN it is compatible with OWL-DL.

If you wish to give the reader this information, make it a separate
comment or footnote.  It would really belong in a discussion with the
definition of 'compatible' which is a blue term targeted for a glossary
definition.

Summary of approach 2. I suggest re-structuring this as follows:
This approach will be suitable to the extent that the following applies
to you:
	*	staying in OWL DL is important. 
	*	there is no need to infer that books about lions are
also about animals (perhaps reworded/augmented to be more general than
the lions example)
	*	you are not concerned about interoperability with others
who may be using the class hierarchy in a way that is aligned with the
original meaning, rather than the different interpretation of this
approach.
	*	you don't need to, or are prepared to meet the cost of
maintaining consistency between the set of classes representing subjects
and the set of corresponding individuals. 

APPROACH 3
Overall, this approach is very clearly described.

This approach assumes that there is a subject hierarchy. Is that true?
Is there a more general view that is not subject-specific? Can it be
generalized? Or do we bite the bullet and say this note IS about
representing subject hierarchies.

Remove last word in: "We can create a single class Subject and make all
the subjects to be individuals that are instances of this class Subject"

Might: "using individuals as surrogates for classes" be a better name
for this approach? The current one may be too specific?

I would leave the SKOS comment as an aside at the end of this approach,
it is a worthwhile point, but not germane to understanding the basic
idea, in fact it gets in the way.

This approach also entails creating a parallel hierarchy.  I think it
would be good to show the parallel hierarchy in the figure, in a way
that really looks like a parallel hierarchy (e.g. the layout should be
more or less identical, so it is immediately obvious). This will involve
a major re-structuring of the diagram.

Considerations: ditto  the parenthetical remark: (hence ...)

If the DL reasoner cannot infer that "a book that has LionSubject as the
value for dc:subject is also about Animals" then what is the point of
mentioning that "Most DL reasoners will be able to infer transitive
relations between subjects". Is this useful by itself?  If so, can it be
factored into the requirements/criteria for evaluating the different
approaches?

What is the import of this: "The resulting hierarchy of subjects is not
related to or dependent on the class hierarchy representing the same
topics (in this case, animals), except through an annotation property
rdfs:seeAlso."  Is it good? helpful? Why? Relate it to one of the
evaluation criteria.

What is the import of this consideration:
"This approach explicitly separates the subject terminology from the
corresponding ontology. Many consider this separation a good modeling
practice: the semantics of a subject Lion can be different from the
semantics of the class of lions. Having subjects in a separate
hierarchy, would allow us to define for example that the subject Africa
is a parent subject of the subject AfricanLion."
Relate to one or more evaluation criteria, does it relate to supporting
a desirable inference? does it impact on maintenance?

"correpsonding" and "separartion" misspelled.

The following two considerations seem similar, merge into one? If these
are all re-structured anyway, this may be moot.
	*	Some may consider a approach having two parallel
hierarchies of essentially the same data to be too complicated and
difficult to maintain for the simple task at hand 
	*	The separartion of the subject terminology from the
correpsonding ontology incurs a serious maintenance penalty: We need to
maintain a set of instances for all subjects in addition to the
hierarchy of subjects. In many applications, we may also need to ensure
that the two sets-classes representing subjects and corresponding
individuals and values for the parentSubject property-are consistent
with each other. However, developers can instrument tools that would
maintain this consistency automatically. 

Here is a suggested expanding and re-structuring the summary (it was too
brief):

This approach will be suitable to the extent that the following applies
to you:
	*	staying within OWL DL is important. 
	*	there is a requirement to limit the range of the
dc:subject <http://purl.org/dc/elements/1.1/subject>  values in a
natural convenient way.
	*	there is a requirement to infer transitive relationships
between subjects.
	*	there is no need to infer that books about lions are
also about animals (perhaps reworded/augmented to be more general than
the lions example)
	*	there is no need to have a formal linkage between the
class hierarchy and the parallel subject hierarchy (perhaps
reworded/augmented to be more general than the lions/subjects example)
	*	there is a requirement to minimize semantic overloading
to support interoperability with persons using the class hierarchy with
the original intended interpretation.
	*	there is an adequate way to manage the penalty of having
two parallel "hierarchies." 

APPROACH 4
I found that this example was hard to grasp. Focusing on "unspecified
members of a class" seems very obscure, and must be missing the main
point, which is ??? - I'm not sure.  The main thing seems to be that the
actual value that the property dc:subject has is an [implicit]
unidentified instance of the class Lion and that the relationship of
this [nonexistent implicit] value to the class Lion is rdf:type.  This
is IMHO, rather obscure and many are likely to have little idea what you
are talking about. The main problem is that the instance DOES NOT EXIST,
so it needs to be explained differently. You might at the end mention
that this representation approach corresponds to there being an implicit
instance, but otherwise it is likely to be far to confusing.  

Specifically, there is nothing corresponding to the following (from
approach 2)
:AfricanLionBook
      a       :BookAboutAnimals ;
      dc:subject :AfricanLionSubject .
 
If there was, it would be:
:AfricanLionBook
      a       :BookAboutAnimals ;
      dc:subject :UnidentifiedAfricanLion .

If you stuck with the current name for the approach, it would be better
to change it to be something like: 
"using IMPLICIT members of a class as values for the property" since
that is more accurate. But I suggest throwing it out entirely.

Here is a possible alternative way to describe this approach which
focuses on what the USER HAS TO DO: create [possibly] anonymous classes
such as BookAboutLions, and say that the books are instances of such
classes.

===
This approach is designed to make it easy to leverage a DL reasoner to
infer, for example that a book whose subject is Lion also as subject
Animal. In this approach, we create a parallel hierarchy of types of
books consisting of classes such as: BookAboutAnimals, BookAboutLions,
BookAboutAfricanLions. We then say that various instances of Books are
explicit members of one or more of these subject classes.

The key to making this work is how we define the classes in the parallel
hierarchy. For example, we can define the class BookAboutLions as
follows:  [continue as per current draft].

...
...

[before the Alternatively clause, put this text in:]
By saying that LionsLifeInThePrideBook  is an instance of BookAboutLions
we are saying that it is a member of a class, all of whose members have
as their subject, at least one instance of the class Lion. [this text
might need fixed so it is strictly and literally true, I might have a
misreading of someValuesFrom, all the more reason that these examples
need English every time.]  In OWL, it is not necessary to create any
explicit instances of these classes.  In the figure, we list them as if
the were explicit, and use dotted lines to denote that they may not
actually exist.

Strictly speaking, this approach does not answer the question: what is
the exact thing that is the value of the dc:subject property, because
there are no explicit instances of the Lions. The best we can do is
answer the question this way: the exact thing that is the value of the
dc:subject property is an implicit, unidentified instance of the
corresponding class (e.g. Lion).

Put another way, we [next, I just pasted in the original opening text of
this approach] can approximate the interpretation that we used in the
previous approaches by using unspecified members of a class rather than
the class itself as property values. We define the class
BookAboutAnimals as a class of books where the subject is some animal.
Correspondingly, a BookAboutLions class will be a class of books where a
subject is some (unidentified) lion or lions.

Variaint: Now put in the alternatively clause, which just creates an
instance with a an anoymous class. This way avoids the need to created
an explicit parallel hierarchy.
===

This may need further fiddling, but I think it will be much easier to
understand this way.

Suggested re-doing of summary for approach 4:

This approach will be suitable to the extent that the following applies
to you:
	*	staying in OWL DL is important. 
	*	there is a need to infer that books about lions are also
about animals (perhaps reworded/augmented to be more general than the
lions example)
	*	you are not concerned about interoperability with others
who may be using a more standard interpretation of subject (e.g. about
lions, rather than about a particular unidentified lion).
	*	you are prepared to expend the effort in creating and
maintaining a [possibly implicit] parallel hierarchy of classes of the
sort: BookAboutClass.  

This approach is agnostic to the following issues:
	*	how to limit the range of the dc:subject
<http://purl.org/dc/elements/1.1/subject>  values 
	*	others?


There are really two variants here, perhaps make that more explicit,
figures for each? N3 and RDF/XML representations for each?

The figure does not have the class(es): BookAbout(African)Lions.
I think it should, to show the parallel hierarchy.

Also, if the class(es) BookAbout(African)Lions are both defined as
restrictions as per your example, then a DL classifier can infer that
BookAboutLions is a subclass of BookAboutAnimals [I think], which is why
it can also classify LionsLifeInThePrideBook as an instance of the class
BookAboutLions. You say this in a consideration, but not in the main
text. 



APPROACH 5
The relationship to approach one should be very explicit (see comment
above).

The figure should not split up the two snippets of N3.
Also, you introduced a new diagramming convention: and arc from an arrow
to a class. Align with the diagrammatic conventions Alan just created
for the specified values note.

Add the consideration: there are no non-standard semantic
interpretations of this approach. Or no semantic interpretations that
differ from the original intent of an existing ontology that is being
re-used. Well, maybe it is non-standard to view dc:subject as an
annotation property...

Suggested redoing of summary for approach 4:

This approach will be suitable to the extent that the following applies
to you:
	*	staying in OWL DL is important. 
	*	there is NO  need to infer that books about lions are
also about animals (perhaps reworded/augmented to be more general than
the lions example)
	*	you are not concerned about interoperability with other
systems/applications that may use dc:subject as an objectProperty that
is used for reasoning. 
	*	you can't afford to expend the effort in creating and
maintaining a [possibly implicit] parallel hierarchy of classes to
represent subjects.  

This approach is agnostic to the following issues:
	*	how to limit the range of the dc:subject
<http://purl.org/dc/elements/1.1/subject>  values 
	*	others?

There is no non-standard semantic interpretations used on the one hand,
but on the other hand, using dc:subject as an annotation property will
make it hard to interoperate with applications that use dc:subject as an
object property?

SUMMARY and CONCLUSIONS
Add a new section which has things like summary tables, all the figures
side by side, and punchy re-statements of the main reasons for choosing
which approach.  You might include here the bullet points from above
under the heading: "What are the key factors that distinguish the
different approaches, from a REPRESENTATIONAL view." This will help the
reader get all the approaches in their heads and how they inter-relate,
all at the same time.
Received on Wednesday, 2 March 2005 23:26:08 UTC