Re: [OEP] Classes as Values - A detailed review from Natasha Noy on 2005-03-03 (public-swbp-wg@w3.org from March 2005)

From: Natasha Noy <noy@smi.stanford.edu>
Date: Wed, 2 Mar 2005 23:45:15 -0800
To: "Uschold, Michael F" <michael.f.uschold@boeing.com>
Cc: <public-swbp-wg@w3.org>
Message-Id: <23ae5ba7aa6ee1a1388eb9f524fdad4c@smi.stanford.edu>
Mike,

Thank you so much for your excellent and insightful comments! I agree  
with most of them, as always.

> If you think this translates to a short review, you are in for a shock
> :-) .

Indeed :) 12 pages, it's longer than the document itself! :) Maybe it  
would almost have taken you less time to just re-write the document :)  
I've posted a new draft in the location referred by Editor's draft on  
the OEP page [1]

I'll start with the one major suggestion that I did not address at the  
moment and would like to get some feedback from the group on. I'll  
reply to the rest of your comments (most of which I simply followed)  
after that.

> The considerations for each approach should be identified up front as
> evaluation criteria, and then each considered for each approach.

[excellent suggestions on how exactly to do this, snipped]

> It would be good to have a table listing the approaches and the  
> criteria
> summarizing how each approach does according to the various criteria.
>
> Going this route will entail re-structuring and re-writing much of the
> text in the considerations sections.  Most or all of the points being
> made will remain. I believe it will make the note significantly more
> clear and useful to readers.

I completely agree -- it would certainly make for a much better  
document! Where were you 10 months ago? :) Seriously though, as you  
mention, following this suggestion would mean a complete rewrite of the  
prose in the document, with major possibility for new contentious  
points, re-review, etc. If you are willing to take on that task, I'll  
be happy to help. I am not however sure I feel motivated enough to do  
it myself, given that the document was in the WD status for more than 6  
months now, and got few major comments.

I tried to address most of your other comments in the draft available  
now. My thoughts/questions on some of them below.


> Same comment as to Alan Rector on the specified values note:
> Bite the bullet and where it is reasonably clear, call the
> considerations pros and cons.
> Put the pros first and the cons last, and order the 5 approaches so  
> that
> [as much as possible] the cons for one approach motivate and naturally
> lead to pros of the next one.

They are ordered this way, as much as possible, anyway. Do you have  
suggestions for a different ordering? I would tend to leave things as  
“considerations” since in many cases these are neither pros nor cons.  
The only ones that I would classify as pros or cons are the ones that  
are so obvious already that the classification would be useless anyway.  
I would really want to remain non-judgmental (otherwise, this note will  
never be finished).

> Printing anomaly:  when my copy got printed, the second figure (which
> should be NUMBERED) was split across two pages. Other figures also got
> split across pages.

I have no idea how to fix this. Are there any HTML gurus who could tell  
me how to put images into HTML document so that they don’t get split  
during printing.

I’ve added numbers to figures.

> The font for the heading for the APPROACH sections is section is less
> prominent than that for the next-level subheading. Boldface dominates,
> even though the font is smaller.

I am not sure I understand what you are referring to. Approaches is h2  
heading, the subheadings are h3 or simply bold. Could you be more  
specific? Is this browser specific?

> Ideally, all the notes would use the
> same heading/subheading font conventions.

Ideally, yes :) I basically just use h1/h2/h3, etc, and let the W3C  
stylesheet figure it out. This should be pretty standard, no?

> Also, I prefer numbered
> headings, easier to reference portions in the document, especially when
> there are no page numbers.

It’s a matter of taste I guess: I don’t particularly like numbering  
sections with 4 sentences in them, which most of them are in this  
document. Thus subheadings. You can reference each of the approaches  
through anchors though, e.g.:
<url>#1 for approach 1. Is that sufficient?

> Also, it would be good for the figures to better reflect the
> similarities and differences between the different approaches. For
> example, approaches 1 and 5 are virtually identical. This should be
> evident in the figures as follows: if you had two powerpoint slides and
> showed one right after the other, the only thing the viewer should see
> that is different is:
> 	1.	the addition of the annotationProperty class with links
> from the dc:subject arrows
> 	2.	the arrow for dc:subject turned green
> Everything else should be exactly aligned.

I fixed this where I could. In some cases, trying to conform to this  
idea makes the diagrams either too strange or too cluttered. If you’d  
like to take a stab at this, please, go ahead.

> It would also be good to have a summery table with all the figures  
> 'side
> by side' for easy comparison. Make sure to put the ones that are most
> similar to each other next to each other.

I am not sure how this would look in a single table. The figures need  
to be pretty large for any text to be readable. Again, if you’d like to  
mock it up, please, go ahead.

> SPECIFIC section by section comments
>
> ABSTRACT: give a specific example so reader can immediately relate to
> the issue. 'using classes as property values' is hard to relate to, it
> is very abstract.
>
> I suggest something like this to replace the first part of the  
> abstract:

I've re-written the abstract (also to allay your other concern below).

> stray quote in: "behavior of the" African lions"

Doesn't look stray to me.

> It is not immediately clear what is being said in the following
> paragraph. I'm having trouble parsing the key sentence (with "own" in
> it).
> ===
> "One goal of the web publisher is to enable maximum reuse of published
> information. It will be common on the Semantic Web to import and reuse
> other published ontologies. In doing so, it is important for web
> developers to preserve the original semantics of imported resources.
> Therefore, an important consideration in choosing a representation
> pattern in this case is the following: If the pattern requires a
> different interpretation of classes to be used as values, does the
> designer "own" the definitions of these classes (in this case, the
> hierarchy of animals) to change them according to the new
> interpretation? Are others already using this hierarchy of animals in
> their applications and will this change affect those applications?"
> ===
>
> I'm going to have a go rewording it, reflecting my best guess.

Your rewording makes a slightly different point than the one I was  
trying to make. I’ve changed the paragraph (borrowing some sentences  
from your rewording) – hopefully it’s better now.

> OTHER USE CASE SCENARIOS
> This is very important, I suggest making this more prominent somehow,  
> as
> it is, it could easily be missed, especially since approach 2 refers to
> a hierarchy of subjects anyway (see also comments under Approach 2
> below).

Any suggestions on how to do this?

> It is slightly jarring that you name the class of all books about
> animals as singular 'BookAboutAnimals' - although I see this is the
> consistent naming convention. Might be worth mentioning this in a
> footnote?
> [very minor point]

I think this is rather common. I don’t think it’s worth mentioning

> The definition is a good example of why it would be useful to have a
> distinction between properSubclassOf and subclassOf. Then you don't  
> need
> the union clause. I'm not sure about my OWL. If there is such a
> distinction, then this definition could be shortened by only using the
> subclassOf which includes the class itself (Animals).

This separation doesn’t exist in OWL, thus we can’t shorten it,  
unfortunately.

> I'm a strong advocate of fairly literal, but also readable English
> translations of all/most N3 expressions in these documents. For  
> example,
> the bookaboutanimals would read:
>
> The class BookAboutAnimals is an OWL class. It is a subclass of Book.  
> It
> is also a subclass of the class of all things whose subject property  
> has
> a value that is either an animal, or is a member of the class of all
> things that are subclasses of Animal.
>
> If anyone questions the need for this, let me just say that it took me
> 3-4 minutes to puzzle out this English text from the N3. As a way to
> teach people about OWL, having English definitions of all code
> alongside, is going to be very helpful indeed. It helps me a lot, and I
> already know a lot about OWL (though I am not so great at reading any  
> of
> the raw syntaxes).

I agree on this particular example, and tried to add some text there.  
For the rest of the examples, I tried to give a “high-level”  
description of the OWL code, but not a literal one. I am a bit lazy and  
am not convinced that putting literal translations for all OWL code is  
useful everywhere. Mike, do you want to take a stab at it if you feel  
this is necessary?

> You don't say whether this is OWL-DL or OWL-Lite.

What does "it" refer to here?

> Maybe you could avoid 'subject' and say that this approach is about
> creating a parallel hierarchy of annotation individuals?

I changed the title for the approach. I am not sure its better though. 

> Minor point: you say
> "The resulting ontology is compatible with RDF Schema and OWL Lite (and
> hence OWL DL)" I had to think for a moment about the hence clause, it
> seemed backwards at first, but then I saw it was correct. For this
> audience, it might be better to keep it simple. It is also compatible
> with OWL-Full, which you do not mention. I suggest rewording to:
>
> "The resulting ontology is compatible with RDF Schema and all variants
> of OWL (Full, DL, and Lite)."
>
> It is not germane to this discussion that:
> IF it is compatible with OWL-Lite,
> THEN it is compatible with OWL-DL.
>
> If you wish to give the reader this information, make it a separate
> comment or footnote.  It would really belong in a discussion with the
> definition of 'compatible' which is a blue term targeted for a glossary
> definition.

I am not sure I agree. The key point here is that, unlike the previous  
approach, it *is*compatible with OWL DL and OWL Lite

> APPROACH 3
> Overall, this approach is very clearly described.
>
> This approach assumes that there is a subject hierarchy. Is that true?
> Is there a more general view that is not subject-specific? Can it be
> generalized? Or do we bite the bullet and say this note IS about
> representing subject hierarchies.

No, it is not about subjects. Subjects are used as an example. Consider  
genre for annotating CDs, diseases for annotating clinical guidelines,  
others under the “Other use cases” section.

> Remove last word in: "We can create a single class Subject and make all
> the subjects to be individuals that are instances of this class  
> Subject"

Why?

> Might: "using individuals as surrogates for classes" be a better name
> for this approach? The current one may be too specific?

I don’t know. Is it?

> This approach also entails creating a parallel hierarchy.  I think it
> would be good to show the parallel hierarchy in the figure, in a way
> that really looks like a parallel hierarchy (e.g. the layout should be
> more or less identical, so it is immediately obvious). This will  
> involve
> a major re-structuring of the diagram.

This would be nice, but I honestly don’t see how I can do it in a  
single diagram and still get the main point across.

> If the DL reasoner cannot infer that "a book that has LionSubject as  
> the
> value for dc:subject is also about Animals" then what is the point of
> mentioning that "Most DL reasoners will be able to infer transitive
> relations between subjects". Is this useful by itself?  If so, can it  
> be
> factored into the requirements/criteria for evaluating the different
> approaches?

 Yes, if/when the document is refactored.

> What is the import of this: "The resulting hierarchy of subjects is not
> related to or dependent on the class hierarchy representing the same
> topics (in this case, animals), except through an annotation property
> rdfs:seeAlso."  Is it good? helpful? Why? Relate it to one of the
> evaluation criteria.

It depends on the requirements for your application. As it stands, it’s  
just a fact.

> What is the import of this consideration:
> "This approach explicitly separates the subject terminology from the
> corresponding ontology. Many consider this separation a good modeling
> practice: the semantics of a subject Lion can be different from the
> semantics of the class of lions. Having subjects in a separate
> hierarchy, would allow us to define for example that the subject Africa
> is a parent subject of the subject AfricanLion."
> Relate to one or more evaluation criteria, does it relate to supporting
> a desirable inference? does it impact on maintenance?

Again, depends on your requirements and application.


> APPROACH 4
> I found that this example was hard to grasp. Focusing on "unspecified
> members of a class" seems very obscure, and must be missing the main
> point, which is ??? - I'm not sure.  The main thing seems to be that  
> the
> actual value that the property dc:subject has is an [implicit]
> unidentified instance of the class Lion and that the relationship of
> this [nonexistent implicit] value to the class Lion is rdf:type.  This
> is IMHO, rather obscure and many are likely to have little idea what  
> you
> are talking about. The main problem is that the instance DOES NOT  
> EXIST,
> so it needs to be explained differently. You might at the end mention
> that this representation approach corresponds to there being an  
> implicit
> instance, but otherwise it is likely to be far to confusing.

I was first tempted to change the title of the approach to include  
“implicit”. But then I am not sure “implicit” is the right word here.  
We don’t actually know if the instance exists or not. All we are saying  
is that for a book about lions, one value for the subject property will  
be an instance of Lion. This is enough to classify it, but we don’t  
actually say anything about whether or not this instance exists and is  
named, etc..

>
> Specifically, there is nothing corresponding to the following (from
> approach 2)
> :AfricanLionBook
>       a       :BookAboutAnimals ;
>       dc:subject :AfricanLionSubject .
>
> If there was, it would be:
> :AfricanLionBook
>       a       :BookAboutAnimals ;
>       dc:subject :UnidentifiedAfricanLion .

There won’t be: we describe AfricanLionBook as a book where at least  
one subject is an instance of Lion (regardless of what else we know  
about that instance)

> This approach is designed to make it easy to leverage a DL reasoner to
> infer, for example that a book whose subject is Lion also as subject
> Animal. In this approach, we create a parallel hierarchy of types of
> books consisting of classes such as: BookAboutAnimals, BookAboutLions,
> BookAboutAfricanLions. We then say that various instances of Books are
> explicit members of one or more of these subject classes.

Actually, here we don’t need to create a full parallel hierarchy:  
technically, we create only for the classes that have a book with this  
subject there. Thus, if we have no books about African Lions, we don’t  
need to create the class, we can always do it on demand. This is not  
quite true for the subjects themselves in previous approaches.

> [before the Alternatively clause, put this text in:]
> By saying that LionsLifeInThePrideBook  is an instance of  
> BookAboutLions
> we are saying that it is a member of a class, all of whose members have
> as their subject, at least one instance of the class Lion. [this text
> might need fixed so it is strictly and literally true, I might have a
> misreading of someValuesFrom, all the more reason that these examples
> need English every time.]  In OWL, it is not necessary to create any
> explicit instances of these classes.  In the figure, we list them as if
> the were explicit, and use dotted lines to denote that they may not
> actually exist.

Added – thanks! I haven’t followed the rest of your suggestion here,  
since I thought it was just reiterating the point once again, perhaps  
making it a bit more confusing

> This approach is agnostic to the following issues:
> 	*	how to limit the range of the dc:subject
> <http://purl.org/dc/elements/1.1/subject>  values

Not really – it limits it explicitly in each case.

> There are really two variants here, perhaps make that more explicit,
> figures for each? N3 and RDF/XML representations for each?
>
> The figure does not have the class(es): BookAbout(African)Lions.
> I think it should, to show the parallel hierarchy.

See above: you don’t necessarily have a parallel hierarchy.

> APPROACH 5
>
> Add the consideration: there are no non-standard semantic
> interpretations of this approach. Or no semantic interpretations that
> differ from the original intent of an existing ontology that is being
> re-used. Well, maybe it is non-standard to view dc:subject as an
> annotation property...

Indeed, that’s unclear. I am not sure we want to say that this is  
definitely standard.

Again, thanks a million for doing it this carefully! I look forward to  
the discussion tomorrow. Too bad I won't be able to be there in person!

Natasha

[1]  
http://smi-web.stanford.edu/people/noy/ClassesAsValues/ClassesAsValues 
-2nd-WD.html
Received on Thursday, 3 March 2005 07:45:26 UTC