- From: Uschold, Michael F <michael.f.uschold@boeing.com>
- Date: Wed, 2 Mar 2005 15:25:30 -0800
- To: <public-swbp-wg@w3.org>, <noy@SMI.Stanford.EDU>
GENERAL This is an excellent note, most of my comments are geared at fine-tuning things like formatting, grammar, terminology, word-smithing etc. to improve clarity and readability. There are also some suggestions on re-structuring certain things. If you think this translates to a short review, you are in for a shock :-) . Same comment as to Alan Rector on the specified values note: Bite the bullet and where it is reasonably clear, call the considerations pros and cons. Put the pros first and the cons last, and order the 5 approaches so that [as much as possible] the cons for one approach motivate and naturally lead to pros of the next one. Printing anomaly: when my copy got printed, the second figure (which should be NUMBERED) was split across two pages. Other figures also got split across pages. The font for the heading for the APPROACH sections is section is less prominent than that for the next-level subheading. Boldface dominates, even though the font is smaller. Ideally, all the notes would use the same heading/subheading font conventions. Also, I prefer numbered headings, easier to reference portions in the document, especially when there are no page numbers. It would also help a lot to use numbered lists instead of just bullets. It may not matter so much for end users, but it would make reviewing the note easier [very minor point]. The considerations for each approach should be identified up front as evaluation criteria, and then each considered for each approach. The criteria are something like: * compatibility: what languages is it compatible with * understandability: improved by succinctness and intuitiveness, diminished by messiness * degree of semantic 'tidiness' or coherence, e.g. is it a semantic hack or workaround, or is 'clean' [this might be a mine field worth avoiding, focusing instead on the practical import of the particular choice, aside from whether it might be widely viewed as a hack.]. Particular objective things one can say might be: o semantic overloading (e.g. class whose instances are lions, vs. whose instances are subjects) o non-standard semantic interpretations which are likely to cause confusion and [tehrefore] hinder interoperability * maintainability - goes down with workarounds and semantic 'hacks' * ability to support interoperability: goes down when there are semantic hacks. * ability to support a particular inference (whose value must be made clear) o e.g. ability to infer that any book whose subject is 'Lion' is also a book whose subject is 'Animal'. In the case of MusicCDs one would wish to infer that a CD annotated by the class HardRock would also be annotated (by inference) by the class Rock. * Nature of the [formal represented] relationship between the class hierarchy and the [parallel] subject hierarchy. Relate this to the ability to perform one or more desired inferences. * I'm struggling to generalize the 4th consideration under approach 1. It is too focused on the particular example, referring to "dc:subject". It needs to be augmented or replaced by something more general, so the reader can apply the information in the note to a different context. Oh wait, maybe this is it: o ability to restrict the range of values for properties whose values are naturally represented as classes (e.g. dc:subject). What would another property be, say in another example that does not use subject? There may be more, I did not do a thorough check. For each approach, the considerations and the summary should explicitly address most/all of these requirements/criteria. I have given some examples of how this would work for the summary section for each approach. I have attempted to make these summary sections as complete as possible, but I will have missed some things. It would be good to have a table listing the approaches and the criteria summarizing how each approach does according to the various criteria. Going this route will entail re-structuring and re-writing much of the text in the considerations sections. Most or all of the points being made will remain. I believe it will make the note significantly more clear and useful to readers. Some of these criteria may not be relevant to some of the approaches. They might be explicitly listed as not mattering (agnosticity). I'm surprised that in the initial discussions, you never mention Range Constraints. After all, the value that a property has is the range of the property. An interesting and important fact is that the range for a property that has classes as values, is a meta-class. Before going into each approach, perhaps give a short overview describing some of the common themes used in the approaches. Like creating a hierarchy that is a parallel to the hierarchy of classes. It is helpful for the reader to see the similarities and differences among the approaches. It is also helpful for them to see a big picture, that all the specific approaches fit into. A good example of this kind of text is the introduction to approach 4: "We can approximate the interpretation that we used in the previous approaches by using ..." An even better example would be to introduce approach 5 as follows: "This approach is almost identical to approach 1. The only difference is that the property is an annotationProperty, rather than a objectProperty. This difference is very important because..." Also, introduce each approach by saying what disadvantage(s) it is specifically designed to overcome, or which criteria is it designed to meet. e.g. approach 4 seems to be designed to support a particular inference: "A DL classifier will be able to classify LionsLifeInThePrideBook as an instance of the class BookAboutLions". You could introduce approach 4 by saying something along those lines. Also, it would be good for the figures to better reflect the similarities and differences between the different approaches. For example, approaches 1 and 5 are virtually identical. This should be evident in the figures as follows: if you had two powerpoint slides and showed one right after the other, the only thing the viewer should see that is different is: 1. the addition of the annotationProperty class with links from the dc:subject arrows 2. the arrow for dc:subject turned green Everything else should be exactly aligned. It would also be good to have a summery table with all the figures 'side by side' for easy comparison. Make sure to put the ones that are most similar to each other next to each other. Clearly say what are the key factors that distinguish the different approaches, from a REPRESENTATIONAL view (ie. what the user has to do with OWL code). * the heart of the matter is: what is the exact thing that is the value of the dc:subject property and how is it related to the corresponding class? o 1: the actual class, e.g. Lion the relationship of this value to the class Lion is identity (it IS the class) o 2: an instance (called LionSubject) of the class: Lion denoting the subject of Lions. The relationship of this value to the class, Lion is: rdf:Type (or instance) o 3: an instance (called LionSubject) of the class: Subject denoting the subject of Lions. LionSubject is related to the class Lion via an rdf:seeAlso link. o 4: an [implicit] unidentified instance of the class Lion. The relationship of this [nonexistent implicit] value to the class Lion is rdf:type o 5: the actual class, e.g. Lion the relationship of this value to the class Lion is identity (it IS the class) NB: this is identical to approach 1. The difference is that the property is an annotation property. * is there an explicit parallel hierarchy (akin to the question of whether there is an explicit property that represents the relationship between say the subject lion from the subject AfricanLion)? * is a literal class used as a value, or some substitute for a class? * others? This relates to but is different from a user functionality pros/cons view which is the main emphasis of the note, now. SPECIFIC section by section comments STATUS: 'compaible' is misspelled. ABSTRACT: give a specific example so reader can immediately relate to the issue. 'using classes as property values' is hard to relate to, it is very abstract. I suggest something like this to replace the first part of the abstract: It is frequently convenient to use a class (such as 'Lions') as a value for a property (e.g. topic_of_book) when building an ontology. While this presents no difficulties in OWL-Full and RDF-Schema, most properties cannot have classes as their values in OWL-Lite and OWL-DL. This document ... [same as now] --- Be clear about whether this note is going to give advice about using OWL-Lite, or only OWL-DL. USE CASE EXAMPLE: In: "is a particular species or class of animal (or animals)" it is unnecessary to say "(or animals)". --- capitalize 'African'. --- stray quote in: "behavior of the" African lions" --- Remove the word "However" in "However, we also want to retrieve" --- Reword: "Furthermore, we have a class hierarchy representing different animal species and would like to reuse it as these subjects." to be "Furthermore, we wish to use as our subjects various species from an existing class hierarchy of different animal species." --- "We discuss a number of approaches for representing this pattern in OWL DL and their implications." What about OWL-Lite? --- It is not immediately clear what is being said in the following paragraph. I'm having trouble parsing the key sentence (with "own" in it). === "One goal of the web publisher is to enable maximum reuse of published information. It will be common on the Semantic Web to import and reuse other published ontologies. In doing so, it is important for web developers to preserve the original semantics of imported resources. Therefore, an important consideration in choosing a representation pattern in this case is the following: If the pattern requires a different interpretation of classes to be used as values, does the designer "own" the definitions of these classes (in this case, the hierarchy of animals) to change them according to the new interpretation? Are others already using this hierarchy of animals in their applications and will this change affect those applications?" === I'm going to have a go rewording it, reflecting my best guess. === In the representation patterns that we present below, it is important to be aware of what if any new interpretation of existing classes is required for them to be used as values of properties. Ideally, any reuse of existing ontologies on the Semantic Web (in our case, a hierarchy of animal species) should preserve the original intended semantics in the new context. If the semantics changes, this could adversely impact other applications already using the ontology - or make interoperation with existing applications error-prone (since there will be two different interpretations of the same ontology). More generally, caution must be taken whenever non-standard interpretations of commonly used terms or representational patterns are being adopted. === OTHER USE CASE SCENARIOS This is very important, I suggest making this more prominent somehow, as it is, it could easily be missed, especially since approach 2 refers to a hierarchy of subjects anyway (see also comments under Approach 2 below). The considerations are too focused on the particular example. The note is supposed to be about classes as values in general, not focused on subjects or lions. termss -> terms NOTATIONS: What is an 'annotation property'? You mention it as if reader already knows. You don't say anywhere that the code examples will be in N3, say it and point to a URL for an N3 tutorial. APPROACH 1: As you introduce this approach, it seems to me it can't work. You already said you can NOT use classes directly as values. So I'm confused. I expect to be enlightened, but I would suggest that the opening wording be changed to avoid this confusion. Ah, I see, you said the note was about how to do classes as values in OWL-DL, so I expected all examples to be in OWL-DL. It was quite a surprise to learn you actually are using OWL-Full or RDF-Schema. Mention that up front. OR: don't say in the abstract and introduction that this note is about how to do this in OWL-DL, but rather say it is about how to do it in all variants of OWL (because it is!) It is slightly jarring that you name the class of all books about animals as singular 'BookAboutAnimals' - although I see this is the consistent naming convention. Might be worth mentioning this in a footnote? [very minor point] The definition is a good example of why it would be useful to have a distinction between properSubclassOf and subclassOf. Then you don't need the union clause. I'm not sure about my OWL. If there is such a distinction, then this definition could be shortened by only using the subclassOf which includes the class itself (Animals). I'm a strong advocate of fairly literal, but also readable English translations of all/most N3 expressions in these documents. For example, the bookaboutanimals would read: The class BookAboutAnimals is an OWL class. It is a subclass of Book. It is also a subclass of the class of all things whose subject property has a value that is either an animal, or is a member of the class of all things that are subclasses of Animal. If anyone questions the need for this, let me just say that it took me 3-4 minutes to puzzle out this English text from the N3. As a way to teach people about OWL, having English definitions of all code alongside, is going to be very helpful indeed. It helps me a lot, and I already know a lot about OWL (though I am not so great at reading any of the raw syntaxes). Summary of approach 1, a bit hard to read. I suggest the following variant: === This approach will be suitable to the extent that the following applies to you: 1. simplicity is important 2. being in OWL DL or OWL-Lite is not important 3. there is no requirement to limit the range of the dc:subject <http://purl.org/dc/elements/1.1/subject> values or you are amenable to using classes as subjects to implement this restriction. === You don't say whether this is OWL-DL or OWL-Lite. APPROACH 2: Above you said that "This note should not be interpreted as a general discussion of how to represent subject hierarchies or terminologies on the semantic web." I strongly agree with this. However, this is undermined by the name of this approach, which refers to a hierarchy of subjects. QUESTION: is there any use case that has nothing to do with subjects? If so then you can perhaps name this approach using the phrase 'creating a parallel hierarchy' and not mention subjects explicitly. For the example, it is indeed creating a hierarchy of subjects, what do you do in a more general case, e.g. music CDs. On the other hand, you say that using classes as values arises in general when you want to use a hierarchy of classes to ANNOTATE other classes or individuals. This may be true. Can anyone think of any use of classes of values that does NOT involve using the classes for annotation? If not, then maybe we are tied to subjects all the time after all. For, one can frequently/usually/always interpret any annotation as a topic or subject. Maybe you could avoid 'subject' and say that this approach is about creating a parallel hierarchy of annotation individuals? Maybe the ONLY use of classes as values is when the classes are used as annotations? If so, then perhaps this note is really about 'classes as annotations', from a user perspective, and from an OWL perspective, it is about classes as property values. This goes to the heart of the question of under what circumstances one might use classes as values. How much do we really know about this? It can be difficult to abstract from examples to a general case. The figure for this approach should actually show a 'parallel hierarchy'. It does not. I'm not sure there IS a parallel hierarchy. If so, what is the relation that defines the hierarchy? i.e. what is the relationship between LionSubject and AnimalSubject? Is that relation explicit? The common sense meaning of the relationship is indeed, 'is a kind of'. For, anything that is about Lions is also about Animals. BUT, here the Subjects are not classes. OOPS: I just realized that you never said there was a parallel hierarchy. You said there was a parallel set. So maybe the above comments do not apply? Minor point: you say "The resulting ontology is compatible with RDF Schema and OWL Lite (and hence OWL DL)" I had to think for a moment about the hence clause, it seemed backwards at first, but then I saw it was correct. For this audience, it might be better to keep it simple. It is also compatible with OWL-Full, which you do not mention. I suggest rewording to: "The resulting ontology is compatible with RDF Schema and all variants of OWL (Full, DL, and Lite)." It is not germane to this discussion that: IF it is compatible with OWL-Lite, THEN it is compatible with OWL-DL. If you wish to give the reader this information, make it a separate comment or footnote. It would really belong in a discussion with the definition of 'compatible' which is a blue term targeted for a glossary definition. Summary of approach 2. I suggest re-structuring this as follows: This approach will be suitable to the extent that the following applies to you: * staying in OWL DL is important. * there is no need to infer that books about lions are also about animals (perhaps reworded/augmented to be more general than the lions example) * you are not concerned about interoperability with others who may be using the class hierarchy in a way that is aligned with the original meaning, rather than the different interpretation of this approach. * you don't need to, or are prepared to meet the cost of maintaining consistency between the set of classes representing subjects and the set of corresponding individuals. APPROACH 3 Overall, this approach is very clearly described. This approach assumes that there is a subject hierarchy. Is that true? Is there a more general view that is not subject-specific? Can it be generalized? Or do we bite the bullet and say this note IS about representing subject hierarchies. Remove last word in: "We can create a single class Subject and make all the subjects to be individuals that are instances of this class Subject" Might: "using individuals as surrogates for classes" be a better name for this approach? The current one may be too specific? I would leave the SKOS comment as an aside at the end of this approach, it is a worthwhile point, but not germane to understanding the basic idea, in fact it gets in the way. This approach also entails creating a parallel hierarchy. I think it would be good to show the parallel hierarchy in the figure, in a way that really looks like a parallel hierarchy (e.g. the layout should be more or less identical, so it is immediately obvious). This will involve a major re-structuring of the diagram. Considerations: ditto the parenthetical remark: (hence ...) If the DL reasoner cannot infer that "a book that has LionSubject as the value for dc:subject is also about Animals" then what is the point of mentioning that "Most DL reasoners will be able to infer transitive relations between subjects". Is this useful by itself? If so, can it be factored into the requirements/criteria for evaluating the different approaches? What is the import of this: "The resulting hierarchy of subjects is not related to or dependent on the class hierarchy representing the same topics (in this case, animals), except through an annotation property rdfs:seeAlso." Is it good? helpful? Why? Relate it to one of the evaluation criteria. What is the import of this consideration: "This approach explicitly separates the subject terminology from the corresponding ontology. Many consider this separation a good modeling practice: the semantics of a subject Lion can be different from the semantics of the class of lions. Having subjects in a separate hierarchy, would allow us to define for example that the subject Africa is a parent subject of the subject AfricanLion." Relate to one or more evaluation criteria, does it relate to supporting a desirable inference? does it impact on maintenance? "correpsonding" and "separartion" misspelled. The following two considerations seem similar, merge into one? If these are all re-structured anyway, this may be moot. * Some may consider a approach having two parallel hierarchies of essentially the same data to be too complicated and difficult to maintain for the simple task at hand * The separartion of the subject terminology from the correpsonding ontology incurs a serious maintenance penalty: We need to maintain a set of instances for all subjects in addition to the hierarchy of subjects. In many applications, we may also need to ensure that the two sets-classes representing subjects and corresponding individuals and values for the parentSubject property-are consistent with each other. However, developers can instrument tools that would maintain this consistency automatically. Here is a suggested expanding and re-structuring the summary (it was too brief): This approach will be suitable to the extent that the following applies to you: * staying within OWL DL is important. * there is a requirement to limit the range of the dc:subject <http://purl.org/dc/elements/1.1/subject> values in a natural convenient way. * there is a requirement to infer transitive relationships between subjects. * there is no need to infer that books about lions are also about animals (perhaps reworded/augmented to be more general than the lions example) * there is no need to have a formal linkage between the class hierarchy and the parallel subject hierarchy (perhaps reworded/augmented to be more general than the lions/subjects example) * there is a requirement to minimize semantic overloading to support interoperability with persons using the class hierarchy with the original intended interpretation. * there is an adequate way to manage the penalty of having two parallel "hierarchies." APPROACH 4 I found that this example was hard to grasp. Focusing on "unspecified members of a class" seems very obscure, and must be missing the main point, which is ??? - I'm not sure. The main thing seems to be that the actual value that the property dc:subject has is an [implicit] unidentified instance of the class Lion and that the relationship of this [nonexistent implicit] value to the class Lion is rdf:type. This is IMHO, rather obscure and many are likely to have little idea what you are talking about. The main problem is that the instance DOES NOT EXIST, so it needs to be explained differently. You might at the end mention that this representation approach corresponds to there being an implicit instance, but otherwise it is likely to be far to confusing. Specifically, there is nothing corresponding to the following (from approach 2) :AfricanLionBook a :BookAboutAnimals ; dc:subject :AfricanLionSubject . If there was, it would be: :AfricanLionBook a :BookAboutAnimals ; dc:subject :UnidentifiedAfricanLion . If you stuck with the current name for the approach, it would be better to change it to be something like: "using IMPLICIT members of a class as values for the property" since that is more accurate. But I suggest throwing it out entirely. Here is a possible alternative way to describe this approach which focuses on what the USER HAS TO DO: create [possibly] anonymous classes such as BookAboutLions, and say that the books are instances of such classes. === This approach is designed to make it easy to leverage a DL reasoner to infer, for example that a book whose subject is Lion also as subject Animal. In this approach, we create a parallel hierarchy of types of books consisting of classes such as: BookAboutAnimals, BookAboutLions, BookAboutAfricanLions. We then say that various instances of Books are explicit members of one or more of these subject classes. The key to making this work is how we define the classes in the parallel hierarchy. For example, we can define the class BookAboutLions as follows: [continue as per current draft]. ... ... [before the Alternatively clause, put this text in:] By saying that LionsLifeInThePrideBook is an instance of BookAboutLions we are saying that it is a member of a class, all of whose members have as their subject, at least one instance of the class Lion. [this text might need fixed so it is strictly and literally true, I might have a misreading of someValuesFrom, all the more reason that these examples need English every time.] In OWL, it is not necessary to create any explicit instances of these classes. In the figure, we list them as if the were explicit, and use dotted lines to denote that they may not actually exist. Strictly speaking, this approach does not answer the question: what is the exact thing that is the value of the dc:subject property, because there are no explicit instances of the Lions. The best we can do is answer the question this way: the exact thing that is the value of the dc:subject property is an implicit, unidentified instance of the corresponding class (e.g. Lion). Put another way, we [next, I just pasted in the original opening text of this approach] can approximate the interpretation that we used in the previous approaches by using unspecified members of a class rather than the class itself as property values. We define the class BookAboutAnimals as a class of books where the subject is some animal. Correspondingly, a BookAboutLions class will be a class of books where a subject is some (unidentified) lion or lions. Variaint: Now put in the alternatively clause, which just creates an instance with a an anoymous class. This way avoids the need to created an explicit parallel hierarchy. === This may need further fiddling, but I think it will be much easier to understand this way. Suggested re-doing of summary for approach 4: This approach will be suitable to the extent that the following applies to you: * staying in OWL DL is important. * there is a need to infer that books about lions are also about animals (perhaps reworded/augmented to be more general than the lions example) * you are not concerned about interoperability with others who may be using a more standard interpretation of subject (e.g. about lions, rather than about a particular unidentified lion). * you are prepared to expend the effort in creating and maintaining a [possibly implicit] parallel hierarchy of classes of the sort: BookAboutClass. This approach is agnostic to the following issues: * how to limit the range of the dc:subject <http://purl.org/dc/elements/1.1/subject> values * others? There are really two variants here, perhaps make that more explicit, figures for each? N3 and RDF/XML representations for each? The figure does not have the class(es): BookAbout(African)Lions. I think it should, to show the parallel hierarchy. Also, if the class(es) BookAbout(African)Lions are both defined as restrictions as per your example, then a DL classifier can infer that BookAboutLions is a subclass of BookAboutAnimals [I think], which is why it can also classify LionsLifeInThePrideBook as an instance of the class BookAboutLions. You say this in a consideration, but not in the main text. APPROACH 5 The relationship to approach one should be very explicit (see comment above). The figure should not split up the two snippets of N3. Also, you introduced a new diagramming convention: and arc from an arrow to a class. Align with the diagrammatic conventions Alan just created for the specified values note. Add the consideration: there are no non-standard semantic interpretations of this approach. Or no semantic interpretations that differ from the original intent of an existing ontology that is being re-used. Well, maybe it is non-standard to view dc:subject as an annotation property... Suggested redoing of summary for approach 4: This approach will be suitable to the extent that the following applies to you: * staying in OWL DL is important. * there is NO need to infer that books about lions are also about animals (perhaps reworded/augmented to be more general than the lions example) * you are not concerned about interoperability with other systems/applications that may use dc:subject as an objectProperty that is used for reasoning. * you can't afford to expend the effort in creating and maintaining a [possibly implicit] parallel hierarchy of classes to represent subjects. This approach is agnostic to the following issues: * how to limit the range of the dc:subject <http://purl.org/dc/elements/1.1/subject> values * others? There is no non-standard semantic interpretations used on the one hand, but on the other hand, using dc:subject as an annotation property will make it hard to interoperate with applications that use dc:subject as an object property? SUMMARY and CONCLUSIONS Add a new section which has things like summary tables, all the figures side by side, and punchy re-statements of the main reasons for choosing which approach. You might include here the bullet points from above under the heading: "What are the key factors that distinguish the different approaches, from a REPRESENTATIONAL view." This will help the reader get all the approaches in their heads and how they inter-relate, all at the same time.
Received on Wednesday, 2 March 2005 23:26:08 UTC