TopQuadrant Response to the OWL2 Last Call of 02 December 2008

Summary

TopQuadrant believes that the current working drafts for the OWL2 specifications, would, if advanced to Recommendation, be detrimental to our business, and to our customers' use of the Semantic Web Recommendations. We ask the working group to consider whether this is an atypicality of the market segment which we serve, or whether the scientific HCLS use cases motivating much of the design of OWL2 are atypical. If the latter, we suggest either scaling back or rebranding the OWL2 Recommendation.

Introduction

TopQuadrant is a small company offering both products and services. All of our revenues depend on the successful deployment of Semantic Web technologies. We are profitable. To date that success has been possible through co-existence strategies that use both RDF and OWL together, including OWL full where expressiveness and ease of understanding out-weigh decidability.

We have a range of concerns about the OWL2 specifications. Recognizing the considerable technical achievements in their design, we have no major technical criticism. Thus, our concerns are, in the main, not suited to being Last Call comments on the Rec track documents, but are better expressed as comments concerning the New Features and Rationale document. We make one formal procedural comment against the last call, merely to connect these comments to the last call, and the W3C process.

The main thrust of our concerns is that we find the motivations for changes to the OWL Recommendation to be weak or non-existent, and to be limited in their scope to what we believe to be a narrow section of the Semantic Web marketplace.

Specifically many of the new features are motivated by highly scientific applications within HCLS. Our customers, including our HCLS customers, are typically interested in applications such as data integration from multiple heterogeneous sources; for these the new features of OWL2 are not needed and represent a considerable cost and performance impact.

Thus, while those consortium members using ontologies to express precise assertions about domain theories, may benefit from the new specifications, we do not believe we will; and we suspect that our lack of benefit will be the more common experience amongst consortium members, and the wider public. Given this is the case, then the consortium would be ill-advised to advance them to Recommendation without significant change.

TopQuadrant is committed to the consensus process of the consortium, and will not press these concerns without support from other members. However, anecdotal evidence, e.g. OWL 2 Far and Schism in the Semantic Web community, indicate that others in the industry share our experience and opinion.

Last Call Comment

TopQuadrant's principal last call comment on all of the technical OWL2 specification documents is the following request:

Since most of our comments concern the overall design, rather than the technical details, we have made them against the New Features and Rationale document, which is not technically part of the last call. Hence, we ask the WG to formally address our comments on the New Features and Rationale working draft as part of the last call process for the technical specifications.

Comments on New Features and Rationale working draft

These are comments on: New Features and Rationale of 02 December 2008.

Main comment: OWL Too?

This single comment will be made as a separate e-mail, for ease of tracking.

The rationale document (and the design) has not taken into account the cost of new features particularly to those who do not need them. These costs include: implementation costs, training costs, documentation costs, and simply the cost of ignoring something. (It needs to be understood before it can be ignored). This is seen anecdotally in the ease with which people slip into thinking: "OWL Too Far", "OWL Too Full", "OWL Too Much".

An example use case illustrating such costs is as follows:

Izzie, Joseph, Kevin, Lucy and Makato are building a set of ontologies and semantically enhanced applications in the area of dietary planning. They are using a collection of tools each of which supports some of the semantic web recommendations. They have some informations sources that they have prepared in-house, they are also integrating several Web-based information sources, most (Diet.example.org, Energy.example.org, Food.example.org) of which are available in RDF; some (Diet.example.org, Energy.example.org) of these use OWL1 features, and are available in RDF/XML, and one (Comestibles.example.org) of which is only available in the Manchester OWL Syntax, and one of which (Beverages.example.org) is only available in application/owl+xml. Both of these last two use OWL2 features, but have some useful information in their OWL1 subset.
Izzie is the team lead and has a good understanding of the full range of the Semantic Web recommendations. Kevin is a modelling expert, with a background in knowledge representation, Lucy understands inference well, Makato is a SPARQL wizard. Joseph is fresh out of school. (The economy has picked up, and Izzie was slighty disappointed with the limited choice of candidates for the Junior Ontologist Trainee role).
Izzie makes the design decision that OWL2 features are not needed for this project, and that the advantages of using several OWL or RDF applications that do not support OWL2 features is the critical factor. She sends Joseph on a Semantic Web training course, while the rest of the team get started on the project.
Most of their tools cannot read the manchester syntax or OWL/XML, so they use a Jena extension to convert Beverages.example.org and Comestible.example.org to RDF/XML; and they keep the copies locally.
They experience a variety of problems related to OWL2, including:

Various OWL2 constructs (from Diet.example.org, Energy.example.org) are meaningless in OWL1, and the OWL1 implications are lost.

Joseph's training was 1-day RDF, 1-day OWL1, 1-day SPARQL, half-day RIF, half-day OWL2. He got somewhat confused! He sees the OWL2 constructs being used in some of the data sources, and when mapping some proprietary data into OWL he uses the same constructs. Despite the fact that he used the OWL2 constructs correctly, the overall system does not take his modelling into account, since several of the components are not OWL2 aware. Izzie has to take him to one side, and spell out the decision to use OWL1 only and what that means.

One of their tools do understand the manchester syntax and OWL/XML, this ends up going to the web and retrieving the latest version of Beverages.example.org and Comestible.example.org. A couple of months into the project, these ontologies are updated on the web, and the local copies (in RDF/XML) are not. A few days later, unexpected system behaviour is observed, which is eventually tracked down to the version mismatch.

A week or two after this, a team meeting gets completely derailed when Kevin pipes up in advocacy of supporting various OWL2 features because he wants the extra expressive power; Lucy gets quite irate at having to explain, all over again, her estimates of the runtime cost in the inference engines for the additional expressive power, and her inability to meet her performance targets using the additional features; while Makato tries to drill down with Kevin on precisely what is wrong with the SPARQL queries that the team have been using to fill in gaps where the expressive power of OWL1 is less than is required for their application. After a fairly unproductive 90 minutes, Izzie calls time, and reminds the team of the earlier design decision to use OWL1. Joseph has a headache.

While these problems are largely to be expected in an ontology development project, and can be addressed by a range of techniques such as improved project management, cache control, version management, and aspirins, we believe that OWL2 introduces many additional places where such problems might arise, and we do not see adequate consideration of these costs in the design.

We believe that the rationale document shows a very significant dependency on a narrow section of the Health Care and Life Sciences applications. The needs expressed are not ones which we find are pressing for our HCLS customers.

We ask that many under-motivated new features should be dropped, including all unmotivated new features.

An alternative, possibly better approach to addressing this comment, might be to rebrand most, if not all, of the new features of OWL2, as "Web-SROIQ", and put them in a separate namespace, not branded as OWL, so that the (vast) majority of Semantic Web users for whom these features are neither useful nor helpful, but merely confusing, can rest more easily in ignoring them. Notice the choice of name for the rebranding does not include the string "OWL".

Other comments

Danger of bias: We believe that at all stages in the development of the OWL2 specification that the interests of the DL community, such as tableau reasoner vendors, and that part of the academic community concerned with tableau reasoner design, has been over-represented, and the interests of the wider semantic web community (particularly RDF users) has been under-represented. This is not intended as a procedural criticism. We trust that during periods of public review, and during the call for implementations, the WG will be mindful of the need to see active support from the wider semantic web community, and not be satisfied with passive acceptance.
RDF interoperability: Since almost all TopQuadrant's business uses both RDF and OWL together the implicit requirement in OWL 1.0 that OWL and RDF should work well together, remains a critical requirement for OWL2. We do not see this listed as a requirement, and believe that several of the new features added are in practice in conflict with this requirement.
effective?: In the abstract we are highly unconvinced by the scoping to effective reasoning algorithms. To the expert reader this appears to be a highly technical sense of the word effective, which is likely to confuse the more general reader. Our customers are interested in software that returns results in reasonable time, typically within a few seconds, but maybe with some offline computation of a few hours, effective algorithms can take an unbounded length of time, and ineffective algorithms can be quick in most cases of interest. Thus, the scoping to effective algorithms is simply a religious allegiance and has no business sense.
OWLED: In the overview, a key part of the rationale is expressed as: as part of the OWLED Workshop Series. The OWLED process while open to all, unfortunately did not, in our opinion, achieve what we believe to be the organizers' goals of representing a broad spectrum of the OWL community. It was only the first two meetings (Nov 2005, Nov 2006) that played into the member submission, and had significant impact on the design. The attendance and presenters at these meetings seem to, from our perspective, under-represent the many OWL users, who use mainly RDF with a little bit of OWL etc. The ordering "real applications, user and tool developer experience" reflects a desire that new features are gated by business motivations. A frequent pattern with hi-tech innovation is that developer excitement at "oh wow! we can do this" (even if prompted by discussion with end users) can lead to inappropriate investments that do not make wider business sense. Our questioning of the representativeness of the use cases is intended to probe this point.
manchester syntax: In section 2 Features & Rationale, the manchester syntax is not mentioned or justified as a new feature. Since this introduces additional costs, and is apparently unmotivated, we suggest it is dropped.
OWL/XML: In section 2 Features & Rationale, the OWL/XML syntax is not mentioned or justified as a new feature. Since this introduces additional costs, and is apparently unmotivated, we suggest it is dropped.
Links to Wiki should be links to TRs: The syntax and semantic links for features discussed in section 2, should link to the TRs and not to the Wiki.
Syntax examples should include RDF: The lack of RDF triples in the syntax examples hindered our review effort.
DisjointUnion subPropertyof UnionOf: The new features introduced as syntactic sugar introduce dangers of interoperability failure between OWL1 and OWL2. Some simple steps should be taken to reduce such risks, such as adding the assertion that owl:disjointUnionOf rdfs:subPropertyOf owl:unionOf.
DisjointUnion and DisjointClasses: Being syntactic sugar, these new primitives are strictly speaking unnecessary. There form in RDF triples is very different from the equivalent disjointWith statements and are significantly harder to process for OWL implementations that work natively over RDF, rather than by first translating into OWL axioms. It seems unlikely that many RDF based OWL implementations (OWL Full implementations) will correctly implement these constructs. Hence these constructs are likely to lead to interoperability failure between OWL Full and OWL DL systems. The costs of such failure are much higher than the costs of requiring users who need such constructs to use the somewhat funky styles required by OWL1. These features should be dropped.
Negative*PropertyAssertion: These features are highly problematic for RDF interoperability. While, technically, from an OWL DL implementation perspective, these are merely macros for membership of complements of hasValue restrictions, the promotion of this from an esoteric construct, to a first class axiom, changes their implicit status. RDF systems are simply not geared up to support negative triples as well as positive ones. The negative assertion of the reified triple, while technically faultless, is a practical disaster in terms of setting user expectations that RDF based OWL tool vendors are unlikely to meet. For the SemanticWeb community as a whole, interoperability between RDF and OWL is much more valuable than ease of use of an advanced OWL construct. The WG has made a bad design choice by including these.
SelfRestriction and the Schneider variant of the Patel-Schneider paradox: As the WG is well aware, the self-restriction increases expressive power in ways that introduced further paradoxes with OWL Full semantics. While these are addressed in the WG's technical work, the use cases motivating the new construct fall well short of what we would expect as needed for motivating risky theoretical changes.
QCRs: We believe these to be a useful addition to OWL. Several TopQuadrant customers would use this feature.
reflexive, irreflexive, asymmetric and disjoint properties: Our general comment concerning failure to consider the cost of change applies to these features.
Property chain inclusion axioms: These appear to be quite widely useful. We have some concerns with the use of blank nodes in the subPropertyOf triple corresponding to a RIA. These are likely to cause problems for RDF implementations which expect all predicates to be URI nodes. We think that drilling down and fixing all instances where RDF software makes this assumption is costly and unlikely to happen and to introduce incompatibility between OWL2 and RDF. We believe introducing a new property in the RDF mapping for RIA and avoiding the use of subPropertyOf is probably a better trade off here. We are not wholly convinced that such behaviors are best addressed in the reasoner rather than using some other approach, such as rules.
EasyKeys: explicitly no opinion
unary datatype: Many TopQuadrant customers require these features. These make sense as part of the main SemanticWeb Recommendations
N-ary datatype: This feature appears to have been dropped, and to be in this document by accident, if not we would like to comment.
Punning: We have mixed opinions ... and make no formal comment, but we are uneasy with this change.
Annotations: In general, improvements in the expressiveness of the OWL1 annotation system are to be welcomed. However, the detail of the solution in OWL2, is worrying. The use of reification for mapping some of the axioms is suspect. The change form RDF reification to OWL2 reification is unmotivated. The underlying problems with reification are not addressed by renaming. A practical worry for a Semantic Web editor (supporting both RDF and OWL) as opposed to an OWL editor or an RDF editor, is that maintaining the link between the reified triple and the triple itself is another indirection point that must be considered in many places. Thus, with the implementation of extended annotations embedded within the current OWL2 design will likely lead to further bifurcation of OWL from RDF, with people either using OWL tools (that support such features) or RDF tools (that do not, and are likely to interoperate poorly with the OWL tools, e.g. by renaming a triple without considering the impact on the reified triple). In some of our consultancy engagements we have been applying annotations to any convenient blank node, e.g. the blank node of a restriction: we believe this is a better compromise between the needs to annotate ontologies and the need for RDF interop.
Profiles: We are generally supportive, but would not oppose a scaling back of this effort in light of comments from other consortium members.
Appendix: Many of the use cases come from the HCLS domain.
TopQuadrant has several HCLS customers. We do not, in general, find their needs qualitatively different from our other customer needs, and find the choice of HCLS applications referenced from this document to not reflect our experience of the business area.
Thus, we guess that the HCLS applications discussed might be characterized as scientific HCLS apps, as opposed to the more business focussed apps which typically motivate our customers.
We believe it would be a mistake to impose significant costs on the whole Semantic Web community because to support use cases from a relatively narrow subsection of one business segment.
Tables - dependency on HCLS: Without the HCLS use cases, (UC#1, UC#2, UC#3, UC#5, UC#, UC#8, UC#9), the first table seems much closer to our experience of the need for new general purpose constructs within OWL. Essentially there isn't much: QCRs and datatype improvements seem to be it. Thus, we find the tables to reveal a lack of motivation for change, and so OWL2 is a change we can't believe in.
UC#11, UC#10: Presumably are legacy and need deleting.

Comments on Manchester Syntax

Document reviewed: Manchester Syntax

Typo: TopBraid Composer: Please capitalize correctly on all uses (TBC all in caps).
Trademark: please acknowledge our trade mark in TopBraid Composer
Error in Appendix: TopBraid Composer does not, and does not intend, to support Manchester Syntax as an I/O format, and so should not be referenced in the IETF application. The earlier statement correctly expresses the intent: TopBraid Composer uses Manchester Syntax for displaying and entering descriptions associated with classes.
Informative?: Given that the Manchester Syntax should not be normative (by W3C process) it seems mistaken to put it through an IETF process that would make it so. Please drop the mimetype registration.

Comments on OWL/XML

Document reviewed: XML Serialization

GRDDL: Our understanding of the WG charter is that a GRDDL transform, in XSLT1, will be provided. We will raise this issue again at PR review if necessary. Our preferred fix to the lack of a GRDDL transform, is to drop the OWL/XML serialization.
Informative?: Our view, expressed elsewhere, is that this document is unmotivated and divisive, and has unconsidered costs; and it should simply be dropped. We note that Frank van Harmelen suggests downgrading it to informative. While this would be an improvement, it would then be mistaken to put it through an IETF process that would make it normative by IETF. Please drop the mimetype registration.

Comments on datatypes in Structural Specification

These comments are on Structural Specification and Functional-Style Syntax; and concern the datatype mapping section, and the features at risk.

Lack of rationale and motivation: The changes in the approach to XSD datatypes (such as introduction of owl:real, and the redefinition of the value spaces of the XSD built in datatypes) is not motivated in the Rationale document. Since this change risks interoperability failure this seems like a significant oversight.
owl:real: A possible motivation for owl:real is to allow a property which can be used with any numeric datatype. TopQuadrant customers have such use cases, when merging data from various sources: however, the use of a simple XSD union datatype is an alternative solution, which we prefer.
owl:real: A possible motivation for use of owl:real is to permit integration of numeric reasoning services in with ontological reasoning services. While this may be useful for some Semantic Web application, we do not find this to be useful for our business. We do find it critical that numbers in semantic web applications interoperate with numbers in databases, and with numbers in programming languages. We hence suspect that this proposed change to the semantics of datatypes in OWL is a further example of a clean theoretical solution that does not make practical business sense. We suggest that the value spaces of the XSD datatypes should remain unchanged from OWL1.
owl:rational: There is a typo (owl:datetime) in the description of this datatype. Otherwise, we explicitly have no comment.
rdf:XMLLiteral: While we have no comment at this time, please let us know if the WG is minded to drop support for this datatype; we may wish to speak up in its defence.