Re: Revisiting structured datatypes was: Re: Revisiting AllDisjoint (was Proposed (parital) response to Ken Laskey and questions for WG)

Re: Revisiting structured datatypes was: Re: RevisitinJim Hendler wrote:
[[
Jonathan - specific questions about the NCI ontology seem to me to be out of
scope for this WG unless they relate to LC comments -  For what it is worth,
the developers of this ontology need these CDATA things this specific way
(details can be provided elsewhere) and did not ask for structured datatypes
in their LC comments [1].  (Also the new version of the ontology has defined
all these as annotation types, so there's no semantic issue here per se)
 The WG agreed structured datatypes were desirable but that we would not be
able to add them ourselves (in OWL) instead of in XSD or RDF [2] -- are you
offering new evidence, preferably based on LC comments, that we should
change our minds and reopen the issue?
 -JH
]]

The 'new evidence' is that a serious attempt to convert a proprietary
ontology into OWL has resulted in the introduction of a horrible XML idiom
(XML within CDATA section). Either this is due to a poorly written piece of
software or this is strong evidence that the current mechanisms for datatype
properties which are intended to contain XML are so seriously lacking as to
be essentially unusable.

Since you point to this particular ontology as one of sufficient importance
that the WG ought consider the introduction of a new language element at
this stage in the process [1,2] I think that it is totally on topic to look
at this particular ontology as an example of OWL in the real world.

People should not be forced to use objectionable techniques to get real work
done with OWL.

Jonathan

[1] http://lists.w3.org/Archives/Public/www-webont-wg/2003Jul/0119.html
[2] http://lists.w3.org/Archives/Public/www-webont-wg/2003Jul/0123.html


[1]
http://lists.w3.org/Archives/Public/public-webont-comments/2003May/0068.html
[2]
http://www.w3.org/2001/sw/WebOnt/webont-issues.html#I4.3-Structured-Datatype
s




At 4:37 PM -0400 7/15/03, Jonathan Borden wrote:
Jim,

With all due respect, if we are going to modify OWL based upon the needs of
the NCI Ontology (which *is* reasonable) then let's look at an example class
definition:



[snip]


Pardon me, but this use of XML inside CDATA sections is a, to be blunt,
horrible hack whose necessity raises serious concerns about the lack of
structured datatypes in OWL. Since this seems essential to the ontology --
this idiom is repeated over and over -- I think that we need to readdress
the issue of structured datatypes -- such use of 'XML' and I mean the quotes
as the XML is inside CDATA sections !!! must raise a most serious
interoperability issue.

This is aside from other properties such as <code> <id> and <CUI> which I
suspect contains an identifier -- ought not these be resources i.e. URIs?

It looks to me as though these types of 'graphs' have all sorts of extra-RDF
arrows between the nodes.

Jonathan

From: Jim Hendler

To: Jonathan Borden ; webont
Sent: Monday, July 14, 2003 6:17 PM
Subject: Re: Revisiting AllDisjoint (was Proposed (parital) response to Ken
Laskey and questions for WG)


At 6:05 PM -0400 7/14/03, Jonathan Borden wrote:
Different kinds of cancer, for example, are hardly disjoint: one person
might have any number of different cancers -- really. If we are going to
revisit anything that might actually improve something like GALEN, how about
qualified cardinalities?

Jonathan



Jonathan - there are sets of diseases that do not/cannot co-occur, there are
sets of genetic loci that are disjoint and others that arent, and, in fact,
any reasoning system that is going to use any sort of medical ontology for
classifying diseases (or patients, or lifestyles, or anything else) is going
to have to represent these disjoints or the fact that its reasoning is
complete and sound won't be worth much.  Sometimes these disjoints happen at
a "high level" and our argument that there won't be many of them holds
(Human v. non-human categorizes a lot of stuff), but sometimes these happen
at a low level (all the different mammals are speciated in a disjoint way by
nature, except a couple of odd cases like mules) and one would need to
express them.   I'd bet you that there's a lot more disjoint statements that
are needed for the NCI ontology [1] than QCRs - will you take that bet??
 -JH
p.s. And I'll just count the disjoint classes themselves, so I won't take
unfair advantage of the N^2 to 1 that disjoints have over QCRs.

[1] http://www.mindswap.org/2003/CancerOntology/

----- Original Message -----

From: Jim Hendler
To: webont
Sent: Monday, July 14, 2003 5:32 PM
Subject: Revisiting AllDisjoint (was Proposed (parital) response to Ken
Laskey and questions for WG)


I would like to take a moment to see what people think about having to
reopen this issue (or possibly move forward over an objection):


In a conversation (non electronic) with Ken Laskey, who has again raised the
issue of having an owl:allDisjoint construct (mirroring the allDifferent
construct), I pointed him to Dan Connolly's [1] earlier response to this
issue.  Ken indicated that he was not likely to accept this answer, and in
conversation he brought up many use cases where this would be needed.
Basically, he disputes our contention that since this occurs in "class
space" it is likely to be just a small number -- as he points out,  we
already have a number of ontologies in OWL that are quite large (the NCI
ontology and the GALEN ontology, are two examples).  In these ontologies,
there are numerous cases where one would want to take a large set of classes
(for example the different kinds of cancers) and make it explicit that some
of these are disjoint (and thus others aren't necessarily) -- even though N
is comparatively small, say 100 (remember the total number of classes in NCI
is about 17000), this takes ~(N^2)/2 = 5,000 (!!) OWL statements.
  Further, Ken points out that even in some of the smaller ontologies we've
created, the number of classes and the number of disjoint classes can be
almost identical (for example, military ranks are mutually disjoint within
services, but not always between - someone cannot be an Army Lieutenant and
and Army Captain, but there are rare cases where someone is "dual hatted" as
an Army Colonel and a Navy Captain, for example).  Here's the odd thing --
the number of classes in the military ranks ontology would be about 50 (if
one includes officers and non-officiers), which would require on the order
of 1000 disjoint statements --significantly dwarfing the size of the
original ontology!
 Note that in none of these use cases are we talking disjoint unions per se
(although I suppose one could create a workaround if one had a disjointunion
construct).
 I think the N^2 problem in the size of the ontologies we're already seeing
might be evidence to reopen this issue.  Alternatively, if there is a decent
workaround, we might want to document that workaround and not add this
construct.  We can also try to move forward over Ken's objection, although
my preference would be to look for a way not to.

 -JH
p.s. Please note that this would NOT require any new semantics or any major
new syntax - semantically we already have the ability to assert the pairwise
disjuncts, and semantically this could be done using the same construct we
created for allDifferent.




[1]
http://lists.w3.org/Archives/Public/public-webont-comments/2003Jun/0038.html


--
Professor James Hendler                           hendler@cs.umd.edu
Director, Semantic Web and Agent Technologies         301-405-2696
Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742      *** 240-277-3388 (Cell)
http://www.cs.umd.edu/users/hendler      *** NOTE CHANGED CELL NUMBER ***




--
Professor James Hendler                           hendler@cs.umd.edu
Director, Semantic Web and Agent Technologies         301-405-2696
Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742      *** 240-277-3388 (Cell)
http://www.cs.umd.edu/users/hendler      *** NOTE CHANGED CELL NUMBER ***




--

Professor James Hendler                           hendler@cs.umd.edu
Director, Semantic Web and Agent Technologies         301-405-2696
Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742      *** 240-277-3388 (Cell)
http://www.cs.umd.edu/users/hendler      *** NOTE CHANGED CELL NUMBER ***

Received on Tuesday, 15 July 2003 18:37:34 UTC