Re: LANG: owl:import - Two Proposals from Jim Hendler on 2002-09-30 (www-webont-wg@w3.org from September 2002)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Mon, 30 Sep 2002 12:26:54 -0400
To: Jeff Heflin <heflin@cse.lehigh.edu>
Cc: WebOnt <www-webont-wg@w3.org>
Message-Id: <p05111734b9be26261235@[10.0.1.2]>
At 10:46 AM -0400 9/30/02, Jeff Heflin wrote:
>Jim,
>
>Thanks for the arguments in favor of and against proposal 2. I think it
>is important that all the pros and cons be identified and we have a
>debate on this so that the WG can truly make the best decision, whether
>that be in favor of proposal 1, 2 or something as yet undetermined.
>
>That said, I'd like to discuss your points:
>
>Proposal #1 requires a new MIME type
>-------------------------------------
>I find this an interesting point. Does the W3C have any documentation
>that say when a new MIME type is required or recommended? On one hand, I
>don't see why we need a new one because we are just using our own XML
>schema to describe the Ontology, imports, etc. tags. Thus, it would seem
>we could just use the XML MIME type. Certainly, the W3C doesn't require
>a new MIME type for each schema? However, on the other hand, our
>language does have special semantics that most XML schemas don't have,
>and perhaps the MIME type is used to indicate to applications that they
>should process it in a different way. This makes sense, but then it
>seems to me, OWL should have its own MIME type regardless. After all, we
>have a different semantics from RDF (even if it is just additive). So,
>it seems to me either both proposals or neither require the new MIME
>type, and I'm leaning toward both of them needing one.

I'm not the expert on this stuff, I hope Dan Connolly or Massimo will 
correct me if I'm wrong -- but I think going to a separate mime type 
would require much more motivation than this.  If you insist OWL can 
only be used through an XML schema, then I will point out this 
disagrees with the f2f decisions taken by the WG.  If you say no, we 
want to be RDF parsable, then we need to go by RDF rules.   I think 
the location of our metadata is not such an important issue that it 
is worth reopening the decisions (that's my opinion)

>
>Proposal 1 would require RDF tools to read a document twice to get
>import information
>--------------------------------------------------------------------------
>Any tools that care about import information would use an XML parser to
>extract it, and then pass the RDF subtree of the document to an RDF
>parser.
>There's absolutely no reason to read the document twice. If the
>application is a plain-old RDF application that doesn't realize this,
>then it will never have heard of imports in the first place and won't
>care about imports information.

exactly - so maybe I'm now in favor of your solution because it means 
most people won't care about imports.

>I can see that there is a slight cost in tools because now all of your
>RDF tools need an extra 10 lines of code to write out proper OWL, but I
>think that cost is negligible, because the OWL tools that users will
>find easiest are those that have some built-in support for OWL. That is,
>ontologies will be a central aspect (as opposed to just another class),
>the "parseType:Collection" ugliness will be handled automatically for
>you, etc.
>In other words, in order for OWL to succeed, there will have to be OWL
>specific tools anyway.

well, many current DAML tools, including all of my group's open 
source stuff, are RDF tools to which some OWL has been added - goal 
is to get to at least OWL-Lite for all of them.  So far we have been 
able to do this easily - changing the initial parsing, rather than 
the interpretation of the graph as we process it, would be a bigger 
change than you imply.   I agree there will be some OWL specific 
tools, but if all OWL use requires OWL specific tools then I fear it 
will never catch on -- in fact, the great success of your SHOE system 
in gaining acceptance was your recognition thatit was important to 
keep it so HTML tools interoperated with it -- otherwise it would 
have just been some KR langauge on the web and gotten much of the 
same attention as several others that didn't get to the starting 
gate.  OWL will gain immediate penetration from playing nice with 
RSS, RDF-XMP (Adobe's metadata system) and other existing RDF tools, 
and I fear that the import thing, being only somewhat defined as it 
is, is not worth breaking this over.

>If people think it is important for RDF to process imports information,
>then I suggest we ask RDF Core to consider extending RDF to handle it
>correctly. This could be done by first allowing RDF to make statements
>about graphs (as opposed to about a resource that we pretend represents
>a document or a graph), and then adding an official imports property
>that has a new thing called Graph as it's domain and range. We would
>also need a way to give an identifier to a graph, which could probably
>be done by adding an ID attribute in the <rdf:RDF> tag.

I suspect we could discuss with RDF Core the idea of their being 
something in the <rdf:RDF> to help - perhaps an "RDF-Profile" or a 
pointer to an (optional) "RDF Header" graph - if it appears we need 
such mechanisms I'll be happy to bring that up in the SWCG - I'm not 
yet convinced we couldn't do this with RDF as is.

>
>Proposal #1 can't have instances and classes in the same document
>------------------------------------------------------------------
>Not necessarily. Although my proposal said "class and property
>definitions go here," there is no reason why instance statements
>couldn't go in the same place, particularly if the instances were
>important to the ontology. I don't follow your argument about having to
>import the instances if you import the ontology. Why wouldn't you want
>to? If someone decides the instances are part of the ontology, then when
>you import it, you should import the whole ontology. Note, that in
>proposal #2 the same thing is true, because there an imports means you
>import the whole document. Thus, if you had classes and instances in the
>document, you import both as well.

OK, so how do I have instances in a separate document?  Can it do an 
import?  If not, how do I know what semantics to impart?  If yes, are 
you saying they must be in an ontology definition?  THat would 
definitely break a lot of tools that output RDF as triples - not as 
XML documents.

>Proposal #2 will make it easier to convert DAML+OIL to OWL
>------------------------------------------------------------
>This might be true to some extent, because I believe that as it stands
>now, the conversion is simply a series of find and replaces so you could
>do it all with a simple Unix script. However, I do not believe that
>proposal #1 would require you to save a temporary file in order to do
>the conversion. In the worst case, you'd have to do two reads of the
>DAML+OIL document: one to collect up the ontology and imports
>information, and one to create the OWL document with it in the right
>place. However, since the DAML+OIL convention is to put the ontology
>stuff at the top, I think a one pass program would work in most if not
>all cases. Even so, since conversion tools only need to be used once per
>document, the two pass algorithm isn't that expensive.

well, we use a simple PERL script to import all sorts of things into 
OWL, and also have a python front end for RIC so we can read N3, and 
an RDF triples to Parka converter and a few other things that would 
have to be rewritten to either create documents or to do explicit 
imports -- but all those could be rewritten, or we could ignore 
imports...

>
>I look forward to your counter-arguments. I think this is a very useful
>and important discussion.

The real problem is I think you've missed the key argument - not 
having the imports statement in the graph means we cannot have 
non-document-based tools for handling OWL unless they ignore imports 
(which is actually okay with me).  If we say the owl:ontology 
statements do go in the graph, then we can put them there.  IF we say 
they don't, then we lose interoperability.  Your approach cannot have 
it both ways -- you think you can because you're starting from the 
assumption that everything lives in documents -- but that isn't true 
- once my crawler grabs stuff and pulls it into ParkaSW, for example, 
all we keep around are the triples (including an extra one with a 
pointer back to an original document if we started from one).  Mike 
Dean does  the same with his Daml crawler [1]
(5,000,000 DAML assertions found so far on 20k+ pages) -- the 
assertions go into his DAML DB, and thus you could not query for 
imports statements once things were in the graph -- unless he puts 
them there, in which case why don't we just do it in the first place?


[1] http://www.daml.org/crawler/
-- 
Professor James Hendler				  hendler@cs.umd.edu
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742	  240-731-3822 (Cell)
http://www.cs.umd.edu/users/hendler
Received on Monday, 30 September 2002 12:27:08 UTC