Re: LANG: owl:import - Two Proposals from Jeff Heflin on 2002-10-01 (www-webont-wg@w3.org from October 2002)

From: Jeff Heflin <heflin@cse.lehigh.edu>
Date: Tue, 01 Oct 2002 11:29:45 -0400
To: Jim Hendler <hendler@cs.umd.edu>
CC: WebOnt <www-webont-wg@w3.org>
Message-ID: <3D99BF69.166E327F@cse.lehigh.edu>
Please see my responses inline...

Jim Hendler wrote:
> 
> At 10:46 AM -0400 9/30/02, Jeff Heflin wrote:
> >Jim,
> >
> >Thanks for the arguments in favor of and against proposal 2. I think it
> >is important that all the pros and cons be identified and we have a
> >debate on this so that the WG can truly make the best decision, whether
> >that be in favor of proposal 1, 2 or something as yet undetermined.
> >
> >That said, I'd like to discuss your points:
> >
> >Proposal #1 requires a new MIME type
> >-------------------------------------
> >I find this an interesting point. Does the W3C have any documentation
> >that say when a new MIME type is required or recommended? On one hand, I
> >don't see why we need a new one because we are just using our own XML
> >schema to describe the Ontology, imports, etc. tags. Thus, it would seem
> >we could just use the XML MIME type. Certainly, the W3C doesn't require
> >a new MIME type for each schema? However, on the other hand, our
> >language does have special semantics that most XML schemas don't have,
> >and perhaps the MIME type is used to indicate to applications that they
> >should process it in a different way. This makes sense, but then it
> >seems to me, OWL should have its own MIME type regardless. After all, we
> >have a different semantics from RDF (even if it is just additive). So,
> >it seems to me either both proposals or neither require the new MIME
> >type, and I'm leaning toward both of them needing one.
> 
> I'm not the expert on this stuff, I hope Dan Connolly or Massimo will
> correct me if I'm wrong -- but I think going to a separate mime type
> would require much more motivation than this.  If you insist OWL can
> only be used through an XML schema, then I will point out this
> disagrees with the f2f decisions taken by the WG.  If you say no, we
> want to be RDF parsable, then we need to go by RDF rules.   I think
> the location of our metadata is not such an important issue that it
> is worth reopening the decisions (that's my opinion)

I will await Dan or Massimo's response on the issue of MIME types.

As for proposal #1, it does play by the RDF rules. As we've discussed,
even the RDF syntax documents say it is perfectly okay for the RDF to be
embedded in another XML document. Thus, I do not see this as going
against the WG's prior decision regarding RDF. I also do not think this
is just an issue of "where the metadata goes," I think it is a critical
issue about what can and cannot be expressed in RDF.

> >
> >Proposal 1 would require RDF tools to read a document twice to get
> >import information
> >--------------------------------------------------------------------------
> >Any tools that care about import information would use an XML parser to
> >extract it, and then pass the RDF subtree of the document to an RDF
> >parser.
> >There's absolutely no reason to read the document twice. If the
> >application is a plain-old RDF application that doesn't realize this,
> >then it will never have heard of imports in the first place and won't
> >care about imports information.
> 
> exactly - so maybe I'm now in favor of your solution because it means
> most people won't care about imports.

That's a risk I'd be willing to take. I think what the OWL specs say and
the OWL tools do will be more important to users than what some old RDF
tools do. So can we just settle this and go with Proposal #1? ;-) I
guess not, huh.

> 
> >I can see that there is a slight cost in tools because now all of your
> >RDF tools need an extra 10 lines of code to write out proper OWL, but I
> >think that cost is negligible, because the OWL tools that users will
> >find easiest are those that have some built-in support for OWL. That is,
> >ontologies will be a central aspect (as opposed to just another class),
> >the "parseType:Collection" ugliness will be handled automatically for
> >you, etc.
> >In other words, in order for OWL to succeed, there will have to be OWL
> >specific tools anyway.
> 
> well, many current DAML tools, including all of my group's open
> source stuff, are RDF tools to which some OWL has been added - goal
> is to get to at least OWL-Lite for all of them.  So far we have been
> able to do this easily - changing the initial parsing, rather than
> the interpretation of the graph as we process it, would be a bigger
> change than you imply.   I agree there will be some OWL specific
> tools, but if all OWL use requires OWL specific tools then I fear it
> will never catch on -- in fact, the great success of your SHOE system
> in gaining acceptance was your recognition thatit was important to
> keep it so HTML tools interoperated with it -- otherwise it would
> have just been some KR langauge on the web and gotten much of the
> same attention as several others that didn't get to the starting
> gate.  OWL will gain immediate penetration from playing nice with
> RSS, RDF-XMP (Adobe's metadata system) and other existing RDF tools,
> and I fear that the import thing, being only somewhat defined as it
> is, is not worth breaking this over.

Look if RDF had the penetration of XML, then I might agree with you. In
that case, maximum compatibility with existing RDF tools would be a
critical issue. But the fact is, when compared to XML, RDF is barely
even on the radar screen. If RDF really takes off, it will be because
people want to use OWL, and not the other way around.

Still, I will grant you that there is some RDF data out there that it
might be nice to bring into the OWL world, and this could be seen as
minor con against proposal #1. However, if we went with that proposal,
we could develop an appendix about how to work with plain-old RDF data.
For example, we might say that you assume that any schema is an ontology
and that all RDF documents import schemas whose namespaces they use.

BTW, your SHOE comparison doesn't actually work. Sure, HTML tools could
read SHOE pages without being adversely affected, and the same would go
for RDF tools reading OWL pages under proposal #1. In SHOE, I had to
create a whole suite of tools to do anything useful with the language,
and I certainly didn't get anything out of HTML pages that weren't
already marked up with SHOE.

> >If people think it is important for RDF to process imports information,
> >then I suggest we ask RDF Core to consider extending RDF to handle it
> >correctly. This could be done by first allowing RDF to make statements
> >about graphs (as opposed to about a resource that we pretend represents
> >a document or a graph), and then adding an official imports property
> >that has a new thing called Graph as it's domain and range. We would
> >also need a way to give an identifier to a graph, which could probably
> >be done by adding an ID attribute in the <rdf:RDF> tag.
> 
> I suspect we could discuss with RDF Core the idea of their being
> something in the <rdf:RDF> to help - perhaps an "RDF-Profile" or a
> pointer to an (optional) "RDF Header" graph - if it appears we need
> such mechanisms I'll be happy to bring that up in the SWCG - I'm not
> yet convinced we couldn't do this with RDF as is.

Such an approach may alleviate many of my concerns with proposal #2. If
the group is leaning strongly that way, we may wish to investigate this
further.

> >
> >Proposal #1 can't have instances and classes in the same document
> >------------------------------------------------------------------
> >Not necessarily. Although my proposal said "class and property
> >definitions go here," there is no reason why instance statements
> >couldn't go in the same place, particularly if the instances were
> >important to the ontology. I don't follow your argument about having to
> >import the instances if you import the ontology. Why wouldn't you want
> >to? If someone decides the instances are part of the ontology, then when
> >you import it, you should import the whole ontology. Note, that in
> >proposal #2 the same thing is true, because there an imports means you
> >import the whole document. Thus, if you had classes and instances in the
> >document, you import both as well.
> 
> OK, so how do I have instances in a separate document?  Can it do an
> import?  If not, how do I know what semantics to impart?  If yes, are
> you saying they must be in an ontology definition?  THat would
> definitely break a lot of tools that output RDF as triples - not as
> XML documents.

If you look at Proposal #1, it does require an additional <owl:Data> tag
around RDF instances so that you can specify the imports information. So
you can either put the instances in an ontology (if they belong there)
or in a separate document. Now this does mean that existing RDF
documents would not be valid OWL, which I admitted above is an argument
against proposal #1, but as I said there, I think this can be mitigated
by saying how RDF data could be used in OWL.

As for tools outputing RDF as triples, I can't really speak to that
without knowing what tools your talking about. I admit I haven't used
many RDF tools, but I think most RDF parsers have something like an
RdfGraph object or data structure that contains the triples. It should
be easy enough to subclass this with (or embed in) an OwlGraph
object/data structure which has methods for retrieving imports
information.

> >Proposal #2 will make it easier to convert DAML+OIL to OWL
> >------------------------------------------------------------
> >This might be true to some extent, because I believe that as it stands
> >now, the conversion is simply a series of find and replaces so you could
> >do it all with a simple Unix script. However, I do not believe that
> >proposal #1 would require you to save a temporary file in order to do
> >the conversion. In the worst case, you'd have to do two reads of the
> >DAML+OIL document: one to collect up the ontology and imports
> >information, and one to create the OWL document with it in the right
> >place. However, since the DAML+OIL convention is to put the ontology
> >stuff at the top, I think a one pass program would work in most if not
> >all cases. Even so, since conversion tools only need to be used once per
> >document, the two pass algorithm isn't that expensive.
> 
> well, we use a simple PERL script to import all sorts of things into
> OWL, and also have a python front end for RIC so we can read N3, and
> an RDF triples to Parka converter and a few other things that would
> have to be rewritten to either create documents or to do explicit
> imports -- but all those could be rewritten, or we could ignore
> imports...

I don't see what the problem is. OWL is an evolving language. Once it is
set in stone, we'll all have to modify our tools to work with it,
regardless of whether or not it has imports.

> >
> >I look forward to your counter-arguments. I think this is a very useful
> >and important discussion.
> 
> The real problem is I think you've missed the key argument - not
> having the imports statement in the graph means we cannot have
> non-document-based tools for handling OWL unless they ignore imports
> (which is actually okay with me).  If we say the owl:ontology
> statements do go in the graph, then we can put them there.  IF we say
> they don't, then we lose interoperability.  Your approach cannot have
> it both ways -- you think you can because you're starting from the
> assumption that everything lives in documents -- but that isn't true
> - once my crawler grabs stuff and pulls it into ParkaSW, for example,
> all we keep around are the triples (including an extra one with a
> pointer back to an original document if we started from one).  Mike
> Dean does  the same with his Daml crawler [1]
> (5,000,000 DAML assertions found so far on 20k+ pages) -- the
> assertions go into his DAML DB, and thus you could not query for
> imports statements once things were in the graph -- unless he puts
> them there, in which case why don't we just do it in the first place?
>

No, I don't think I missed the argument. I just have a different idea of
what it means to parse an OWL document. You think the only result is a
set of RDF triples. I think the results is a data structure which
consists mostly of RDF triples, but there might be a few extra
components as well (such as imports information or your pointer back to
the original document). If you are storing this in a database (or Parka
for that matter), then you might have one or more tables for storing the
RDF triples and then another "meta"-table for the imports information
(this is basically what I did with SHOE in Parka). If you then want to
exchange this information with another application then you should
either use the OWL presentation syntax or define custom data structures
and/or file formats that preserve all relevant aspects of the language,
whatever they may be.

Look, I'm really not trying to be difficult here. I'm not on some
anti-RDF crusade, although I'll admit I'm not a big fan of the language.
In [1], I listed what I consider to be a number of problems with
proposal #2. I haven't heard anyone address any of these concerns. If
these were addressed satisfactorily (perhaps by an alternate proposal)
then I would be happy to endorse that approach.

Jeff

[1] http://lists.w3.org/Archives/Public/www-webont-wg/2002Sep/0473.html
Received on Tuesday, 1 October 2002 11:29:53 UTC