Fwd: Re: LANG: owl:import - Two Proposals

>Sender: heflin@EECS.Lehigh.EDU
>Date: Wed, 02 Oct 2002 12:06:00 -0400
>From: Jeff Heflin <heflin@cse.lehigh.edu>
>Organization: Lehigh University
>X-Accept-Language: en
>To: Jim Hendler <hendler@cs.umd.edu>
>Subject: Re: LANG: owl:import - Two Proposals
>
>Jim,
>
>I think you sent this only to me, but I imagine you meant to send it to
>the mailing list. If so, please do and I will respond there.
>
>Jeff

thanks Jeff - all - here's a message I should have sent to you all - 
esp the call at the end for other people to weigh in on this issue!

>
>Jim Hendler wrote:
>>
>>  >Please see my responses inline...
>>  >
>>  >Jim Hendler wrote:
>>  >>
>>  >>  At 10:46 AM -0400 9/30/02, Jeff Heflin wrote:
>>  >>  >Jim,
>>  >>  >
>>  >>  >Thanks for the arguments in favor of and against proposal 2. I think it
>>  >>  >is important that all the pros and cons be identified and we have a
>>  >>  >debate on this so that the WG can truly make the best decision, whether
>>  >>  >that be in favor of proposal 1, 2 or something as yet undetermined.
>>  >>  >
>>  >>  >That said, I'd like to discuss your points:
>>  >>  >
>>  >>  >Proposal #1 requires a new MIME type
>>  >>  >-------------------------------------
>>  >>  >I find this an interesting point. Does the W3C have any documentation
>>  >>  >that say when a new MIME type is required or recommended? On 
>>one hand, I
>>  >>  >don't see why we need a new one because we are just using our own XML
>>  >>  >schema to describe the Ontology, imports, etc. tags. Thus, it 
>>would seem
>>  >>  >we could just use the XML MIME type. Certainly, the W3C doesn't require
>>  >>  >a new MIME type for each schema? However, on the other hand, our
>>  >>  >language does have special semantics that most XML schemas don't have,
>>  >>  >and perhaps the MIME type is used to indicate to applications that they
>>  >>  >should process it in a different way. This makes sense, but then it
>>  >>  >seems to me, OWL should have its own MIME type regardless. 
>>After all, we
>>  >>  >have a different semantics from RDF (even if it is just additive). So,
>>  >>  >it seems to me either both proposals or neither require the new MIME
>>  >>  >type, and I'm leaning toward both of them needing one.
>>  >>
>>  >>  I'm not the expert on this stuff, I hope Dan Connolly or Massimo will
>>  >>  correct me if I'm wrong -- but I think going to a separate mime type
>>  >>  would require much more motivation than this.  If you insist OWL can
>>  >>  only be used through an XML schema, then I will point out this
>>  >>  disagrees with the f2f decisions taken by the WG.  If you say no, we
>>  >>  want to be RDF parsable, then we need to go by RDF rules.   I think
>>  >>  the location of our metadata is not such an important issue that it
>>  >>  is worth reopening the decisions (that's my opinion)
>>  >
>>  >I will await Dan or Massimo's response on the issue of MIME types.
>>  >
>>  >As for proposal #1, it does play by the RDF rules. As we've discussed,
>>  >even the RDF syntax documents say it is perfectly okay for the RDF to be
>>  >embedded in another XML document. Thus, I do not see this as going
>>  >against the WG's prior decision regarding RDF. I also do not think this
>>  >is just an issue of "where the metadata goes," I think it is a critical
>>  >issue about what can and cannot be expressed in RDF.
>>
>>  Jeff -it my seem like I'm quibbling - but it is important.  If we put
>>  the RDF in an XML document, then it is no longer a "RDF/XML" document
>>  (by definition) - it is an XML document.  This violates the decision
>>  made by the WG earlier that our transfer protocol will be RDF/XML,
>>  and that XML will bw non-normative.  I think reopening that issue is
>>  problematic at this late date, and the above would require the WG to
>>  rethink earlier decisions, which is a problem to me given we have an
>>  alternative.
>>
>>  >  > >
>>  >>  >Proposal 1 would require RDF tools to read a document twice to get
>>  >>  >import information
>>  >> 
>>>--------------------------------------------------------------------------
>>  >>  >Any tools that care about import information would use an XML parser to
>  > >>  >extract it, and then pass the RDF subtree of the document to an RDF
>>  >>  >parser.
>>  >>  >There's absolutely no reason to read the document twice. If the
>>  >>  >application is a plain-old RDF application that doesn't realize this,
>  > >>  >then it will never have heard of imports in the first place and won't
>>  >>  >care about imports information.
>>  >>
>>  >>  exactly - so maybe I'm now in favor of your solution because it means
>>  >>  most people won't care about imports.
>>  >
>>  >That's a risk I'd be willing to take. I think what the OWL specs say and
>>  >the OWL tools do will be more important to users than what some old RDF
>>  >tools do. So can we just settle this and go with Proposal #1? ;-) I
>>  >guess not, huh.
>>
>>  how well Jeff knows me :->
>>
>>  >  >
>>  >>  >I can see that there is a slight cost in tools because now all of your
>>  >>  >RDF tools need an extra 10 lines of code to write out proper OWL, but I
>>  >>  >think that cost is negligible, because the OWL tools that users will
>>  >>  >find easiest are those that have some built-in support for 
>>OWL. That is,
>>  >>  >ontologies will be a central aspect (as opposed to just another class),
>>  >>  >the "parseType:Collection" ugliness will be handled automatically for
>>  >  > >you, etc.
>>  >>  >In other words, in order for OWL to succeed, there will have to be OWL
>>  >>  >specific tools anyway.
>>  >>
>>  >>  well, many current DAML tools, including all of my group's open
>>  >>  source stuff, are RDF tools to which some OWL has been added - goal
>>  >>  is to get to at least OWL-Lite for all of them.  So far we have been
>>  >>  able to do this easily - changing the initial parsing, rather than
>>  >>  the interpretation of the graph as we process it, would be a bigger
>>  >>  change than you imply.   I agree there will be some OWL specific
>>  >>  tools, but if all OWL use requires OWL specific tools then I fear it
>>  >>  will never catch on -- in fact, the great success of your SHOE system
>>  >>  in gaining acceptance was your recognition thatit was important to
>>  >>  keep it so HTML tools interoperated with it -- otherwise it would
>>  >>  have just been some KR langauge on the web and gotten much of the
>>  >>  same attention as several others that didn't get to the starting
>>  >>  gate.  OWL will gain immediate penetration from playing nice with
>>  >>  RSS, RDF-XMP (Adobe's metadata system) and other existing RDF tools,
>>  >>  and I fear that the import thing, being only somewhat defined as it
>>  >>  is, is not worth breaking this over.
>>  >
>>  >Look if RDF had the penetration of XML, then I might agree with you. In
>>  >that case, maximum compatibility with existing RDF tools would be a
>>  >critical issue. But the fact is, when compared to XML, RDF is barely
>>  >even on the radar screen. If RDF really takes off, it will be because
>>  >people want to use OWL, and not the other way around.
>>  >
>>  >Still, I will grant you that there is some RDF data out there that it
>>  >might be nice to bring into the OWL world, and this could be seen as
>>  >minor con against proposal #1. However, if we went with that proposal,
>>  >we could develop an appendix about how to work with plain-old RDF data.
>>  >For example, we might say that you assume that any schema is an ontology
>>  >and that all RDF documents import schemas whose namespaces they use.
>>  >
>>  >BTW, your SHOE comparison doesn't actually work. Sure, HTML tools could
>>  >read SHOE pages without being adversely affected, and the same would go
>>  >for RDF tools reading OWL pages under proposal #1. In SHOE, I had to
>>  >create a whole suite of tools to do anything useful with the language,
>>  >and I certainly didn't get anything out of HTML pages that weren't
>>  >already marked up with SHOE.
>>
>>  okay, analogy wasn't great - but I think you underestimate RDF use -
>>  however, that's an external issue.  I think the important thing to me
>>  is that the ontologies and instances we create in OWL get pulled into
>>  a graph, that can be manipulated separately from using a reasoner on
>>  the ontology per se. We have LOTS of examples of this sort of OWL
>>  tool growing - and thus whether we say "RDF" per se, or RDF-like
>  > graphs, is indeed quibbling - the idea of saying you have to
>>  reprocess graphs to use them in OWL makes me nervous indeed.
>>
>>  >
>>  >>  >If people think it is important for RDF to process imports information,
>  > >>  >then I suggest we ask RDF Core to consider extending RDF to handle it
>>  >>  >correctly. This could be done by first allowing RDF to make statements
>>  >>  >about graphs (as opposed to about a resource that we pretend represents
>>  >>  >a document or a graph), and then adding an official imports property
>>  >>  >that has a new thing called Graph as it's domain and range. We would
>>  >>  >also need a way to give an identifier to a graph, which could probably
>>  >>  >be done by adding an ID attribute in the <rdf:RDF> tag.
>>  >>
>>  >>  I suspect we could discuss with RDF Core the idea of their being
>>  >>  something in the <rdf:RDF> to help - perhaps an "RDF-Profile" or a
>>  >>  pointer to an (optional) "RDF Header" graph - if it appears we need
>>  >>  such mechanisms I'll be happy to bring that up in the SWCG - I'm not
>>  >  > yet convinced we couldn't do this with RDF as is.
>>  >
>>  >Such an approach may alleviate many of my concerns with proposal #2. If
>>  >the group is leaning strongly that way, we may wish to investigate this
>>  >further.
>>
>>  We can discuss this at the f2f - I'm hoping Dave Beckett will join us
>>  for at least the social events, and we (either you and I, or the WG)
>>  might wish to discuss this with him.
>>
>>  >  > >
>>  >>  >Proposal #1 can't have instances and classes in the same document
>>  >>  >------------------------------------------------------------------
>>  >>  >Not necessarily. Although my proposal said "class and property
>>  >>  >definitions go here," there is no reason why instance statements
>>  >>  >couldn't go in the same place, particularly if the instances were
>>  >>  >important to the ontology. I don't follow your argument about having to
>>  >>  >import the instances if you import the ontology. Why wouldn't you want
>>  >>  >to? If someone decides the instances are part of the 
>>ontology, then when
>>  >>  >you import it, you should import the whole ontology. Note, that in
>>  >>  >proposal #2 the same thing is true, because there an imports means you
>>  >  > >import the whole document. Thus, if you had classes and 
>>instances in the
>>  >>  >document, you import both as well.
>>  >>
>>  >>  OK, so how do I have instances in a separate document?  Can it do an
>>  >>  import?  If not, how do I know what semantics to impart?  If yes, are
>>  >>  you saying they must be in an ontology definition?  THat would
>>  >>  definitely break a lot of tools that output RDF as triples - not as
>>  >>  XML documents.
>>  >
>>  >If you look at Proposal #1, it does require an additional <owl:Data> tag
>>  >around RDF instances so that you can specify the imports information. So
>>  >you can either put the instances in an ontology (if they belong there)
>>  >or in a separate document. Now this does mean that existing RDF
>>  >documents would not be valid OWL, which I admitted above is an argument
>>  >against proposal #1, but as I said there, I think this can be mitigated
>>  >by saying how RDF data could be used in OWL.
>>
>>  yes, but again I think this is moving backwards against previously 
>>WG decisions
>>
>>  >As for tools outputing RDF as triples, I can't really speak to that
>>  >without knowing what tools your talking about. I admit I haven't used
>>  >many RDF tools, but I think most RDF parsers have something like an
>>  >RdfGraph object or data structure that contains the triples. It should
>>  >be easy enough to subclass this with (or embed in) an OwlGraph
>>  >object/data structure which has methods for retrieving imports
>>  >information.
>>
>>  Jeff - we take RDF stuff (RDF, RDFS, OWL) and run it through the W3C
>>  validator to turn into Ntriples which are read into PARKA and
>>  accessible over the web (with Parka inferecing) - I don't see anyway
>>  to do this given the above without having to build new tools or build
>>  complex processes.  By the way, one of the things we are playing with
>>  grabbing this way is the results of an Expose-based crawler, so we
>>  want to work on large numbers of triples with no preprocessing.
>  >
>>  >
>>  >>  >Proposal #2 will make it easier to convert DAML+OIL to OWL
>>  >>  >------------------------------------------------------------
>>  >>  >This might be true to some extent, because I believe that as it stands
>  > >>  >now, the conversion is simply a series of find and replaces 
>so you could
>>  >>  >do it all with a simple Unix script. However, I do not believe that
>>  >>  >proposal #1 would require you to save a temporary file in order to do
>>  >>  >the conversion. In the worst case, you'd have to do two reads of the
>>  >>  >DAML+OIL document: one to collect up the ontology and imports
>>  >>  >information, and one to create the OWL document with it in the right
>>  >>  >place. However, since the DAML+OIL convention is to put the ontology
>>  >>  >stuff at the top, I think a one pass program would work in most if not
>>  >>  >all cases. Even so, since conversion tools only need to be 
>>used once per
>>  >>  >document, the two pass algorithm isn't that expensive.
>>  >>
>>  >>  well, we use a simple PERL script to import all sorts of things into
>>  >>  OWL, and also have a python front end for RIC so we can read N3, and
>>  >>  an RDF triples to Parka converter and a few other things that would
>>  >>  have to be rewritten to either create documents or to do explicit
>>  >  > imports -- but all those could be rewritten, or we could ignore
>>  >>  imports...
>>  >
>>  >I don't see what the problem is. OWL is an evolving language. Once it is
>>  >set in stone, we'll all have to modify our tools to work with it,
>>  >regardless of whether or not it has imports.
>>
>>  I agree that this was a minor issue - it's why I had it in a "p.s."
>>  instead of in the main body of my first response - I agree to ignore
>>  this as an important issue about OWL for now...
>>
>>  >  > >
>>  >>  >I look forward to your counter-arguments. I think this is a very useful
>>  >>  >and important discussion.
>>  >>
>>  >>  The real problem is I think you've missed the key argument - not
>>  >>  having the imports statement in the graph means we cannot have
>>  >>  non-document-based tools for handling OWL unless they ignore imports
>>  >>  (which is actually okay with me).  If we say the owl:ontology
>>  >>  statements do go in the graph, then we can put them there.  IF we say
>>  >>  they don't, then we lose interoperability.  Your approach cannot have
>>  >>  it both ways -- you think you can because you're starting from the
>>  >>  assumption that everything lives in documents -- but that isn't true
>>  >>  - once my crawler grabs stuff and pulls it into ParkaSW, for example,
>>  >>  all we keep around are the triples (including an extra one with a
>>  >>  pointer back to an original document if we started from one).  Mike
>>  >>  Dean does  the same with his Daml crawler [1]
>>  >>  (5,000,000 DAML assertions found so far on 20k+ pages) -- the
>>  >>  assertions go into his DAML DB, and thus you could not query for
>>  >>  imports statements once things were in the graph -- unless he puts
>>  >>  them there, in which case why don't we just do it in the first place?
>>  >>
>>  >
>>  >No, I don't think I missed the argument. I just have a different idea of
>>  >what it means to parse an OWL document. You think the only result is a
>>  >set of RDF triples. I think the results is a data structure which
>>  >consists mostly of RDF triples, but there might be a few extra
>>  >components as well (such as imports information or your pointer back to
>>  >the original document). If you are storing this in a database (or Parka
>>  >for that matter), then you might have one or more tables for storing the
>>  >RDF triples and then another "meta"-table for the imports information
>>  >(this is basically what I did with SHOE in Parka). If you then want to
>>  >exchange this information with another application then you should
>>  >either use the OWL presentation syntax or define custom data structures
>>  >and/or file formats that preserve all relevant aspects of the language,
>>  >whatever they may be.
>>
>>  i.e. we should lose interoperability, defeating the whole purpose of
>>  moving towards standards
>>
>>  >
>>  >Look, I'm really not trying to be difficult here. I'm not on some
>  > >anti-RDF crusade, although I'll admit I'm not a big fan of the language.
>>  >In [1], I listed what I consider to be a number of problems with
>>  >proposal #2. I haven't heard anyone address any of these concerns. If
>  > >these were addressed satisfactorily (perhaps by an alternate proposal)
>>  >then I would be happy to endorse that approach.
>>
>>  fair enough, but I thought those were valid disadvantages, just not
>>  as important as the disadvantages of proposal 1.
>>
>>  >Jeff
>>  >
>>  >[1] http://lists.w3.org/Archives/Public/www-webont-wg/2002Sep/0473.html
>>
>>  --
>>  Professor James Hendler                           hendler@cs.umd.edu
>>  Director, Semantic Web and Agent Technologies     301-405-2696
>>  Maryland Information and Network Dynamics Lab.    301-405-6707 (Fax)
>>  Univ of Maryland, College Park, MD 20742          240-731-3822 (Cell)
>>  http://www.cs.umd.edu/users/hendler


-- 
Professor James Hendler				  hendler@cs.umd.edu
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742	  240-731-3822 (Cell)
http://www.cs.umd.edu/users/hendler

Received on Wednesday, 2 October 2002 12:51:01 UTC