Re: XML vs. RDF from William Bug on 2006-07-10 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Mon, 10 Jul 2006 10:20:48 -0400
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>
Message-Id: <A30DE111-1B1E-4479-BECC-3AD77950F974@DrexelMed.edu>
Dear Philip,

I agree with every comment you make - 100%.

Thanks for taking the time to address some of these issues.

See below for very brief addenda.

Cheers,
Bill

On Jul 10, 2006, at 6:36 AM, Phillip Lord wrote:

>
>>>>>> "WB" == William Bug <William.Bug@DrexelMed.edu> writes:
>
>   WB> 	5) OWL isn't perfect for representing formal ontological
>   WB> frameworks - besides we're just representing terminologies, not
>   WB> building an ontology
>
>
> OWL is sufficient for representing terminologies as far as I can tell.
> To suggest that it isn't perfect for representing formal ontologies is
> true, but slightly misleading. We don't have a perfect methodology for
> representing formal ontologies. That OWL is not perfect is therefore a
> relatively trivial statement.
>
>
>   WB> 		a) Even when assembling a terminology, you will be
>   WB> hard pressed not to represent some implicit semantic relations
>   WB> in your graph.
>
> Not sure how this relates to OWL.

It doesn't.  That's just another aspect of the debate from which this  
whole constellation of issues has arisen.

>
>
>
>   WB> 		b) Work is ongoing to expand the semantic expressivity
>   WB> of OWL (see Chris M.'s comment re: including a formalism to
>   WB> accommodate time).
> 	
> It's worth mentioning that there are some difficult constraints with
> respect to time. Don't quote me on this, as I am well out of my area
> of knowledge. However, within the constraints of a decidable logic,
> we do not yet know how to represent time based statements, while still
> maintaining expressivity in other ways.
>
> The point is that the limitations in OWL expressivity are often
> deliberate, not an over-sight.

Agreed on all points.

My hope is there are some "low hanging fruit" on the issue of time as  
relates to development and disease processes where even achieving  
partial expressivity for time, we'll be able to make some progress.   
In the context of several projects I'm working on - especially the  
BIRN project where the focus is neurodegenerative disease - small  
steps in the right direction to accommodate formal representation of  
durations and points in time in an ontological context will be a big  
help, so long as they don't lead us down a path precluding re-use  
later of what we do now, once formalisms to support the larger steps  
become available.

>
>
>   WB> 	6) We can leave it to others to create XSLT converters to move
>   WB> the XML-only resources into the RDF++ space
>   WB> 		Philip & Chris M. have both given clear answers to
>   WB> this ill-advised use of XSLT.
>
> I think you may have misinterpreted this. My point is that XSLT is not
> good for operating on RDF because there are many syntactic ways of
> representing the same thing. In general, I wouldn't use XSLT at all as
> I hate it, but that's a different issue.

This was exactly how I took your point.  Sorry for any confusion.

I think Chimezie's follow-up point to your comment is also important  
to take into account here.  XSLT is a means to this end, so long as  
you have a very clear application in mind and are willing to invest  
the manual labor required to map the semantics in the body of your  
XSLT document, this is a viable route.  I believe all those who've  
weighed in on this topic so far would agree there's little point in  
going in the opposite direction, unless you are only looking to take  
the RDF instances you created through the forward process - exactly  
as they were with no changes of any kind - and turn them back to  
their XML-only equivalent.  The overwhelming majority of  
modifications you'd likely make to the RDF instances would leave you  
with semantic ambiguities or enriched expressions no longer supported  
by the XSD you are trying to translate back to.

As Chimezie pointed out, it is not an accident, but a deliberate  
design goal to create an XML binding for semantic web formalisms.  My  
fear is the fact this XML --> XSD ==> XSLT --> RDF route is available  
leads some to believe using XML-only is a "safe" option to expressing  
semantic information now even if you may eventually want to move that  
information into the RDF++ space.  This perception can especially be  
the case when assembling a strictly hierarchical terminological  
resource.  My concerns are:
	1) In representing complex, biomedical terminologies, you can rarely  
stick to a pure, hierarchical graph, one where you are careful to use  
only one type of relation - e.g., 'is_a' subsumption only, ONE of the  
MANY types of mereological relation, etc..  There are definitely ways  
of representing terminologies that include this additional complexity  
in XML-only; however, once you proceed to include that additional  
semantic subtlety, you are really making much more work for yourself  
by not moving into RDF++.
	2) If you are expressing the terminology without any lexical variety  
and focussed on strict 'is_a' subsumption (i.e., hypernym--hyponym  
relations ONLY - no synonyms, no homographic homonyms, no meronyms,  
etc.), then the terms you pick - "preferred" terms, if you will -  
really just become stand in human-readable labels in an ontological  
graph, where the graph itself is purely semantic in nature.  You  
would again be better off to represent that now in RDF++ where the  
syntax is specifically designed to support concise representation of  
such a graph.  If you extend that approach by creating separate  
graphs for 'is_a' subsumption, specific mereological graphs to  
represent different manners of "parthood", etc. - you are then really  
making MUCH MORE work for yourself by avoiding RDF++ technologies for  
expressing your terminology.

To express my opinion on this issue with a bit more nuance:
RDF++ specs+tools (I include OWL in this "universe"), are not the  
most complete - or the only - means to formally express semantic  
info.  If you are aware of the limits of the technology, however, and  
determine you can still accommodate the KE/KR/KD problem in hand, if  
you plan your design with these limits in mind, RDF++ can be  
extremely concise and semantically expressive - and - from my point  
of view - of much greater value for expressing semantic information  
than the XML-only approach.

>
>
> Phil

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Monday, 10 July 2006 14:46:04 UTC