XML vs. RDF from William Bug on 2006-07-08 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Sat, 8 Jul 2006 00:42:12 -0400
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>
Message-Id: <19239B1D-4082-474C-9AF8-663C78955440@DrexelMed.edu>
Dear Philip,

Many thanks for this concise and accessible qualification to  
Chimezie's explanation.  I was a little crest-fallen when I saw his  
original answer to Trish, and thought I really had misunderstood an  
issue that is becoming of very significant importance to several  
projects with which I'm involved.

There have been several debates recently in the neuroinformatics  
community as to whether an XML-only (XML, XSD, XSLT, XLink) will  
suffice when creating creating sub-domain knowledge resources -  
especially if you are just collecting terminologies, as opposed to  
creating a full-blown, well-founded ontology.  Whether it really  
isn't necessary to go to Semantic Web tech - i.e., the constellation  
of RDF-associated specs (RDF++ - sorry to add to the acronym soup -  
this is just a shorthand for this email) and the growing number of  
utilities for manipulating RDF/OWL and all the other RDF-related  
formalisms.

The general arguments against moving on to RDF++ seem to be:
	1) It's extra work to fashion the assembled terminologies in such a  
way so as to be able to represent them in RDF++
	2) RDF++ is relatively new and unproven on a large scale (i.e., has  
limited adoption)
	3) The RDF++ toolset is consequently small, of questionable  
robustness, and not ubiquitous (in the sense the Xerces parser is  
ubiquitous);
	4) RDF++ are all XML-based.  Whatever you do with them, you could do  
yourself with a little extra work;
	5) OWL isn't perfect for representing formal ontological frameworks  
- besides we're just representing terminologies, not building an  
ontology
	6) We can leave it to others to create XSLT converters to move the  
XML-only resources into the RDF++ space
	7) XLink can provide typed relations not unlike the predicate in an  
RDF triplet
	8) RDF syntax is more opaque to a human than XML/XSD - e.g., more  
difficult for a human to read.
	9) Proponents of RDF++ argue that XML has limited semantic  
expressivity, but that's just not true.

I've really only started using RDF++ technologies myself over the  
year or so, but my naive answers have typically been:

	1) It's extra work to fashion the assembled terminologies in such a  
way so as to be able to represent them in RDF++
		a) The work you'd do in order to correctly represent your knowledge  
resource in RDF++ doesn't really add a very significant percentage of  
time to the overall effort and doing will force you to be more  
explicit about the semantic relations between the terms.  There would  
be a moderate amount of work just developing a working knowledge of  
the technologies associated with RDF++, but you'll be better off for  
it in the end.  I think its this latter issue that is really at the  
bottom of most of the concern.  Folks have invested heavily in the  
XML-only array of technologies over the last 10 years.  They are  
somewhat knowledgeable regarding RDF++ technologies but don't yet  
have a complete working knowledge of that space.

	2) RDF++ is relatively new and unproven on a large scale (i.e., has  
limited adoption)
		It is neither new, nor is it unproven on a large scale.  The number  
of applications in biomedical informatics is growing fast ([http:// 
www.w3.org/2005/04/swls/], [http://esw.w3.org/topic/ 
SemanticWebForLifeSciences]), though admittedly its much less visible  
in neuroinformatics right now.  This too is changing rapidly, where  
the focus is on semantically-based information processing ([http:// 
esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/ 
Neuroscience_Semantic_Web_Projects], [http://sciencecommons.org/data/ 
neurocommons], and several projects in development by other  
participants on the HCLSIG [http://esw.w3.org/topic/ 
SemanticWebForLifeSciencesPeople] - e.g., Kei Cheung, Don Doherty,  
etc.).  Still, many of the projects are just in the offing.
	
	3) The RDF++ toolset is consequently small, of questionable  
robustness, and not ubiquitous (in the sense the Xerces parser is  
ubiquitous);
		This is also not true, as best I can tell.  Certainly, CWM, Protégé- 
OWL, and others [http://esw.w3.org/topic/], as well as those more  
specific to the HC/LS space (http://www.w3.org/2001/sw/hcls/#resources)
	
	4) RDF++ are all XML-based.  Whatever you do with them, you could do  
yourself with a little extra work;
		These seems to contradict point 1 above.  The "little extra work"  
is not trivial, and an overwhelming amount of it will need to meet  
general requirements for manipulating semantic information  
effectively - the driver behind the creation of RDF++.  You'll have a  
lot more code to write and maintain, if you don't take advantage of  
Semantic Web tech.  Also, as Eric N. has mentioned: "An example of a  
classic SW Myth: RDF is not based at all on XML -- it was defined as  
a graph-relational model outside of XML, and can be represented  
TRIPLES, TURTLES, N3, and XML.  By definition, it is broader than any  
XML schema-- it can live as a meta-definition for a RDBM or even a KB."
	
	5) OWL isn't perfect for representing formal ontological frameworks  
- besides we're just representing terminologies, not building an  
ontology
		a) Even when assembling a terminology, you will be hard pressed not  
to represent some implicit semantic relations in your graph.  This is  
even true for some flat lists of terms - e.g., 'driver', 'iron', and  
'putter' are all types of 'golf club'.
		b) Work is ongoing to expand the semantic expressivity of OWL (see  
Chris M.'s comment re: including a formalism to accommodate time).
	
	6) We can leave it to others to create XSLT converters to move the  
XML-only resources into the RDF++ space
		Philip & Chris M. have both given clear answers to this ill-advised  
use of XSLT.  The other issue Eric N. has described clearly is the  
N**2 problem - the combinatorial proliferation of XSLTs as more XSDs  
are added to the mix.  Eric: "Data assembly from new sources and  
modalities is virtually impossible via XML schemas , after the  
schemas have been defined."
	
	7) XLink can provide typed relations not unlike the predicate in an  
RDF triplet.  There's nothing special about the use of URIs to  
provide these links in RDF
		Yes - but:
			a) Using XLink to do this forces you to reference semantic  
entities in relation to an entire document structure, as opposed to a  
more direct, simple URI based link;
			b) This was not the design intension of XLink, so it's likely to  
be a problematic mechanism to rely on for representing complex  
semantic networks.
			c) Again, as Eric N. has said: "URI's are a central part of the  
"node definition" in RDF, but not in XML. You can do what you want  
with URI's in XML, and that becomes a problem. RDF says all URI's  
must merge, while XML says "you need to explicitly define that in  
your parser and tree handler"-- yuck! No guarantee others will do  
that when they read your XML content. URI meta-semantics are only  
defined in RDF/OWL."
	
	8) RDF syntax is more opaque to a human than XML/XSD - e.g., more  
difficult for a human to read.
		Not true.  There's the N3 formalism and many tools providing an  
much easier way for humans to review formal, semantically specified  
data sets than sorting through XML/XSD/XSLT mappings to ontologies,  
for instance.  Eric: "Tim B-L himself says, RDF predicates serve  
human expressivity first, machines second! It's easy enough to write  
an RDF to english viewer for those addicted to reading XML."

	9) Proponents of RDF++ argue that XML has limited semantic  
expressivity, but that's just not true.
		I think this argument is completely inverted.  The problem is XML  
has nearly unlimited expressivity, but any semantic meaning you want  
to imbue your XML with must be made explicit in the parsers you  
write.  When you hope to align your semantic content to others, they  
must also represent equivalent semantic entities and relations  
according to the logic in your parser - not a very scalable  
approach.  I think the Nature Biotech article by Xiaoshu and his  
colleagues clearly explains that issue.
		
A lot of the counter arguments to these statements come down to:
	I) if you try to perform semantically-based KE/KR/KD with XML-only,  
you will have a lot more code to write & maintain YOURSELF - and much  
of it will reproduce what you'd get automatically using RDF++.  	
	II) You just can't provide the flexibility, guaranteed resolvability  
of resources, and efficient expression required when representing  
semantic relations in the rigid, strictly hierarchical document- 
oriented world of XML-only, so you'll likely fall short on a lot of  
your requirements.

I'd really appreciate hearing the views both pro & con on these  
issues from others on this list.

Thanks again, Philip, for your lucid and concise explanation.

Cheers,
Bill

On Jul 7, 2006, at 6:35 AM, Phillip Lord wrote:

>
>>>>>> "TW" == Trish Whetzel <whetzel@pcbi.upenn.edu> writes:
>
>   TW> Hi all,
>
>   TW> As a terribly simple question, is it possible to take the actual
>   TW> FuGE-ML that is generated on a per instance reporting of an
>   TW> experiment/study/investigation and then convert than to RDF for
>   TW> use with semantic web technologies?
>
>
> Converting between one syntax and another is fairly simple, and there
> are some reasonably tools for it. XSLT would work for converting XML
> into RDF. I wouldn't like to use it for converting the other way
> (actually I wouldn't like to use it at all, but this is personal
> prejudice!).
>
> This is assuming, however, that the semantics of the two
> representations are compatible. To give an example, syntactically it
> is possible to convert between the GO DAG and an OWL representation of
> GO. However, the GO part-of relationship doesn't distinguish
> universal and existential, while OWL forces you to make this
> distinction; you can't sit on the fence.
>
> So, the simple answer to a simple question is: it depends. I wouldn't
> assume that FuGE-ML will be convertible into a given
> ontology or representation in RDF, unless a reasonable amount of care
> is taken in the design of FuGE-ML or the ontology to ensure that it
> can happen.
>
> Course, you could always hack it with some rules and a bit of human
> intervention. That works as well.
>
> Cheers
>
> Phil
>
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Saturday, 8 July 2006 04:42:21 UTC