Re: XML vs. RDF from William Bug on 2006-07-08 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Sat, 8 Jul 2006 11:51:42 -0400
To: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>
Message-Id: <870BFC80-8E54-475A-A51E-B57A3673F5BA@DrexelMed.edu>
Hi Chimezie,

I would say we vigorously agree, and any appearance to the contrary  
was my inability to more clearly describe the issue at hand.

To be honest, OS X Mail crashed on me as I tried to send out the  
first version of this post which was much more nuanced.  I should  
have taken that as a sign from the almighty Leibniz this was a email  
better left unsent.  ;-)

In particular, I'm a very heavy user of XML-only technologies -  
couldn't live without them - for all the reason you state below.

My concern is given the nature of the problem under discussion in the  
debate to which I refer - semantic representation, semantic-based  
integration, and supporting semantic queries on federated  
neuroscientific data repositories - the problem is just the opposite  
- RDF++ technologies are not being clearly vetted properly yet in the  
larger neuroinformatics community re: the requirements at hand and  
some are inclined to go with XML-only technologies because it's what  
they know and what they are invested in.  There are many counter  
examples to this statement - projects being worked on by folks on  
this list and others - but they still have limited visibility in the  
larger community of neuroinformatics researchers.

I should also state I didn't mean to indicate I interpreted what you  
said as implying XSD/XSLT --> RDF was the preferred way of keeping  
the XML-only space in sync with the RDF++ space - when such a  
specific need arises - but merely a potential route, when confronted  
with the specific task of moving XML-only based data representations  
into the RDF++ space.  I think my concern was this simple answer -  
without stating the caveats added by Philip & Chris M. - can give  
some the impression this was what you were saying.

When it comes to performing KE on existing data sources, tools such  
as GRDDL are - and will continue to be - invaluable.  XSLT-based  
translation will also be required, but as Philip has indicated in his  
response, this can be fraught with problems and is not really an  
invertible process.  Yes - you can certainly perform the translation  
in the opposite direction, but if you are seeking to move the  
semantic information into the XML-only space - as opposed to mere  
moving data back to XML-only representation to lean on the uniform,  
explicit syntax provided by some constellation of XSDs & XSLTs used  
to interoperate amongst them - you probably shouldn't even be going  
back to XML-only space at all.

Part of the confusion in the debate, I believe, results from the fact  
the work being done needs to support both KE from existing sources,  
as well as providing tools and "best practices" for how we'd want to  
see researchers encapsulate semantic information going forward.  Such  
broad requirements will profit from using BOTH XML-only as well as RDF 
++ specific technologies.

As you say, it all comes down to the Use Cases and user requirements  
you seek to support.

Please see additional, brief inline comments below.

Cheers,
Bill

On Jul 8, 2006, at 8:23 AM, Chimezie Ogbuji wrote:

>
>
> On Sat, 8 Jul 2006, William Bug wrote:
>
>> Dear Philip,
>>
>> Many thanks for this concise and accessible qualification to  
>> Chimezie's explanation.  I was a little crest-fallen when I saw  
>> his original answer to Trish, and thought I really had  
>> misunderstood an issue that is becoming of very significant  
>> importance to several projects with which I'm involved.
>>
>> There have been several debates recently in the neuroinformatics  
>> community as to whether an XML-only (XML, XSD, XSLT, XLink) will  
>> suffice when creating creating sub-domain knowledge resources -  
>> especially if you are just collecting terminologies, as opposed to  
>> creating a full-blown, well-founded ontology.  Whether it really  
>> isn't necessary to go to Semantic Web tech - i.e., the  
>> constellation of RDF-associated specs (RDF++ - sorry to add to the  
>> acronym soup - this is just a shorthand for this email) and the  
>> growing number of utilities for manipulating RDF/OWL and all the  
>> other RDF-related formalisms.
>
> Here is the crux of the issue.  I think there is a misunderstanding  
> of my original response.  In suggesting that XSLT makes such a  
> transformation relatively painless (from an established XML format  
> to one or more RDF representations),  I wasn't suggesting this as  
> an argument *for* XML-only representation but as a consideration  
> that shouldn't be disregarded.  I think one of the biggest  
> misconceptions people who debate whether to go for XML-only  
> solutions versus RDF++ (as you put it) is that the two technologies  
> are mutually exclusive - which the ability to write such XSLT  
> transforms shows is not the case at all.  Afterall, XML *is* in the  
> semantic web stack and for good reason as well.
>
> I think too much time is often invested in comparison and contrast  
> of two representation languages that each address a different set  
> of problems rather than in focusing on asking the more important  
> question of what the requirements for representation are:
>
> 1) Is the data you wish to represent subject to lots of  
> interpretation?
> 2) Is uniform syntax more important than semantics?
> 3) Is the domain being modelled subject to expansion in a semantic  
> way?
> 4) What is the nature of systems with which interoperability is  
> important
>
> etc..

I should have explained more fully at the outset that the answer to  
all of these requirements issues - for the specific issue in which  
this debate has arisen - are:

	1) yes
	2) no
	3) yes
	4) semantic integration is what's under question

>
> I think a handful of your points below fall more along the line of  
> direct comparison and contrast that I don't think is as useful for  
> answering the questions the neuroinformatic community may be  
> grappling than focusing on what are the specific problems being  
> solved and what are the short and long term requirements / goals  
> for representation.

Admittedly, I should have been more clear about the focus of the  
debate, as I've done above.

>
> Cross-technological debate with well established trenches often do  
> very little to answer the original questions but only further  
> misconceptions - which is why the subject of this post (XML vs RDF)  
> concerns me.

Sorry - I intended it to catch folks attention.  I agree stating the  
topic so broadly doesn't help provide guidance on how to implement  
specific solutions to well defined requirements.

The problem I'm having is if you go to Google and post that very  
string "XML vs RDF", you get a myriad of answers all coming from  
different directions.  In this particular context - semantic  
representation, integration, manipulation for the life science space  
- it would be very helpful to have this group present the pros AND  
the cons for the rest of the community to use as a set of "best  
practices".  One might then say, how does it help to achieve that  
goal by just posting an email to this list with that generic title.   
My answer would be - it gets the attention of the people who care -  
and have cogitated over this issue - thereby bootstrapping the  
process of creating such a resource.

>
> Both representation languages bring with them a set of well  
> established tools that become readily available once you express  
> your content with them and you have more to gain in leveraging dual- 
> representation between both (where it's feasible - I agree with the  
> qualification of the use of XSLT that emphasizes that it's  
> contingent on having a well defined mapping in the first place) via  
> XSLT.

The word I'd focus on in this comment is BOTH.  I agree with you  
completely.  My concern - as some of the points in my previous email  
point to - is some don't feel the RDF++ space has the required tools.

>
> Consider for instance XForms (which we are using quite heavily for  
> instance data entry).  XForms is an XML dialect that addresses  
> specific and well known pitfalls with legacy brower-based user  
> interface dialects and does so in a *very* powerful and promising  
> way.  If a dynamic, expressive means of data entry is an important  
> requirement for you data (as it is in our case) then you already  
> have a good argument for having representation in XML for which  
> there is no equivalent alternative in an RDF++ only approach.  The  
> main difficulty is that with forms-based user interfaces uniform  
> syntax and declarative structure is of more concern than  
> semantics.  I've chatted about this before, see this thread:
>
> http://www.dehora.net/journal/2005/08/ 
> automated_mapping_between_rdf_and_forms_part_i.html
>
> Ofcourse, you don't get your lunch for free and the price for  
> leveraging uniform syntactical representation in order to simplify  
> your use of forms for data entry is the effort up front in devising  
> a mapping that provides the level of semantic grounding (if you  
> will) sufficient for your needs and express such a mapping in an  
> XSLT transform.

Yes - absolutely - HTTP based data entry technologies have advanced  
considerably.  XForms, XHTML, AJAX - not that they are mutually  
exclusive - can all add considerable flexibility and responsiveness  
to a data entry environment.  I totally agree.

As these continue to mature, it would be wonderful it specific  
extensions designed to interface with the RDF++ space where  
representation of semantic specificity and relations are an important  
part of the data entry process.  This is critical to what we are  
trying to do in BIRN with BIRNLex, its use for semantic annotation of  
data, and the use of the resulting annotation by the BIRN query  
mediator.  Daniel Rubin has indicated there are projects underway at  
NCBO that will be specifically valuable in the area of semantically- 
based data entry.  There are many tools already developed in the  
context of working with the Gene Ontology that also support many of  
these requirements.

>
>> driver behind the creation of RDF++.  You'll have a lot more code  
>> to write and maintain, if you don't take advantage of Semantic Web  
>> tech.
>
> This depends more on what it is you are trying to achieve with  
> representation than by the technologies by themselves, so I don't  
> agree with this very broad assesment.

Again - sorry - I meant this statement to be limited to the still  
general requirements laid out in my answer to your 4 questions above  
and should have stated this more clearly.

>
>> 		6) We can leave it to others to create XSLT converters to move  
>> the XML-only resources into the RDF++ space
>> 		Philip & Chris M. have both given clear answers to this ill- 
>> advised use of XSLT.
>
> I don't see how use of XSLT in this way can be considered 'ill- 
> advised' and I don't think that was the point.  The issue is that a  
> neccessary
>  prerequisite for using XSLT in this way is a well defined mapping  
> (if such a mapping exists) to begin with.  Once you have a well  
> established mapping, XSLT *does* render the remaining mechanics a  
> non-issue and it's for this particular reason that I think  
> diregarding such a possibility is more ill-advised, especially if  
> there is already a large and valuable body of existing XML content  
> - this is precisely one of the main motivations for technologies  
> such as GRDDL.

See above.  I don't feel its ALWAYS ill-advised, I just think - as  
you are clearly stating throughout - one shouldn't rely on this  
solution to be appropriate for all movement of data between the XML- 
only space and RDF++ space.  Where it's specifically ill-advised is  
in assuming the existence of this option precludes your having to  
even think about direct use of RDF++ to meet your formal semantic  
information representation needs now.  As i believe you state below,  
it's not the that this is an "ill-advised use of XSLT", but rather  
it's the assumption the availability of this option precludes having  
to consider RDF++ technologies in designing your specific solution.   
That's what's ill-advised.

>
>> The other issue Eric N. has described clearly is the N**2 problem  
>> - the combinatorial proliferation of XSLTs as more XSDs are added  
>> to the mix.
>
> Once again, a misunderstanding of what I was suggesting.  The  
> ability to use XSLT in such a fashion isn't an endorsement to XML- 
> only representation solutions but as an effective way to leverage  
> dual representation where there is value to do so.

Agreed.

>
>> 	9) Proponents of RDF++ argue that XML has limited semantic  
>> expressivity, but that's just not true.
>> 		I think this argument is completely inverted.  The problem is  
>> XML has nearly unlimited expressivity, but any semantic meaning  
>> you want to imbue your XML with must be made explicit in the  
>> parsers you write.
>
> An XML parser interprets at the syntactic level (not at the  
> semantic level).  Semantic mapping from XML dialects typically  
> occurs directly via XSLT (written perhaps by those familar with the  
> XML schema) to RDF or by other more novel means.  See:
>
> http://copia.ogbuji.net/blog/2006-04-03/_Semantic_
>
> Ofcourse, such mappings will not be sufficient if your original  
> needs for representation go above and beyond what XML provides  
> (with regards to semantic expressiveness), but it's worth noting  
> that there *is* a spectrum of oppurtunity between both technologies.

Again - agreed.  Sorry for the generality that makes this statement  
ambiguous.  When I said parser above, I was including all the code  
you write - or make use of - which includes the low-level syntactic  
parser such as one gets from Xerces, the XSLT mapping from the XSD  
into the specific semantic representation you require, and any other  
code you need to write to fully realize your semantic extraction/ 
representation requirements.

>
>> 	I) if you try to perform semantically-based KE/KR/KD with XML- 
>> only, you will have a lot more code to write & maintain YOURSELF -  
>> and much of it will reproduce what you'd get automatically using  
>> RDF++.
>
> XML was never meant to address Knoweledge Representation and  
> attempts to use it in such a fashion is the fault of the author not  
> the technology being misused.

My point exactly.  Sorry I didn't state this more clearly.

>
>> 		II) You just can't provide the flexibility, guaranteed  
>> resolvability of resources, and efficient expression required when  
>> representing semantic relations in the rigid, strictly  
>> hierarchical document-oriented world of XML-only, so you'll likely  
>> fall short on a lot of your requirements.
>
> Only with those requirements that have more to do with KR and  
> ubiquitous semantics than uniform, interoperable syntax.  Once  
> again, the more constructive questions are about the nature of the  
> requirements not the two technologies by themselves - there *is*  
> always a context with their use.
>
> Ask yourself why message protocols such as REST / POX and Web  
> Services are expressed in XML and not in RDF.  Ask yourself why the  
> same is true of user interface dialects (such as XHTML and it's  
> derivatives - XForms), syndication formats, etc.. and perhaps the  
> value of context and the nature of the problem being solved becomes  
> more evident.
>
> Polarizing comparison and contrast of both ends of the  
> representation strata does more harm than good to both technologies  
> and the more constructive questions should *first* be about what  
> the requirements for representation are.
>
>
>>
>> I'd really appreciate hearing the views both pro & con on these  
>> issues from others on this list.
>>
>> Thanks again, Philip, for your lucid and concise explanation.
>>
>> Cheers,
>> Bill
>>
>> On Jul 7, 2006, at 6:35 AM, Phillip Lord wrote:
>>
>>>>>>>> "TW" == Trish Whetzel <whetzel@pcbi.upenn.edu> writes:
>>>   TW> Hi all,
>>>   TW> As a terribly simple question, is it possible to take the  
>>> actual
>>>   TW> FuGE-ML that is generated on a per instance reporting of an
>>>   TW> experiment/study/investigation and then convert than to RDF  
>>> for
>>>   TW> use with semantic web technologies?
>>> Converting between one syntax and another is fairly simple, and  
>>> there
>>> are some reasonably tools for it. XSLT would work for converting XML
>>> into RDF. I wouldn't like to use it for converting the other way
>>> (actually I wouldn't like to use it at all, but this is personal
>>> prejudice!).
>>> This is assuming, however, that the semantics of the two
>>> representations are compatible. To give an example, syntactically it
>>> is possible to convert between the GO DAG and an OWL  
>>> representation of
>>> GO. However, the GO part-of relationship doesn't distinguish
>>> universal and existential, while OWL forces you to make this
>>> distinction; you can't sit on the fence.
>>> So, the simple answer to a simple question is: it depends. I  
>>> wouldn't
>>> assume that FuGE-ML will be convertible into a given
>>> ontology or representation in RDF, unless a reasonable amount of  
>>> care
>>> is taken in the design of FuGE-ML or the ontology to ensure that it
>>> can happen.
>>> Course, you could always hack it with some rules and a bit of human
>>> intervention. That works as well.
>>> Cheers
>>> Phil
>>
>> Bill Bug
>> Senior Analyst/Ontological Engineer
>>
>> Laboratory for Bioimaging  & Anatomical Informatics
>> www.neuroterrain.org
>> Department of Neurobiology & Anatomy
>> Drexel University College of Medicine
>> 2900 Queen Lane
>> Philadelphia, PA    19129
>> 215 991 8430 (ph)
>> 610 457 0443 (mobile)
>> 215 843 9367 (fax)
>>
>>
>> Please Note: I now have a new email - William.Bug@DrexelMed.edu
>>
>
> Chimezie Ogbuji
> Lead Systems Analyst
> Thoracic and Cardiovascular Surgery
> Cleveland Clinic Foundation
> 9500 Euclid Avenue/ W26
> Cleveland, Ohio 44195
> Office: (216)444-8593
> ogbujic@ccf.org
>
>
>>
>>
>>
>>
>>
>>
>> This email and any accompanying attachments are confidential.This  
>> information is intended solely for the use of the individualto  
>> whom it is addressed. Any review, disclosure,  
>> copying,distribution, or use of this email communication by others  
>> is strictlyprohibited. If you are not the intended recipient  
>> please notify usimmediately by returning this message to the  
>> sender and deleteall copies. Thank you for your cooperation.
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Saturday, 8 July 2006 15:52:03 UTC