Re: XML vs. RDF from Chimezie Ogbuji on 2006-07-08 (public-semweb-lifesci@w3.org from July 2006)

From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Date: Sat, 8 Jul 2006 08:23:32 -0400 (EDT)
To: w3c semweb hcls <public-semweb-lifesci@w3.org>
cc: Phillip Lord <phillip.lord@newcastle.ac.uk>
Message-ID: <Pine.GSO.4.60.0607080727050.29955@joplin.bio.ri.ccf.org>
On Sat, 8 Jul 2006, William Bug wrote:

> Dear Philip,
>
> Many thanks for this concise and accessible qualification to Chimezie's 
> explanation.  I was a little crest-fallen when I saw his original answer to 
> Trish, and thought I really had misunderstood an issue that is becoming of 
> very significant importance to several projects with which I'm involved.
>
> There have been several debates recently in the neuroinformatics community as 
> to whether an XML-only (XML, XSD, XSLT, XLink) will suffice when creating 
> creating sub-domain knowledge resources - especially if you are just 
> collecting terminologies, as opposed to creating a full-blown, well-founded 
> ontology.  Whether it really isn't necessary to go to Semantic Web tech - 
> i.e., the constellation of RDF-associated specs (RDF++ - sorry to add to the 
> acronym soup - this is just a shorthand for this email) and the growing 
> number of utilities for manipulating RDF/OWL and all the other RDF-related 
> formalisms.

Here is the crux of the issue.  I think there is a misunderstanding of my 
original response.  In suggesting that XSLT makes such a transformation 
relatively painless (from an established XML format to one or more RDF 
representations),  I wasn't suggesting this as an argument *for* XML-only representation but as a 
consideration that shouldn't be disregarded.  I think one of the biggest 
misconceptions people who debate whether to go for XML-only solutions 
versus RDF++ (as you put it) is that the two technologies are mutually 
exclusive - which the ability to write such XSLT transforms shows is 
not the case at all.  Afterall, XML *is* in the semantic web stack and for good reason as well.

I think too much time is often invested in comparison and contrast of two 
representation languages that each address a different set of problems 
rather than in focusing on asking the more important question of what the 
requirements for representation are:

1) Is the data you wish to represent subject to lots of interpretation?
2) Is uniform syntax more important than semantics?
3) Is the domain being modelled subject to expansion in a semantic way?
4) What is the nature of systems with which interoperability is important

etc..

I think a handful of your points below fall more along the line of direct 
comparison and contrast that I don't think is as useful for answering the 
questions the neuroinformatic community may be grappling than focusing on 
what are the specific problems being solved and what are the short and 
long term requirements / goals for representation.

Cross-technological debate with 
well established trenches often do very little to answer the original 
questions but only further misconceptions - which is why the subject of 
this post (XML vs RDF) concerns me.

Both representation languages bring with them a set of well established 
tools that become readily available once you express your content with 
them and you have more to gain in leveraging dual-representation between 
both (where it's feasible - I agree with the qualification of the use of 
XSLT that emphasizes that it's contingent on having a well defined mapping 
in the first place) via XSLT.

Consider for instance XForms (which we are 
using quite heavily for instance data entry).  XForms is an XML dialect 
that addresses specific and well known pitfalls with legacy brower-based 
user interface dialects and does so in a *very* powerful and promising 
way.  If a dynamic, expressive means of data entry is an important 
requirement for you data (as it is in our case) then you already have 
a good argument for having representation in XML for which there is no 
equivalent alternative in an RDF++ only approach.  The main difficulty is 
that with forms-based user interfaces uniform syntax and declarative 
structure is of more concern than semantics.  I've chatted about this 
before, see this thread:

http://www.dehora.net/journal/2005/08/automated_mapping_between_rdf_and_forms_part_i.html

Ofcourse, you don't get your lunch for free and the price for leveraging 
uniform syntactical representation in order to simplify your use of 
forms for data entry is the effort up front in devising a mapping that 
provides the level of semantic grounding (if you will) sufficient for your 
needs and express such a mapping in an XSLT transform.

> driver behind the creation of RDF++.  You'll have a lot more code to write 
> and maintain, if you don't take advantage of Semantic Web tech.

This depends more on what it is you are trying to achieve with 
representation than by the technologies by themselves, so I 
don't agree with this very broad assesment.

> 		6) We can leave it to others to create XSLT converters to 
> move the XML-only resources into the RDF++ space
> 		Philip & Chris M. have both given clear answers to this 
> ill-advised use of XSLT.

I don't see how use of XSLT in this way can be considered 'ill-advised' 
and I don't think that was the point.  The issue is that a neccessary
  prerequisite for using XSLT in this way is a well defined 
mapping (if such a mapping exists) to begin with.  Once you have a well 
established mapping, XSLT *does* render the remaining mechanics a 
non-issue and it's for this particular reason that I think diregarding 
such a possibility is more ill-advised, especially if there is already 
a large and valuable body of existing XML content - this is precisely one 
of the main motivations for technologies such as GRDDL.

> The other issue Eric N. has described clearly is 
> the N**2 problem - the combinatorial proliferation of XSLTs as more XSDs are 
> added to the mix.

Once again, a misunderstanding of what I was suggesting.  The ability to 
use XSLT in such a fashion isn't an endorsement to XML-only 
representation solutions but as an effective way to leverage dual 
representation where there is value to do so.

> 	9) Proponents of RDF++ argue that XML has limited semantic 
> expressivity, but that's just not true.
> 		I think this argument is completely inverted.  The problem is 
> XML has nearly unlimited expressivity, but any semantic meaning you want to 
> imbue your XML with must be made explicit in the parsers you write.

An XML parser interprets at the syntactic level (not at the semantic 
level).  Semantic mapping from XML dialects typically occurs directly via 
XSLT (written perhaps by those familar with the XML schema) to RDF or by 
other more novel means.  See:

http://copia.ogbuji.net/blog/2006-04-03/_Semantic_

Ofcourse, such mappings will not be sufficient if your original needs for 
representation go above and beyond what XML provides (with regards to 
semantic expressiveness), but it's worth noting that there *is* a spectrum 
of oppurtunity between both technologies.

> 	I) if you try to perform semantically-based KE/KR/KD with XML-only, 
> you will have a lot more code to write & maintain YOURSELF - and much of it 
> will reproduce what you'd get automatically using RDF++.

XML was never meant to address Knoweledge Representation and attempts to 
use it in such a fashion is the fault of the author not the technology 
being misused.

> 		II) 
> You just can't provide the flexibility, guaranteed resolvability of 
> resources, and efficient expression required when representing semantic 
> relations in the rigid, strictly hierarchical document-oriented world of 
> XML-only, so you'll likely fall short on a lot of your requirements.

Only with those requirements that have more to do with KR and ubiquitous 
semantics than uniform, interoperable syntax.  Once again, the more 
constructive questions are about the nature of the requirements not the 
two technologies by themselves - there *is* always a context with their 
use.

Ask yourself why message protocols such as REST / POX and Web Services 
are expressed in XML and not in RDF.  Ask yourself why the same is true of user interface 
dialects (such as XHTML and it's derivatives - XForms), syndication formats, etc.. and perhaps the 
value of context and the nature of the problem being solved becomes more 
evident.

Polarizing comparison and contrast of both ends of the representation 
strata does more harm than good to both technologies and the more 
constructive questions should *first* be about what the requirements for 
representation are.


>
> I'd really appreciate hearing the views both pro & con on these issues from 
> others on this list.
>
> Thanks again, Philip, for your lucid and concise explanation.
>
> Cheers,
> Bill
>
> On Jul 7, 2006, at 6:35 AM, Phillip Lord wrote:
>
>> 
>>>>>>> "TW" == Trish Whetzel <whetzel@pcbi.upenn.edu> writes:
>> 
>>   TW> Hi all,
>> 
>>   TW> As a terribly simple question, is it possible to take the actual
>>   TW> FuGE-ML that is generated on a per instance reporting of an
>>   TW> experiment/study/investigation and then convert than to RDF for
>>   TW> use with semantic web technologies?
>> 
>> 
>> Converting between one syntax and another is fairly simple, and there
>> are some reasonably tools for it. XSLT would work for converting XML
>> into RDF. I wouldn't like to use it for converting the other way
>> (actually I wouldn't like to use it at all, but this is personal
>> prejudice!).
>> 
>> This is assuming, however, that the semantics of the two
>> representations are compatible. To give an example, syntactically it
>> is possible to convert between the GO DAG and an OWL representation of
>> GO. However, the GO part-of relationship doesn't distinguish
>> universal and existential, while OWL forces you to make this
>> distinction; you can't sit on the fence.
>> 
>> So, the simple answer to a simple question is: it depends. I wouldn't
>> assume that FuGE-ML will be convertible into a given
>> ontology or representation in RDF, unless a reasonable amount of care
>> is taken in the design of FuGE-ML or the ontology to ensure that it
>> can happen.
>> 
>> Course, you could always hack it with some rules and a bit of human
>> intervention. That works as well.
>> 
>> Cheers
>> 
>> Phil
>> 
>> 
>
> Bill Bug
> Senior Analyst/Ontological Engineer
>
> Laboratory for Bioimaging  & Anatomical Informatics
> www.neuroterrain.org
> Department of Neurobiology & Anatomy
> Drexel University College of Medicine
> 2900 Queen Lane
> Philadelphia, PA    19129
> 215 991 8430 (ph)
> 610 457 0443 (mobile)
> 215 843 9367 (fax)
>
>
> Please Note: I now have a new email - William.Bug@DrexelMed.edu
>

Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org


>
>
>
>
>
>
> This email and any accompanying attachments are confidential.This information 
> is intended solely for the use of the individualto whom it is addressed. Any 
> review, disclosure, copying,distribution, or use of this email communication 
> by others is strictlyprohibited. If you are not the intended recipient please 
> notify usimmediately by returning this message to the sender and deleteall 
> copies. Thank you for your cooperation.
Received on Saturday, 8 July 2006 12:23:55 UTC