RE: Update on RDF in XHTML from Butler, Mark on 2003-11-24 (public-rdf-in-xhtml-tf@w3.org from November 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Mon, 24 Nov 2003 17:55:20 -0000
To: "'Joseph Reagle'" <reagle@w3.org>, Dominique Hazaël-Massieux <dom@w3.org>, public-rdf-in-xhtml-tf@w3.org, connolly@w3.org
Message-ID: <E864E95CB35C1C46B72FEA0626A2E8082062CD@0-mail-br1.hpl.hp.com>

Hi Dan, Joseph

On SIMILE (http://web.mit.edu/simile/www) we have been doing quite a bit of
work on using XSLT to create RDF from XML. Some of the conclusions I have
reached may be relevant here:

1. Letting people create RDF from XML using XSLT, rather than encouraging
everyone to use RDF/XML regardless is probably a good thing, and may lead to
better RDF data models. 

When people create XML, they are concerned about their own needs, rather
than how to model their data so it is globally unambiguous. Therefore I
observe that allowing people to create data as XML, and then using XSLT to
style it to RDF when required, means that they only have to concentrate on
the latter requirements when they need to, so stand a better chance of
getting it right. 

In SIMILE, we are still refining our RDF model for our data, but as we are
using XML as a canonical format, currently its just a matter of using an
XSLT transform to change the model, rather than hand editing a lot of data,
and this is rather attractive. So creating RDF from XHTML via XSLT might
have similar benefits. 

2. XSLT 2.0 is much better at this task than XSLT 1.0, and ideally XSLT
needs a function for URI encoding

XSLT 1.0 doesn't have many functions to manipulate the contents of elements
or attributes, but when dealing with RDF it is highly desirable to be able
to do this. One reason is you may want to assign unique URIs to data
objects, so you may synthesise these URIs from the contents of elements or
attributes. This means you have to ensure that the URIs are valid. In SIMILE
this means we are using the XSLT 2.0 replace and regular expression
functionality.  

EXSLT includes proposals for functions for encoding and decoding URIs
http://www.exslt.org/str/functions/encode-uri/
and they were implemented by FourThought in 4Suite, and I note the
FourThought folks have been quite active in using RDF. So my guess is that
such functions would be potentially useful here. 

3. Having to transform using XSLT to RDF/XML, and then deserialize is a pain
- in an ideal world the reader would process the transform on the fly.

I realise this is a bit like trying to run before you walk, but particularly
when dealing with very large models it is desirable to stream them directly
into the RDF model rather than transform them, save them to disk, and then
deserialize them. Some groups work with Python or Perl to convert XML to
RDF/XML, so similar approaches could be used with XHTML with the advantage
that you could convert it directly to the RDF model. However it would be
good if it was possible to do this with XSLT also, so perhaps there is a
need for a variant of XSLT that could support this - an area for future
work. 

I plan to write up the SIMILE work in more detail in the future, but I'm
afraid it will be the new year before this happens. 

hope this is of interest, kind regards

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Monday, 24 November 2003 12:55:45 UTC