Re: XSLT for screen-scraping RDF out of real-world data

On Tue, 21 Mar 2000, Dan Connolly wrote:
> I believe that one of the best ways to transition into RDF,
> if not a long-term deployment strategy for RDF, is to manage the
> information in human-consumable form (XHTML) annotated with just
> enough info to extract the RDF statements that the human info
> is intended to convey. In other words: using a relational
> database or some sort of native RDF data store, and spitting
> out HTML dynamically, is a lot of infrastructure to operate
> and probably not worth it for lots of interesting cases. We all know
> that we have to produce a human-readable version
> of the thing... why not use that as the primary source?

In the process of making our publication process use more automations,
and with the help of Dan Connolly, I've written an XSLT stylesheet to
extract metadata from W3C technical reports:
	http://www.w3.org/2001/10/trdoc2rdf

For instance, this stylesheet applied on "RDF/XML Syntax Specification
(Revised)" http://www.w3.org/TR/rdf-syntax-grammar/ outputs this:
<?xml version="1.0" encoding="utf-8"?>
<!--Produced by $Id: trdoc2rdf.xslt,v 1.29 2001/12/20 16:17:44 dom Exp
$-->
<rdf:RDF xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"
xmlns:dc="h
ttp://purl.org/dc/elements/1.1/"
xmlns:doc="http://www.w3.org/2000/10/swap/pim/d
oc#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://www.w
3.org/2001/02pd/rec54">
<WD
rdf:about="http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20011218">
<dc:date>2001-12-18</dc:date>
<dc:title>RDF/XML Syntax Specification (Revised)</dc:title>
<doc:versionOf rdf:resource="http://www.w3.org/TR/rdf-syntax-grammar"/>
<editor rdf:parseType="Resource">
<contact:fullName>Dave Beckett</contact:fullName>
</editor>
</WD>
</rdf:RDF>

Note that there is a little inline HTML form interface in the stylesheet
which allows to run it through the W3C XSLT service. For instance, the
above code can be get at:
http://www.w3.org/2000/06/webdata/xslt?xmlfile=http%3A%2F%2Fwww.w3.org%2FTR%2Frdf-syntax-grammar%2F&xslfile=http%3A%2F%2Fwww.w3.org%2F2001%2F10%2Ftrdoc2rdf

Any feedback is appreciated.

Regards,

Dom
-- 
Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/
W3C's Webmaster
mailto:dom@w3.org

Received on Friday, 21 December 2001 07:09:26 UTC