W3C home > Mailing lists > Public > www-rdf-interest@w3.org > December 2001

Re: XSLT for screen-scraping RDF out of real-world data

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Fri, 21 Dec 2001 07:09:26 -0500
To: www-rdf-interest@w3.org
Cc: Dan Connolly <connolly@w3.org>
Message-ID: <20011221070926.M11693@w3.org>
On Tue, 21 Mar 2000, Dan Connolly wrote:
> I believe that one of the best ways to transition into RDF,
> if not a long-term deployment strategy for RDF, is to manage the
> information in human-consumable form (XHTML) annotated with just
> enough info to extract the RDF statements that the human info
> is intended to convey. In other words: using a relational
> database or some sort of native RDF data store, and spitting
> out HTML dynamically, is a lot of infrastructure to operate
> and probably not worth it for lots of interesting cases. We all know
> that we have to produce a human-readable version
> of the thing... why not use that as the primary source?

In the process of making our publication process use more automations,
and with the help of Dan Connolly, I've written an XSLT stylesheet to
extract metadata from W3C technical reports:

For instance, this stylesheet applied on "RDF/XML Syntax Specification
(Revised)" http://www.w3.org/TR/rdf-syntax-grammar/ outputs this:
<?xml version="1.0" encoding="utf-8"?>
<!--Produced by $Id: trdoc2rdf.xslt,v 1.29 2001/12/20 16:17:44 dom Exp
<rdf:RDF xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"
oc#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
<dc:title>RDF/XML Syntax Specification (Revised)</dc:title>
<doc:versionOf rdf:resource="http://www.w3.org/TR/rdf-syntax-grammar"/>
<editor rdf:parseType="Resource">
<contact:fullName>Dave Beckett</contact:fullName>

Note that there is a little inline HTML form interface in the stylesheet
which allows to run it through the W3C XSLT service. For instance, the
above code can be get at:

Any feedback is appreciated.


Dominique HazaŽl-Massieux - http://www.w3.org/People/Dom/
W3C's Webmaster
Received on Friday, 21 December 2001 07:09:26 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:07:38 UTC