W3C home > Mailing lists > Public > www-rdf-interest@w3.org > June 2000

Dublin core HTML->RDF (more semantic screen scraping with XSLT)

From: Dan Connolly <connolly@w3.org>
Date: Fri, 09 Jun 2000 13:54:25 -0500
Message-ID: <39413D61.B66A7998@w3.org>
To: www-rdf-interest@w3.org, dc@oclc.org
Share and enjoy...

----------
http://www.w3.org/2000/06/dc-extract/form.html

Dublin Core Extraction Service

XSL file: 

XML data: 



How does it work?

The form invokes a generic XSLT service that takes

an XSLT transformation 
     the default transformation for this form, dc-extract.xsl, converts
     from the format given in Encoding Dublin Core Metadata in HTML,
     December 1999 by J. Kunze and produces RDF. 
some XML data 
     try the tidy service if you have HTML that isn't well-formed. 

     For example, the ADAM page isn't well-formed (i.e. if it isn't
     XHTML), but the results of running the ADAM page thru tidy is.

and returns the result.

Inspiration

I wrote the guts of dc-extract.xsl on my palm pilot, over drinks with
Eric Miller and Dan Brickley in Amsterdam after WWW9 in an effort to
show them how easy it is to use XSLT to extract RDF from real-world
data.


Dan Connolly
$Revision: 1.4 $ of $Date: 2000/06/09 18:52:10 $ by $Author: connolly $ 
----------


I copy dc@oclc.org per:

	"Additions, deletions and changes to
	this list are welcomed. Please submit all
	information to dc@oclc.org"

	-- http://purl.org/dc/tools/index.htm

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Friday, 9 June 2000 14:53:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:43 GMT