For GRDDL-heads: XSLT+Tidy from Mark Nottingham on 2005-10-19 (semantic-web@w3.org from October 2005)

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 18 Oct 2005 18:35:12 -0700
To: semantic-web@w3.org
Message-Id: <262FC4DB-8B09-4E56-9009-42999B24C543@mnot.net>

This may be of interest:
   http://www.mnot.net/blog/2005/10/18/libxslt_web

Basically, it's HTML Tidy and XSLT glued together, along with access  
to HTTP headers from XSLT, automatic cookie handling, etc.

I don't know if it's directly compatible with GRDDL, because it  
doesn't use tidy for the input document, but instead defines a  
document()-like extension function. Still, I don't think it would be  
hard to get from here to there.

Let the scraping begin...

--
Mark Nottingham     http://www.mnot.net/

Received on Wednesday, 19 October 2005 01:35:30 UTC