For GRDDL-heads: XSLT+Tidy

This may be of interest:
   http://www.mnot.net/blog/2005/10/18/libxslt_web

Basically, it's HTML Tidy and XSLT glued together, along with access  
to HTTP headers from XSLT, automatic cookie handling, etc.

I don't know if it's directly compatible with GRDDL, because it  
doesn't use tidy for the input document, but instead defines a  
document()-like extension function. Still, I don't think it would be  
hard to get from here to there.

Let the scraping begin...

--
Mark Nottingham     http://www.mnot.net/

Received on Wednesday, 19 October 2005 01:35:30 UTC