This may be of interest: http://www.mnot.net/blog/2005/10/18/libxslt_web Basically, it's HTML Tidy and XSLT glued together, along with access to HTTP headers from XSLT, automatic cookie handling, etc. I don't know if it's directly compatible with GRDDL, because it doesn't use tidy for the input document, but instead defines a document()-like extension function. Still, I don't think it would be hard to get from here to there. Let the scraping begin... -- Mark Nottingham http://www.mnot.net/Received on Wednesday, 19 October 2005 01:35:30 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:36:08 GMT