- From: Mark Nottingham <mnot@mnot.net>
- Date: Tue, 18 Oct 2005 18:35:12 -0700
- To: semantic-web@w3.org
This may be of interest: http://www.mnot.net/blog/2005/10/18/libxslt_web Basically, it's HTML Tidy and XSLT glued together, along with access to HTTP headers from XSLT, automatic cookie handling, etc. I don't know if it's directly compatible with GRDDL, because it doesn't use tidy for the input document, but instead defines a document()-like extension function. Still, I don't think it would be hard to get from here to there. Let the scraping begin... -- Mark Nottingham http://www.mnot.net/
Received on Wednesday, 19 October 2005 01:35:30 UTC