- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 06 Dec 2005 07:26:02 +0100
- To: Sunil Sharma <sunil.sharmaa@gmail.com>
- Cc: html-tidy@w3.org
* Sunil Sharma wrote: >How can I extract information from pages developed in asp, jsp using cookies >and sessions? Well, what you need to understand is that HTML Tidy operates on HTML- like input documents and transforms them, if you wish, to X(HT)ML. You can use this output to extract "information" using XQuery, XPath, XSLT, whatever; use `tidy -asxml` to do this. That's about all I can tell you, at least on this list. If you are concerned with scraping web sites that depend on cookies, javascript, etc. Tidy is not the right tool for you, the Perl frameworks WWW::Mechanize or Win32::IE::Mechanize would be more appropriate for this. These are offtopic on html-tidy, however. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Tuesday, 6 December 2005 06:26:04 UTC