W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2005

Re: Regarding XQuery

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 06 Dec 2005 07:26:02 +0100
To: Sunil Sharma <sunil.sharmaa@gmail.com>
Cc: html-tidy@w3.org
Message-ID: <cebap11ej8s8m06921pepbjttjcr72ac2r@hive.bjoern.hoehrmann.de>

* Sunil Sharma wrote:
>How can I extract information from pages developed in asp, jsp using cookies
>and sessions?

Well, what you need to understand is that HTML Tidy operates on HTML-
like input documents and transforms them, if you wish, to X(HT)ML. You
can use this output to extract "information" using XQuery, XPath, XSLT,
whatever; use `tidy -asxml` to do this. That's about all I can tell you,
at least on this list. If you are concerned with scraping web sites that
depend on cookies, javascript, etc. Tidy is not the right tool for you,
the Perl frameworks WWW::Mechanize or Win32::IE::Mechanize would be more
appropriate for this. These are offtopic on html-tidy, however.
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Tuesday, 6 December 2005 06:26:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:55 UTC