W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2005

Re: Regarding Tidy and XQUERY

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 05 Dec 2005 07:07:05 +0100
To: Sunil Sharma <sunil.sharmaa@gmail.com>
Cc: html-tidy@w3.org
Message-ID: <31m7p111v89i69sttcfs48qavm3sum5rb4@hive.bjoern.hoehrmann.de>

* Sunil Sharma wrote:
>Can we use tidy and xquery to extract information from an html page which
>has a lot of javascript functions.

You can use Tidy to turn HTML-like documents into well-formed XHTML
documents and use XQuery on that like on any other XML document, all
that might be special here is that the elements would be in the XHTML
namespace which you have to specify in the query if you use element
names in it. I'm not sure how JavaScript is relevant here, Tidy won't
execute any script. If you need that, you might want to use one of
the web browsers to convert to XHTML (you'd inject a script that
serializes the document when it is in the state you are interested in).
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Monday, 5 December 2005 06:13:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:55 GMT