Re: question: GetToken returns the contents of <script> as Text from Bjoern Hoehrmann on 2004-05-14 (html-tidy@w3.org from April to June 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 14 May 2004 23:29:33 +0200
To: Sasha P Caskey <sashac@us.ibm.com>
Cc: html-tidy@w3.org
Message-ID: <40a538d1.967795936@smtp.bjoern.hoehrmann.de>

* Sasha P Caskey wrote:
>I figured I would ask before I changed the behavior of GetToken in lexer.c.
>I've been using the latest version of tidy to parse html by calling
>GetToken. However, I noticed that I keep receiving the content of the
>script elements as text nodes (perhaps that is the proper behavior), is
>there any way I can change that through a configuration parameter perhaps?

Note that GetToken(...) is an internal function and thus unsupported,
we might and likely will change it without much consideration of the
implications for other applications...

I've recently changed GetToken(...) for CDATA elements like <script>,
<style>, etc. so that you should be able to use the CdataContent mode
for these elements. This would require that you check for CDATA start
tags, assign the token to lexer->parent and call it with CdataContent
mode, have a look at parser.c:ParseScript(...) which does this.

What change to GetToken(...) did you have in mind to do this?

Received on Friday, 14 May 2004 17:30:14 UTC