- From: Dave Raggett <dsr@w3.org>
- Date: Fri, 1 Sep 2000 11:21:11 +0100 (GMT Daylight Time)
- To: cateen <cateen@turboweb.net.au>
- cc: html-tidy@w3.org, gerald@w3.org
On Tue, 22 Aug 2000, cateen wrote: > Having used JTidy to produce an xml file, I then use a SAX based > XML parser on it and get the following exceptions: 'Whitespace > required before attributes.' caused by the following line in a > javascript function: for (i=0;i<y;i++) OR 'The content begining > "<" is not legal markup. Perhaps the "" () character > should be a letter.' caused by : for ( i = 0; i < y; i++) > > Can anyone tell me why this is happening and if there is a > configuration setting that will change the way the XML parser > treats the '<' sign. Alternatively is there a way that I can > tell JTidy to simply strip out all the information between the > <script> and </script> tags as it writes the xml file?? When I looked into this I discovered a bug in tidy.c lines 1075 and 1093 which should be testing XmlOut and not XmlTags. This fix now causes < and & in script element content to be escaped. This however doesn't solve the problem for XHTML. JavaScript uses < and & both of which need to be escaped in XML content unless you wrap the content in a CDATA marked section. Unfortunately, current HTML browsers don't recognize CDATA marked sections, and furthermore they expect script elements to have CDATA content, and hence expect < and & to be unescaped. The XHTML 1.0 standard therefore recommends you move your scripts to external files. The onmouseover and other event attributes are however ok, as browsers will deal with entities in attribute values. In the future as true XHTML browsers are deployed this problem will go away, since when parsed as XML, you will be able to use script elements with < for < or with the use of an enclosing CDATA marked section. I am uncertain as what HTML Tidy should do about this problem. If it wraps the contents of a script element in a CDATA marked section, it will stop the pages working in existing browsers. Ditto if it escapes the problem characters. It could write the contents of the script element to a new file, but what file name should it use? One possibility is simply to warn if Tidy finds < and & within script elements, placing the burden on the user to decide how to deal with the problem. What do people think about this? Regards, -- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile) World Wide Web Consortium (on assignment from HP Labs)
Received on Friday, 1 September 2000 06:21:21 UTC