- From: Evan Lenz <elenz@xyzfind.com>
- Date: Wed, 6 Sep 2000 11:53:02 -0700
- To: <html-tidy@w3.org>
After further review, I dislike option #3. The W3C advises against continuing the practice of "hiding" scripts and stylesheets within comments. Plus, even though I'm using a comment-aware XML parser, not everyone will be. For XHTML, I think Tidy's best option would be to strip any open and close comment delimiters when they are right after the opening tag and before the closing tag, respectively, and then replacing them with CDATA section delimiters. Tidy can then issue a warning that the script may not work in current browsers and that perhaps the user should include the script from another file. Let's vote! :) Evan Lenz elenz@xyzfind.com http://www.xyzfind.com XYZFind Corp. "Building Better Search" -----Original Message----- From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org]On Behalf Of Evan Lenz Sent: Wednesday, September 06, 2000 11:20 AM To: html-tidy@w3.org Subject: RE: xml parsing error for < character I, for one, am in favor of at least having an option for either 1) nesting script content into CDATA sections, 2) escaping < to < and & to &, 3) always nesting script contents in a comment and escaping --. In short, I don't care how it's done, but well-formedness is well-formedness. One of the primary advantages of (and reasons for) XHTML is the ability to do further XML processing with languages such as XSLT. Perhaps my situation is unique, but I don't need Tidy's output to display correctly in a browser, because it's just serving as input to my XSLT processor. The XSLT processor can then worry about the HTML output method, allowing the result to work in current browsers. So, ultimately, that isn't much of a solution for those that want XHTML that will work in browsers as their final result, and, yes, it's a cop-out to the real XHTML problem. But my opinion is that if Tidy is to advertise the ability to output XHTML (or any kind of well-formed XML), it should do it (or at least try its best). As for instances of the literal string ]]> in script content, that's easy. Simply escape it with the following string: ]]]]><![CDATA[> Okay, I didn't come up with that, but it certainly works (by breaking the CDATA section into two). Thanks for all the great work you've done so far!! Evan Lenz elenz@xyzfind.com http://www.xyzfind.com XYZFind Corp. "Building Better Search" -----Original Message----- From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org]On Behalf Of Dave Raggett Sent: Friday, September 01, 2000 3:21 AM To: cateen Cc: html-tidy@w3.org; gerald@w3.org Subject: Re: xml parsing error for < character On Tue, 22 Aug 2000, cateen wrote: > Having used JTidy to produce an xml file, I then use a SAX based > XML parser on it and get the following exceptions: 'Whitespace > required before attributes.' caused by the following line in a > javascript function: for (i=0;i<y;i++) OR 'The content begining > "<" is not legal markup. Perhaps the "" () character > should be a letter.' caused by : for ( i = 0; i < y; i++) > > Can anyone tell me why this is happening and if there is a > configuration setting that will change the way the XML parser > treats the '<' sign. Alternatively is there a way that I can > tell JTidy to simply strip out all the information between the > <script> and </script> tags as it writes the xml file?? When I looked into this I discovered a bug in tidy.c lines 1075 and 1093 which should be testing XmlOut and not XmlTags. This fix now causes < and & in script element content to be escaped. This however doesn't solve the problem for XHTML. JavaScript uses < and & both of which need to be escaped in XML content unless you wrap the content in a CDATA marked section. Unfortunately, current HTML browsers don't recognize CDATA marked sections, and furthermore they expect script elements to have CDATA content, and hence expect < and & to be unescaped. The XHTML 1.0 standard therefore recommends you move your scripts to external files. The onmouseover and other event attributes are however ok, as browsers will deal with entities in attribute values. In the future as true XHTML browsers are deployed this problem will go away, since when parsed as XML, you will be able to use script elements with < for < or with the use of an enclosing CDATA marked section. I am uncertain as what HTML Tidy should do about this problem. If it wraps the contents of a script element in a CDATA marked section, it will stop the pages working in existing browsers. Ditto if it escapes the problem characters. It could write the contents of the script element to a new file, but what file name should it use? One possibility is simply to warn if Tidy finds < and & within script elements, placing the burden on the user to decide how to deal with the problem. What do people think about this? Regards, -- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile) World Wide Web Consortium (on assignment from HP Labs)
Received on Wednesday, 6 September 2000 14:50:32 UTC