- From: Lee Passey <lee@novonyx.com>
- Date: Fri, 01 Feb 2002 16:08:27 -0700
- To: Tidy Development <tidy-develop@lists.sourceforge.net>, "html-tidy@w3.org" <html-tidy@w3.org>
OK, I have come up with a workable implementation of this. I'll give everybody a couple more days to yell at me before I check it in. As I was doing it though, a couple of issues presented themselves: 1. I replaced a long multi-test conditional with isspace(). Is this 'c' library call available on all platforms? 2. The escaping of '/' after '<' in CDATA is done in lexer.c during parsing. Should it be postponed to the pretty-print phase? I am in favor of this, as I am working with the goal in mind of using the parsing routines as a basis for a DOM-1 compliant set of libraries (which, I believe, is one of the stated goals of the group). Charles Reitzel wrote: > > Hi Lee, > > I like the Tar Baby analogy. I think Dave's Original fix - escaping the > forward slash is a good one for Javascript. For other scripting languages, > emitting a warning is probably a good idea for HTML output. For XHTML > output, the ugly but effective, commented-out-CDATA-section-marker is > OK. So far, so good. > > If I understand it correctly, as long as you avoid looking inside quoted > strings, your new refinement has no conflict with either of these > "fixes". I like your approach of limiting the tag name check to the > current ancestor stack. It strikes a nice balance between adaptability and > avoiding unintended side effects. > > Those specs are mostly good reading (Thanks, Dave). Once you get into it, > they can be your best reference. Keep it up, and you'll learn SGML. Don't > say I didn't warn you... > > take it easy, > Charlie > > At 12:48 PM 1/31/2002 -0700, you wrote: > >p.s. > > > >I couldn't find an explicit discussion of this in the 4.x spec, but > >apparently this is the behavior mandated by the 3.2 spec, which states: > > > >"All markup characters or delimiters are ignored and passed as data to > >the application, except for ETAGO ("</") delimiters followed immediately > >by a name character [a-zA-Z]. This means that the element's end-tag (or > >that of an element in which it is nested) is recognized, while an error > >occurs if the ETAGO is invalid." > > > > > >Lee Passey wrote: > > > > > So I reworked the logic of GetCDATA() a bit, so that when a presumed > > > end-tag is encountered, and it does not match the container's tag, it > > > would climb the parse tree. If the tag matches _any_ parent tag, it > > > pushes the end-tag token, and stops parsing CDATA. This has the effect > > > of > > > adding an implied end-tag for an unterminated <script>, and causes > > > parsing to continue correctly. > > > > > > Is this the right solution? > > > >_______________________________________________ > >Tidy-develop mailing list > >Tidy-develop@lists.sourceforge.net > >https://lists.sourceforge.net/lists/listinfo/tidy-develop
Received on Friday, 1 February 2002 18:01:12 UTC