- From: Lee Passey <lee@novonyx.com>
- Date: Fri, 01 Feb 2002 16:08:27 -0700
- To: Tidy Development <tidy-develop@lists.sourceforge.net>, "html-tidy@w3.org" <html-tidy@w3.org>
OK, I have come up with a workable implementation of this. I'll give
everybody a couple more days to yell at me before I check it in. As I
was doing it though, a couple of issues presented themselves:
1. I replaced a long multi-test conditional with isspace(). Is this
'c' library call available on all platforms?
2. The escaping of '/' after '<' in CDATA is done in lexer.c during
parsing. Should it be postponed to the pretty-print phase? I am in
favor of this, as I am working with the goal in mind of using the
parsing routines as a basis for a DOM-1 compliant set of libraries
(which, I believe, is one of the stated goals of the group).
Charles Reitzel wrote:
>
> Hi Lee,
>
> I like the Tar Baby analogy. I think Dave's Original fix - escaping the
> forward slash is a good one for Javascript. For other scripting languages,
> emitting a warning is probably a good idea for HTML output. For XHTML
> output, the ugly but effective, commented-out-CDATA-section-marker is
> OK. So far, so good.
>
> If I understand it correctly, as long as you avoid looking inside quoted
> strings, your new refinement has no conflict with either of these
> "fixes". I like your approach of limiting the tag name check to the
> current ancestor stack. It strikes a nice balance between adaptability and
> avoiding unintended side effects.
>
> Those specs are mostly good reading (Thanks, Dave). Once you get into it,
> they can be your best reference. Keep it up, and you'll learn SGML. Don't
> say I didn't warn you...
>
> take it easy,
> Charlie
>
> At 12:48 PM 1/31/2002 -0700, you wrote:
> >p.s.
> >
> >I couldn't find an explicit discussion of this in the 4.x spec, but
> >apparently this is the behavior mandated by the 3.2 spec, which states:
> >
> >"All markup characters or delimiters are ignored and passed as data to
> >the application, except for ETAGO ("</") delimiters followed immediately
> >by a name character [a-zA-Z]. This means that the element's end-tag (or
> >that of an element in which it is nested) is recognized, while an error
> >occurs if the ETAGO is invalid."
> >
> >
> >Lee Passey wrote:
> >
> > > So I reworked the logic of GetCDATA() a bit, so that when a presumed
> > > end-tag is encountered, and it does not match the container's tag, it
> > > would climb the parse tree. If the tag matches _any_ parent tag, it
> > > pushes the end-tag token, and stops parsing CDATA. This has the effect
> > > of
> > > adding an implied end-tag for an unterminated <script>, and causes
> > > parsing to continue correctly.
> > >
> > > Is this the right solution?
> >
> >_______________________________________________
> >Tidy-develop mailing list
> >Tidy-develop@lists.sourceforge.net
> >https://lists.sourceforge.net/lists/listinfo/tidy-develop
Received on Friday, 1 February 2002 18:01:12 UTC