Re: [Tidy-dev] Bug list from Lee Passey on 2002-02-01 (html-tidy@w3.org from January to March 2002)

From: Lee Passey <lee@novonyx.com>
Date: Fri, 01 Feb 2002 16:08:27 -0700
To: Tidy Development <tidy-develop@lists.sourceforge.net>, "html-tidy@w3.org" <html-tidy@w3.org>
Message-ID: <3C5B1FEB.BF0B2EDC@novonyx.com>

OK, I have come up with a workable implementation of this.  I'll give
everybody a couple more days to yell at me before I check it in.    As I
was doing it though, a couple of issues presented themselves:

1.  I replaced a long multi-test conditional with isspace().  Is this
'c' library call available on all platforms?

2.  The escaping of '/' after '<' in CDATA is done in lexer.c during
parsing.  Should it be postponed to the pretty-print phase?  I am in
favor of this, as I am working with the goal in mind of using the
parsing routines as a basis for a DOM-1 compliant set of libraries
(which, I believe, is one of the stated goals of the group). 

Charles Reitzel wrote:
> 
> Hi Lee,
> 
> I like the Tar Baby analogy.  I think Dave's Original fix - escaping the
> forward slash is a good one for Javascript.  For other scripting languages,
> emitting a warning is probably a good idea for HTML output.  For XHTML
> output, the ugly but effective, commented-out-CDATA-section-marker is
> OK.  So far, so good.
> 
> If I understand it correctly, as long as you avoid looking inside quoted
> strings, your new refinement has no conflict with either of these
> "fixes".  I like your approach of limiting the tag name check to the
> current ancestor stack.  It strikes a nice balance between adaptability and
> avoiding unintended side effects.
> 
> Those specs are mostly good reading (Thanks, Dave). Once you get into it,
> they can be your best reference.  Keep it up, and you'll learn SGML.  Don't
> say I didn't warn you...
> 
> take it easy,
> Charlie
> 
> At 12:48 PM 1/31/2002 -0700, you wrote:
> >p.s.
> >
> >I couldn't find an explicit discussion of this in the 4.x spec, but
> >apparently this is the behavior mandated by the 3.2 spec, which states:
> >
> >"All markup characters or delimiters are ignored and passed as data to
> >the application, except for ETAGO ("</") delimiters followed immediately
> >by a name character [a-zA-Z]. This means that the element's end-tag (or
> >that of an element in which it is nested) is recognized, while an error
> >occurs if the ETAGO is invalid."
> >
> >
> >Lee Passey wrote:
> >
> > > So I reworked the logic of GetCDATA() a bit, so that when a presumed
> > > end-tag is encountered, and it does not match the container's tag, it
> > > would climb the parse tree.  If the tag matches _any_ parent tag, it
> > > pushes the end-tag token, and stops parsing CDATA.  This has the effect
> > > of
> > > adding an implied end-tag for an unterminated <script>, and causes
> > > parsing to continue correctly.
> > >
> > > Is this the right solution?
> >
> >_______________________________________________
> >Tidy-develop mailing list
> >Tidy-develop@lists.sourceforge.net
> >https://lists.sourceforge.net/lists/listinfo/tidy-develop

Received on Friday, 1 February 2002 18:01:12 UTC