W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 1999

RE: How do I get Tidy to ignore my PHP code blocks

From: Dave Raggett <dsr@w3.org>
Date: Thu, 22 Jul 1999 14:29:58 +0100 (GMT Daylight Time)
To: Richard Allsebrook <Richard.Allsebrook@easysoft.com>
cc: html-tidy@w3.org
Message-ID: <Pine.WNT.4.10.9907221417470.-417461@hazel.hpl.hp.com>
On Thu, 22 Jul 1999, Richard Allsebrook wrote:

> Ah, yes but:
> 
> <title>test</title>
> <?
> echo "This ia a <test>";
> echo "This is another test...";
> ?>

Ah, I see the problem. The issue is that in SGML processing
instructions end with the first '>' character. XML 1.0
altered this to require processing instructions to end with ?>

I could modify Tidy to require ?> as the terminator of all
processing instructions, but this would break its ability
to parse legal html. I guess a reasonable approach would
to make this a new configuration option that is automatically
set if the input is in XML. This would be easy to do for the
next release.

The patch is in the file lexer.c. The code to change is
just after the case statement

Old:

            case LEX_PROCINSTR:  /* seen <? so look for '>' */
                if (c != '>')
                    continue;

New:

            case LEX_PROCINSTR:  /* seen <? so look for '?>' */
                if (c != '?')
                    continue;

		/* now look for '>' */

		c = ReadChar(lexer->in);

                if (c == EndOfStream)
                {
                    ReportWarning(lexer, null, null,
					UNEXPECTED_END_OF_FILE);
                    UngetChar(c, lexer->in);
                    continue;
                }

                AddCharToLexer(lexer, c);

                if (c != '>')
		    continue;

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
phone: +44 122 578 2984 (or 2521) +44 385 320 444 (gsm mobile)
World Wide Web Consortium (on assignment from HP Labs)
Received on Thursday, 22 July 1999 09:30:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:42 GMT