- From: Dave Raggett <dsr@w3.org>
- Date: Thu, 22 Jul 1999 14:29:58 +0100 (GMT Daylight Time)
- To: Richard Allsebrook <Richard.Allsebrook@easysoft.com>
- cc: html-tidy@w3.org
On Thu, 22 Jul 1999, Richard Allsebrook wrote: > Ah, yes but: > > <title>test</title> > <? > echo "This ia a <test>"; > echo "This is another test..."; > ?> Ah, I see the problem. The issue is that in SGML processing instructions end with the first '>' character. XML 1.0 altered this to require processing instructions to end with ?> I could modify Tidy to require ?> as the terminator of all processing instructions, but this would break its ability to parse legal html. I guess a reasonable approach would to make this a new configuration option that is automatically set if the input is in XML. This would be easy to do for the next release. The patch is in the file lexer.c. The code to change is just after the case statement Old: case LEX_PROCINSTR: /* seen <? so look for '>' */ if (c != '>') continue; New: case LEX_PROCINSTR: /* seen <? so look for '?>' */ if (c != '?') continue; /* now look for '>' */ c = ReadChar(lexer->in); if (c == EndOfStream) { ReportWarning(lexer, null, null, UNEXPECTED_END_OF_FILE); UngetChar(c, lexer->in); continue; } AddCharToLexer(lexer, c); if (c != '>') continue; Regards, -- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett phone: +44 122 578 2984 (or 2521) +44 385 320 444 (gsm mobile) World Wide Web Consortium (on assignment from HP Labs)
Received on Thursday, 22 July 1999 09:30:03 UTC