W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 1999

Use of HTML tidy with PHP

From: Justin Farnsworth <jef@eyeintegrated.com>
Date: Sun, 25 Jul 1999 08:14:31 -0400
Message-ID: <379AFFA7.8E7A3CB@eyeintegrated.com>
To: html-tidy@w3.org
Is there anyone on this list that can help out in adding
the capability of using HTML tidy on PHP files?  There
are a lot of us that would be appreciative.  There are
about 500,000 sites with PHP, according to Netcraft.  I
have been trying to hack lexer.c for PHP, but have not
been successful.
I have sent the following to Dave Raggett.


I think that the best thing to do would be to go visit:


where things are explained far better than I could do.

Basically, there are three types of tags that are
recognized by the mod_php3.o module that is compiled into

1.  So-called long tags, where the tags are
        opening         <?php
        closing         ?>

2.  So-called short tags, where the tags are
        opening         <?
        closing         ?>

3.  So-called ASP type tags, where the tags are
        opening         <%
        closing         %>

The above are configurable in a php.ini file at
Apache start-up time.

NOW, we at Eye Integrated use the "long tags".  I did
go into and fiddle a bit, namely, changed the switch
for the lexer state LEX_PROCINST, to detect the
long tag, and then tried to change it to detect the
short tag.  The problem still exists, that in the
lexer state LEX_PROCINST, inside the PHP demarked
stanza, tidy still tries to parse what it thinks
is HTML, and consequently mangles the PHP code.

WHAT IS NEEDED, is that immediately the PHP opening
tag is detected, then the rest of the code is simply
left alone, and put out as it was found, when the
closing tag is detected.

I don't want to get into advocacy, but PHP is used on
many sites, and you can get some idea at


which shows PHP second in popularity of all modules used
in Apache (16 percent).  I am certain that many web
developers would be deeply appreciative of this
"PHP" mode, should it be incorporated into tidy.

This is not flattery for encouraging you, but tidy
is by far the best parser/formatter for HTML that
I have come across.

Justin Farnsworth - Technical Director
Eye Integrated Communications
106 East Victoria Court - Suite A
Greenville, NC 27858 | Tel: (252) 353-0722
Received on Sunday, 25 July 1999 08:18:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:46 UTC