W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2003

new PHP Bindings for Tidylib

From: John Coggeshall <john@coggeshall.org>
Date: 30 Jul 2003 08:39:05 -0400
To: tidy-develop@lists.sourceforge.net
Cc: html-tidy@w3.org
Message-Id: <1059568748.19291.4.camel@coogle.localdomain>

Hey all:

I hope this is the right list :) I just wanted to announce that I have
written a Tidylib binding for PHP5 (Zend Engine2 only). I'm be happy to
hear feedback about the extension and my implementation.

I've put the extension, basic PHPDocs, working examples, tests, and of
course the extension itself on my web site:

http://www.coggeshall.org/tidy/

Although I haven't written true PHPDocs yet, README_TIDY in the tidy/
directory outlines the API including a description of the OO methods and
proprieties available to accessing the parsed HTML document tree.

I am interested in hearing from the tidy community on my extension
and finding out what everyone thinks of it. There are memleaks which
still need to be tracked down (which I believe either have to do with
ZE2 itself or because I am missing something in my OO implementation),
and I know there are probably bugs.. I'd welcome suggestions, patches, 
and even just "it broke doing this". 

Regards,

John

PS -- Here is a paste of one of the examples for those who are curious
to how the OO stuff works (pulls all <A HREF> links out):

<?php

    /* Create a Tidy Resource */
    $tidy = tidy_create();
    
    /* Parse the document */
    tidy_parse_file($tidy, $_SERVER['argv'][1]);
    
    /* Fix up the document */
    tidy_clean_repair($tidy);
    
    /* Get an object representing everything from the <HTML> tag in */
    $html = tidy_get_html($tidy);
    
    /* Traverse the document tree */
    print_r(get_links($html));
    
    function get_links($node) {
        $urls = array();
        
        /* Check to see if we are on an <A> tag or not */
        if($node->id == TIDY_TAG_A) {
            /* If we are, find the HREF attribute */
            $attrib = $node->get_attr_type(TIDY_ATTR_HREF);
            if($attrib) {
                /* Add the value of the HREF attrib to $urls */
                $urls[] = $attrib->value;
            }
            
        }
        
        /* Are there any children? */
        if($node->has_children()) {
            
            /* Traverse down each child recursively */
            foreach($node->children as $child) {
                   
                /* Append the results from recursion to $urls */
                foreach(get_links($child) as $url) {
                    
                    $urls[] = $url;
                    
                }
                
            }
        }
        
        return $urls;
    }

?>
-- 
-~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~-
John Coggeshall
john at coggeshall dot org                 http://www.coggeshall.org/
-~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~-
Received on Wednesday, 30 July 2003 08:31:39 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:54 UTC