W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2006

[whatwg] 9.1.2.1: trailing slash and atheism

From: Christian Schmidt <whatwg.org@chsc.dk>
Date: Sun, 03 Dec 2006 03:17:39 +0100
Message-ID: <457233C3.6060504@chsc.dk>
Charles Iliya Krempeaux wrote:
> Sometimes web developers parse (non-XML) HTML with an XML parser 
> because it's the tool they have on hand.
> 
> Consider a PHP developer trying to analyse an HTML page.
> 
> If a PHP developer wants to analyse an HTML page; that developer may 
> try to use SimpleXML <http://php.net/simplexml> because that's what
> they have on hand and know how to use.  There's no SimpleHTML
> available in PHP.
> 
> And while none of this is certainly our fault.  This is a situation 
> some web developers are going to run into.  (What else are they going
>  to use?)

PHP developers can parse HTML using DOMDocument::loadHTML(). If they
want, they can then convert the DOMDoucment to SimpleXML:

$doc = new DOMDocument();
$doc->loadHTML('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01
     Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd"><title>Foo</title>
    <body>Foo<br>bar');
$simpleXml = simplexml_import_dom($doc);
print $simpleXml->head->title;


Christian
Received on Saturday, 2 December 2006 18:17:39 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:31 UTC