PHP code only allowed in XHTML 5?

Summary: Please add back support for SGML processing instructions as 
they are well supported and much used.

Suppose one wants to validate a page that embeds PHP, but that one wants 
to do so /prior/ to the execution of the PHP script:

        <?php Print "Hello, World!"; ?>

Validator.nu will then tell you that the page is _invalid_ as a HTML5 
page. But that it is valid as a XHTML5 page ... Also, checking the HTML 
5 draft, you will find nothing about the <?> syntax.

And this despite the fact that <?php ... ?> is valid both as a SGML/HTML 
processing instruction [1][2]  and as a XML/XHTML [3] processing 
instruction. Thus the code is also valid  as a HTML 4 or a XHTML 1 
fragment. User agents (IE, Opera, Firefox, Webkit, Lynx etc) support 
processing instructions ("PI") too: just as HTML 4 and XML require them 
to do, they ignore them. In Firefox and Webkit, a PI simply disappear  
(they are not reflected in the DOM at all). In IE they are treated as 
comments.  In Opera they are treated as either "unnamed"  (<?>) or named 
(e.g. <?php >, <?biferno >)  comment  nodes. See the Live DOM viewer 
demo[4].

There is a slight difference between the minimal SGML/HTML syntax, which 
is <? ... >, and the XML/XHTML syntax, which is <?instruction ?>. HTML 
User Agents accepts the SGML syntax. However, as we can see from the 
Hello World example above, PHP prefers the XML syntax ( "?>" ), probably 
because it is valid both as XML and as HTML.

Due to HTML 5's failure to specify the <? > syntax (and due to XML 
paranoia ... ?), Validator.nu currently throws an error, blaming XML, as 
soon as it sees a "<?":

    "Error: Saw <?. Probable cause: Attempt to use an XML processing 
instruction in HTML. (XML processing instructions are not supported in 
HTML.)"

PHP is not the only language that takes advantage of HTML's support for 
processing instructions, however. Another example is  Biferno[5], and 
there most certainly are others. Of course, "real" XML processing 
instructions is an example of its own - by allowing PI it would become 
easier to create pages that are valid both as XHTML 5 and HTML 5.

If the HTML 5 spec ends up not defining the PI element, then one will 
have to rely on the HTML 4 specifications for getting info on it. 
Additionally, it gets difficult to use a HTML validator as an authoring 
tool if it throws an error as soon as it sees an "<?". More 
fundamentally, PI is one of the extension mechanisms of HTML 4 - not 
including them in HTML 5 makes HTML less extendable than HTML 4 and 
removes an important non-UA interface of HTML.

[1] http://www.w3.org/TR/html401/conform#h-4.2
[2] http://www.w3.org/TR/html401/appendix/notes#h-B.3.6
[3] http://www.w3.org/TR/REC-xml/#sec-pi
[4] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/180
[5] http://www.tabasoft.it/biferno/index.bfr
-- 
Leif Halvard Silli

Received on Wednesday, 22 July 2009 01:07:49 UTC