Re: PHP code only allowed in XHTML 5? from Leif Halvard Silli on 2009-07-22 (public-html@w3.org from July 2009)

From: Leif Halvard Silli <lhs@malform.no>
Date: Wed, 22 Jul 2009 19:50:21 +0200
To: Thomas Broyer <t.broyer@ltgt.net>
CC: HTMLWG <public-html@w3.org>
Message-ID: <4A67515D.6040400@malform.no>
Thomas Broyer On 09-07-22 15.56:

> On Wed, Jul 22, 2009 at 12:42 PM, Leif Halvard Silli wrote:
>> Thomas Broyer On 09-07-22 12.03:
>>
>>> On Wed, Jul 22, 2009 at 11:30 AM, Leif Halvard Silli wrote:
>>>> Do we want the anomaly that <?php ... ?> is valid XHTML 5, but invalid
>>>> HTML 5?
>>> Yes, just like:
>>>  - <feed xmlns="http://www.w3.org/2005/Atom">
>>>  - xmlns:foo="http://example.net"
>>>  - <foo:bar />
>>>  - <p />
>> I thought HTML 5 was about "HTML, in his own right" ... Those are all
>> specific XHTML syntax examples
> 
> Given that HTML5's text/html serialization isn't SGML (because you can
> count on the fingers of one hand the number of UAs having ever parsed
> HTML as SGML),


HTML 5 isn't defined as SGML. That - and not because UAs doesn't 
parse it like that - is the reason why it isn't SGML. If we look 
at HTML4 from an UA point of view, then even HTML4 isn't SGML.

> and that most text/html UAs parse <?php ?> as a "bogus
> comment", why wouldn't it be different?

Whether <?php ?> and <? > is SGML or a "custom format inspired by 
SGML" (as HTML 5 roughly puts it in the introduction), is 
uninteresting. You might just as well tell me that "<p>" is SGML 
syntax, and that we therefore should not support it. What matters 
is that both <p> and <? > are supported with a predictable parsing 
in UAs [1].

By 'parse <?php ?> as a "bogus comment"' I suppose you refer to 
the wording "bogus comment" in HTML 5. But that doesn't really 
make anything much clearer. E.g. "PI-comment" would be a better name.

>> (although many of us want xmlns="" to be valid in HTML 5 as well).

   [...]

> Note: In HTML, the xmlns attribute has absolutely no effect. It is
> basically a talisman. [...]


There is a proposal to incorporate RDFa as part of HTML 5 - for 
that the xmlns attribute is necessary, regardless. At least, that 
seems to be the rough consensus amongst the RDFa supporters.

>> What a strange message to send to PHP users, that they should use XHTML5.
>> ;-)

    [...]

> (html5.validator.nu says
> that you tried to use an XML PI in an HTML document; however it is
> phrased, it means the same: "<?php" isn't HTML).


And that is the problem - the false message - it /is/ HTML, as 
understood by UAs, validators and WYSIWYG editors.

 
> [...] they should know that the validator not reporting any warning
> or error doesn't mean their document will, after PHP processing, 
> be valid (or even  well-formed).

Of course. This is (too) basic. Is the validator supposed to be a 
pedagogue and not a validator, perhaps?

> My opinion is that <?php being flagged as a parse error (freedom to
> the validator to show it as a warning or error, and describe it that
> way it wants) is better for PHP users than not showing anything at
> all.


If the validator is supposed to be a pedagogue that gives friendly 
advices, then that should be a task that is separate from the 
error checking. Otherwise it only becomes annoying to use.

 
> If you're using PHP in an HTML document, you're likely to output HTML
> markup (elements, attributes; not just attribute values) and therefore
> have issues with validation before PHP processing (">" from HTML
> markup within PHP strings being seen as the end of the "bogus
> comment", "<?php" found within start tags)


Again, this depend on the coding style. This is allowed in HTML 4 
and in XHTML:

<table>
<tr><td>content
<?php code to insert more rows ?>
</table>

 
>>>> What about the UA support, should it be ignored?
>>> Which UA support?
>> See below.
>>
>>> In text/html, <?php is parsed as a comment
>> According to Live DOM viewer, only Opera and IE render it as a comment in
>> the DOM. (See my first message.) And this UA behavior doesn't seem to
>> documented in the HTML 5 draft - despite its said parsing focus ...
> 
> It is, but you have to dig deep into the parsing algorithms.

Thanks for answering this subject!

> In the tokenization stage:
> §9.2.4.1 Data state: "<" when content model flag is PCDATA state =>
> switch to tag open state
> §9.2.4.3 Tag open state: if content model flag is PCDATA state, "?" =>
> parse error, switch to bogus comment state
> §9.2.4.16 Bogus comment state: consume everything up to ">" or EOF
> (whichever comes first) and emit a comment token whose data is the
> "everything" between the "<" and ">" (i.e. it includes the "?")
> 
> Then in the tree construction state, AFAICT, every comment token leads
> to inserting a Comment node in the tree.


Is "bogus comment" used about other things than the (specified) 
effect of "<? comment >" ? In other words, is it anything but a 
negative word for the effect of <? comment > ?

The Live DOM viewer do not detect any comments for Firefox and 
Webkit. Is Live DOM Viewer wrong? Or do Firefox and Webkit not 
fulfill the spec yet? The way I read Live DOM Viewer, we have 3 
different interpretation of <? > when we consider the result in 
the DOM (but one result if we consider result to the user).

 
>>> (and ends at the first ">"
>> Already noted in my first message - the SGML/HTML PI syntax starts with "<?"
>> and ends with the first occurrence of ">".
>>
>>> on at least Firefox and Opera: try it with <?php echo "hello
>>> <b>world</b>!"; ?>)
>> So it ends with the ">" in "<b>". Where is the news? UAs support the
>> SGML/HTML PI syntax - that is why it works like this - and it is also in
>> accordance with how the W3 validator sees it.
> 
> You'll note that in WebKit and IE, it ends at the "?>", not the first
> ">" (even a "-->" wouldn't end the "bogus comment" in these UAs)


May be you are colored by your attitude here: I am unable to 
verify your claim. All I see is that IE and Webkit - in text/html 
mode - ends the PI at the first ">". In other words, I don't see 
the behavior that you describe. E.g. see this Live DOM viewer demo 
[2].

 
>>>> What about current
>>>> validators - Validator.w3.org and HTML Tidy?
> 
> Forgot to say:
>  - validator.w3.org uses an SGML parser (except of course for HTML5),
> which isn't in par with "HTML as she is spoke" (i.e. <meta />
> generates an error that character cannot appear within <head>, because
> the "/" closes the element and the ">" is then parsed as character
> data; of course this construct isn't valid HTML4, but the error
> message isn't any clearer than an HTML5 validator saying "<?php" is a
> "bogus comment")


The thing is that the SGML interpretation of the W3 validator is 
in line with how UAs work - they begin and end the PI/"bogus 
comment" at the same place.

>  - HTML Tidy as explicit, limited support for ASP (<% %>), JSTE (<#
> #>) and PHP (<?php ?> only, not the <? ?> syntax)

Using e.g. an online version of TIDY [3], I am unable to confirm 
that it doesn't accept the <? ... ?> syntax. When configured to 
output XHTML, then it will correct <? ... > to <? ... ?>. 
Otherwise, it doesn't touch it. (But HTML Tidy is very configurable.)

>>>> And so on. Should we pretend that support for <? > doesn't exist?
>>> In "HTML as she is spoke"? yes.
>> I disagree that we should pretend.
> 
> Given that on the 4 main browser engines (Gecko, Trident, Presto,
> WebKit), some parse it as a comment and others ignore it altogether
> (and this depends on the content of the PHP code too: both IE and
> WebKit seem to look for paired quotes with the <?php > construct);


If you give an example, then perhaps I'll understand what you 
refer to w.r.t. IE and Webkit ...

> I don't understand how you could say there is any "UA support".


Because you can insert <? > into your code and be certain that, as 
long as you do not place another ">" in between, then UAs will not 
render the content to the user, *and* they will parse them as the 
W3 validator does.

As for whether it is correct, according to HTML 4, to render 
<?...> as some kind of comment as Opera and IE do, or if it is 
correct to ignore them entirely, as Firefox/Webkit do, that I am 
not certain of. This is of the things that HTML 5 could specify.

> (it seems like Opera 9.6 parse it as a ProcessingInstruction !?)

Opera renders <?php > as a node named "php", and inserts the 
content as a comment, is that what you mean?

> ...and HTML Tidy as explicit support for it besides HTML, as has been
> suggested by others.


Again, your interpretation of HTML Tidy seems here to be quite 
colored by your attitude to the issue - Tidy doesn't treat <?php 
?> in any special way. You can even write <?whatever ... >.

> On Wed, Jul 22, 2009 at 1:48 PM, Leif Halvard Silli wrote:
>> Anne van Kesteren On 09-07-22 12.53:
>>> I agree with Simon that if you want stuff like this to work
>>> dedicated editor support is needed (and there is to some
>>> extent) and potentially modified validators.
> 
> +1 (see above, this is the case for HTML Tidy)


Again, it isn't the case for HTML Tidy w.r.t. PI. (How it treats 
<% %> etc, is another issue.) And to repeat, once more: It is PHP 
and Biferno that use HTML syntax - not the other way around. 
Hence, future HTML specifications, such a HTML 5, are responsible 
for not breaking things that other languages and tools depends on.

 
>>> and also does not work in common PHP scenarios
>>> such as
>>>
>>> <div<?php if($foo) { echo " class='bar'"; }?>>
>>
>> But since we are talking HTML and not XHTML, you could use
>>
>>    <div class=' <?php if($foo) { echo " bar"; }?>' >
>>
>> and be valid.
> 
> How about this highly common case:
> <input type=checkbox <?php if($foo) { echo "checked" }; ?> >


An example of when shorttag is useful - from the W3 validator:

<input type=checkbox <?php if($foo) { echo "checked" }; ?> >

The construct <foo<bar> is valid in HTML (it is an example of the 
rather obscure “Shorttags” feature) but its use is not 
recommended. In most cases, this is a typo that you will want to 
fix. If you really want to use shorttags, be aware that they are 
not well implemented by browsers.

> <input type=submit <?php if ($bar) { echo "disabled" }; ?> >
> <select><option <?php if($baz=='quux') { echo "selected"; } ?> >quux</option>...
> 
> Would you suggest writing them as:
> <?php if ($foo) { ?>
> <input type=checkbox checked>
> <?php } else { ?>
> <input type=checkbox>
> <?php } ?>
> 
> or in some weird mind:
> <input type='checkbox<?php if ($foo) { echo "\x27 checked=\x27"; } ?>' >
> 
> just for the sake of validating before PHP processing?

Making PHP pages that are valid before execution is a choice of 
the author or the authoring tool.

>>> et cetera.
>> As the HTML 5 draft says: What is possible to do in the DOM vs in text/HTML
>> vs XHTML etc, may differ.
> 
> So I wonder which point you're trying to make; just that <? > does not
> generate a parse error?


I'm not "making a point". I'm saying that the PI syntax should 
remain a part of HTML because it is widely used and supported. Of 
course it should not generate a parse error.

[1] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/180
[2] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/181
[3] http://infohound.net/tidy/
-- 
leif halvard silli
Received on Wednesday, 22 July 2009 17:51:08 UTC