Re: PHP code only allowed in XHTML 5? from Leif Halvard Silli on 2009-07-23 (public-html@w3.org from July 2009)

From: Leif Halvard Silli <lhs@malform.no>
Date: Thu, 23 Jul 2009 02:06:07 +0200
To: Thomas Broyer <t.broyer@ltgt.net>
CC: HTMLWG <public-html@w3.org>
Message-ID: <4A67A96F.40808@malform.no>
Thomas Broyer On 09-07-22 23.45:

> On Wed, Jul 22, 2009 at 7:50 PM, Leif Halvard Silli wrote:
>> What matters is that both <p> and <? > are supported with a
>> predictable parsing in UAs [1].
> 
> The parsing is not predictable [2]


Thanks for that demo. You are right - there are some glitches in 
IE and Webkit, as you said. The trouble seems to be related to 
presence of unpaired quote characters (" or ' or [in IE] `) inside 
the <?...> construct. (I.e. the ?> in the first paragraph, is 
outside the PI construct.)

 
>> Is "bogus comment" used about other things than the (specified) effect of
>> "<? comment >" ? In other words, is it anything but a negative word for the
>> effect of <? comment > ?
> 
> It is also triggered when "</" is followed by neither [A-Za-z>] or EOF
> when the content model flag is set to the PCDATA state (§9.2.4.4), and
> when "<!" is followed by neither "--", "DOCTYPE" (case-insensitive
> match) or "[CDATA[" (case-sensitive match, only if the insertion mode
> is "in foreign content" and the current node is not an element in the
> HTML namespace) (§9.2.4.17).

Thanks. So, it is not treated in its own right.

>> The Live DOM viewer do not detect any comments for Firefox and Webkit. Is
>> Live DOM Viewer wrong? Or do Firefox and Webkit not fulfill the spec yet?
>> The way I read Live DOM Viewer, we have 3 different interpretation of <? >
>> when we consider the result in the DOM (but one result if we consider result
>> to the user).
> 
> Actually, three results if we consider result to the user [1].

This can be easily fixed with a requirement to be careful with how 
one inserts quotes.

>>> You'll note that in WebKit and IE, it ends at the "?>", not the first
>>> ">" (even a "-->" wouldn't end the "bogus comment" in these UAs)
>> May be you are colored by your attitude here: I am unable to verify your
>> claim. All I see is that IE and Webkit - in text/html mode - ends the PI at
>> the first ">". In other words, I don't see the behavior that you describe.
>> E.g. see this Live DOM viewer demo [2].
> 
> See this Live DOM viewer demo [1] (compare the second and first
> paragraphs, in WebKit; this sample doesn't demo this behavior in IE)


Your demo [1] confirms that it is the unpaired quote character 
that is the problem, both in IE and in Webkit. Both IE and expects 
the PI to end at the first ">". However, the unpaired quote 
character means gets IE and Webkit to postpone looking for the 
">", and send them on search for the pairing quote character 
instead. Thus, they do not, as I think you said somewhere earlier, 
prefer "?>" over ">". For instance, this explains the treatment of 
the 2nd and 3rd paragraph in IE.

(Btw, please always include a <body> tag in such demos, or else 
the UA, especially IE, may place bits of the elements inside the 
<head> element.)

 
>>>  - HTML Tidy as explicit, limited support for ASP (<% %>), JSTE (<#
>>> #>) and PHP (<?php ?> only, not the <? ?> syntax)
>> Using e.g. an online version of TIDY [3], I am unable to confirm that it
>> doesn't accept the <? ... ?> syntax. When configured to output XHTML, then
>> it will correct <? ... > to <? ... ?>. Otherwise, it doesn't touch it. (But
>> HTML Tidy is very configurable.)
> 
> So it's a documentation omission (the doc only deals with the <?php
> ... ?> syntax when talking about PHP)

OK - I see.

>>> Given that on the 4 main browser engines (Gecko, Trident, Presto,
>>> WebKit), some parse it as a comment and others ignore it altogether
>>> (and this depends on the content of the PHP code too: both IE and
>>> WebKit seem to look for paired quotes with the <?php > construct);
>> If you give an example, then perhaps I'll understand what you refer to
>> w.r.t. IE and Webkit ...
> 
> Compare the 1st and 2nd, and 3rd and 4th paras in [1] (in IE, beware,
> the third <p> is actually parsed as part of the comment from the 2nd
> paragraph, so the forth <p> ends up being the third paragraph in the
> DOM).

Yup - as noted above.

>>> I don't understand how you could say there is any "UA support".
>> Because you can insert <? > into your code and be certain that, as long as
>> you do not place another ">" in between, then UAs will not render the
>> content to the user, *and* they will parse them as the W3 validator does.
> 
> Hopefully my simple example [1] proves it wrong.

Unfortunately, IE and Webkit have a quote character bug, yes. How 
commonly one will get to experience this error, is another issue - 
usually one will pair one's quotes (except when one writes 
"one's", but then one should write "oneʼs" ...) It would be better 
if a validator eventually only warned whenever one failed to pair 
a quote, rather than the current error message for any presence of 
a PI.

>> As for whether it is correct, according to HTML 4, to render <?...> as some
>> kind of comment as Opera and IE do, or if it is correct to ignore them
>> entirely, as Firefox/Webkit do, that I am not certain of. This is of the
>> things that HTML 5 could specify.
>>
>>> (it seems like Opera 9.6 parse it as a ProcessingInstruction !?)
>> Opera renders <?php > as a node named "php", and inserts the content as a
>> comment, is that what you mean?
> 
> No, I mean a ProcessingInstruction node [2] (also change it to end the
> PI with "?>" and notice that there's no difference). Tested in Opera
> 9.64.


OK, interesting.

 
>>> ...and HTML Tidy as explicit support for it besides HTML, as has been
>>> suggested by others.
>> Again, your interpretation of HTML Tidy seems here to be quite colored by
>> your attitude to the issue - Tidy doesn't treat <?php ?> in any special way.
>> You can even write <?whatever ... >.
> 
> I was confused by the documentation.

OK.

>>> On Wed, Jul 22, 2009 at 1:48 PM, Leif Halvard Silli wrote:
>>>> Anne van Kesteren On 09-07-22 12.53:
>>>>> I agree with Simon that if you want stuff like this to work
>>>>> dedicated editor support is needed (and there is to some
>>>>> extent) and potentially modified validators.
>>> +1 (see above, this is the case for HTML Tidy)
>>
>> Again, it isn't the case for HTML Tidy w.r.t. PI. (How it treats <% %> etc,
>> is another issue.) And to repeat, once more: It is PHP and Biferno that use
>> HTML syntax - not the other way around. Hence, future HTML specifications,
>> such a HTML 5, are responsible for not breaking things that other languages
>> and tools depends on.
> 
> PHP is text-based (byte-based actually, unfortunately), not
> HTML-based. W.r.t HTML it is a *pre*processor, there's no real
> relation between PHP and HTML. The fact that PHP uses a PI-like
> construct is to accommodate (some) existing tools 


I don't think this contradict my standpoint. It is also not clear 
to me how "real" the relationship between HTML and PI is in the 
HTML 4.01 spec. There is a relationship - whether "real" or not.

> (w.r.t. XML, it
> allows generating XHTML+PHP with XSLT using
> <xsl:processing-instruction/> rather than <xsl:text
> disable-output-escaping="yes" />)
> ...but that's another debate...


Certainly interesting to those that are into XSLT ... !

 
> [1] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/182
> [2] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/183
-- 

leif halvard silli
Received on Thursday, 23 July 2009 00:06:49 UTC