Re: PHP code only allowed in XHTML 5?

On Wed, Jul 22, 2009 at 12:42 PM, Leif Halvard Silli wrote:
> Thomas Broyer On 09-07-22 12.03:
>
>> On Wed, Jul 22, 2009 at 11:30 AM, Leif Halvard Silli wrote:
>>>
>>> Do we want the anomaly that <?php ... ?> is valid XHTML 5, but invalid
>>> HTML 5?
>>
>> Yes, just like:
>>  - <feed xmlns="http://www.w3.org/2005/Atom">
>>  - xmlns:foo="http://example.net"
>>  - <foo:bar />
>>  - <p />
>
> I thought HTML 5 was about "HTML, in his own right" ... Those are all
> specific XHTML syntax examples

Given that HTML5's text/html serialization isn't SGML (because you can
count on the fingers of one hand the number of UAs having ever parsed
HTML as SGML), and that most text/html UAs parse <?php ?> as a "bogus
comment", why wouldn't it be different?

> (although many of us want xmlns="" to be
> valid in HTML 5 as well).

It is (§3.3.3 "Global attributes"):
"""In HTML documents, elements in the HTML namespace may have an xmlns
attribute specified, if, and only if, it has the exact value
"http://www.w3.org/1999/xhtml". This does not apply to XML documents.

Note: In HTML, the xmlns attribute has absolutely no effect. It is
basically a talisman. It is allowed merely to make migration to and
from XHTML mildly easier. When parsed by an HTML parser, the attribute
ends up in no namespace, not the "http://www.w3.org/2000/xmlns/"
namespace like namespace declaration attributes in XML do."""

(unless you were talking about an xmlns attribute with an empty value)

> What a strange message to send to PHP users, that they should use XHTML5.
> ;-)

Absolutely not: that they should not use a validator before PHP
processing, or if they do, understand the error/warning ("parse
error") about <?php being a "bogus comment" (html5.validator.nu says
that you tried to use an XML PI in an HTML document; however it is
phrased, it means the same: "<?php" isn't HTML).

If PHP users use XHTML5 and validate their page before PHP processing,
and their doing any output from within their PHP code, they should
know that the validator not reporting any warning or error doesn't
mean their document will, after PHP processing, be valid (or even
well-formed).

My opinion is that <?php being flagged as a parse error (freedom to
the validator to show it as a warning or error, and describe it that
way it wants) is better for PHP users than not showing anything at
all.

If you're using PHP in an HTML document, you're likely to output HTML
markup (elements, attributes; not just attribute values) and therefore
have issues with validation before PHP processing (">" from HTML
markup within PHP strings being seen as the end of the "bogus
comment", "<?php" found within start tags)

>>> What about the UA support, should it be ignored?
>>
>> Which UA support?
>
> See below.
>
>> In text/html, <?php is parsed as a comment
>
> According to Live DOM viewer, only Opera and IE render it as a comment in
> the DOM. (See my first message.) And this UA behavior doesn't seem to
> documented in the HTML 5 draft - despite its said parsing focus ...

It is, but you have to dig deep into the parsing algorithms.

In the tokenization stage:
§9.2.4.1 Data state: "<" when content model flag is PCDATA state =>
switch to tag open state
§9.2.4.3 Tag open state: if content model flag is PCDATA state, "?" =>
parse error, switch to bogus comment state
§9.2.4.16 Bogus comment state: consume everything up to ">" or EOF
(whichever comes first) and emit a comment token whose data is the
"everything" between the "<" and ">" (i.e. it includes the "?")

Then in the tree construction state, AFAICT, every comment token leads
to inserting a Comment node in the tree.

>> (and ends at the first ">"
>
> Already noted in my first message - the SGML/HTML PI syntax starts with "<?"
> and ends with the first occurrence of ">".
>
>> on at least Firefox and Opera: try it with <?php echo "hello
>> <b>world</b>!"; ?>)
>
> So it ends with the ">" in "<b>". Where is the news? UAs support the
> SGML/HTML PI syntax - that is why it works like this - and it is also in
> accordance with how the W3 validator sees it.

You'll note that in WebKit and IE, it ends at the "?>", not the first
">" (even a "-->" wouldn't end the "bogus comment" in these UAs)

>>> What about current
>>> validators - Validator.w3.org and HTML Tidy?

Forgot to say:
 - validator.w3.org uses an SGML parser (except of course for HTML5),
which isn't in par with "HTML as she is spoke" (i.e. <meta />
generates an error that character cannot appear within <head>, because
the "/" closes the element and the ">" is then parsed as character
data; of course this construct isn't valid HTML4, but the error
message isn't any clearer than an HTML5 validator saying "<?php" is a
"bogus comment")
 - HTML Tidy as explicit, limited support for ASP (<% %>), JSTE (<#
#>) and PHP (<?php ?> only, not the <? ?> syntax)

> And so on. Should we pretend
>>> that support for <? > doesn't exist?
>>
>> In "HTML as she is spoke"? yes.
>
> I disagree that we should pretend.

Given that on the 4 main browser engines (Gecko, Trident, Presto,
WebKit), some parse it as a comment and others ignore it altogether
(and this depends on the content of the PHP code too: both IE and
WebKit seem to look for paired quotes with the <?php > construct); I
don't understand how you could say there is any "UA support".
(it seems like Opera 9.6 parse it as a ProcessingInstruction !?)

...and HTML Tidy as explicit support for it besides HTML, as has been
suggested by others.



On Wed, Jul 22, 2009 at 1:48 PM, Leif Halvard Silli wrote:
> Anne van Kesteren On 09-07-22 12.53:
>> I agree with Simon that if you want stuff like this to work
>> dedicated editor support is needed (and there is to some
>> extent) and potentially modified validators.

+1 (see above, this is the case for HTML Tidy)

>> and also does not work in common PHP scenarios
>> such as
>>
>> <div<?php if($foo) { echo " class='bar'"; }?>>
>
>
> But since we are talking HTML and not XHTML, you could use
>
>    <div class=' <?php if($foo) { echo " bar"; }?>' >
>
> and be valid.

How about this highly common case:
<input type=checkbox <?php if($foo) { echo "checked" }; ?> >
<input type=submit <?php if ($bar) { echo "disabled" }; ?> >
<select><option <?php if($baz=='quux') { echo "selected"; } ?> >quux</option>...

Would you suggest writing them as:
<?php if ($foo) { ?>
<input type=checkbox checked>
<?php } else { ?>
<input type=checkbox>
<?php } ?>

or in some weird mind:
<input type='checkbox<?php if ($foo) { echo "\x27 checked=\x27"; } ?>' >

just for the sake of validating before PHP processing?

>> et cetera.
>
> As the HTML 5 draft says: What is possible to do in the DOM vs in text/HTML
> vs XHTML etc, may differ.

So I wonder which point you're trying to make; just that <? > does not
generate a parse error?

-- 
Thomas Broyer

Received on Wednesday, 22 July 2009 13:57:15 UTC