- From: Leif Halvard Silli <lhs@malform.no>
- Date: Wed, 22 Jul 2009 19:50:21 +0200
- To: Thomas Broyer <t.broyer@ltgt.net>
- CC: HTMLWG <public-html@w3.org>
Thomas Broyer On 09-07-22 15.56:
> On Wed, Jul 22, 2009 at 12:42 PM, Leif Halvard Silli wrote:
>> Thomas Broyer On 09-07-22 12.03:
>>
>>> On Wed, Jul 22, 2009 at 11:30 AM, Leif Halvard Silli wrote:
>>>> Do we want the anomaly that <?php ... ?> is valid XHTML 5, but invalid
>>>> HTML 5?
>>> Yes, just like:
>>> - <feed xmlns="http://www.w3.org/2005/Atom">
>>> - xmlns:foo="http://example.net"
>>> - <foo:bar />
>>> - <p />
>> I thought HTML 5 was about "HTML, in his own right" ... Those are all
>> specific XHTML syntax examples
>
> Given that HTML5's text/html serialization isn't SGML (because you can
> count on the fingers of one hand the number of UAs having ever parsed
> HTML as SGML),
HTML 5 isn't defined as SGML. That - and not because UAs doesn't
parse it like that - is the reason why it isn't SGML. If we look
at HTML4 from an UA point of view, then even HTML4 isn't SGML.
> and that most text/html UAs parse <?php ?> as a "bogus
> comment", why wouldn't it be different?
Whether <?php ?> and <? > is SGML or a "custom format inspired by
SGML" (as HTML 5 roughly puts it in the introduction), is
uninteresting. You might just as well tell me that "<p>" is SGML
syntax, and that we therefore should not support it. What matters
is that both <p> and <? > are supported with a predictable parsing
in UAs [1].
By 'parse <?php ?> as a "bogus comment"' I suppose you refer to
the wording "bogus comment" in HTML 5. But that doesn't really
make anything much clearer. E.g. "PI-comment" would be a better name.
>> (although many of us want xmlns="" to be valid in HTML 5 as well).
[...]
> Note: In HTML, the xmlns attribute has absolutely no effect. It is
> basically a talisman. [...]
There is a proposal to incorporate RDFa as part of HTML 5 - for
that the xmlns attribute is necessary, regardless. At least, that
seems to be the rough consensus amongst the RDFa supporters.
>> What a strange message to send to PHP users, that they should use XHTML5.
>> ;-)
[...]
> (html5.validator.nu says
> that you tried to use an XML PI in an HTML document; however it is
> phrased, it means the same: "<?php" isn't HTML).
And that is the problem - the false message - it /is/ HTML, as
understood by UAs, validators and WYSIWYG editors.
> [...] they should know that the validator not reporting any warning
> or error doesn't mean their document will, after PHP processing,
> be valid (or even well-formed).
Of course. This is (too) basic. Is the validator supposed to be a
pedagogue and not a validator, perhaps?
> My opinion is that <?php being flagged as a parse error (freedom to
> the validator to show it as a warning or error, and describe it that
> way it wants) is better for PHP users than not showing anything at
> all.
If the validator is supposed to be a pedagogue that gives friendly
advices, then that should be a task that is separate from the
error checking. Otherwise it only becomes annoying to use.
> If you're using PHP in an HTML document, you're likely to output HTML
> markup (elements, attributes; not just attribute values) and therefore
> have issues with validation before PHP processing (">" from HTML
> markup within PHP strings being seen as the end of the "bogus
> comment", "<?php" found within start tags)
Again, this depend on the coding style. This is allowed in HTML 4
and in XHTML:
<table>
<tr><td>content
<?php code to insert more rows ?>
</table>
>>>> What about the UA support, should it be ignored?
>>> Which UA support?
>> See below.
>>
>>> In text/html, <?php is parsed as a comment
>> According to Live DOM viewer, only Opera and IE render it as a comment in
>> the DOM. (See my first message.) And this UA behavior doesn't seem to
>> documented in the HTML 5 draft - despite its said parsing focus ...
>
> It is, but you have to dig deep into the parsing algorithms.
Thanks for answering this subject!
> In the tokenization stage:
> §9.2.4.1 Data state: "<" when content model flag is PCDATA state =>
> switch to tag open state
> §9.2.4.3 Tag open state: if content model flag is PCDATA state, "?" =>
> parse error, switch to bogus comment state
> §9.2.4.16 Bogus comment state: consume everything up to ">" or EOF
> (whichever comes first) and emit a comment token whose data is the
> "everything" between the "<" and ">" (i.e. it includes the "?")
>
> Then in the tree construction state, AFAICT, every comment token leads
> to inserting a Comment node in the tree.
Is "bogus comment" used about other things than the (specified)
effect of "<? comment >" ? In other words, is it anything but a
negative word for the effect of <? comment > ?
The Live DOM viewer do not detect any comments for Firefox and
Webkit. Is Live DOM Viewer wrong? Or do Firefox and Webkit not
fulfill the spec yet? The way I read Live DOM Viewer, we have 3
different interpretation of <? > when we consider the result in
the DOM (but one result if we consider result to the user).
>>> (and ends at the first ">"
>> Already noted in my first message - the SGML/HTML PI syntax starts with "<?"
>> and ends with the first occurrence of ">".
>>
>>> on at least Firefox and Opera: try it with <?php echo "hello
>>> <b>world</b>!"; ?>)
>> So it ends with the ">" in "<b>". Where is the news? UAs support the
>> SGML/HTML PI syntax - that is why it works like this - and it is also in
>> accordance with how the W3 validator sees it.
>
> You'll note that in WebKit and IE, it ends at the "?>", not the first
> ">" (even a "-->" wouldn't end the "bogus comment" in these UAs)
May be you are colored by your attitude here: I am unable to
verify your claim. All I see is that IE and Webkit - in text/html
mode - ends the PI at the first ">". In other words, I don't see
the behavior that you describe. E.g. see this Live DOM viewer demo
[2].
>>>> What about current
>>>> validators - Validator.w3.org and HTML Tidy?
>
> Forgot to say:
> - validator.w3.org uses an SGML parser (except of course for HTML5),
> which isn't in par with "HTML as she is spoke" (i.e. <meta />
> generates an error that character cannot appear within <head>, because
> the "/" closes the element and the ">" is then parsed as character
> data; of course this construct isn't valid HTML4, but the error
> message isn't any clearer than an HTML5 validator saying "<?php" is a
> "bogus comment")
The thing is that the SGML interpretation of the W3 validator is
in line with how UAs work - they begin and end the PI/"bogus
comment" at the same place.
> - HTML Tidy as explicit, limited support for ASP (<% %>), JSTE (<#
> #>) and PHP (<?php ?> only, not the <? ?> syntax)
Using e.g. an online version of TIDY [3], I am unable to confirm
that it doesn't accept the <? ... ?> syntax. When configured to
output XHTML, then it will correct <? ... > to <? ... ?>.
Otherwise, it doesn't touch it. (But HTML Tidy is very configurable.)
>>>> And so on. Should we pretend that support for <? > doesn't exist?
>>> In "HTML as she is spoke"? yes.
>> I disagree that we should pretend.
>
> Given that on the 4 main browser engines (Gecko, Trident, Presto,
> WebKit), some parse it as a comment and others ignore it altogether
> (and this depends on the content of the PHP code too: both IE and
> WebKit seem to look for paired quotes with the <?php > construct);
If you give an example, then perhaps I'll understand what you
refer to w.r.t. IE and Webkit ...
> I don't understand how you could say there is any "UA support".
Because you can insert <? > into your code and be certain that, as
long as you do not place another ">" in between, then UAs will not
render the content to the user, *and* they will parse them as the
W3 validator does.
As for whether it is correct, according to HTML 4, to render
<?...> as some kind of comment as Opera and IE do, or if it is
correct to ignore them entirely, as Firefox/Webkit do, that I am
not certain of. This is of the things that HTML 5 could specify.
> (it seems like Opera 9.6 parse it as a ProcessingInstruction !?)
Opera renders <?php > as a node named "php", and inserts the
content as a comment, is that what you mean?
> ...and HTML Tidy as explicit support for it besides HTML, as has been
> suggested by others.
Again, your interpretation of HTML Tidy seems here to be quite
colored by your attitude to the issue - Tidy doesn't treat <?php
?> in any special way. You can even write <?whatever ... >.
> On Wed, Jul 22, 2009 at 1:48 PM, Leif Halvard Silli wrote:
>> Anne van Kesteren On 09-07-22 12.53:
>>> I agree with Simon that if you want stuff like this to work
>>> dedicated editor support is needed (and there is to some
>>> extent) and potentially modified validators.
>
> +1 (see above, this is the case for HTML Tidy)
Again, it isn't the case for HTML Tidy w.r.t. PI. (How it treats
<% %> etc, is another issue.) And to repeat, once more: It is PHP
and Biferno that use HTML syntax - not the other way around.
Hence, future HTML specifications, such a HTML 5, are responsible
for not breaking things that other languages and tools depends on.
>>> and also does not work in common PHP scenarios
>>> such as
>>>
>>> <div<?php if($foo) { echo " class='bar'"; }?>>
>>
>> But since we are talking HTML and not XHTML, you could use
>>
>> <div class=' <?php if($foo) { echo " bar"; }?>' >
>>
>> and be valid.
>
> How about this highly common case:
> <input type=checkbox <?php if($foo) { echo "checked" }; ?> >
An example of when shorttag is useful - from the W3 validator:
<input type=checkbox <?php if($foo) { echo "checked" }; ?> >
The construct <foo<bar> is valid in HTML (it is an example of the
rather obscure “Shorttags” feature) but its use is not
recommended. In most cases, this is a typo that you will want to
fix. If you really want to use shorttags, be aware that they are
not well implemented by browsers.
> <input type=submit <?php if ($bar) { echo "disabled" }; ?> >
> <select><option <?php if($baz=='quux') { echo "selected"; } ?> >quux</option>...
>
> Would you suggest writing them as:
> <?php if ($foo) { ?>
> <input type=checkbox checked>
> <?php } else { ?>
> <input type=checkbox>
> <?php } ?>
>
> or in some weird mind:
> <input type='checkbox<?php if ($foo) { echo "\x27 checked=\x27"; } ?>' >
>
> just for the sake of validating before PHP processing?
Making PHP pages that are valid before execution is a choice of
the author or the authoring tool.
>>> et cetera.
>> As the HTML 5 draft says: What is possible to do in the DOM vs in text/HTML
>> vs XHTML etc, may differ.
>
> So I wonder which point you're trying to make; just that <? > does not
> generate a parse error?
I'm not "making a point". I'm saying that the PI syntax should
remain a part of HTML because it is widely used and supported. Of
course it should not generate a parse error.
[1] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/180
[2] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/181
[3] http://infohound.net/tidy/
--
leif halvard silli
Received on Wednesday, 22 July 2009 17:51:08 UTC