- From: Leif Halvard Silli <lhs@malform.no>
- Date: Wed, 22 Jul 2009 19:50:21 +0200
- To: Thomas Broyer <t.broyer@ltgt.net>
- CC: HTMLWG <public-html@w3.org>
Thomas Broyer On 09-07-22 15.56: > On Wed, Jul 22, 2009 at 12:42 PM, Leif Halvard Silli wrote: >> Thomas Broyer On 09-07-22 12.03: >> >>> On Wed, Jul 22, 2009 at 11:30 AM, Leif Halvard Silli wrote: >>>> Do we want the anomaly that <?php ... ?> is valid XHTML 5, but invalid >>>> HTML 5? >>> Yes, just like: >>> - <feed xmlns="http://www.w3.org/2005/Atom"> >>> - xmlns:foo="http://example.net" >>> - <foo:bar /> >>> - <p /> >> I thought HTML 5 was about "HTML, in his own right" ... Those are all >> specific XHTML syntax examples > > Given that HTML5's text/html serialization isn't SGML (because you can > count on the fingers of one hand the number of UAs having ever parsed > HTML as SGML), HTML 5 isn't defined as SGML. That - and not because UAs doesn't parse it like that - is the reason why it isn't SGML. If we look at HTML4 from an UA point of view, then even HTML4 isn't SGML. > and that most text/html UAs parse <?php ?> as a "bogus > comment", why wouldn't it be different? Whether <?php ?> and <? > is SGML or a "custom format inspired by SGML" (as HTML 5 roughly puts it in the introduction), is uninteresting. You might just as well tell me that "<p>" is SGML syntax, and that we therefore should not support it. What matters is that both <p> and <? > are supported with a predictable parsing in UAs [1]. By 'parse <?php ?> as a "bogus comment"' I suppose you refer to the wording "bogus comment" in HTML 5. But that doesn't really make anything much clearer. E.g. "PI-comment" would be a better name. >> (although many of us want xmlns="" to be valid in HTML 5 as well). [...] > Note: In HTML, the xmlns attribute has absolutely no effect. It is > basically a talisman. [...] There is a proposal to incorporate RDFa as part of HTML 5 - for that the xmlns attribute is necessary, regardless. At least, that seems to be the rough consensus amongst the RDFa supporters. >> What a strange message to send to PHP users, that they should use XHTML5. >> ;-) [...] > (html5.validator.nu says > that you tried to use an XML PI in an HTML document; however it is > phrased, it means the same: "<?php" isn't HTML). And that is the problem - the false message - it /is/ HTML, as understood by UAs, validators and WYSIWYG editors. > [...] they should know that the validator not reporting any warning > or error doesn't mean their document will, after PHP processing, > be valid (or even well-formed). Of course. This is (too) basic. Is the validator supposed to be a pedagogue and not a validator, perhaps? > My opinion is that <?php being flagged as a parse error (freedom to > the validator to show it as a warning or error, and describe it that > way it wants) is better for PHP users than not showing anything at > all. If the validator is supposed to be a pedagogue that gives friendly advices, then that should be a task that is separate from the error checking. Otherwise it only becomes annoying to use. > If you're using PHP in an HTML document, you're likely to output HTML > markup (elements, attributes; not just attribute values) and therefore > have issues with validation before PHP processing (">" from HTML > markup within PHP strings being seen as the end of the "bogus > comment", "<?php" found within start tags) Again, this depend on the coding style. This is allowed in HTML 4 and in XHTML: <table> <tr><td>content <?php code to insert more rows ?> </table> >>>> What about the UA support, should it be ignored? >>> Which UA support? >> See below. >> >>> In text/html, <?php is parsed as a comment >> According to Live DOM viewer, only Opera and IE render it as a comment in >> the DOM. (See my first message.) And this UA behavior doesn't seem to >> documented in the HTML 5 draft - despite its said parsing focus ... > > It is, but you have to dig deep into the parsing algorithms. Thanks for answering this subject! > In the tokenization stage: > §9.2.4.1 Data state: "<" when content model flag is PCDATA state => > switch to tag open state > §9.2.4.3 Tag open state: if content model flag is PCDATA state, "?" => > parse error, switch to bogus comment state > §9.2.4.16 Bogus comment state: consume everything up to ">" or EOF > (whichever comes first) and emit a comment token whose data is the > "everything" between the "<" and ">" (i.e. it includes the "?") > > Then in the tree construction state, AFAICT, every comment token leads > to inserting a Comment node in the tree. Is "bogus comment" used about other things than the (specified) effect of "<? comment >" ? In other words, is it anything but a negative word for the effect of <? comment > ? The Live DOM viewer do not detect any comments for Firefox and Webkit. Is Live DOM Viewer wrong? Or do Firefox and Webkit not fulfill the spec yet? The way I read Live DOM Viewer, we have 3 different interpretation of <? > when we consider the result in the DOM (but one result if we consider result to the user). >>> (and ends at the first ">" >> Already noted in my first message - the SGML/HTML PI syntax starts with "<?" >> and ends with the first occurrence of ">". >> >>> on at least Firefox and Opera: try it with <?php echo "hello >>> <b>world</b>!"; ?>) >> So it ends with the ">" in "<b>". Where is the news? UAs support the >> SGML/HTML PI syntax - that is why it works like this - and it is also in >> accordance with how the W3 validator sees it. > > You'll note that in WebKit and IE, it ends at the "?>", not the first > ">" (even a "-->" wouldn't end the "bogus comment" in these UAs) May be you are colored by your attitude here: I am unable to verify your claim. All I see is that IE and Webkit - in text/html mode - ends the PI at the first ">". In other words, I don't see the behavior that you describe. E.g. see this Live DOM viewer demo [2]. >>>> What about current >>>> validators - Validator.w3.org and HTML Tidy? > > Forgot to say: > - validator.w3.org uses an SGML parser (except of course for HTML5), > which isn't in par with "HTML as she is spoke" (i.e. <meta /> > generates an error that character cannot appear within <head>, because > the "/" closes the element and the ">" is then parsed as character > data; of course this construct isn't valid HTML4, but the error > message isn't any clearer than an HTML5 validator saying "<?php" is a > "bogus comment") The thing is that the SGML interpretation of the W3 validator is in line with how UAs work - they begin and end the PI/"bogus comment" at the same place. > - HTML Tidy as explicit, limited support for ASP (<% %>), JSTE (<# > #>) and PHP (<?php ?> only, not the <? ?> syntax) Using e.g. an online version of TIDY [3], I am unable to confirm that it doesn't accept the <? ... ?> syntax. When configured to output XHTML, then it will correct <? ... > to <? ... ?>. Otherwise, it doesn't touch it. (But HTML Tidy is very configurable.) >>>> And so on. Should we pretend that support for <? > doesn't exist? >>> In "HTML as she is spoke"? yes. >> I disagree that we should pretend. > > Given that on the 4 main browser engines (Gecko, Trident, Presto, > WebKit), some parse it as a comment and others ignore it altogether > (and this depends on the content of the PHP code too: both IE and > WebKit seem to look for paired quotes with the <?php > construct); If you give an example, then perhaps I'll understand what you refer to w.r.t. IE and Webkit ... > I don't understand how you could say there is any "UA support". Because you can insert <? > into your code and be certain that, as long as you do not place another ">" in between, then UAs will not render the content to the user, *and* they will parse them as the W3 validator does. As for whether it is correct, according to HTML 4, to render <?...> as some kind of comment as Opera and IE do, or if it is correct to ignore them entirely, as Firefox/Webkit do, that I am not certain of. This is of the things that HTML 5 could specify. > (it seems like Opera 9.6 parse it as a ProcessingInstruction !?) Opera renders <?php > as a node named "php", and inserts the content as a comment, is that what you mean? > ...and HTML Tidy as explicit support for it besides HTML, as has been > suggested by others. Again, your interpretation of HTML Tidy seems here to be quite colored by your attitude to the issue - Tidy doesn't treat <?php ?> in any special way. You can even write <?whatever ... >. > On Wed, Jul 22, 2009 at 1:48 PM, Leif Halvard Silli wrote: >> Anne van Kesteren On 09-07-22 12.53: >>> I agree with Simon that if you want stuff like this to work >>> dedicated editor support is needed (and there is to some >>> extent) and potentially modified validators. > > +1 (see above, this is the case for HTML Tidy) Again, it isn't the case for HTML Tidy w.r.t. PI. (How it treats <% %> etc, is another issue.) And to repeat, once more: It is PHP and Biferno that use HTML syntax - not the other way around. Hence, future HTML specifications, such a HTML 5, are responsible for not breaking things that other languages and tools depends on. >>> and also does not work in common PHP scenarios >>> such as >>> >>> <div<?php if($foo) { echo " class='bar'"; }?>> >> >> But since we are talking HTML and not XHTML, you could use >> >> <div class=' <?php if($foo) { echo " bar"; }?>' > >> >> and be valid. > > How about this highly common case: > <input type=checkbox <?php if($foo) { echo "checked" }; ?> > An example of when shorttag is useful - from the W3 validator: <input type=checkbox <?php if($foo) { echo "checked" }; ?> > The construct <foo<bar> is valid in HTML (it is an example of the rather obscure “Shorttags” feature) but its use is not recommended. In most cases, this is a typo that you will want to fix. If you really want to use shorttags, be aware that they are not well implemented by browsers. > <input type=submit <?php if ($bar) { echo "disabled" }; ?> > > <select><option <?php if($baz=='quux') { echo "selected"; } ?> >quux</option>... > > Would you suggest writing them as: > <?php if ($foo) { ?> > <input type=checkbox checked> > <?php } else { ?> > <input type=checkbox> > <?php } ?> > > or in some weird mind: > <input type='checkbox<?php if ($foo) { echo "\x27 checked=\x27"; } ?>' > > > just for the sake of validating before PHP processing? Making PHP pages that are valid before execution is a choice of the author or the authoring tool. >>> et cetera. >> As the HTML 5 draft says: What is possible to do in the DOM vs in text/HTML >> vs XHTML etc, may differ. > > So I wonder which point you're trying to make; just that <? > does not > generate a parse error? I'm not "making a point". I'm saying that the PI syntax should remain a part of HTML because it is widely used and supported. Of course it should not generate a parse error. [1] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/180 [2] http://software.hixie.ch/utilities/js/live-dom-viewer/saved/181 [3] http://infohound.net/tidy/ -- leif halvard silli
Received on Wednesday, 22 July 2009 17:51:08 UTC