Comments on XHTML Media Types Note - Second Edition from Simon Pieters on 2009-02-13 (public-xhtml2@w3.org from February 2009)

From: Simon Pieters <simonp@opera.com>
Date: Fri, 13 Feb 2009 10:23:04 +0100
To: "public-xhtml2@w3.org" <public-xhtml2@w3.org>
Message-ID: <op.uo99oqgwidj3kv@zcorpandell.linkoping.osa>
I noticed that a second edition of the XHTML Media Types Note was published -- congratulations, it's a much improved edition compared to the previous one and compared to Appendix C of XHTML 1.0.

Speaking of Appendix C, have you considered updating XHTML 1.0 so that Appendix C points to this note instead of including obsolete guidelines?


I sent comments on the note while it was an Editor's draft last year:

   http://lists.w3.org/Archives/Public/public-xhtml2/2008Sep/0000.html

...which had some follow-ups:

   http://lists.w3.org/Archives/Public/public-xhtml2/2008Oct/thread.html#msg25

This email never got a reply, though:

   http://lists.w3.org/Archives/Public/public-xhtml2/2008Oct/0028.html

Why did you not get back to me asking if the changes were ok before publishing?


Looking at the new edition, I see the following comments from my original email still apply:


On Mon, 01 Sep 2008 19:21:02 +0200, Simon Pieters <simonp@opera.com> wrote:

> > Also, some user agents interpret the XML declaration to mean that the  
> > document is unrecognized XML rather than HTML.
>
> Which ones? I don't know of any.


> > Such user agents may not render the document as expected. For  
> > compatibility with these types of HTML browsers, you should avoid using  
> > processing instructions and XML declarations.
>
> You forgot the most important rationale: it makes IE6 trigger quirks mode.


> > A.2. Elements that can never have content
> >
> > If an element has an EMPTY content model DO use the minimized tag syntax  
> > permitted by XML (e.g., <br />). DO NOT use the alternative syntax  
> > (e.g., <br></br>) allowed by XML, since this may be unsupported by HTML  
> > user agents.
>
> What do you mean with "unsupported"? AFAIK, </br> is treated as <br> in  
> HTML UAs and other end tags are ignored.


> > Also, DO include a space before the trailing / and >.
>
> Why?
>
> AFAIK, this is only a problem for NS4 when not using any attributes (<br/>  
> would be treated as an element "br/" instead of "br"). Considering that  
> NS4 is irrelevant, this advice could well be dropped.


> > A.3. Elements that have no content
> >
> > If an element permits content (e.g., the p element) but an instance of  
> > that element has no content (e.g., an empty paragraph), DO NOT use the  
> > "minimized" tag syntax (e.g., <p />).
> >
> > Rationale: HTML user agents may give uncertain results when using the  
> > the minimized syntax permitted by XML when an element has no content.
>
> They give very certain results, AFAIK: they uniformly ignore the slash.


> > A.4. Embedded Style Sheets and Scripts
> >
> > DO use external style sheets if your style sheet uses < or & or ]]> or  
> > --. DO NOT use an internal stylesheet if the style rules contain any of  
> > the above characters.
> > DO use external scripts if your script uses < or & or ]]> or --. DO NOT  
> > embed a script in a document if it contains any of these characters.
>
> Why?
>
> If you use < and/or &, you could just use
>
>     <script>//<![CDATA[
>     ...
>     //]]></script>
>
> or
>
>     <style>/*<![CDATA[*/
>     ...
>     /*]]>*/</style>
>
> If you use ]]> you could just escape it as ]]\>.
>
> Why is -- a problem?

It still doesn't say what to do if you want to include a script inline that contains ]]> inline and why -- is a problem (AFAIK it's not a problem).


> > A.5. Line Breaks within Attribute Values
> >
> > DO ensure that attribute values are on a single line and only use single  
> > whitespace characters. DO NOT use line breaks and multiple consecutive  
> > white space characters within attribute values.
>
> I understand linebreaks but why only a single whitespace? Also it would be  
> good to give advice about what to do if you actually want a linebreak or a  
> tab in an attribute value (i.e. use character references).
>
>
> > Rationale: These are handled inconsistently by user agents.
>
> Or rather: XML requires whitespace to be normalized to spaces. Or is there  
> inconsistency that I don't know about?


> > A.8. Fragment Identifiers
> >
> > DO use the id attribute to identify elements.
> >
> > DO ensure that the values used for the id attribute are limited to the  
> > pattern [A-Za-z][A-Za-z0-9:_.-]*.
> >
> > DO NOT use the name attribute to identify elements, even in languages  
> > that permit the use of name such as XHTML 1.0.
>
> Why not allow to use both?


> > A.10. Boolean Attributes
> >
> > DO use the full form for boolean attributes, as required by XML (e.g.,  
> > disabled="disabled"). Such attributes include: compact, nowrap, ismap,  
> > declare, noshade, checked, disabled, readonly, multiple, selected,  
> > noresize, and defer.
>
> Isn't valid XHTML the baseline?


> > A.13. Cascading Style Sheets (CSS) and XHTML
> >
> > DO use lower case element and attribute names in style sheets. DO create  
> > rules that include inferred elements (e.g., the tbody element in a  
> > table).
> >
> > Rationale: These simple rules will help increase the portability of CSS  
> > rules regardless of the media type the document is processed as.
>
> Hmm. Including inferred elements seems like a way to be *in*compatible,  
> since they aren't inferred in application/xhtml+xml.


> > A.14. Referencing Style Elements when serving as XML
> >
> > DO NOT use xml stylesheet declarations to identify style sheets.
> >
> > DO use the style or link elements to define stylesheets.
> >
> > Rationale: Since XML processing instructions may be rendered by some  
> > HTML user agents, using the standard XML stylesheet declaration  
> > mechanism may not work well. However, since XHTML user agents are  
> > required to process style and link elements and interpret stylesheets  
> > referenced from those elements, documents constructed to use them will  
> > work as expected.
>
> Or more likely, XML processing instructions are dropped or parsed into  
> comments in HTML UAs.

That is, the fact that some irrelevant HTML UAs render XML PIs is besides the point. The point is that xml-stylesheet doesn't work in HTML at all.


> > A.15. Formfeed Character in HTML vs. XML
> >
> > DO NOT use the formfeed character (U+000C).
> >
> > Rationale: This character is recognized as white space in HTML 4, but is  
> > NOT considered white space in XML.
>
> Where is it said that U+000C is whitespace in HTML 4?

(I still disagree that it's "recognized as white space in HTML 4".)

> Also, not only is it not considered whitespace in XML, it's not  
> well-formed XML.

(This should be the real rationale for the guideline.)

Isn't well-formed XML (and even valid XHTML) the baseline? Or do you expect that authors start with HTML and apply the guidelines when they move to XHTML? I have assumed the former but I keep getting confused because of the existence of guidelines such as this one. If it is the latter then there are many things missing. It would be useful if the note said which is intended.


 * * *

Even when the points above are addressed there are still various issues with the Note (introduced after I sent my original comments). If I find time and motivation I might point them out.

Cheers,
-- 
Simon Pieters
Opera Software
Received on Friday, 13 February 2009 09:23:51 UTC