Re: Comments on XHTML Media Types Note 20080827 from Shane McCarron on 2008-10-22 (public-xhtml2@w3.org from October 2008)

From: Shane McCarron <shane@aptest.com>
Date: Wed, 22 Oct 2008 10:23:22 -0500
To: Simon Pieters <simonp@opera.com>
CC: "public-xhtml2@w3.org" <public-xhtml2@w3.org>
Message-ID: <48FF456A.8000902@aptest.com>
Simon,

Thanks very much for your thorough review of the draft XHTML Media Types 
Note.  The XHTML 2 Working Group continues to make progress on this 
document, and expects to update the published Note in the near future.  
Many of your changes have been included in the current editors draft - 
some notes on your comments are below.  A few of your comments raised 
further questions.  I am going to split those out into a separate thread. 

As always, you can find the current editors draft via 
http://www.w3.org/MarkUp/Drafts#xhtmlmime

Thanks again for your comments - they were a big help!


Simon Pieters wrote:
>
> This abstract sucks. It shouldn't use RFC2119 terms. It shouldn't 
> summarize the spec. It shouldn't give notes or advice about things. It 
> shouldn't contain references or pointers.
>
> It should describe in abstract terms what the Note does and why it 
> exists.
>
> e.g. "This Note contains advice about how to serve XHTML markup to 
> different UAs and advice on how such markup should look in order to 
> work as intended in common UAs when served with different media 
> types." would be a better abstract. Better still would be to also 
> explain why anyone would want to do so (instead of just using HTML or 
> just XHTML).

Changed.
>
>
>> 2. Terms and Definitions
>>
>> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
>> "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
>> document are to be interpreted as described in RFC 2119 [RFC2119].
>
> This document isn't normative. Why reference RFC2119 at all? I'd 
> suggest to remove and use non-RFC2119 terms throughout to avoid 
> confusion.

Removed
>>
>> This section summarizes which Internet media type SHOULD be used for 
>> which XHTML Family document for which purpose.
>>
>> A combination of these rules, in conjunction with a careful 
>> examination of the HTTP Accept header, can be useful in determining 
>> which media type to use when a document adheres to the guidelines in 
>> Appendix A. Specifically:
>>
>>     1. if the Accept header explicitly contains application/xhtml+xml 
>> deliver the document using that media type.
> 3. Recommended Media Type Usage
>
> This is not appropriate since it doesn't consider the q parameter, nor 
> does it consider wildcards. Consider:
>
>    Accept: text/html, application/xhtml+xml; q=0
>
> ...or:
>
>    Accept: application/*, text/*; q=0.5

Changed.
>
>
>>     2. Otherwise, if the Accept header contains text/html, deliver 
>> the document using that media type.
>>     3. Otherwise, deliver the document using media type text/html.
>
> Step 2 can be struck.
>
>
>> In other words, requestors that advertise they support XHTML family 
>> documents will receive the document in the XHTML media type, and all 
>> other requestors will receive the document using the HTML media type.
>
> This is not appropriate when the UA Accepts neither (should give a 406).

Expanded upon to try to clarify when this should be delivered.  However, 
I don't think this note should be a comprehensive document on content 
negotiation.  I would prefer that we reference TAG findings or other 
relevant sources for more details.

>
>
>> When a document does NOT adhere to the guidelines, it SHOULD NOT be 
>> delivered as media type text/html. If such documents need to be 
>> delivered to requestors who do not explicitly support the XHTML 
>> family, those documents should be transformed into valid HTML and 
>> then delivered as such.
>
> Documents that *do* adhere to the guidelines aren't valid HTML. Why do 
> documents that don't need to be transformed into valid HTML instead 
> of, say, be transformed into XHTML that adheres to the guidelines?

Interesting point.  I have added an issue to the agenda for this week to 
discuss it.  Certainly you are correct that such a document would not 
validate as HTML - its DOCTYPE is wrong for one thing.  I don't think 
the goal of the compatibility guidelines has ever been that people 
deliver valid HTML when falling back - rather that they deliver valid 
XHTML that will "work" in current HTML user agents.

>
>
>> Note: It is possible that in the future XHTML Modularization will 
>> define rules for indicating which specific XHTML family members are 
>> supported by a requestor (e.g., via the profile parameter of the 
>> media type in the Accept header). Such rules, when used in 
>> conjunction with the "quality" parameter of the media type could help 
>> a server determine which of several versions of a document to deliver.
>
> Well we could start with getting the q parameter right... :-)
>
> In any case, why would it be useful to know if a UA claims to support 
> a specific XHTML family member? What would you do with that information?

Added some clarifying text.
>
>
>> 3.1. 'text/html'
>>
>>
>> "5.2.2 Specifying the character encoding" of the HTML 4 specification 
>> [HTML4] also notes that "user agents must not assume any default 
>> value for the "charset" parameter". Therefore, authors SHOULD NOT 
>> assume any default value for an XHTML document served as 'text/html', 
>> and as mentioned in [RFC2854], the use of an explicit charset 
>> parameter is STRONGLY RECOMMENDED. When it is difficult to specify an 
>> explicit charset parameter through a higher-level protocol (e.g., 
>> HTTP), authors SHOULD include the XML declaration (e.g., <?xml 
>> version="1.0" encoding="EUC-JP"?>) and a meta http-equiv statement 
>> (e.g. <meta http-equiv="Content-Type" content="text/html; 
>> charset=EUC-JP" />). See guideline 9 for details.
>
> This is giving the opposite advice from A.1, which says to omit the 
> XML declaration and, as a consequence, use UTF-8 or UTF-16 when it is 
> difficult to specify an explicit charset parameter through a 
> higher-level protocol.
>
> Which advice is correct?

Appendix A is updated.  Nice catch.
>
>
>> 3.2. 'application/xhtml+xml'
>>
>> The 'application/xhtml+xml' media type [RFC3236] is the primary media 
>> type for XHTML Family document types, and in particular it is 
>> suitable for all XHTML Host Language document types. XHTML Family 
>> document types suitable for this media type include [XHTML1], 
>> [XHTMLBasic], [XHTML11] and [XHTML+MathML]. An XHTML Host Language 
>> document type that adds elements and attributes from foreign 
>> namespaces MAY identify its profile with the 'profile' optional 
>> parameter or other means such as the "Content-features" MIME header 
>> described in RFC 2912 [RFC2912]. Each namespace SHOULD be explicitly 
>> identified through namespace declaration [XMLNS]. This document does 
>> not preclude the registration of its own media type for specific 
>> XHTML Host Language document type.
>>
>> In general, this media type is NOT suitable for XHTML Integration Set 
>> document types. This document does not define which media type should 
>> be used for XHTML Integration Set document types.
>
> Why mention XHTML Integration Set document types at all?

Because someone will ask.
>
>> Generic XML processors might recognize it as just an XML document 
>> which includes elements and attributes from the XHTML namespace (and 
>> others), and may not have a priori knowledge what to do with such a 
>> document beyond they can do for generic XML documents.
>
> I think "XML processors" isn't what is meant here. An XML processor 
> alone wouldn't constitute a UA and by definition has no knowledge of 
> XHTML.
>
> Assuming s/processors/UAs/, how is this different from generic XML UAs 
> processing application/xhtml+xml? Why do authors need to know this?
We changed the term - we are talking about User Agents that use XML 
natively.
>
>> Authors SHOULD explicitly identify the XHTML namespace through the 
>> namespace declaration when they serve an XHTML Family document as 
>> 'application/xml' to facilitate the chance for reliable processing.
>
> Um. Isn't this always required? "facilitate the chance for reliable 
> processing"? Is there a chance that it will fail? What is unreliable? 
> If you don't include it, it won't be interpreted as XHTML; if you do, 
> it will.

Expanded.
>
>
>> The XML stylesheet PI SHOULD be used to associate style sheets.
>
> Why?

Expanded.
>
>
>> Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather 
>> than 'application/xml'.
>
> Why?

Added some text.
>
>
>> As for character encoding issues, "3.2 Application/xml Registration" 
>> of [RFC3023] says that "the use of the charset parameter is STRONGLY 
>> RECOMMENDED", and also specifies a rule that "[i]f an application/xml 
>> entity is received where the charset parameter is omitted, no 
>> information is being provided about the charset by the MIME 
>> Content-Type header". This means that conforming XML processors MUST 
>> follow the requirements described in section 4.3.3 of [XML10].
>>
>> Therefore, while it is STRONGLY RECOMMENDED to specify an explicit 
>> charset parameter through a higher-level protocol, authors SHOULD 
>> include the XML declaration (e.g. <?xml version="1.0" 
>> encoding="EUC-JP"?>). Note that a meta http-equiv statement will not 
>> be recognized by XML processors, and while authors MAY include such a 
>> statement a statement in an XHTML document served as 
>> 'application/xml' it will not effect processing of the document since 
>> the higher level protocol and the XML PI both take precedence.
>
> "Take precedence" makes it sound like the meta would do something when 
> the higher level protocol doesn't say anything and the XML declaration 
> is absent. It does not.
>

Fixed.

>
> Why is application/xhtml+xml "MAY" for XHTML Family (HTML 4 
> compatible) but "SHOULD" for other XHTML?

Because for other XHTML, delivering as anything else doesn't make much 
sense.  There will be other namespaces included, or it will be otherwise 
incompatible with HTML user agents so there is no reason for it to be 
delivered as "text/html".

>
>> A.8. Fragment Identifiers
>>
>> DO use the id attribute to identify elements.
>>
>> DO ensure that the values used for the id attribute are limited to 
>> the pattern [A-Za-z][A-Za-z0-9:_.-]*.
>>
>> DO NOT use the name attribute to identify elements, even in languages 
>> that permit the use of name such as XHTML 1.0.
>
> Why not allow to use both?

As explained in the rationale, it is redundant and unnecessary. 
Moreover, @name is not supported in XHTML Family markup languages other 
than XHTML 1.0.
>
>
>> Rationale: In HTML 3.2 and earlier the name attribute on some 
>> elements could be used to define an anchor, but HTML 4 introduced the 
>> id attribute. In an XML dialect, only attributes with type ID are 
>> permitted to be used as anchors, and the id attribute is defined to 
>> be of type ID. Relying upon the id attribute as an anchor will work 
>> well in modern HTML and XHTML-aware user agents.
>
>> A.15. Formfeed Character in HTML vs. XML
>>
>> DO NOT use the formfeed character (U+000C).
>>
>> Rationale: This character is recognized as white space in HTML 4, but 
>> is NOT considered white space in XML.
>
> Where is it said that U+000C is whitespace in HTML 4?
>
> In the SGML declaration for HTML 4 I find:
>
>          FUNCTION
>                   RE            13
>                   RS            10
>                   SPACE         32
>                   TAB SEPCHAR    9
>
> ...which seems to suggest that only U+000A, U+000D, U+0020 and U+0009 
> are whitespace.
>
>
> Also, not only is it not considered whitespace in XML, it's not 
> well-formed XML.
The SGML declaration for HTML 4 is inconsistent with the HTML 4 
recommendation, which explicitly states that form feed is a whitespace 
character - http://www.w3.org/TR/html401/struct/text.html#didx-white_space-1

So we tell people to not use it at all - that way there is no 
incompatibility risk. 


-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Wednesday, 22 October 2008 15:24:18 UTC