Re: let authors choose text/html or application/xhtml+xml (detailed review of section 1. Introduction)

Hi Dean,

On Aug 31, 2007, at 7:50 AM, Dean Edridge wrote:

>
> Dan Connolly wrote:
>> Regarding this paragraph:
>>
>> "XHTML documents (XML documents using elements from the HTML  
>> namespace)
>> that use the new features described in this specification and that  
>> are
>> served over the wire (e.g. by HTTP) must be sent using an XML MIME  
>> type
>> such as application/xml or application/xhtml+xml and must not be  
>> served
>> as text/html."
>>
>> XHTML documents served as text/html result in interoperable behavior
>> in typical cases, so that constraint is too strong. Please change
>> it to "SHOULD be sent..." and "SHOULD NOT be served...".
>>
>> for reference:
>>
>> http://www.w3.org/html/wg/html5/
>> 24 August 2007
>> 1.218 Fri Aug 24 22:56:42 2007 UTC
>>
>> and
>>
>> [[
>> 6. Guidance in the use of these Imperatives
>>
>>    Imperatives of the type defined in this memo must be used with  
>> care
>>    and sparingly.  In particular, they MUST only be used where it is
>>    actually required for interoperation or to limit behavior which  
>> has
>>    potential for causing harm (e.g., limiting retransmisssions)  For
>>    example, they must not be used to try to impose a particular  
>> method
>>    on implementors where the method is not required for
>>    interoperability.
>> ]]
>>  -- http://www.ietf.org/rfc/rfc2119.txt
>>
> How much longer do we need to go on pretending that XHTML can be  
> sent as text/html Dan? This is ridiculous. Hasn't the W3C learnt  
> it's lesson with XHTML's failure over the last 8 years.
>
> Exactly who benefits from the myth of XHTML being able to be sent  
> as text/html? Not you, me, the W3c or anyone, and certainly not  
> XHTML it self.

I"m not sure why you call it a myth. I'm sure we can find countless  
sites that serve valid XHTML files as text/html. This discussion  
keeps popping up, but so far no one has been able to articulate what  
the dangers are in doing so.

> So what's the incentive for misleading people like this? Is it  
> because the W3C doesn't want to admit it got it wrong?  
> Unfortunately, as of today, XHTML can be described as a failure and  
> all the valid XHTML sites in the world could be listed on one small  
> piece of paper.

Authors clearly want to move to XHTML. I would imagine the main thing  
holding them back is that well over half of the visitors to their  
site would not be able to view their content if they served it as  
application/xhtml+xml. So I wouldn't say the slowness to deploy  
XHTML  has necessarily been due to any failure on the W3C's part. The  
W3C can provide these recommendations, but its up to the implementors  
to implement the feature that authors have clearly shown an interest  
in (and I expect many users would appreciate if they knew of the  
capabilities).

> One of the main reasons for this is because the W3C hasn't made it  
> clear to developers and browser manufacturers that it's the media- 
> type ("application/xhtml+xml") that people need to get used to, not  
> just the XML syntax of XHTML, and it's the media-type that makes  
> the document XHTML.

We've been discussing this at length on the "review of content type  
rules by IETF/HTTP community"  thread (see also the wiki page [1]). I  
think a more accurate way to think of it is that a file's type is  
determined by the internals of the file and the authoring tool. There  
is a separate issue in that  files of certain types can be handled as  
files of other certain types. For example a file of type text/html  
can be handled as type text/plain. However, that does not make the  
file a plain text file, it merely handles it that way. Imagine if a  
UA had a menu to change a files handling. You wouldn't say the file  
magically changed from one type to another even without editing the  
file. Instead you would understand that the UA is alternately  
handling the same file — unedited — as one type and then another type.

> That's right, because of this millions of people out there are  
> thinking that there web documents are valid XHTML, but they're not,  
> they are in fact invalid HTML.

They are often valid XHTML. Many who make use of XHTML have a  
stronger tendency to use validators too. You may be right that these  
files may not validate as HTML too. But then again any document that  
uses EMBED will not validate as HTML. Clearly browsers use a  
different notion of "valid HTML ' than the validators do. That's a  
problem that could be easily fixed by making an HTML 4.02 DTD that  
includes EMBED and disabled nulll end tags. Voila! Now all of the  
invalid XHTML-like HTML files suddenly becomes valid HTML.

> If you think that a XHTML document can be sent as text/html and  
> still be XHTML then can you please tell me what exactly makes that  
> document XHTML?

A document is XHTML when it adheres to the norms laid out in the  
XHTML recommendation.

> It's not the Doctype, and it's not the solidus in the br tags  
> either is it. So what is it?

Among those norms are the DocType and the solidus for self-closing  
elements.

> Answer: Nothing, it's not XHTML. As soon as the document is given  
> the media type "text/html" it becomes a HTML document, simple as that.

As soon as the document is given the media type "text/html" the UA is  
directed to handle the file as "text/html", however simply changing a  
filename extension or a server content-type header cannot change the  
internal format of a document. If I change my "image/png" files to be  
delivered as 'text/html" does that make them not valid 'image/png'  
files? No, there still every bit as valid PNG as they were before,  
its just that the UA cannot recognize them as such (though it looks  
like Firefox 3 probably will[1]). There's been a lot of  
misinformation on the web about XHTML 1.0’s appendix C. The current  
draft of HTML5 proposes allowing this appendix C syntax to continue.  
When done right, an HTML5 document can then potentially be handled as  
either text/html or application/xhtml+xml and still be valid either  
way (though there are some areas where the parse to DOM will differ).

Take care,
Rob


[1]: <http://esw.w3.org/topic/HTML/ContentTypeIssues#preview>

Received on Friday, 31 August 2007 15:01:39 UTC