Re: XHTML namespace, DOCTYPE, and other versioning mechanisms from Maciej Stachowiak on 2009-02-16 (public-html@w3.org from February 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Mon, 16 Feb 2009 13:29:19 -0800
To: Larry Masinter <masinter@adobe.com>
Cc: HTML WG <public-html@w3.org>
Message-id: <9F780494-5677-4F3B-94AA-41FE022F1123@apple.com>
On Feb 16, 2009, at 12:33 PM, Larry Masinter wrote:

> In reply to various messages:
>
>> ... unless I'm missing something the doctype issue won't help with
>> the use case of compound documents (say HTML used in  
>> <svg:foreignObject>
>> no matter what.  You can't put a doctype inside <svg:foreignObject>
>> because that wouldn't be well-formed XML.
>
> Yes, I agree; DOCTYPE doesn't seem to help much.
>
>> ... since user agents in practice support XHTML namespace
>> elements in arbitrary XML compound documents
>
> The fact that some software implements something (table summaries or
> header profiles, for example) hasn't been given as strong a weight
> in this group as the actual practice in real documents on the web.
> Is there actually any significant deployment of real web content that
> uses XHTML namespace elements in arbitrary XML compound documents?

Apparently it is fairly common for Atom documents to embed XHTML  
fragments. Here is an example of a site that uses XHTML combined with  
SVG in the same document: <http://burningbird.net/>.


>> Further, since user agents in practice support XHTML namespace
>> elements in arbitrary XML compound documents, this would amount to
>> requiring XHTML5 user agents to implement XHTML2, which is not a
>> reasonable requirement.
>
> Requiring that senders identify the language they're using does
> not require that receivers be able to interpret every language
> a sender might send.

The snipped you quoted was in response to your proposal that any XHTML  
namespace element in a compound document must be treated as an XHTML2  
element: "XHTML5 *MUST NOT* be used in arbitrary generic XML compound  
documents. Any use of the namespace in such contexts denotes XHTML2,  
not XHTML5." This does not seem to be imposing a labeling requirement  
on senders; rather it appears to be a processing requirement on  
receivers.


>
>> As for XHTML5 and incompatible changes, those would in fact need to  
>> be
>> avoided, as they have generally been for HTML, unless we have a
>> per-element versioning scheme in place....
>
> The per-element versioning scheme has been the norm, either by adding
> new elements or new attributes or new ways in which old elements can
> be combined that were previously not supported and are supported now.
>
> I think a specific element/attribute combination seems like the
> only choice given the reasons for keeping the same namespace for
> XHTML1, XHTML2, XHTML5, and XHTML5.1.
>
> I've read about a version="" proposal as an attribute, but that
> seems like a poor device specifically *because* of the need to
> promote versions.
>
> As an aternative, we could ask that, if XHTML2 keeps the same
> namespace as XHTML1 and XHTML5, that all XHTML2 fragments be
> identified by wrapping them in an <xhtml2> element.  We could
> require that HTML user agents that do not support XHTML2 not
> attempt to interpret <xhtml2>-wrapped content any more than
> they would attempt to treat SVG content as HTML.

That would help somewhat. It's not exactly the same thing as SVG,  
because SVG content will not be treated as HTML due to use of the SVG  
namespace, regardless of whether it is wrapped in an SVG element.

>
>
>
>> HTML 4.01 differs from HTML 3.2 and XHTML 1.0 yet all three can be
>> served with the text/html MIME type.
>
> The XHTML 1.0 specification was careful to *not* allow
> text/html except in transitional cases. So saying that
> "XHTML 1.0 can be served with the tex/html MIME type"
> is misleading.

I acknowledge your correction. It would be more accurate to say that  
only some XHTML 1.0 documents can be served with the text/html MIME  
type. My point is simply that MIME type is not necessarily sufficient  
to identify language version.

> MIME type labeling is a way of a sender to communicate to
> a receiver the identity of the language of the message --
> i.e., how the sender wishes the receiver to interpret
> the message. This language identification process has
> meaning, independently of the advice you might give
> senders on how to label their message so that receivers
> are likely to understand it, and independent of the
> advice you give to receivers on how to deal with the
> reality of mis-configured senders.
>
>> More importantly, though, user
>> agents will likely process application/xhtml+xml (and indeed
>> application/xml or text/xml) documents under XHTML5 processing rules
>> rather than XHTML2 or XHTML1, so the new MIME type won't provide
>
> If there is no chance that the implementers of user agents
> might actually listen to the advice of the consensus of
> members of "public-html@w3.org", then there's no point
> in this conversation at all.

The reasons I make this prediction are as follows:

1) Browsers in particular and user agents in general would like to  
share implementation between HTML and XHTML as much as possible.

2) It seems likely that many user agents will implement HTML5.

3) Due to code complexity considerations and pragmatic constraints  
(e.g. download size) user agents do not generally include  
implementations of multiple incompatible versions of the same language  
to use in different contexts; this has always been the case. For  
example, after HTML4 was published, software generally updated to  
support it and did not retain separate code for HTML 3.2 processing.

Combining these conditions, many user agent developers are likely to  
implement HTML5, share that code with XHTML to implement XHTML5, and  
have extreme reluctance to have an additional separate XHTML  
implementation.

> So I'm unsure how much weight to give to an independent
> prediction of what "user agents will likely process"
> as input to what it is that the committee should recommend
> to user agents.

To be fair, my prediction is not independent, since I am a browser  
developer.

> And of course, there are more players in the tool chain than
> "user agents".
>
> Can you make this argument in a less circular form?

I think my explanation above should clarify the pragmatic constraints  
that HTML user agents (at least those running on end-user systems) face.

Regards,
Maciej
Received on Monday, 16 February 2009 21:30:05 UTC