Re: HTML 5 and XHTML 2 combined from Benjamin Hawkes-Lewis on 2009-01-08 (www-html@w3.org from January 2009)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Thu, 08 Jan 2009 11:41:57 +0000
To: Mark Birbeck <mark.birbeck@webbackplane.com>
CC: Brett Patterson <inspiron.pattersonb@gmail.com>, David Woolley <forums@david-woolley.me.uk>, Molte <molte93@gmail.com>, Shavkat Karimov <shavkat@seomanager.com>, HTML Working Group Discussion Mailing-List <www-html@w3.org>
Message-ID: <4965E685.9080605@googlemail.com>
On 7/1/09 22:13, Mark Birbeck wrote:
> Many
> organisations choose to generate documents that are technically XHTML,
> but deliver them to browsers using an HTML MIME type, so as to 'force'
> the browser to use its HTML parser for rendering.

Yep. They send output in such a way as its processing has no detailed 
conformance requirements, save for those that HTML5 will hopefully provide.

> This gives them the best of both worlds; they can use one or more of
> the enormous number of XML tools around to generate their documents,

Since a serialization to HTML could be appended to any toolchain 
producing XHTML, I don't agree that serving text/html gives them this 
option.

> but they can still have these documents rendered in existing browsers,
> without having to worry about whether the browser supports XHTML or
> not.

But they do need to worry about all the ways in which text/html differs 
from application/xhtml+xml (without any conformance criteria), and they 
do need to forgo any benefits of having their markup processed by 
browsers as XML.

> The important thing here is that this technique also means that in
> principle, even if a 'new' language is created, it could still be
> processed by existing browsers, provided that the new language paid
> attention to HTML processing rules.

Yes, but I didn't mean HTML5 wasn't a new language, I'm saying XHTML 2 
is moving beyond the constraints of the text/html serialization in 
devising a new language.

> So XHTML 2 could be delivered with an HTML MIME type, just as HTML5
> could be delivered with an XHTML MIME type -- in both cases the
> languages are distinct from the delivery mechanism.

Yes. You could deliver any byte stream as text/html.

>> HTML5 is premised on the constraints of supporting the existing web with the
>> same specification; XHTML 2 is premised on ignoring those constraints.
>
> I think this is a little misleading.
>
> First, HTML5 adds new features that are not backwards-compatible with
> HTML 4, but it just so happens that the close relationship between
> some of the browser implementers and the spec writers mean that
> features are being added quite quickly. In effect, the 'existing web'
> is changing, even as we discuss it.
>
> Second, XHTML 2 is not based on ignoring those constraints, although
> it would probably be true to say that it was at its inception.

While HTML adds new features with backwards-compatibility problems, it's 
a requirement that the new features are at least not incompatible with 
the supporting the current web corpus.

AFAIK the feedback from browser vendors like Opera seems to be that 
implementing XHTML 2 even in text/html is not compatible with supporting 
the current web corpus. I would of course welcome a correction on this 
point from popular browser vendors. :)

Under "Backwards compatibility", the draft clearly states that XHTML 2 
depends on XML parsing:

"Because earlier versions of HTML were special-purpose languages, it was 
necessary to ensure a level of backwards compatibility with new versions 
so that new documents would still be usable in older browsers. However, 
thanks to XML and style sheets, such strict element-wise backwards 
compatibility is no longer necessary, since an XML-based browser, of 
which at the time of writing means more than 95% of browsers in use, can 
process new markup languages without having to be updated."

http://www.w3.org/TR/2005/WD-xhtml2-20050527/introduction.html#backCompat

If XHTML 2 is not taking advantage of XML to break free of the past, 
perhaps this needs rephrasing?

> For a
> long time now XHTML 2 has had a modular architecture, which means that
> language designers can create languages that use one or more of the
> XHTML 2 modules, and implementers can provide support for whichever
> modules they deem appropriate. This makes XHTML 2 useful not just in
> browsers and constrained devices, but also for creating Docbook-style
> languages, news formats, and so on.

I don't really see what this has to do with text/html backwards 
compatibility. If you mean some XHTML 2 modules could be reconciled with 
text/html processing, that's probably true. The following seem like 
possible examples that might more or less work as is:

* XHTML Document Module
* XHTML Structural Module
* XHTML Text Module
* XHTML Hypertext Module
* XHTML I18N Attribute Module
* XHTML Bi-directional Text Attribute Module
* XHTML Role Attribute Module
* Ruby Module
* XHTML Style Attribute Module
* XHTML Tables Module

These modules reflect features in existing text/html implementations 
(multiple web engines support 'role'; Trident and a Firefox plugin 
support supports Ruby).

Then there's modules that would definitely require new implementation 
work, but aren't obviously incompatible with supporting the existing 
text/html corpus without code branching and where the fallback might be 
acceptable:

* XHTML List Module
* XHTML Edit Attributes Module
* XHTML Image Map Attributes Module
* XHTML Metainformation Attributes Module
* XHTML Media Attribute Module: media-specific content would be shown in 
every media
* XHTML Object Module: fallback content
* XHTML Style Sheet Module: disabled attribute wouldn't work.


Here are some simple examples, drawn from the list of important changes:

that would be difficult to make work in text/html:

"in earlier versions of HTML, a p element could only contain simple 
text. It has been improved to bring it closer to what people perceive as 
a paragraph, now being allowed to include such things as lists and tables."

This wouldn't work in text/html because a table or list start tag must 
close a paragraph for compatibility with existing content.

"XHTML 2 takes a completely different approach, by taking the premise 
that all images have a long description"

This wouldn't work in text/html because text following an <image> tag 
must be treated as text following an image, not alternate text, for 
compatibility with existing content.

I agree some XHTML2 modules that represent a change from the 
functionality provided by HTML5 could perhaps be implemented in 
text/html without breaking support for the existing web corpus. Examples 
may include:






Non-functional:



Non-examples may include:

* XHTML Embedding Attributes Module: Browser vendors say to hard to 
implement src on every element.
* XHTML Handler Module: This doesn't look backwards compatible since 
downlevel text/html browsers will display raw script on the page.
* XHTML Image Module: Unacceptable degradation plus impossible to implement.
* XHTML Hypertext Attributes Module: Browser vendors say too hard to 
implement href on every element.
* XHTML Metainformation Module: <meta> with content ends the <head> 
element in text/html; they may even be web corpus content depending on 
the behavior
* XForms Module: associations between fields and labels wouldn't work; 
"select" contents can't even be parsed into the DOM
* XML Events Module: event handling wouldn't work since it depends on 
the Handler Module.










>
> Best regards,
>
> Mark
>
Received on Monday, 12 January 2009 17:45:04 UTC