Re: Choosing name for XML serialization from Lachlan Hunt on 2007-06-25 (public-html@w3.org from June 2007)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Mon, 25 Jun 2007 11:53:05 +1000
To: mark.birbeck@x-port.net
CC: HTML WG <public-html@w3.org>
Message-ID: <467F2001.7050402@lachy.id.au>
Mark Birbeck wrote:
> Everyone I've heard talk on HTML 5 is actually critical of XML.
> Usually it's along the lines that namespaces are a mess, or that
> browsers shouldn't have to use an XML processing model (I agree with
> both, by the way). But also we've been told for years that we
> shouldn't actually use XML representations of HTML.

Personal opinions about the usage of XHTML are not really relevant to 
discussion of its name.

> That's all fair enough, and people are entitled to pursue things
> however they think best. But it's a little rich now to come from this
> viewpoint and say that you want to create version 5 of XHTML.

The fact is, whether the XHTML2 WG likes it or not, we are creating a 
revision of XHTML by extending XHTML 1.x.  Therefore, it is correct for 
it to be called XHTML.  The XHTML2 WG, on the other hand, has been 
creating an entirely new language that is unrelated to XHTML 1.x in 
reality.  So if only one language is to retain the name "XHTML", it 
should be the one actually resembles XHTML and uses the XHTML namespace. 
  Clearly, that would be XHTML5, because XHTML2 in its current state 
simply cannot use the XHTML 1.x namespace because that will make it 
impossible to implement.

However, it doesn't really bother me if the XHTML2 WG continues to call 
their language XHTML.  It doesn't bother me if they try to reuse the 
XHTML namespace either.  I don't think continuing to debate the issue 
will be particularly productive because, as far as I'm concerned, it 
won't be a problem in reality.  The market will eventually decide upon 
the relevance of XHTML5 and XHTML2.  One will succeed, and the other 
will fall into disuse and be quickly forgotten.

>> This is against precedent of the HTML WG - HTML 4.01 and XHTML 1.0
>> were two serializations of the same language with different names.
> 
> Not at all. XHTML 1.0 was very consciously an XML version of HTML 4.01
> that could be built on. XHTML 1.1 however was not simply an XML
> serialisation of HTML 4.01, since it:
> ...
>  * and broke the language up into modules that could be used to create
>    other XHTML-based languages.

The whole modularisation issue, which in practice just means splitting 
up the DTD into separate files, has no practical benefit and is not 
particularly relevant.

> So that's where the W3C is at, with the HTML and XHTML languages. My 
> understanding of the original motivation for HTML 5 was that there was 
> a feeling amongst some browser manufacturers that HTML needed 
> updating. But no-one said anything about XHTML.

Where did you get the impression that no-one said anything about XHTML? 
  HTML5 has been developing both serialisations since its inception.

> In short, HTML and XHTML were forked quite a while ago, and whilst I
> also believe they should never have been split, if they are now to be
> merged it's going to need to be done on a far more substantial basis
> than the one currently being proposed (i.e., HTML 5).

In what ways do you think HTML and XHTML could be merged better than 
they are currently in HTML5 spec?

>>> "Rich: All existing XHTMLs have been modular, and HTML5 is not.
>>> It's a mess."
>>
>> This is false, XHTML 1.0 is not modular (in the Modularization sense).
> 
> Yes, Rich was incorrect in the first part of his
> statement--modularisation came about in version 1.1. You don't seem to
> be disputing the second part of his statement though. ;)

The second part is a baseless allegation that is nothing more than 
flamebait.  It's not really worth addressing with a counter argument.

>> XHTML2 has whole subsystems like
>> forms and events handling that are redone in completely different
>> ways; there's very little chance of an XHTML1 document functioning
>> correctly when processed as XHTML2.
> 
> Yes, you are right, on that, although it's worth looking at these
> issues separately. First, XML Events is a separate spec, and all it
> does is provide a mark-up version of DOM 2 Events. It's used in other
> languages too, has been around for a long time, and it really would be
> a bit weird for HTML 5 not to make use of it.

HTML5 doesn't need to make use of XML Events because it doesn't provide 
sufficient benefits and isn't backwards compatible.  The existing 
onevent attributes in HTML provide basic declarative markup for event 
handlers in a backwards compaitible way, and XBL will add enhanced 
features for event handling in the future.  Until then, widely used DOM 
scripting techniques are working quite well.

> But the key point that Steven was making is that although people try
> to claim that HTML 5 is backwards-compatible, it really isn't.

Care to elaborate?  Perhaps provide some evidence to support that claim?

> And at the same time, since most of XHTML 2 is about semantics and 
> structure, there is very little to implement beyond the forms stuff.

Either that means there's no conformance criteria, which makes it 
impossible to implement, or your statment is wrong.  Let's take a look 
and see exactly what will need to be implemented, and what will be 
difficult or impossible to implement:

* The version attribute:

There's a note in the spec that states:
| The version attribute needs a machine processable format so that
| document processors can reliably determine that the document is an
| XHTML Family conforming document.

http://www.w3.org/TR/xhtml2/mod-document.html#adef_document_version

It was also suggested that if XHTML2 reuses the XHTML1.x namespace, the 
version attribute would be used as a switch to differentiate between 
XHTML2 and XHTML5.  That suggests that there would have to be some 
processing requirements, which would have to be implemented.  Using it 
as a switch for choosing different processing models would be 
impractical for various reasons which were already discussed in in this 
thread.

* The Document Title.

XHTML2 tries to make <title> and <meta property="title"> equivalent.

* <title>Document Title</title>
* <meta property="title" content="Document Title"/>
* <meta property="title">Document Title</meta>

* <title>Document Title <em>Containing Extra Markup</em></title>
* <meta property="title" content="Document Title A">Document Title B</meta>

There would need to be some processing requirements to define how to 
extract the title and resolve conflicts if more than one occurs.

How does it interact with DOM APIs like HTMLDocument.title?  What's the 
return value, particularly for the example that contains markup?  What 
happens when the value gets set?

The last example I provided would also need some error handling to 
determine which takes precedence out of the attribute or content.

* Headings

How does a UA construct a table of contents from the headings in the 
document?  How does <h> interact with <h1> to <h6>?  How do <h1> to <h6> 
interact with the <section> element?

* The <abbr> Element

The full="" attribute 
woulhttp://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-21482039
HTMLLegendElementd need processing requirements that specify how to 
extract the expansion from the referenced element.  What happens if it 
references itself, or references an ID of an element like, for example, 
<separator id="foo"/> or maybe even <html id="foo">?

Which takes precedence if both full="" and title="" are specified?

* The cite attribute

How should the UA process it?

* The <label> Element

If you intend to reuse the XHTML namespace, how do you intend to deal 
with the incompatibility with XHTML 1?  In XHTML2, it represents a label 
for a list.  In XHTML1 and XHTML5, it's a form control label.  How does 
this change in semantics affect the HTMLLabelElement DOM API?

* id="" and xml:id=""

The XHTML2 draft states that these must not be specified on the same 
element together.  What if they are? How do you process the value, 
particularly if it contains spaces?  What does it mean if the value is 
empty?  How do they affect the HTMLElement.id DOM API?  For <p 
xml:id="foo"> does getting p.id return foo?  Does setting p.id change 
the xml:id value or add a separate id attribute?  You need to define 
error handling.

* The href, src and xml:base Attributes

These can now be specified on nearly every element.  How does one 
implement that?  Several UA vendors, including Mozilla, Apple and Opera, 
have indicated that implementing that would be difficult or impossible. 
  The href and src would clearly be affected by the xml:base attribute. 
  That doesn't appear to be defined well.

* The hreflang, hrefmedia, hreftype and srctype Attributes

How should the UA process these attribues?  Should they do anything 
specific with them

* The encoding Attribute

The spec has some non-trivial processing requirements specified for this.

* The nextfocus and prevfocus Attributes

This seems to have a few processing requirements.  It's certainly not 
trivial to implement.

* The target Attribute

The spec states:

| This specification does not define how this attribute gets used

Ha! :-D  No further comments necessary.

* The <access> Element

This would clearly need some sort of non-trivial implementation.

* The edit and datetime attributes

How should the UA process these attributes?  Should it expose the 
datetime value to the user somehow?

* The <handler> Element

... I couldn't be bothered continuing.  I think I've made my point by 
now.  XHTML2 is clearly not "[mostly] about semantics and structure". 
There is *a lot more* to implement beyond the forms stuff!

> In other words, HTML 5 claims to be backwards-compatible but isn't, 
> whilst XHTML 2 doesn't claim to be backwards-compatible, but is not 
> as big a leap as people try to claim.

I also pointed out several instances where there are compatibility 
issues with XHTML2, and have previously pointed out several others. 
Your claim about HTML5 not being backwards compatible is still unsupported.

> Don't get me wrong, no-one is saying that there is anything _wrong_
> with being non-backwards--compatible at some point in a language's
> evolution,

I'm aware of the XHTML2 WG's stance on backwards compatibility, but I 
think many of us in the HTML WG have been saying that there is something 
wrong with not being backwards compatible for a long time.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Monday, 25 June 2007 08:21:30 UTC