Re: Fwd: HTML5 and XHTML2 combined (a new approach)

2009/1/24 Benjamin Hawkes-Lewis <>

> On 22/1/09 21:48, Giovanni Campagna wrote:
>>        It helps the
>>        implementors, because all modules have strict defined
>>        dependencies, so
>>        changes don't affect other modules.
>>    I'm not an implementor. But the impression I have from talking to
>>    implementors is that formal distinctions between modules by spec
>>    writers make little practical difference to how difficult it is to
>>    introduce changes such as adding hyperlinking capability to every
>>    element.
>> If I want to implement XHTML Lists, I don't need to implement XHTML
>> Metainformation Attributes. This is modularization.
> But we were discussing "changes" to existing implementations, not selective
> implementation. If you have an implementation including Lists and you _do_
> implement Metainformation Attributes, you will likely need to revisit the
> code implementing Lists, which is my point.
Why? Meta-Attributes changes don't change List features. Obviously, for the
sake of performance, you could hardcode everything inline, but this is an
implementation issue.

>  Btw, hyperlinking can be "easily" achieved in Gecko using currently
>> available technology (XBL1.0), without adding any feature to the browser.
> href-on-any-element is a feature and would require code changes, even in
> Gecko - never mind in user agents that don't implement XBL.
Href-on-any-element in Gecko could be implemented with code changes or with
XBL. This is, another time, an implementation issue. (Btw, XBL2 is a W3C
Candidate Recommendation, so all UA are asked to implement it, sooner or

>         Yes. More important, if UA get markup they cannot handle, you're
>>        quite
>>        sure they'll process only features in supported modules and
>>        ignore the
>>        rest, without breaking it all.
>>    That's more the result of a process for handling unrecognized
>>    features than XHTML modularization:
>>    * insert unknown attributes into the DOM
>>    * insert unknown elements into the DOM
>>    * apply CSS to unknown elements
>>    * render content of unknown elements
>> No. DOM (and DOM depending models, like CSS rendering model or XPath
>> data model) is not created after XHTML processing, it is created after
>> Infoset Generation (that is, pure XML parsing). You don't need to know
>> about XHTML to successfully build a DOM (but you need to know about HTML
>> and its extendend interfaces, quirks, etc.)
> Sorry, I can't really follow what you're saying here, or how it's a reply
> to what I said.
You said that XHTML needs to define how to handle unknown elements. I
replied that it's not part of XHTML processing building the Infoset and then
DOM, CSS rendering tree, XPath data model, etc. The XML 1.0 and XML
Information Set RECs define how to handle any element, with any attribute,
in any namespace, provided a well-formed document.

>     Lynx works fine with
>>    clientside image maps because HTML 4.01 provides a mechanism for
>>    providing text equivalents for your navigation. Text alternative
>>    interfaces can be provided for many of the potential uses of
>>    serverside image maps too. (Look at T. V. Raman's work with an
>>    Emacspeak interface to Google Maps, for example.)
>> Yes, but it doesn't work with image maps (and will never work).
> A text browser's lack of useful support for serverside image maps is just
> one of many reasons why depending on serverside image maps is a bad idea. Of
> course, browsers can support the feature as required by the HTML 4.01
> specification - submitting "0,0" if the user cannot select particular
> coordinates.
> As I explained, clientside image maps work fine.
> Typically, XHTML modularization fails to allow user agents to declare
> support for one and not the other:

>  Similarly printers will never work with scripting
> I'm not sure about that. It seems to me a printer could download a (X)HTML
> resource, apply included scripts to it, and print the result. (Granted,
> you'd need to decide _when_ to print - e.g. what to do with timer
> functions.)
> Rather amusingly, the actual attempt to define a profile of XHTML for
> printers (XHTML-Print) actually includes the Scripting module! Not because
> conforming agents were to execute scripts (on the contrary, they were
> ultimately required not to do so), but because the Scripting module includes
> the "noscript" element and conforming agents were required to show that
> alternative content instead:
> This is thus yet another example of how XHTML modules are insufficiently
> fine-grained to express critical differences between implementations. The
> only thing that clarified what the Scripting module's inclusion meant was
> additional specification text:
>  If I don't use Scripting, why should I care of Dom / Scripting execution
>> context (window object) / script elements?
> There are plenty of browsers that don't implement scripting and plenty of
> users that disable it, with the result that scripts on a page are not
> executed. Authors who want their content/functionality to work in those
> scenarios don't make their content/functionality rely on scripts. XHTML
> modularization doesn't help there (as we've seen above, it doesn't even let
> user agent developers declare they don't execute scripts!); progressive
> enhancement/unobtrusive scripting does.
I asked a different question: why an author that doesn't rely on script (or
an implementation that cannot, for various reason, implement scripts) should
learn a plenty of DOM interfaces and APIs?

> > If I don't use XBL, why should I care of XBL elements and PIs?
> What's the relevance of this to XHTML modularization? XBL is a different
> language.
It was just an example of added languages.

>     Well for one thing, an SGML compliant processor would have to
>>    interpret "<br />" in HTML 4.01 documents according to HTML 4.01's
>>    SGML declaration (that is, as equivalent to "<br>&gt;") - sprinkling
>>    the web corpus with misplaced "greater than" signs. Being
>>    incompatible with the Web in that way is not viable for software
>>    attempting to compete with rival browsers or search engines.
>> No: setting up correctly SGML, <br/ becames a NET-enabling start tag,
>> and > its corresponding end tag.
> When the HTML4 SGML declaration is applied to a document validating as HTML
> 4.01 containing "<br />", "<br /" is the null-end start tag. Since the DTDs
> define BR as "EMPTY", the end tag /must/ be omitted. Therefore the
> subsequent > cannot be parsed as the element's end tag.
> See also:
> Now *perhaps* it would be legitimate to require HTML processors to use a
> different SGML declaration even with documents that validate to HTML 4.01
> DTDs and perhaps a different SGML declaration could define "<br />" as
> equivalent to "<br>"? At any rate, this doesn't seem much more of a
> departure than requiring HTML processors to use completely new processing
> rules. The full requirements of acceptably processing the tag soup web
> corpus are so tangled, however, that it is not obvious that _any_ SGML
> declaration could express them and it seems likely that even if it did, it
> would allow syntax to be validated even when it is non-conforming and broken
> - which is indeed one of the problems the original attempt to apply SGML to
> HTML ran into:
I'm not asking to get SGML back. I'm asking to separate syntax from
vocabulary, and possibly apply this new syntax to any W3C or externally
defined language based on XML, providing an appropriate way to switch
between languages (the old DTD)

>     Hmm. This doesn't really answer my question; how does putting these
>>    serialization definitions into separate documents allow additional
>>    "modularization and extensibility"? (I don't have any particular
>>    bias against the proliferation of technical documents - I just don't
>>    see any necessary connection between this and allowing
>>    modularization or allowing extensibility.)
>> Documents after REC cannot be modified other than errata-ed. If you need
>> a new language, you need a new spec. Better to define a new spec only
>> for that language, than for all language implemented so far, isn't it?
> How do ten new RECs allow "modularization and extensibility" than one new
> REC?
Because not all ten RECs must be released at the same time. Actually, the
most important part is CR: implementation should wait till CR to add new
features, otherwise they risk to have them changed at any time (or they
block changes because they already have implemented). This means that if a
feature is dubious or has lots of discussion in course cannot block other

>  Yes, the fact is that they needed HTML5, a completely new and huge
>> specifications, to add few new features (on the core vocubulary side):
>> video - audio - canvas - datalist - section - etc.
>> If HTML4 had been modularized, HTML5 would have used HTML4's table
>> module, text module (b-i-span-strong..), metainformation module
>> (html-head-body-meta) etc.
> Some of HTML5 defines processing for existing features that was undefined
> in previous specifications (e.g. munging "image" to "img").
> Some of HTML5 changes processing for existing features (e.g. the algorithm
> for associating table cells and table headers)
> So I cannot agree that if HTML4 had defined image and table features in
> formally separate "modules", HTML5 as it stands could have merely reused
> them.
Well maybe Image module or Table module needed a new version, but I'm sure
that there are features just copied from HTML4 / XHTML1 / DOM2HTML etc.

> --
> Benjamin Hawkes-Lewis

In addition, I've thought more on the XHTML2 vs HTML5 problem, and saw two

- if you think that XHTML2 and HTML5 have the same use cases and
destinataries, you surely cannot stand two different and not interoperable
specifications for the same thing. So you must decide one and put all
features in it.
Assuming that the choice was HTML5 (preferred by authors and implenters),
you would need to port features XHTML2 back to HTML5: ie. you're stuck with
RDFa in HTML (and XHTML Access Module, and XForms in HTML, etc.)
- if you instead think that XHTML2 is targeted at documents (hypertextual
collections of data) while HTML5 is targeted to web applications (binary
serializations of user interfaces) as its original name suggests (Web
Applications 1.0), then you should purify both spec: remove what is not
strictly necessary to a document from XHTML2 and remove what is not needed
by an user interface from HTML5:
GMail doesn't need a List Module (not in standard mode, at least), while a
cooking book doesn't need client side Databases.
What you will get is a very light version of XHTML that actually would mean
no XHTML processing at all (just some CSS and XBL). On the other hand, the
resulting purified HTML5 would look very like an XHTML Web Apps module (the
one I proposed on the first mail).

But the current situation is not sustainable: the W3C cannot mantain two WG,
producing two different and not interoperable technologies, with overlapping
features, implementation, use cases, destinataries.


Received on Sunday, 25 January 2009 17:17:38 UTC