Re: Fwd: HTML5 and XHTML2 combined (a new approach) from Benjamin Hawkes-Lewis on 2009-01-24 (www-html@w3.org from January 2009)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Sat, 24 Jan 2009 22:30:07 +0000
To: Giovanni Campagna <scampa.giovanni@gmail.com>
CC: www-html@w3.org
Message-ID: <497B966F.4070604@googlemail.com>
On 22/1/09 21:48, Giovanni Campagna wrote:
>         It helps the
>         implementors, because all modules have strict defined
>         dependencies, so
>         changes don't affect other modules.
>
>
>     I'm not an implementor. But the impression I have from talking to
>     implementors is that formal distinctions between modules by spec
>     writers make little practical difference to how difficult it is to
>     introduce changes such as adding hyperlinking capability to every
>     element.
>
> If I want to implement XHTML Lists, I don't need to implement XHTML
> Metainformation Attributes. This is modularization.

But we were discussing "changes" to existing implementations, not 
selective implementation. If you have an implementation including Lists 
and you _do_ implement Metainformation Attributes, you will likely need 
to revisit the code implementing Lists, which is my point.

> Btw, hyperlinking can be "easily" achieved in Gecko using currently
> available technology (XBL1.0), without adding any feature to the browser.

href-on-any-element is a feature and would require code changes, even in 
Gecko - never mind in user agents that don't implement XBL.

>         Yes. More important, if UA get markup they cannot handle, you're
>         quite
>         sure they'll process only features in supported modules and
>         ignore the
>         rest, without breaking it all.
>
>
>     That's more the result of a process for handling unrecognized
>     features than XHTML modularization:
>
>     * insert unknown attributes into the DOM
>     * insert unknown elements into the DOM
>     * apply CSS to unknown elements
>     * render content of unknown elements
>
> No. DOM (and DOM depending models, like CSS rendering model or XPath
> data model) is not created after XHTML processing, it is created after
> Infoset Generation (that is, pure XML parsing). You don't need to know
> about XHTML to successfully build a DOM (but you need to know about HTML
> and its extendend interfaces, quirks, etc.)

Sorry, I can't really follow what you're saying here, or how it's a 
reply to what I said.

>     Lynx works fine with
>     clientside image maps because HTML 4.01 provides a mechanism for
>     providing text equivalents for your navigation. Text alternative
>     interfaces can be provided for many of the potential uses of
>     serverside image maps too. (Look at T. V. Raman's work with an
>     Emacspeak interface to Google Maps, for example.)
>
> Yes, but it doesn't work with image maps (and will never work).

A text browser's lack of useful support for serverside image maps is 
just one of many reasons why depending on serverside image maps is a bad 
idea. Of course, browsers can support the feature as required by the 
HTML 4.01 specification - submitting "0,0" if the user cannot select 
particular coordinates.

As I explained, clientside image maps work fine.

Typically, XHTML modularization fails to allow user agents to declare 
support for one and not the other:

http://www.w3.org/TR/xhtml2/mod-csImgMap.html#s_csImgMapmodule

> Similarly printers will never work with scripting

I'm not sure about that. It seems to me a printer could download a 
(X)HTML resource, apply included scripts to it, and print the result. 
(Granted, you'd need to decide _when_ to print - e.g. what to do with 
timer functions.)

Rather amusingly, the actual attempt to define a profile of XHTML for 
printers (XHTML-Print) actually includes the Scripting module! Not 
because conforming agents were to execute scripts (on the contrary, they 
were ultimately required not to do so), but because the Scripting module 
includes the "noscript" element and conforming agents were required to 
show that alternative content instead:

http://www.w3.org/TR/xhtml-print/#s3.12

This is thus yet another example of how XHTML modules are insufficiently 
fine-grained to express critical differences between implementations. 
The only thing that clarified what the Scripting module's inclusion 
meant was additional specification text:

http://www.w3.org/MarkUp/2006/xhtml-print-pr-doc.html#ssec1

> If I don't use Scripting, why should I care of Dom / Scripting execution
> context (window object) / script elements?

There are plenty of browsers that don't implement scripting and plenty 
of users that disable it, with the result that scripts on a page are not 
executed. Authors who want their content/functionality to work in those 
scenarios don't make their content/functionality rely on scripts. XHTML 
modularization doesn't help there (as we've seen above, it doesn't even 
let user agent developers declare they don't execute scripts!); 
progressive enhancement/unobtrusive scripting does.

 > If I don't use XBL, why should I care of XBL elements and PIs?

What's the relevance of this to XHTML modularization? XBL is a different 
language.

>     Well for one thing, an SGML compliant processor would have to
>     interpret "<br />" in HTML 4.01 documents according to HTML 4.01's
>     SGML declaration (that is, as equivalent to "<br>&gt;") - sprinkling
>     the web corpus with misplaced "greater than" signs. Being
>     incompatible with the Web in that way is not viable for software
>     attempting to compete with rival browsers or search engines.
>
> No: setting up correctly SGML, <br/ becames a NET-enabling start tag,
> and > its corresponding end tag.

When the HTML4 SGML declaration is applied to a document validating as 
HTML 4.01 containing "<br />", "<br /" is the null-end start tag. Since 
the DTDs define BR as "EMPTY", the end tag /must/ be omitted. Therefore 
the subsequent > cannot be parsed as the element's end tag.

See also:

http://www.cs.tut.fi/~jkorpela/html/empty.html

http://www.is-thought.co.uk/book/sgml-9.htm

http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.3.7

Now *perhaps* it would be legitimate to require HTML processors to use a 
different SGML declaration even with documents that validate to HTML 
4.01 DTDs and perhaps a different SGML declaration could define "<br />" 
as equivalent to "<br>"? At any rate, this doesn't seem much more of a 
departure than requiring HTML processors to use completely new 
processing rules. The full requirements of acceptably processing the tag 
soup web corpus are so tangled, however, that it is not obvious that 
_any_ SGML declaration could express them and it seems likely that even 
if it did, it would allow syntax to be validated even when it is 
non-conforming and broken - which is indeed one of the problems the 
original attempt to apply SGML to HTML ran into:

http://www.w3.org/TR/REC-html40/sgml/intro.html#h-19.1

>     Hmm. This doesn't really answer my question; how does putting these
>     serialization definitions into separate documents allow additional
>     "modularization and extensibility"? (I don't have any particular
>     bias against the proliferation of technical documents - I just don't
>     see any necessary connection between this and allowing
>     modularization or allowing extensibility.)
>
> Documents after REC cannot be modified other than errata-ed. If you need
> a new language, you need a new spec. Better to define a new spec only
> for that language, than for all language implemented so far, isn't it?

How do ten new RECs allow "modularization and extensibility" than one 
new REC?

> Yes, the fact is that they needed HTML5, a completely new and huge
> specifications, to add few new features (on the core vocubulary side):
> video - audio - canvas - datalist - section - etc.
> If HTML4 had been modularized, HTML5 would have used HTML4's table
> module, text module (b-i-span-strong..), metainformation module
> (html-head-body-meta) etc.

Some of HTML5 defines processing for existing features that was 
undefined in previous specifications (e.g. munging "image" to "img").

Some of HTML5 changes processing for existing features (e.g. the 
algorithm for associating table cells and table headers)

So I cannot agree that if HTML4 had defined image and table features in 
formally separate "modules", HTML5 as it stands could have merely reused 
them.

--
Benjamin Hawkes-Lewis
Received on Saturday, 24 January 2009 22:30:54 UTC