Re: New editor's draft of the HTML/XML TF Report

On 9/29/2011 4:28 AM, Henri Sivonen wrote:
> On Wed, Sep 28, 2011 at 9:45 PM, Robert Leif<rleif@rleif.com>  wrote:
>> My proposals were for XHTML5, which I believe, as of yet, has neither
>> been implemented nor described in any detail.
>
> Your belief is incorrect. The HTML5 spec describes XHTML5 and Firefox,
> Opera, Chrome, Safari and IE9 implement XHTML5 to the extent they
> implement the corresponding features on the non-X HTML side.
>

At the risk of causing more confusion, let me offer a theory as to why
everyone is talking past each other here. Earlier, Robert Leif wrote:

> I believe that possibility of developing a working interface between XML
> and HTML5 can be maximized by concentrating on XHTML5. The simplest
> solution would be to encourage Microsoft to make its HTML5 schema fully
> implement HTML5 and make this schema(s) available to the rest of the
> software community at no charge. Microsoft’s and other schema and XML
> validation tools including browsers would then have to accept interleave
> elements, such as the one in XSD1.1.

This and other writings by Robert suggest that he takes a very 
schema-centric view of what XML is, and of how schemas can be fundamental 
to improving the HTML/XML story. I don't think that premise is shared by 
most of the others in this discussion, and so confusion is resulting.

The first two sentences quoted above, taken together, imply that blessing 
as standard an HTML5 schema from Microsoft would somehow cause HTML and XML 
to interoperate in a way that the would not without the schema. I think 
that's true only in a very particular sense. Obviously, if you are using an 
XML stack that happens to to use XSD or some other XML schema language 
(e.g. RelaxNG), then having a useful schema that captures some of the 
constraints required for valid HTML5 will help you get useful validations 
out, or might drive some tooling in an automated way, but that's about all 
it gets you. I've tried to word that quite carefully, in particular:

* Even many users who do want to use XML to process their HTML will not 
want to run schema validation, especially in production (as opposed to 
debugging). For those users, any schema will at most serve as 
documentation, and that documentation will almost surely be redundant with 
the normative documentation for HTML5 and XHTML5, both of which are 
contained in the draft specification for HTML5.

* Even for those who do wish to run XML Schema validation, no schema from 
Microsoft or anyone else can capture more than some of the constraints that 
you need for HTML validation. Yes, having that level of checking automated 
using XSD may be useful, but it can't fundamentally answer questions like: 
how compatible are the DOMs you get when you parse as XML vs. HTML, and 
that question is fundamental to having scripts run compatibly. Indeed, much 
of the polyglot discussion in this thread is aimed at dealing with or 
avoiding such incompatibilities, and XSD has little to say about them.

By the way, these limitations on XML schemas are not confined to HTML 
validation. Almost any XML vocabulary will have levels of validity 
requirements that XSD can't capture. For example, a mathematician using XSD 
might wish to contstrain the value of some attribute to be a prime integer. 
XSD can only check "integerness", and maybe something like 
{1,2,odd-number}. At a yet higher level, a schema for a purchase order can 
check that an element value resembles a credit-card number in format, but 
almost surely can't check that the card number is valid (e.g. has been 
issued and is not stolen).

I'm a fan of XSD and RelaxNG for what they're good at. Once the community 
figures out some good way of writing HTML as XML, and once we figure out 
which users want to do that, then by all means we should write schemas that 
capture as many useful constraints as we can. That will allow tools such as 
XSD to automate some checking, more strongly type some fields, perhaps 
facilitate autmatic binding into document management systems etc.

What I don't think is that writing such schemas is a first step, or is in 
any way fundamental to answering the questions confronting this task force: 
who wants to write HTML that is processable as XML?; what are their 
requirements? what XML/HTML format can we specify that would be convenient 
to use and would meet those needs well?; are there alternate modes (e.g. 
XML5) of processing XML-like documents that will result in DOM's that are 
more usefully compatible with what HTML users want and expect?; etc.  Or 
having for the moment mostly failed to find ways to do all of that, at 
least document what the requirements are and what good practice is given 
the specifications and tools as they exist.  I think thatt last bit is 
where I think the report is today. A compromise, but a useful step.

Robert: I don't know if the above makes sense to you, but the net of it is 
that I think you haven't yet convinced others that a schema-centric 
approach is the right way to tackle these issues. My apologies if I've 
misunderstood your ideas.

Noah

Received on Friday, 30 September 2011 13:16:30 UTC