W3C home > Mailing lists > Public > public-html-xml@w3.org > October 2011

RE: New editor's draft of the HTML/XML TF Report

From: Robert Leif <rleif@rleif.com>
Date: Sat, 1 Oct 2011 10:10:46 -0700
To: "'Noah Mendelsohn'" <nrm@arcanedomain.com>, "'Henri Sivonen'" <hsivonen@iki.fi>
Cc: <public-html-xml@w3.org>
Message-ID: <052f01cc805d$0fb7a830$2f26f890$@rleif.com>
Noah,

Firstly, I have at least tried to indicate that it was necessary to use XSD1.1, which includes assertions and a means to have multiple name spaces that are oblivious to each other. The use of assertions should extend the capacity of a schema to check validity. I do not know if this will be sufficient, but I do believe that it would help. Since this assertion technology is used in mission critical applications, there is hope that it will be extended and optimized. 
I have two use cases: 
1) I want to use XML that can be validated against a schema(s) in a manner where I can also use the functionality of an HTML5 browser. The obvious approach is to make the XML and HTML oblivious to each other.   This involves developing a means to tell the XML parser to ignore the HTML5 tags. One means to do this is to create a simple schema for HTML5 whose sole purpose is to permit XSD1.1 schemas to ignore the HTML tags. It also means that the HTML5 has to live with prefixes and in this case to not check the data-types of elements based upon the schemas. This use of prefixes probably will happen because it is necessary for RDFa to function.
2) I want to use HTML5 forms and similar technology with XML data-types that are validated by my schemas. 

I had suggested that a true XSD schema for XHTML5 would be the ideal solution, since it would permit checking the validity of the data in a web page by the recipient. However, most of what I have written has suggested that there were other ways to accomplish this.  I also have suggested that there was no one size fits all solution and that a graded approach such as: minimal, transitional, and strict would help. Since IDL does interface with strongly typed languages, such as Ada, there is no reason to believe that this would be difficult to do with XML schema. In fact, it may be feasible to automatically translate some of the IDL to XSD1.1. 

The bottom line is that a means has to be found to permit HTML5 and XML that is based upon schemas to work together. Since the functionality of HTML5 is extremely powerful and comprehensive, a means must be found to permit an efficient interface with XML. My own interest is that this combined functionality would be an informatics technology game changer that would greatly facilitate the development of a seamless health informatics technology including the small part of it that is the field of Cytometry. 
Yours,
Bob Leif

-----Original Message-----
From: Noah Mendelsohn [mailto:nrm@arcanedomain.com] 
Sent: Friday, September 30, 2011 6:16 AM
To: Henri Sivonen
Cc: public-html-xml@w3.org; Robert Leif
Subject: Re: New editor's draft of the HTML/XML TF Report



On 9/29/2011 4:28 AM, Henri Sivonen wrote:
> On Wed, Sep 28, 2011 at 9:45 PM, Robert Leif<rleif@rleif.com>  wrote:
>> My proposals were for XHTML5, which I believe, as of yet, has neither 
>> been implemented nor described in any detail.
>
> Your belief is incorrect. The HTML5 spec describes XHTML5 and Firefox, 
> Opera, Chrome, Safari and IE9 implement XHTML5 to the extent they 
> implement the corresponding features on the non-X HTML side.
>

At the risk of causing more confusion, let me offer a theory as to why everyone is talking past each other here. Earlier, Robert Leif wrote:

> I believe that possibility of developing a working interface between 
> XML and HTML5 can be maximized by concentrating on XHTML5. The 
> simplest solution would be to encourage Microsoft to make its HTML5 
> schema fully implement HTML5 and make this schema(s) available to the 
> rest of the software community at no charge. Microsoft’s and other 
> schema and XML validation tools including browsers would then have to 
> accept interleave elements, such as the one in XSD1.1.

This and other writings by Robert suggest that he takes a very schema-centric view of what XML is, and of how schemas can be fundamental to improving the HTML/XML story. I don't think that premise is shared by most of the others in this discussion, and so confusion is resulting.

The first two sentences quoted above, taken together, imply that blessing as standard an HTML5 schema from Microsoft would somehow cause HTML and XML to interoperate in a way that the would not without the schema. I think that's true only in a very particular sense. Obviously, if you are using an XML stack that happens to to use XSD or some other XML schema language (e.g. RelaxNG), then having a useful schema that captures some of the constraints required for valid HTML5 will help you get useful validations out, or might drive some tooling in an automated way, but that's about all it gets you. I've tried to word that quite carefully, in particular:

* Even many users who do want to use XML to process their HTML will not want to run schema validation, especially in production (as opposed to debugging). For those users, any schema will at most serve as documentation, and that documentation will almost surely be redundant with the normative documentation for HTML5 and XHTML5, both of which are contained in the draft specification for HTML5.

* Even for those who do wish to run XML Schema validation, no schema from Microsoft or anyone else can capture more than some of the constraints that you need for HTML validation. Yes, having that level of checking automated using XSD may be useful, but it can't fundamentally answer questions like: 
how compatible are the DOMs you get when you parse as XML vs. HTML, and that question is fundamental to having scripts run compatibly. Indeed, much of the polyglot discussion in this thread is aimed at dealing with or avoiding such incompatibilities, and XSD has little to say about them.

By the way, these limitations on XML schemas are not confined to HTML validation. Almost any XML vocabulary will have levels of validity requirements that XSD can't capture. For example, a mathematician using XSD might wish to contstrain the value of some attribute to be a prime integer. 
XSD can only check "integerness", and maybe something like {1,2,odd-number}. At a yet higher level, a schema for a purchase order can check that an element value resembles a credit-card number in format, but almost surely can't check that the card number is valid (e.g. has been issued and is not stolen).

I'm a fan of XSD and RelaxNG for what they're good at. Once the community figures out some good way of writing HTML as XML, and once we figure out which users want to do that, then by all means we should write schemas that capture as many useful constraints as we can. That will allow tools such as XSD to automate some checking, more strongly type some fields, perhaps facilitate autmatic binding into document management systems etc.

What I don't think is that writing such schemas is a first step, or is in any way fundamental to answering the questions confronting this task force: 
who wants to write HTML that is processable as XML?; what are their requirements? what XML/HTML format can we specify that would be convenient to use and would meet those needs well?; are there alternate modes (e.g. 
XML5) of processing XML-like documents that will result in DOM's that are more usefully compatible with what HTML users want and expect?; etc.  Or having for the moment mostly failed to find ways to do all of that, at least document what the requirements are and what good practice is given the specifications and tools as they exist.  I think thatt last bit is where I think the report is today. A compromise, but a useful step.

Robert: I don't know if the above makes sense to you, but the net of it is that I think you haven't yet convinced others that a schema-centric approach is the right way to tackle these issues. My apologies if I've misunderstood your ideas.

Noah
Received on Saturday, 1 October 2011 17:11:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 1 October 2011 17:11:16 GMT