Re: HTML/XML TF report introductory text from Noah Mendelsohn on 2011-10-12 (public-html-xml@w3.org from October 2011)

From: Noah Mendelsohn <noah@arcanedomain.com>
Date: Tue, 11 Oct 2011 22:18:44 -0400
To: Norman Walsh <ndw@nwalsh.com>
CC: public-html-xml@w3.org
Message-ID: <4E94F904.3000607@arcanedomain.com>
On 9/27/2011 10:11 AM, Norman Walsh wrote:
> Noah,
>
> A few weeks ago, you set out to draft some new introductory text for
> our report. I realize that I've also redrafted that section slightly.
> Are you satisfied with the result, or are you still working on a
> proposal for additional changes?
>
>                                          Be seeing you,
>                                            norm

OK, here's a cut at it:

-----------------------------------

HTML and XML share a common ancestor in SGML. The precise details of that 
ancestry are not strictly important, its significant consequence is that 
HTML and XML have a quite similar surface syntax. Both use angle brackets 
and ampersands to distinguish "markup" characters from "content" 
characters. Both have elements which contain other content and elements 
which are empty.

This high level of surface similarity suggests, at least to some and at 
least at first, that there should be a high level of interoperability 
between HTML and XML systems. This notion is amplified by the fact that 
when XML arrived on the scene, well after HTML was widely deployed, efforts 
were made to recast HTML as an XML application rather than an SGML 
application. HTML was never broadly implemented as an "SGML application", 
but it was defined as one in the early HTML specifications.

However, if you look beyond those high-level generalities, the languages 
are quite different and serve quite different purposes. Where HTML is a 
single language, XML is a framework for defining languages. Where HTML 
defines how a tree is constructed from any input, XML only defines tree 
construction for a small subset of all possible inputs. Where HTML defines 
explicit extension points within a single vocabulary, XML encourages the 
use of multiple vocabularies defined in a distributed fashion. Where HTML 
is in a small, explicit set of namespaces, XML provides for an unbounded 
number of namespaces.

Nonetheless, there are a number of potential benefits that might result if 
XML and HTML were more compatible and interoperable.These include:

  * ·XML tools, including database and content-management systems, as well
    as the export/import capabilities provided in many programs such as
    spreadsheets, might be directly usable with HTML.

  * The same XML markup, e.g. for content management or for vector graphics
    (SVG), might be usable in HTML as well as other XML container
    documents.Such shared markup might be supported by common code and
    tooling, and copy/paste scenarios might be supported.

  * HTML fragments might more easily be copied for use in XML container
    documents. Syntax rules learned for use in one context would work in
    the other.

  * Some overlap might be eliminated from specifications, e.g. rules for
    embedding SVG into XML and HTML container documents might be specified
    just once, rather than in duplicate and slightly differently.

  * Etc., etc.

Against the backdrop of this tension, the TAG formed this Task Force in 
order to explore how interoperability between HTML and XML could be 
improved. The Task Force began by collecting use cases to focus its 
efforts. The original expectation was that a set of the use cases would 
highlight those areas w_h_ere additional work changes to XML and/or HTML 
specifications could usefully improve aid in the 
interoperability.betweenXML and HTML. Then aHowever, the task force could 
not identify any such changes that would provide practical benefit, and 
that would likely be widely deployed in practice. All of the use cases do 
appear to have at least plausible solutions todayusing XML as deployed 
today and HTML5 as planned. solutions that do not appear amenable to 
significant improvement, So, it appears that there is little that can be 
usefully be done now beyond documenting these circumstances.

In the following section, we'll describe a set of use cases that the Task 
Force considered, and how the needs of those use cases can be met today. 
Readers are particularly encouraged to report additional use cases that 
they feel are not represented or specific examples where the solutions 
outlined are not appropriate.

A note about terminology: there are a great many ways to represent the 
"object model" of an HTML or XML document. There are specifications for 
both abstract and concrete representations. As a simplification, we use the 
term "DOM" (Document Object Model) throughout as a general term for any of 
these possible representations.

-----------------------------------


Noah
Received on Wednesday, 12 October 2011 02:19:14 UTC