Draft description of new TAG issue TagSoupIntegration-54

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On its telephone conference earlier today, the TAG agreed to open a
new issue, TagSoupIntegration-54.  This message contains a first draft
of the description of this issue for the issues list [1]. Comments and
suggested changes are invited, as experience to date suggests that
getting a satisfactory definition of exactly what's at issue here is
tricky.

- --------

Is the indefinite persistence of 'tag soup' HTML* consistent with a
sound architecture for the Web?  If so, (and the going-in assumption
is that it _is_ so), what changes, if any, to fundamental Web
technologies are necessary to integrate 'tag soup' with SGML-valid
HTML and well-formed XML?

Heretofore W3C official policy has been not only to encourage the
'withering away' of non-XML content on the Web, but to insist on it.
The possibility of a change in this policy, towards one of at least
tolerance of, and perhaps even support for, SGLM-valid HTML and even
'tag soup', has recently been advocated and taken seriously in various
quarters.  The TAG does not make policy, and it is off-topic for this
list to discuss policy issues, but the TAG definitely _does_ consider
architectural issues, and such a change would undoubtedly ask a number
of questions of Web archicture.

The TAG is interested in exploring ways in which 'tag soup' HTML and
SGML-valid HTML can be thoroughly integrated with the XML-orientated
Web, enjoying its many benefits as much as is possible.  Among the
topics to be explored in this connection are:

 * Can we standardize a series of "as if" propositions for non-XML HTML:
   1) Treat it "as if" it had been processed by [some formalization
      of] 'tidy -asxhtml';
   2) Treat it "as if" it had a default namespace declaration
      determined by its media type;
   3) Treat it "as if" it was the serialization of the DOM produced by
      [some formalization of] common browser error recovery
      strategies?

 * Can we successfully apply to non-XML web content the modularization
   and composition stories under development for well-formed XML
   documents which mix namespaces (e.g. SVG, MathML, RDF) with XHTML?

 * In particular, can our rather more tentative understanding of what
   is meant by "self-describing documents" likewise be applied to
   non-XML plus SVG. . .?

 * Should "as if" number (2) above be extended (contra recent TAG
   finding Authoritative Metadata [2]) to include some form of
   'sniffing'?

 * Can we leverage the common-sense understanding of the phrase "the
   HTML P element" as some kind of abstraction over language/version
   details and exploit some of our developing understanding of
   versioning to manage the relationship between 'tag soup',
   SGML-valid HTML and well-formed XHTML?

*By 'tag soup' HTML is meant documents which are not well-formed
XHTML, or even SGML-valid HTML, but which none-the-less are
more-or-less successfully and consistently rendered by some HTML
browsers.  Estimates of the percentage of HTML-family web-pages
currently being served which are neither well-formed XML nor
SGML-valid HTML vary widely: a quick sample of reports gives 1.5%,
80%, 82%, 91%, 97.8%, 99% and 99.3% for different sample spaces and
different times!

- ------
Please note that in so far as it's appropriate to discuss on this
_public_ list the relationship of the TAG's interests in this area to
the ongoing discussion about the W3C's stewardship of HTML (see e.g.
[3]), please do so only with reference to information which is
likewise public.  Having said that, I'd much prefer to see discussion
about the architectural issue itself. . .

ht

[1] http://www.w3.org/2001/tag/issues.html
[2] http://www.w3.org/2001/tag/doc/mime-respect.html
[3] http://lists.w3.org/Archives/Public/www-forms/2006Aug/0153.html
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFFPoLpkjnJixAXWBoRAoctAJkBQT1OPqErOs3dH4EJUl4ll3zFQACcDsDq
0ILyAAsg0HQVXXw/wNM5H2Q=
=L+yI
-----END PGP SIGNATURE-----

Received on Tuesday, 24 October 2006 21:17:39 UTC