- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Tue, 24 Oct 2006 22:17:24 +0100
- To: www-tag@w3.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On its telephone conference earlier today, the TAG agreed to open a new issue, TagSoupIntegration-54. This message contains a first draft of the description of this issue for the issues list [1]. Comments and suggested changes are invited, as experience to date suggests that getting a satisfactory definition of exactly what's at issue here is tricky. - -------- Is the indefinite persistence of 'tag soup' HTML* consistent with a sound architecture for the Web? If so, (and the going-in assumption is that it _is_ so), what changes, if any, to fundamental Web technologies are necessary to integrate 'tag soup' with SGML-valid HTML and well-formed XML? Heretofore W3C official policy has been not only to encourage the 'withering away' of non-XML content on the Web, but to insist on it. The possibility of a change in this policy, towards one of at least tolerance of, and perhaps even support for, SGLM-valid HTML and even 'tag soup', has recently been advocated and taken seriously in various quarters. The TAG does not make policy, and it is off-topic for this list to discuss policy issues, but the TAG definitely _does_ consider architectural issues, and such a change would undoubtedly ask a number of questions of Web archicture. The TAG is interested in exploring ways in which 'tag soup' HTML and SGML-valid HTML can be thoroughly integrated with the XML-orientated Web, enjoying its many benefits as much as is possible. Among the topics to be explored in this connection are: * Can we standardize a series of "as if" propositions for non-XML HTML: 1) Treat it "as if" it had been processed by [some formalization of] 'tidy -asxhtml'; 2) Treat it "as if" it had a default namespace declaration determined by its media type; 3) Treat it "as if" it was the serialization of the DOM produced by [some formalization of] common browser error recovery strategies? * Can we successfully apply to non-XML web content the modularization and composition stories under development for well-formed XML documents which mix namespaces (e.g. SVG, MathML, RDF) with XHTML? * In particular, can our rather more tentative understanding of what is meant by "self-describing documents" likewise be applied to non-XML plus SVG. . .? * Should "as if" number (2) above be extended (contra recent TAG finding Authoritative Metadata [2]) to include some form of 'sniffing'? * Can we leverage the common-sense understanding of the phrase "the HTML P element" as some kind of abstraction over language/version details and exploit some of our developing understanding of versioning to manage the relationship between 'tag soup', SGML-valid HTML and well-formed XHTML? *By 'tag soup' HTML is meant documents which are not well-formed XHTML, or even SGML-valid HTML, but which none-the-less are more-or-less successfully and consistently rendered by some HTML browsers. Estimates of the percentage of HTML-family web-pages currently being served which are neither well-formed XML nor SGML-valid HTML vary widely: a quick sample of reports gives 1.5%, 80%, 82%, 91%, 97.8%, 99% and 99.3% for different sample spaces and different times! - ------ Please note that in so far as it's appropriate to discuss on this _public_ list the relationship of the TAG's interests in this area to the ongoing discussion about the W3C's stewardship of HTML (see e.g. [3]), please do so only with reference to information which is likewise public. Having said that, I'd much prefer to see discussion about the architectural issue itself. . . ht [1] http://www.w3.org/2001/tag/issues.html [2] http://www.w3.org/2001/tag/doc/mime-respect.html [3] http://lists.w3.org/Archives/Public/www-forms/2006Aug/0153.html - -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFFPoLpkjnJixAXWBoRAoctAJkBQT1OPqErOs3dH4EJUl4ll3zFQACcDsDq 0ILyAAsg0HQVXXw/wNM5H2Q= =L+yI -----END PGP SIGNATURE-----
Received on Tuesday, 24 October 2006 21:17:39 UTC