Re: Revised HTML/XML Task Force Report from Bjoern Hoehrmann on 2011-07-17 (www-tag@w3.org from July 2011)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 17 Jul 2011 23:53:46 +0200
To: Larry Masinter <masinter@adobe.com>
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <egi627tbtgbn99lqslf8dluaiecvklr01r@hive.bjoern.hoehrmann.de>

* Larry Masinter wrote:
>But the polyglot specification referenced above does not address the questions I asked:
>
>* How common are HTML documents that cannot be  reasonably represented in XML?

That's not a very meaningful question. Suppose there were only two web
pages, one is visited by one person once a year, the other is visited
by one billion people every day. Or suppose there was a million pages,
one handcrafted, and 999 999 automatically derived from a template. If
you associate some property with one group but not the other, how would
you determine how "common" documents with that property are, and how is
knowing that useful?

If you made a proxy server that transforms text/html responses by using
a "HTML5 parser" and a "XHTML5 serializer" into application/xhtml+xml
before they hit your fully "HTML5 compliant" client, then you'd probably
encounter web sites that do not work properly on a daily of not hourly
basis (read fully compliant as meeting all must- and should-level re-
quirements and then optimizing for compatibility with this setup).

One issue is that, last I heard, document.write is not supposed to work
with application/xhtml+xml under the "HTML5" requirements (the reasoning
is likely that it could be made to work for some content, but it's ugly
and needs a lot of work with little utility). There are other issues.

>* What kinds of practical difficulties would arise for those documents
>(and how serious are those difficulties)?

Any number of "doesn't work properly" issues would arise. It should be
easy to implement the setup I mentioned above to find out if you get a
couple of volunteers to do their daily browsing through such a proxy.
It's not feasible to answer this in some automated manner with any kind
of clarity in numbers (halting problem, too many possibilities to inter-
act with web sites, hard to automate, plus the problem I mentioned at
the beginning of this message).

My impression is that you would like to know whether there are things
left that could be done to make "HTML" and "XHTML" more similar, or you
would like to have some evidence that most of what could reasonably be
done, has been done (you might also like to understand all the indivi-
dual points; I don't think you would get that without a lot of research
on your own part, as there is a lot of bias surrounding these issues).

I do not think there is much in that regard where you could make a con-
vincing argument for change, so in that sense we are "fine" to some ex-
tent. We would be better off if "we" had fixed issues surrounding XHTML
when they were raised, when there was interest in and innertia around
it. As it is, browsers and other applications that do not support XHTML
are dying. If you have an interest in XHTML then make sure they actually
will be gone soon. We can then re-discover XHTML at some point and fix
issues as people with actual interest in using it encounter them. As it
is, I've not seen so much as an attempt at running new implementations
through any kind of test suite to check for common bugs or whatever...
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Sunday, 17 July 2011 21:54:03 UTC