Re: XHTML 2.0 User Agent Conformance from David Woolley on 2003-11-01 (www-html@w3.org from November 2003)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sat, 1 Nov 2003 09:34:41 +0000 (GMT)
To: www-html@w3.org
Message-Id: <200311010934.hA19YfV05991@djwhome.demon.co.uk>
> What are the central goals behind the development of XHTML 2?  How is

I believe the key goal is the creation of a tool for the "semantic
web".  It has to eliminate presentational features and provide a
simple set of core functions (not usurp more sophisticated document
formats).  It has to describe the nature of its contents not the
presentation.

> it intended to differ from conforming XML + CSS?  For example, is it
> the intention that XHTML 2 user agents will have default knowledge
> of its vocabulary so that they can provide default rendering in the
> absence of CSS?

You are making the fundamental mistake of assuming that the purpose
of user agents is to blindly render a document for a human to *view*.
Scooter and the Google equivalent are user agents but they do not
render.  With high quality documents, other user agents could
analyse the full text of web pages to extract information useful to
their particular users.  A lot of work has been done on intelligent
agents by organisations like BT's Martlesham Labs.

Also, it just doesn't happen that authors provide CSS for all media.

Also, semantics free XML plus CSS only makes sense if the author
has total control of presentation, but having semantics rich XHTML
allows the reader to control the presentation to make all sources of
data consistent, easing their task.  It also allows a re-publisher
of syndicated content to impose their own house style on the material
whilst leaving editorial control with the originator.

If you take a longer term, science fiction certainly at least at the
moment, view, documents may not be rendered in any tangible way, but
injected straight into the recipient's brain.

There seem to be three main ways of using HTML:

1) As an advertising copy language (for which I think a page description
  language would better fit the designers' wants as they typically want
  total control) - such pages often have little informatiion content;

2) As a language for writing thin client data entry and database
  applications (when sold as third party products, or used on the public
  internet, these often have a significant element of item (1));

3) As a language for describing knowledge that is sufficiently weakly
  structured that it is dominated by plain (or at least technical jargon)
  language, but, nonetheless, contains much real information.

I don't believe that XHTML 2.0 has any pretensions about fulfilling niche
number (1).  I haven't looked far enough to see if it is attempting to
address item (2).

Item (3) tends not to be strongly obvious on the public internet,
because real knowledge is valuable intellectual property[A], but there
are some organisations that are heavy users of such documentation.
Large engineering based companies (but not software systems house, who
tend to be more in the business of selling people and fuzzy feelings
than knowledge) and pharmaceutical companies are examples.  ICI were a
very early example, who used to have a leading free text search engine
decades pre-web, as the result of developing it for an in house need.
Academics also need it.

Most of the companies that the public deal with are not knowledge based,
but, in computing terms, data based; their data is highly structured so
they are more likely to be interested in thin client type uses.  Their
applications don't fit well with a hypertext model.

It was, however, an early, if largely unfilled, promise that the web would
give the general public access to the world's knowledge wealth and the
ability to contribute to it.

Historically, the web was created for use (3).  However withing most
companies, it was not the research and engineering but the marketing
departments that became the big users.  Subsequently there has been
a move into area (2), but to reduce wealth loss, rather than wealth
creation.  Netscape particularly addressed area (1) and Microsoft
have always addressed area (2).  Whilst most businesses that make their
money supporting the web are in these areas, I think that content
provision will become more important as these areas saturate.  If there
is a support industry for area (3), it's skills are going to be in
librarianship, not graphics design or programming.

From a visual rendering point of view, I imagine, especially if you
have a CSS engine, supporting HTML use (3) will more or less come
free for a browser.

> Have such goals been publicly stated in a W3C document?

It's interesting to note who asks this sort of question.

[A] and where it is available the ability to machine process it is often
considered an undesirable property and outlawed by the site's terms of
use because the business model is based on the site being read by humans,
who also see the advertising.
Received on Saturday, 1 November 2003 05:19:05 UTC