Re: Proposed issue: site metadata hook from Chris Lilley on 2003-02-11 (www-tag@w3.org from February 2003)

From: Chris Lilley <chris@w3.org>
Date: Tue, 11 Feb 2003 23:32:44 +0100
To: tag@w3.org, Tim Berners-Lee <timbl@w3.org>
CC: www-tag@w3.org
Message-ID: <14463734593.20030211233244@w3.org>
On Monday, February 10, 2003, 5:01:43 PM, TimBL wrote:


TBL> The architecture of the web is that the space of identifiers
TBL> on an http web site is owned by the owner of the domain name.
TBL> The owner, "publisher",  is free to allocate identifiers
TBL> and define how they are served.

TBL> Any variation from this breaks the web.

Sorry, I am going to have to disagree right there. If those are your
axioms, then the analysis is flawed. (Portions of the analysis might
be correct, but if so, not for those reasons).

When the Web was young, I agree that a one to one mapping between
'content providers' and 'server administrators' (and, at that point,
'users') was a realistic model. This soon changed.

The first change was the introduction of the ~username convention, to
allow for university sites with many staff (or students) all on one
server.

The second change was the broadening of Web access from academia to
the general public, using modems rather than direct Internet access.
This required the notion of the ISP and, for that proportion of users
who also became content providers, the notion of hosting.

The third change was the introduction of virtual hosting, which does
help but tends to apportion multiple domains to a single physical
server rather than multiple users to a domain.

The last significant change that I can think of was the rise of 'free'
(as in beer with advertising) web hosting.

In consequence, for at least the past five years, the vast majority of
individuals who post content on the Web do not own the server, do not
control in any way how it is set up, and can only control content in
their own directories.

This undeniable fact has to be taken into account in the Web
Architecture. An Architecture that attempts to pretend that all
content providers have control of the entire server up to the root and
the config files is severely broken.

That being said, some consequences and breakage arising from failing
to take this into account:

XML 1.0 wisely added an encoding declaration to the actual content, to
allow content creators to express the character set used for encoding
their XML. This was then unwisely overridden with the charset
parameter in the XML media type RFC, thus ensuring that most
individual users could not correctly serve XML documents that were not
encoded as ASCII.  (TAG is already looking at some of this breakage).

Individual users, unable to add new media types, tend to serve up less
common media with the wrong type. This results in attempts to fix this
up with hints (ok) overrides (bad) etc. A general solution that does
not penalize individuals who do not have the financial or technical
resources to host and maintain their own web server should be found,
so that such content can be served correctly. (TAG is already looking
at consequences of widespread server misconfiguration).

Robots.txt and the like are another example of ignoring this issue
(which, to be fair, was much less visible at the time the convention
was developed). If robots.txt says the whole site can be crawled, it
is not possible for an individual user to restrict access to some or
all of their part; likewise if robots.txt says the whole site cannot
be crawled, it is not possible for an individual user to allow access
to some or all of their part. The proposed solution should address
this. Likewise, a proposed solution that relies on server config
(adding new HTTP headers) is unsuitable unless there is a general way
to add such information for individual users.

TBL> [... snip ...]
TBL> If, when these features were designed, there had been a
TBL> general way of attaching metadata to a web site, it would
TBL> not have been necessary.

Yes.

TBL> The TAG should address this issue and find a solution,
TBL> or put in place steps for a solution to be found,
TBL> which allows the metadata about a site, including that for
TBL> later applications, to be found with the minimum overhead
TBL> and no use of reserved URIs within the server space.

I agree that the TAG should do this.

The solution should also allow a distributed and scalable architecture
that allows multiple users to share one server, rather than the wildly
unrealistic and unfair constraint of the server owner controlling
everything.



-- 
 Chris                            mailto:chris@w3.org
Received on Tuesday, 11 February 2003 17:32:57 UTC