- From: Chris Lilley <chris@w3.org>
- Date: Tue, 11 Feb 2003 23:32:44 +0100
- To: tag@w3.org, Tim Berners-Lee <timbl@w3.org>
- CC: www-tag@w3.org
On Monday, February 10, 2003, 5:01:43 PM, TimBL wrote: TBL> The architecture of the web is that the space of identifiers TBL> on an http web site is owned by the owner of the domain name. TBL> The owner, "publisher", is free to allocate identifiers TBL> and define how they are served. TBL> Any variation from this breaks the web. Sorry, I am going to have to disagree right there. If those are your axioms, then the analysis is flawed. (Portions of the analysis might be correct, but if so, not for those reasons). When the Web was young, I agree that a one to one mapping between 'content providers' and 'server administrators' (and, at that point, 'users') was a realistic model. This soon changed. The first change was the introduction of the ~username convention, to allow for university sites with many staff (or students) all on one server. The second change was the broadening of Web access from academia to the general public, using modems rather than direct Internet access. This required the notion of the ISP and, for that proportion of users who also became content providers, the notion of hosting. The third change was the introduction of virtual hosting, which does help but tends to apportion multiple domains to a single physical server rather than multiple users to a domain. The last significant change that I can think of was the rise of 'free' (as in beer with advertising) web hosting. In consequence, for at least the past five years, the vast majority of individuals who post content on the Web do not own the server, do not control in any way how it is set up, and can only control content in their own directories. This undeniable fact has to be taken into account in the Web Architecture. An Architecture that attempts to pretend that all content providers have control of the entire server up to the root and the config files is severely broken. That being said, some consequences and breakage arising from failing to take this into account: XML 1.0 wisely added an encoding declaration to the actual content, to allow content creators to express the character set used for encoding their XML. This was then unwisely overridden with the charset parameter in the XML media type RFC, thus ensuring that most individual users could not correctly serve XML documents that were not encoded as ASCII. (TAG is already looking at some of this breakage). Individual users, unable to add new media types, tend to serve up less common media with the wrong type. This results in attempts to fix this up with hints (ok) overrides (bad) etc. A general solution that does not penalize individuals who do not have the financial or technical resources to host and maintain their own web server should be found, so that such content can be served correctly. (TAG is already looking at consequences of widespread server misconfiguration). Robots.txt and the like are another example of ignoring this issue (which, to be fair, was much less visible at the time the convention was developed). If robots.txt says the whole site can be crawled, it is not possible for an individual user to restrict access to some or all of their part; likewise if robots.txt says the whole site cannot be crawled, it is not possible for an individual user to allow access to some or all of their part. The proposed solution should address this. Likewise, a proposed solution that relies on server config (adding new HTTP headers) is unsuitable unless there is a general way to add such information for individual users. TBL> [... snip ...] TBL> If, when these features were designed, there had been a TBL> general way of attaching metadata to a web site, it would TBL> not have been necessary. Yes. TBL> The TAG should address this issue and find a solution, TBL> or put in place steps for a solution to be found, TBL> which allows the metadata about a site, including that for TBL> later applications, to be found with the minimum overhead TBL> and no use of reserved URIs within the server space. I agree that the TAG should do this. The solution should also allow a distributed and scalable architecture that allows multiple users to share one server, rather than the wildly unrealistic and unfair constraint of the server owner controlling everything. -- Chris mailto:chris@w3.org
Received on Tuesday, 11 February 2003 17:32:57 UTC