- From: Jim Davis <jdavis@parc.xerox.com>
- Date: Tue, 7 Jul 1998 14:46:45 PDT
- To: www-webdav-dasl@w3.org
Most of the discussion about scope has assumed that a scope is the URL of a collection, or at least that it is the URL of a WebDAV resource. I want to examine this assumption, and suggest broadening it. Although in the typical case I expect scopes to be URLs of collections, I claim it also makes sense that they be general URIs in some other cases. Consider a Web crawler such as AltaVista or Lycos. Such crawlers have metadata from millions of Web resources, none of which reside on the crawler. (The metadata is on the crawler, but not the resources themselves). Now suppose you wanted to implement DASL on such a crawler, and you wanted the ability to limit search to certain subset of the full Web (which has the topology of a tree), e.g. to search only documents from the US Department of Justice. One answer might be: Don't use scope to do this. Instead, the scope is the whole crawler, and the query should use a pattern match on a property that holds the URL of the resource, e.g. <where> <and> <like><prop><theurl></prop> <pattern>//*.doj.gov</pattern></like> <eq><prop><author/></prop> <literal>Sculley</literal></eq> </and> </where> But I think it's also reasonable to want to use scope to do this. Why? One reason is that the crawler might have different access to different scopes, depending on institutional relations between the crawler site and the remote site, and hence have different or better meta data for some scopes. Another reason is that, at least to me, it just seems natural. So what would such scopes look like? The notion you want to capture is a pattern on the domain names of hosts, e.g. *.doj.gov, so you might express this as x-scope://*.doj.gov. Even if you expressed it as http://doj.gov (to make it *look like* a URL), there's no reason to assume that there's a web server at http://doj.gov running WebDAV whose root collection contains all documents in *any* machine under doj.gov. So I propose: 1) a scope is named by a URI. A scope consists of a set of Web resources. 2) If the scope name is the URI of a WebDAV collection, then every resource in that collection (depending on the value of depth) is in the scope. 3) If the scope name is the URI of a different kind of Web resource, the scope is just that resource 4) otherwise, the set of resources is defined by the server. we don't need to provide (in DASL 1.0) means to discover the scopes supported by an arbiter. Crawlers will find some other ways to express this. (Besides that, given the current business model of Crawlers, which is based on selling eyeballs to advertisements, it's not clear any commercial crawler will support DASL.) See also the next message "broaden 'scope' to be any kind of WebDAV resource"
Received on Tuesday, 7 July 1998 17:50:02 UTC