- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Mon, 26 Jan 2004 10:41:01 +0200
- To: "ext Sandro Hawke" <sandro@w3.org>
- Cc: "ext Hammond, Tony (ELSLON)" <T.Hammond@elsevier.com>, "Thomas B. Passin" <tpassin@comcast.net>, ext Jeremy Carroll <jjc@hplb.hpl.hp.com>, "Phil Dawes" <pdawes@users.sourceforge.net>, www-rdf-interest@w3.org
On Jan 24, 2004, at 22:06, ext Sandro Hawke wrote: > >>> info:lccn/n78890351 > vs >>> http://info-uri.info/lccn/n78890351 >> >> But if these are non-dereferencible URIs, how do you stop every RDF >> web-crawler, information gatherer and clueless agent on the planet >> from attempting to HTTP-GET/MGET the billions of URIs in the >> namespace? >> Unless I'm missing something, as the number of these scale up, so to >> do does the amount of resources used in tackling 404'd requests. > > I'll be surprised if that turns out the be the big inefficiency of the > Semantic Web. Retrieving megabytes of data that turns out not to be > what you want -- that's much worse. Especially if you have to compute > for a long time to know if it's useless stuff. So I expect metadata > to be very valuable, saying which URIs are useful for what. Kind of > like the stuff search engines already store about each URI. Exactly. Alot of thought about scalability and efficiency of SW agents went into the design of URIQA (after all, we make mobile phones, which don't have gobs of memory or processing power to crunch data) and the benefit to being able to obtain a concise description of a particular resource (such as a vocabulary term) is that a given SW agent at a given time may only need information about a handful of resources -- and so it makes no sense to force the agent to download, process, and store (even temporarily) entire schemas or knowledge bases defining entire vocabularies (which can be *huge*). Yes, there will be some core vocabularies which the agent may have full knowledge of to start with, but as the SW grows, and more and more folks define vocabularies, SW agents will frequently encounter terms they do not recognize and will need to, at run time, obtain the information they need to understand those particular terms. And if (by keeping track of newly encountered terms) the agent determines that particular definitions or even entire schemas should be cached for improved future processing, fine. The key, though, is that when that agent encounters a URI, it is able to find out what it means, with *nothing more* than that URI and some standardized protocol(s). The description of the term can then lead to identification of the vocabulary to which it belongs, the entire schema(s) or models where it is officially defined, etc. URIQA is a starting point for resource discovery, which may lead to syndication/manipulation of entire schemas/models, but does not have to. And if the URIs denoting resources are not dereferencable, then they offer far less utility to the software agents which can potentially make our lives easier. > >> The only solution I can think of is to invent a dud subdomain (that >> doesnt exist) and let the DNS infrastructure deal with the 'doesn't >> exist' load (which it's much better placed to do). >> >> But then if you are going to do that, why not just invent a >> non-dereferencible URI scheme... Doh! > > Because it lets you change your mind later. And if each organization > (lccn, not info-uri.info) provides their own domain, it can be changed > on an per-organization basis. > > Mostly I think it'll turn out to be useful and cost-effective to make > all these URIs dereferenceable. When people really understand how > this all works, they'll realize it's often dumb to make a > non-dereferenceable identifier. If I'm going to go to the trouble to > create and publish an identifier for something, I want to leverage my > owning that identifier by getting into the dereference loop among > folks who choose to dereference. Precisely. Cheers, Patrick > > -- sandro > > -- Patrick Stickler Nokia, Finland patrick.stickler@nokia.com
Received on Monday, 26 January 2004 03:41:01 UTC