Re: URI: Name or Network Location?

On Jan 24, 2004, at 22:06, ext Sandro Hawke wrote:

>
>>>     info:lccn/n78890351
> vs
>>>     http://info-uri.info/lccn/n78890351
>>
>> But if these are non-dereferencible URIs, how do you stop every RDF
>> web-crawler, information gatherer and clueless agent on the planet
>> from attempting to HTTP-GET/MGET the billions of URIs in the
>> namespace?
>> Unless I'm missing something, as the number of these scale up, so to
>> do does the amount of resources used in tackling 404'd requests.
>
> I'll be surprised if that turns out the be the big inefficiency of the
> Semantic Web.  Retrieving megabytes of data that turns out not to be
> what you want -- that's much worse.  Especially if you have to compute
> for a long time to know if it's useless stuff.  So I expect metadata
> to be very valuable, saying which URIs are useful for what.  Kind of
> like the stuff search engines already store about each URI.

Exactly.

Alot of thought about scalability and efficiency of SW agents went into
the design of URIQA (after all, we make mobile phones, which don't
have gobs of memory or processing power to crunch data) and the benefit
to being able to obtain a concise description of a particular resource
(such as a vocabulary term) is that a given SW agent at a given time
may only need information about a handful of resources -- and so it
makes no sense to force the agent to download, process, and store
(even temporarily) entire schemas or knowledge bases defining entire
vocabularies (which can be *huge*).

Yes, there will be some core vocabularies which the agent may have
full knowledge of to start with, but as the SW grows, and more and
more folks define vocabularies, SW agents will frequently encounter
terms they do not recognize and will need to, at run time, obtain
the information they need to understand those particular terms. And
if (by keeping track of newly encountered terms) the agent determines
that particular definitions or even entire schemas should be cached
for improved future processing, fine.

The key, though, is that when that agent encounters a URI, it is
able to find out what it means, with *nothing more* than that URI
and some standardized protocol(s). The description of the term
can then lead to identification of the vocabulary to which it
belongs, the entire schema(s) or models where it is officially
defined, etc.

URIQA is a starting point for resource discovery, which may lead
to syndication/manipulation of entire schemas/models, but does
not have to.

And if the URIs denoting resources are not dereferencable, then
they offer far less utility to the software agents which can
potentially make our lives easier.

>
>> The only solution I can think of is to invent a dud subdomain (that
>> doesnt exist) and let the DNS infrastructure deal with the 'doesn't
>> exist' load (which it's much better placed to do).
>>
>> But then if you are going to do that, why not just invent a
>> non-dereferencible URI scheme... Doh!
>
> Because it lets you change your mind later.   And if each organization
> (lccn, not info-uri.info) provides their own domain, it can be changed
> on an per-organization basis.
>
> Mostly I think it'll turn out to be useful and cost-effective to make
> all these URIs dereferenceable.  When people really understand how
> this all works, they'll realize it's often dumb to make a
> non-dereferenceable identifier.  If I'm going to go to the trouble to
> create and publish an identifier for something, I want to leverage my
> owning that identifier by getting into the dereference loop among
> folks who choose to dereference.

Precisely.

Cheers,

Patrick


>
>      -- sandro
>
>

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com

Received on Monday, 26 January 2004 03:41:01 UTC