Re: URI: Name or Network Location? from Patrick Stickler on 2004-01-26 (www-rdf-interest@w3.org from January 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 26 Jan 2004 10:26:50 +0200
To: "ext Phil Dawes" <pdawes@users.sourceforge.net>
Cc: ext Sandro Hawke <sandro@w3.org>, "ext Hammond, Tony (ELSLON)" <T.Hammond@elsevier.com>, "Thomas B. Passin" <tpassin@comcast.net>, ext Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-rdf-interest@w3.org
Message-Id: <635B7CDA-4FD9-11D8-8079-000A95EAFCEA@nokia.com>

On Jan 23, 2004, at 06:28, ext Phil Dawes wrote:

> Hi Patrick,
>
> Patrick Stickler writes:
>>
>> http: based PURLs work just fine. As I've pointed out before, you
>> can accomplish all that you aim to accomplish with the info: URI
>> scheme by simply using http: URIs grounded in your top level
>> domain, delegating control of subtrees of that namespace to the
>> various managing entities per each subscheme (the same is true
>> of urn: URIs). Then each http: URI can be associated with an
>> alias to which it redirects, as well as allow for access to
>> metadata descriptions via solutions such as URIQA[1]. E.g.
>> rather than
>>
>>     info:lccn/n78890351
>>
>> you'd have
>>
>>     http://info-uri.info/lccn/n78890351
>>
>
> But if these are non-dereferencible URIs,

They are not non-dereferencable. They simply may not resolve to
any representations. There's a difference.

> how do you stop every RDF
> web-crawler, information gatherer and clueless agent on the planet
> from attempting to HTTP-GET/MGET the billions of URIs in the
> namespace?

Why would you care to?

And what if key authoritative knowledge *is*, by common good
practice, made available via resolution of those URIs? Why
would you want to prohibit such an effective means of
knowledge interchange?

> Unless I'm missing something, as the number of these scale up, so to
> do does the amount of resources used in tackling 404'd requests.
>

I'm sorry, but it seems to me that you are seeing ghosts and
phantoms where there are none.

If a GET or MGET to some URI fails to provide a useful response,
then it just does. And crawlers will just move on. But if those
requests *do* provide a useful response, then those indices are
all the more useful.

My "vision" is that there would arise a new breed of crawler that
would be gathering knowledge about resources (rather than merely
indexing textual content of representations) allowing for far
more precise, effective searching of web-accessible content.

With URIQA[1] in conjunction with HTTP, one URI can (potentially)
provide you representations and/or a formal (RDF) description. Note
the "and/or". For any given resource, there may only be available
a representation, or a description (e.g. for vocabulary terms), or
both.

Thus, because I consider knowledge about resources (even terms
in various controlled vocabularies such as are a primary focus
of the info: URI scheme) to be highly valuable information that
should be accessible to software agents in a consistent, efficient
manner, I simply can't fathom any real benefit to having a URI
which, by definition, cannot be used to access such knowledge.

Here's a simple use case to illustrate: a software agent encounters
some URI info:foo:blargh. It has no idea what it means. It's stuck.
Or it has to rely on proprietary, hard coded means to discover what
that term means. Alternately, there is rather an analogous URI
http://info.org/foo/blargh where the owner of that URI has made
accessible an RDF description of that resource whenever a request

    MGET http://info.org/foo/blargh HTTP/1.0

is issued (the details of the resolution being based on redirection
from the root info.org server to the subnamespace owner's server, etc.)

Now, that software agent has a formal definition of what that URI
denotes, and can (possibly/hopefully) do something useful with that
information in its subsequent processing.

Now, maybe most folks who would mint http://info.org/* URIs won't
care or bother to provide either representations or descriptions
of their resources -- but for those who do, we can all then exploit
a globally deployed, consistent, and efficient solution for accessing
those representations and descriptions.

It is for this reason that I am against solutions such as the info:
URI scheme which deliberately hobble the web rather than leaving it
up to the users to decide (especially since such decisions can change,
and if your technology precludes changing your mind, then your just
out of luck).

> The only solution I can think of is to invent a dud subdomain (that
> doesnt exist) and let the DNS infrastructure deal with the 'doesn't
> exist' load (which it's much better placed to do).
>
> But then if you are going to do that, why not just invent a
> non-dereferencible URI scheme... Doh!
>

Again, I just don't see why folks would be opposed to (potentially)
dereferencable URIs.

Patrick

> Cheers,
>
> Phil
>
>
>

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com

Received on Monday, 26 January 2004 03:35:59 UTC