Re: SemWeb Non-Starter -- Distributed URI Discovery from Patrick Stickler on 2005-04-04 (semantic-web@w3.org from April 2005)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 4 Apr 2005 09:17:39 +0300
To: "ext Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
Cc: <www-rdf-interest@w3.org>, "Stephen Rhoads" <rhoadsnyc@mac.com>, <semantic-web@w3.org>
Message-Id: <f91022d5da93ed7e53d08b4732fd3d09@nokia.com>
On Apr 1, 2005, at 17:10, ext Miles, AJ ((Alistair)) wrote:

>
>> As far as I can tell, there is no formal, generalized
>> mechanism to reliably query the owner of a URI in order to
>> obtain an RDF Description of that URI.  And this is a serious
>> impediment to the Semantic Web.
>
> I think this hits the nail on the head.
>
> A couple of thoughts ...
>
> First this extends beyond HTTP - how might one implement this for URN 
> for example?  Considering HTTP URIs only, URIQA is the only thing I 
> know of that actually satisfies this requirement fully, but it would 
> take a lot of time and effort to migrate to an http web that supports 
> URIQA.


Actually, because URIQA is based at the lowest architectural layer of 
the web, the
HTTP protocol itself, adoption of URIQA is orders of magnitude easier 
and less
costly than other "best practice" solutions (e.g. special headers, 
embedded metadata,
content negotiation, etc.) because implementation and deployment of the 
fundamental
URIQA functionality can be constrained to the web server platform 
itself, either
natively or by plug in, and each web site owner does not have to 
introduce, police,
and manage the practices of each user, but rather, each user is free to 
exploit
the standardized functionality made available for describing resources.

It is my long held view that bootstrapping knowledge discovery for the 
semantic
web, as an interface between the web and semantic web layers, must be 
an integral
part of the foundational web machinery.

And it should have the same level of robustness as do the traditional 
web methods,
such that clients can confidently determine the success or failure of a 
request
for a description of a resource based on the response code alone, just 
as for
a request for a representation of a resource. E.g. with a header based 
approach,
if the special header is not understood by the server, it can simply 
ignore it
and the client cannot be sure its request was properly understood (or 
the solution
has to add yet another mechanism for reqeust validation); whereas with 
URIQA,
if a server does not implement the method, a 5xx response is returned 
-- and if
it does implement the method, the behavior is strictly defined. Thus, 
the
sw client is able to utilize the existing request validation machinery 
of HTTP
the same as it does for requests for representations.


> Any other ideas?
>
> Second, there is a more general discovery requirement which can be 
> loosely phrased as, 'I want to find out what x said about y,' or, 'who 
> said what about what?'  I have no ideas for how to solve that.

This question nicely illustrates that there are various forms/levels
of knowledge discovery, with different requirements and likely different
solutions.

The question "what resource is identified by this URI and what is it 
like"
is a more fundamental question than the questions above.

We will need a small, standardized, but likely multifaceted toolbox of
solutions for answering the various forms of questions that will need
to be asked by sw agents.

SPARQL, and centralized, third party knowledge stores (ideally employing
named graphs allowing for trust/source management) will surely be part
of that toolbox.

But I also contend that URIQA, providing simple, bootstrapping discovery
of authoritative knowledge (surely utilized in building those 
centralized
third party knowledge stores) will also be an essential component of 
that
toolbox.

> Finally, scoping this problem to just http uris used to denote 
> non-information resources, my currently favoured position is that all 
> such resources (whether denoted by hash or slash URIs) should support 
> the SPARQL protocol (when it's done :)

Agreed. It's important to point out, though, that even as solutions such
as SPARQL and URIQA support querying about all resources, regardless of
the lexical nature of their URIs, the lexical nature of URIs is 
nevertheless
significant to HTTP, such that URIs with fragment identifiers (hash) 
will
always incur more processing overhead for various functions, and 
non-HTTP
URIs pose more fundamental resolvability challenges, and thus
there will still be usability-related choices to be made as to what form
of URIs are used to identify resources.

(my advice is, unless otherwise governed by special requirements, to
always use non-hash http: URIs, which will provide for the greatest
utility and fewest headaches)

Regards,

Patrick


>
> Cheers,
>
> Al.
>
>
>
>
>>
>> "hashing" at least gets you part of the way because -- given
>> an HTTP URI containing a hash and frag ID -- it is *likely*
>> that one can dereference the URI into a document containing
>> (amongst other things) an RDF description of the URI in question.
>>
>> For example, if I encounter the URI
>>
>> http://www.somemediacompany.com/rdfdata/music/classical#resource
>>
>> chances are I can dereference
> "http://www.somemediacompany.com/rdfdata/music/classical" and find 
> within that document an RDF description of "#resource".
>
> If, one the other hand, I encounter
>
> http://www.somemediacompany.com/rdfdata/music/classical/resource
>
> then I can't make any assumptions about whether or not this URI refers 
> to some sort of document containing an RDF description of "resource".  
> The URI owner may just have chosen to mint URIs using some logical 
> hierarchy.
>
> So, given an arbitrary URI, how can I obtain an RDF Description of 
> that URI?
>
> I suppose I could crawl the domain "containing" the URI with a spider 
> and harvest RDF data until I find the description I'm looking for, but 
> that's a bit of a mess.  And it certainly doesn't scale.
>
> I read up a bit on SPARQL -- particularly the "SPARQL Protocol for 
> RDF" -- and, unless I'm misunderstanding, it seems to be the intended 
> long term solution to the problem described herein.  Is that correct?  
> Is it expected that URI owner/minters will operate some sort of SPARQL 
> server for providing RDF Descriptions of their URIs?  Will there be 
> some convention as to the location of these servers such that one can 
> *reliably* and automatically query for an RDF Description of a URI?
>
> Have I framed this problem correctly?  Are there solutions or angles 
> which I have missed?  Input would be greatly appreciated.
>
> --- Stephen
>
> [1] http://www.dmmp.org (Digital Media Metadata Project)
>
>
>
>
>
>
>
>
>
>
>
Received on Monday, 4 April 2005 06:18:20 UTC