RE: URIQA questions from Patrick.Stickler@nokia.com on 2003-07-07 (www-rdf-interest@w3.org from July 2003)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 7 Jul 2003 09:39:40 +0300
To: <b.fallenstein@gmx.de>
Cc: <www-rdf-interest@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBBED@trebe006.europe.nokia.com>
> -----Original Message-----
> From: ext Benja Fallenstein [mailto:b.fallenstein@gmx.de]
> Sent: 06 July, 2003 02:30
> To: Stickler Patrick (NMP/Tampere)
> Cc: rdf-i
> Subject: URIQA questions
> 
> 
> 
> Dear Patrick,
> 
> reading the URIQA spec at
> 
> http://216.239.39.100/search?q=cache:HHP1uPSr1EUJ:sw.nokia.com
> /URIQA.html+uriqa&hl=en&ie=UTF-8
> 
> (for some reason I cannot access sw.nokia.com at the moment), 

Our firewalls got upgraded last weekend and for some reason the
routing to that particular server got munged. It's being fixed.
Sorry.

> I'm left 
> wondering about two points. I would expect to be able to use 
> the URIQA 
> interface to query any information I need about a resource from its 
> authoriative server, 

Well, any authoritative information, at least. Not necessarily
any information that is available everywhere on the SW on ever
server.

> but if I'm not missing something, this 
> would not be 
> possible.
> 
> 
> First, the directionality in concise bounded resource 
> descriptions seems 
> an arbitrary restriction that means I cannot query many, but not all 
> statements about a resource. 

A concise bounded restriction is entirely resource-centric. It's
not meant to replace or compete with more general query services
or functionality, even those operating on the same knowledge
base.

It is a lowest common denominator which can be broadly and 
effectively implemented by all semantic web enabled servers,
regardless of what additional functionality those servers might
offer.

We can't expect every web server on the planet to implement an
identical high level query API. Even if such a standard API
exists (and there are folks working on it) we can't expect every
web server to provide it, yet the URIQA model is both simple
enough yet powerful enough to provide a common, standardized
core of functionality for resource discovery that will enable
SW agents to obtain authoritative knowledge about resources
based solely on their URIs.

> Unless the vocabulary in use has been 
> designed to be used with URIQA, I may be missing out on something 
> important. For example, consider this graph:
> 
>      <http://example.org/>   homePageOf   
> <http://example.org/institute>.
>      <http://example.org/institute>   rdf:label   "The Foo 
> insititute" .
>      <http://example.org/institute>   founded     "2003-03-17" .
> 
> If I start from http://example.org/, all is fine; I can find the 
> institute URI and then query its own description to get the name and 
> founding date.
> 
> But if I start at http://example.org/institute, I have no way 
> of finding 
> out about its home page, even though the semweb server knows about it 
> and this can certainly be considered crucial information 
> regarding the 
> institute.

With most SW and knowledge representation operations, ontology counts
for alot.

> There could of course be an inverse hasHomePage property, but why add 
> this if it isn't necessary, except for URIQA?

Well, this of course presumes that you know about the property homePageOf.


> It would seem much better to make the concise bounded description go 
> both forward and backward-- i.e., also include statements in 
> which the 
> queried URI is an *object*, and so on.

A concise bounded description is intended to be an optimal amount
of knowledge for efficient interchange. If you include such a reverse
lookup, the result graphs can quickly grow to be huge, depending
on the resource. 

E.g., if you want a description about a URI denoting a classificatory
term, in order to get its labels, description, etc., including all
statements where it is the object could make the results jump from
a dozen or so statements to millions of statements, particularly
if inference is used (which is to be expected in many/most cases).

E.g. if subclass inference is employed, and one were to submit
a query for rdf:Resource, one would get a statement for every
resource that exists in the knowledge base (in addition to the
rest of the relevant statements) since it can be infered that

  Ax ( x rdf:type rdf:Resource )

etc.

Even if inference is not employed, the resulting graphs can be
very large (unexpectedly large) and could easily overload smaller
agents.


> Second, the concise bounded description may include URIs not 
> under the 
> authority of the same web server, and yet the knowledge this 
> server has 
> about that URI may be crucial in interpreting the meaning of the 
> statements served by the server. We can query this 
> information using the 
> URIQA servlet, but the reason for the HTTP extension to exist in the 
> first place is that we may not know the location of the authoriative 
> servlet in advance-- yet the HTTP extension doesn't seem to offer any 
> way to discover it.

Sure you do, if the URI includes a web authority component, such as
an http: URI.

If you have http://example.com/foo and you execute a GET with
URI-Resolution-Mode: Description (per the URIQA model) and the
resulting description references http://anotherexample.com/bar
then you can execute another GET with URI-Resolution-Mode:
Description to get the description of the second resource, etc.
ad nauseum until you have all the information you need.

Note that the real core of URIQA is not the Servlet API that is
defined for the reference implementation, but the extensions to
the Web architecture that allow one to query a semantic web
enabled server based in the URI alone.

> As an example, consider this graph:
> 
>      <http://example.org>   ex:hasCreator   _:x .
> 
>      _:x   foaf:name       "Jane Tripleson" .
>      _:x   foaf:mailbox    <mailto:triples@example.org> .
> 
> This is straight-forward enough, but consider this graph:
> 
>      <http://example.org> ex:hasCreator
>      <urn:urn-5:gK0wObL42bRyFllUsU+8cPL5cQBi> .
> 
>      <urn:urn-5:gK0wObL42bRyFllUsU+8cPL5cQBi>
>          foaf:name   "Jane Tripleson" .
> 
>      <urn:urn-5:gK0wObL42bRyFllUsU+8cPL5cQBi>
>          foaf:mailbox   <mailto:triples@example.org> .
> 
> (A urn-5 is just a random number which is long enough to be 
> unique, see 
> http://www.iana.org/assignments/urn-informal/urn-5 .)
> 
> This graph is perfectly valid-- but using URIQA, as far as I 
> can see I 
> can only retrieve a triple that means exactly nothing to me, 
> without a 
> way to find out more.

Well, something like DDDS along with HTTP+URIQA would be needed
if you are working with URNs.

Alternately, and what I and others recommend, you would use PURLs.

The combination of PURLs with URIQA make for a powerful solution
that IMO nearly eliminates any need for the urn: naming scheme
(with the single exception of those who need/want URIs that are
entirely independent of any identifiable organization)

If you were to use, instead of urn:urn-5:gK0wObL42bRyFllUsU+8cPL5cQBi,
something akin to http://purl.example.com/urn-5/gK0wObL42bRyFllUsU+8cPL5cQBi
then you could simply do

GET http://purl.example.com/urn-5/gK0wObL42bRyFllUsU+8cPL5cQBi HTTP/1.1
URI-Resolution-Mode: Description

to get a description of the resource, which could also include
statements about which representations that URI would redirect to,
etc. in addition to all other authoritative knowledge about the
resource.

Services could be founded for the creation and maintanance of PURLs
and their associated metadata, which have (nearly) all of the desirable
qualities of URNs but exploit the web architecture, with semantic
web extensions such as URIQA, to the fullest.

> Ok, you could use
> 
>      GET urn:urn-5:gK0wObL42bRyFllUsU+8cPL5cQBi HTTP/1.1
>      Host: something
> 
> using Host: to get this through proxies, but this would seem 
> to violate 
> the HTTP spec--
> 
>      The Host field value MUST represent the naming authority
>      of the origin server or gateway given by the original URL.
> 
> Besides, this wouldn't help with URIs containing fragids.

Don't get me started about fragids ;-)

> IMHO URIQA should simply provide the URI of its servlet in its HTTP 
> responses. (If this were also included in the response to 
> OPTIONS, the 
> use of URI-Resolution-Mode could even be completely avoided, using 
> OPTIONS to find the servlet URI, then using the web service 
> interface; 
> it seems to me that if URIQA clients supported this, servers could be 
> more easily configured for URIQA-- just add one header to the OPTIONS 
> response and plug in a normal servlet or CGI.)

This would, though, result in SW agents having to make multiple calls
to the server for every single request for information.

SW agents should be able to surf the semantic web, from description
to description, just as easily as web agents can surf the traditional
web, from representation to representation.

This requires analogous behavior defined for semantic web enabled
servers by the architecture. The header URI-Resolution-Mode: provides
this in an elegant manner, by simply indicating to the server whether it
should resolve requests in terms of descriptions or representations.

Otherwise, interaction between agents and servers is roughly identical.

SW agents should not be made second-class citizens of the web by having
to make two system calls where other traditional agents only need make
one.

Cheers,

Patrick

--
Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Monday, 7 July 2003 02:39:43 UTC