Re: RDF query and Rules - my two cents from NMP-MSW/Tampere on 2003-11-20 (www-rdf-rules@w3.org from November 2003)

From: NMP-MSW/Tampere <patrick.stickler@nokia.com>
Date: Thu, 20 Nov 2003 10:01:10 +0200
To: "ext Danny Ayers" <danny666@virgilio.it>
Cc: "Graham Klyne" <GK@ninebynine.org>, "Jim Hendler" <hendler@cs.umd.edu>, "Dan Brickley" <danbri@w3.org>, <www-rdf-rules@w3.org>, <www-rdf-interest@w3.org>
Message-Id: <B41A34F6-1B2F-11D8-8364-000A95EAFCEA@nokia.com>
On Wednesday, Nov 19, 2003, at 17:40 Europe/Helsinki, ext Danny Ayers 
wrote:

>
>> Well, having a deployment platform that is sensitive to efficiency
>> issues (i.e. mobile phones), I'd just as soon not leave such issues
>> for "later work".
>
> Fair enough, but wouldn't this route lead straight to a binary 
> serialization
> of RDF?

Binary serialization does not address the volume issue.

Yes, compression of some sort can be used, but if you have a model
with 10 million triples, but only need 34 of them, why would you
GET the whole darn thing?! Regardless of whether it was compressed,
etc....

>> Use the existing web architecture, definitely. Trying to extend/bend
>> the semantics of the existing verbs, no.
>>
>> If you're not familiar with URIQA, have a look at
>> http://sw.nokia.com/uriqa/URIQA.html
>> to see how I've been approaching this problem.
>
> If you can GET why is there a need for MGET?

> The introduction of new HTTP methods doesn't strike me as being 
> consistent
> with extending the present web architecture, rather it seems like 
> creating a
> new, albeit similar architecture.

Well, firstly, do you consider WebDAV as a web based application,
or something else.

Adding new verbs should be done with great fear and trepidation, but
nevertheless is an acceptable means of extending the web architecture.

It is similar to adding new URI schemes. One should be very hesitant
to do so, but when the case is justified, it's perfectly acceptable.

See my comments on why this is necessary in
http://lists.w3.org/Archives/Public/www-rdf-interest/2003Nov/0146.html

> Personally I think it's important to
> deploy Semantic Web systems without having to modify the servers - in 
> fact
> it's probably critical for adoption.

Well, show me the code...  Demonstrate how that can be done...

IMO, for the SW to reach critical mass, we have to (1) provide a simple,
effortless way to get descriptions of resources having only a URI, and
(2) get away from GETing explicit RDF/XML instances (files) rather
than querying knowledge bases.

The present situation leaves far to much implementational and 
organizational
detail not only exposed, but knowledge of those details are required to 
get
anything done. We need a standardized layer of abstraction away from 
files,
models, databases, etc. that is tuned for SW operations, and IMO, the 
present
Web architecture doesn't quite do the job.

It is my hope that standardized protocols for RDF Query, both for 
bootstrapping
discovery based on a sole URI as well as for general query, 
irrespective of
internal organization, will fill this much needed gap and provide what 
is
needed for the SW to reach critical mass.

>> I'm not entirely sure what point you're trying to make here. Yes, it's
>> true that a KB could be a single triple, or could be virtual --
>> corresponding
>> to a distributed query space across multiple physical databases. But I
>> don't see how that has to be relevant to the query language or 
>> protocol.
>> So I agree, it's an implementational issue. I.e. *which* KB or set of
>> KBs (however implemented) that a given query service employs in order 
>> to
>> respond to queries should not be relevant to the core standard. 
>> Clients
>> should not *have* to know which KB the service should use. I.e. the KB
>> (virtual or physical) is exposed as a service. And one service may 
>> very
>> well use *other* query services as components.
>
> The point I was trying to make was that if http methods are used
> intelligently, then there is no need for the access to a KB 
> necessarily to
> be that coarse-grained, even if you limit yourself to http GET.
>
> For example, consider my blog as a KB. In that KB there is information 
> about
> recent posts, and information about me.
>
> Ok, I understand the need for fine granularity, and the 
> resource-centric
> approach you suggest makes sense, so you might want to make queries 
> like:
>
> GET http://dannyayers.com/q?muri="http://dannyayers.com/"

> for information about the posts and
>
> GET http://dannyayers.com/q?muri="http://dannyayers.com/misc/foaf/> 
> foaf.rdf"
>
>

Here's the bootstrapping problem. You have "http://dannyayers.com/"
and you want to find out some information about the resource denoted
by that URI. But how? From where? Using which parameters? Which
server? Which service? etc. etc. And even when you figure all that out,
what if the publisher of that information later wants to change the
server, or the service, etc. etc. Then you have to start all over again.

To allow the SW to function and scale as efficiently as the Web, there
needs to be the same degree of transparency in the requests for 
descriptions
as there is in the requests for representations. For the web, all you
need to do to get a representation is use GET with a URI that is 
meaningful
to the HTTP protocol. For the SW, all you need to do to get a 
description
is to use MGET with a URI that is meaningful to the HTTP protocol.

And any changes/reorganizations/etc. by the authoritative server 
responding
to that MGET request will have zero impact on any SW agents.

Much better, IMO, to simply be able to ask

MGET http://dannyayers.com/ HTTP/1.1
MGET http://dannyayers.com/misc/foaf/ HTTP/1.1

and not have to know that you have some service http://dannyayers.com/q
that needs some parameter 'muri=', etc. insofar as the retrieval of
a basic, authoritative description is concerned.

> for stuff about me. As it is, the data is served using
>
> GET http://dannyayers.com/index.rdf
> and
> GET http://dannyayers.com/misc/foaf/foaf.rdf
>
> The statements aren't partitioned quite as cleanly as they could be, 
> but
> this could easily be rectified.
> The use of the two URIs gives finer-grained access to my KB than just 
> using
> a single URI. The way in which this is implemented behind the scenes is
> irrelevant, but conveniently a simple implementation is already 
> supported by
> the standard http server ;-)
>

Well, transparancy of implementation is always good, and usually 
necessary,
but your implementation is *not* completely transparent, since you have 
to
know the service name and its parameters to be able to get descriptions
of resources.

Now, to be fair, URIQA also defines such a service, for which you must 
know
the service name and the URIQA defined parameters, but that is a minor,
secondary, side feature of the model and not it's heart -- which is the
ability to interact with authoritative descriptions based solely on a 
URI.

If a generalized query protocol were standardized, which included the
ability to request concise bounded descriptions of particular resources,
then the need for URIQA to specify this service interface would be
eliminated -- but not the need for the fundamental bootstrapping 
protocol
based on MGET, etc.

> There is of course a built-in limitation that the subject resources 
> being
> examined must be within this domain, but as far as I can see this 
> limitation
> is exactly the same with GET as MGET.
>

I'm not sure what you mean by "within this domain". Do you mean, the
resource URI must have as its web authority component the server to
which the request is directed? That is true for MGET, but not for GET,
if that URI is being provided as a parameter value.

>> Each service then is a portal to a particular body of knowledge, and
>> whether other portals to subsets of that knowledge are provided by
>> other services is irrelevant to clients using *that* service.
>>
>> Explicit specification of KB/model/database/etc. should only be via 
>> the
>> URI denoting the query service to which the client interacts. That
>> allows
>> for maximal opacity regarding implementation and minimal impact to
>> clients
>> when things change.
>
> Ok, but I would favour:
>
> http://example.org/ask-me-about-ham
> http://example.org/ask-me-about-eggs
> http://example.org/single-statement-concerning-poptarts
>
> over
>
> http://example.org/ask-me-about-breakfast
>

Well, like many cases in a free-market economy, those which provide
services that most closely correspond to the desires/needs of the
majority will be more likely to succeed -- and everyone is free to
choose whichever services most closely correspond to their preferences,
etc.

The point is that rather than having two URIs, one for a service and
one for a model, you have *one* URI, which is a standardized interface
to a given model (virtual or actual), and when one is doing service
discovery based on descriptions of such services, the URI used to
describe that service is the same URI used to access that service,
rather than having to define (and find) a relation between some model
and the actual service where it is hosted.

Again, as I've said earlier in these discussions, a standard should
absolutely minimize the amount of knowledge a given agent must possess
to accomplish a given task.

>> Not so much separately, but sequentially. I.e., the WG would keep in
>> mind
>> the push functionality during the first standardization round, to 
>> ensure
>> that both push and pull share an integrated conceptual core, but alot 
>> of
>> the details can be deferred to a second round.
>
> Fair enough.
>
>>> But what I'm suggesting doesn't actually conflict with the
>>> requirements you
>>> propose, in fact passing of RDF/XML (and perhaps other syntaxes) over
>>> HTTP
>>> is exactly what I had in mind, along with "Existing web standards
>>> should be
>>> employed as much as possible;". My main concern is that "overloading
>>> of the
>>> semantics of existing web protocols should be avoided" be
>>> misinterpreted as
>>> a need for alternatives to GET, POST etc, or perhaps worse still that
>>> everything be opaquely encoded into POSTs.
>>>
>>
>> Well, as you'll see from URIQA, I believe that there *is* a need for
>> alternatives to GET, PUT and DELETE -- insofar as a bootstrapping SW
>> protocol is concerned, as there are special issues/problems in 
>> ensuring
>> correct SW behavior based solely and exlusively on a URI alone (rather
>> than two URIs, one denoting a target resource and another denoting a
>> web service).
>>
>> However, GET, PUT, and DELETE *are* used and should be used by SW
>> services (which is the case with URIQA) wherever possible, so I think
>> that for the most part, we are in agreement.
>
> I think we are largely in agreement, although I'm not convinced on the 
> need
> for http extensions.

Here are two use cases which, I think, sufficiently show that additional
headers alone are insufficient:

1. What did I GET?

A SW agent submits the following request to a server, wishing to obtain
an authoritative description of a resource:

GET http://example.com/blargh HTTP/1.1
URI-Resolution-Mode: description

The agent gets back an RDF/XML instance. Does that RDF/XML instance
constitute the authoritative concise bounded description of the resource
in question, or simply some representation of the resource that by
coincidence happens to be an RDF/XML instance?

The result is indeterminable. If the server understood the SW header,
it is likely the description. If the server did not understand the SW
header, it is likely a representation.

Yes, one could submit an OPTIONS request and hope to resolve whether
the server understands the SW header, but that places undue overhead
on SW agents for *every* server they must interact with.

With MGET, it's clear whether the request is understood, and the 
semantics
of the request are clear and the behavior expected of the server clear.

2. Oops, there went the representation!

A SW agent wishes to submit some knowledge about a resource and sends
a request

PUT http://example.com/blargh HTTP/1.1
URI-Resolution-Mode: description

with an RDF/XML instance as the input stream. The server decides that
the submitting agent can be trusted to modify content (e.g. the user
is an employee of the company and the server is a shared server, but
the user doesn't realize the server is not SW enabled) and replaces
the primary representation for that resource with the submitted RDF/XML
instance. The next time someone sends a GET request, they get back
an RDF/XML instance rather than e.g. an Excel spreadsheet as expected.

What went wrong was that the server didn't understand the special
SW header, but the submitting user had the right to modify the server
content, so the results were not what was intended. Rather than adding
some metadata to the description of the resource, the entire primary
representation of the resource was replaced by the partial description.
And the server returned a status indicating success, and the user thinks
all is well, until folks start complaining they can't get the 
representation
they expect, etc.

With MPUT (and MDELETE), it's clear whether the request is understood, 
and
the semantics of the request are clear and the behavior expected of the 
server
clear.

--

In short, deploying the SW using the present web architecture is a
good thing, but that does not mean that the present Web architecture
as it stands is sufficient. Nor does adding distinct SW methods to
HTTP mean that one is *not* deploying the SW using the present web
architecture.

The SW is not, IMO, a layer above the web architecture, but a parallel
dimension that has substantial (though not perfect) intersection with
the web architecture. And this is because the needs and functionality
of the SW are distinct from the needs and functionality of the web in
several area, and those distinctions are reflected in the special
methods proposed and introduced by the URIQA model.

> As with the rest of the web "correct behaviour" cannot
> be guaranteed, so I can't see this as justification for something that 
> would
> require retooling the web.

No, no, no! Not *re*tooling the *web*. But tooling the SW for the first 
time!

Does the installation of a WebDAV server require you to re-tool the 
entire
web? No. Nor does the addition of support for SW methods require any 
change
to existing web server behavior.

> Things could perhaps be kept relatively tidy by
> taking advantage of the "application/rdf+xml" mimetype - in the 
> bootstrap
> period at least this is comparatively virgin territory.
>

As the example use case 1 above shows, this is not sufficient. Just 
because
you ask for, and get RDF/XML, does *not* garuntee that the server 
actually
understood that you wanted an authoritative concise bounded description
rather than just a representation in RDF/XML.

Extra headers simply *do not work*. I speak from long and painful 
experience.

The SW architecture should be more than just a bunch of web hacks.

Regards,

Patrick


> Cheers,
> Danny.
>
>
>
Received on Thursday, 20 November 2003 03:03:40 UTC