Re: URIQA from Patrick Stickler on 2003-07-02 (www-tag@w3.org from July 2003)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 2 Jul 2003 14:20:44 +0300
To: "ext Paul Prescod" <paul@prescod.net>
Cc: <www-tag@w3.org>
Message-ID: <003d01c3408b$fe5b6460$e00ea20a@NOE.Nokia.com>
----- Original Message -----
From: "ext Paul Prescod" <paul@prescod.net>
To: "Patrick Stickler" <patrick.stickler@nokia.com>
Cc: <www-tag@w3.org>
Sent: 02 July, 2003 02:30
Subject: Re: URIQA


> Patrick Stickler wrote:
> >
> > ...
> >
> > This works for GET, but not for PUT or DELETE.
> >
> > An early stage of URIQA development actually defined special MIME types
> > for concise bounded descriptions (in an attempt to try to accomplish
what
> > was needed without any extensions to the present Web architecture) --
but
> > ambiguit arises when performing e.g. a PUT because the behavior of the
>  > server differs depending on whether the input is a
>  > representation or description.
>
> I'm not sure that the definition of PUT is that clear. Let's say you
> have a resource representing a "white paper" with representations in
> XML, HTML and PDF. Does a PUT to PDF necessarily obliterate the XML? Or
> does it just replace the PDF rendition?

Ultimately, it's up to the server, to determine which representation, if
any, or all,
is replaced by a new representation being PUT onto the server. In that
regard,
you are correct that it is somewhat unclear what a given server might do
with
regards to multiple representations of a resource.

But that is beside the point (or perhaps precisely the point of URIQA ;-)

The key issue is that when PUTting knowledge, as opposed to a
representation,
one is adding to a single body of knowledge, not entirely replacing that
body of
knowledge with the input.

The fact that one might also interact with a description of a resource as a
kind
of representation, using traditional Web methods, is simply an added extra,
but not central to the fundamental SW behavior defined by URIQA.

So whether the server supports conneg or not, whether the server is able to
maintain multiple representations or not, if the server is dealing with
representations,
the server is *monolithically* replacing one representation for another.

It's the issue of monolithic replacement versus modification that matters
here.

> The safest thing is to have a separate URI for the PDF rendition and PUT
> that.

I agree. But again, I think you are missing the essential problem. I'll try
to
be clearer.

Let's say we use PUT to update entire descriptions of resources managed
individually as RDF/XML instances -- where the entire body of knowledge
known about that resource is contained in that RDF/XML instance.

Let's use as the URI of the resource http://example.com/someResource
and of the RDF/XML instance containing the concise bounded description
of that resource http://example.com/someResource.rdf

We also define a special MIME type application/rdf+xml+uriqa corresponding
to a URIQA Concise Bounded Description encoded in RDF/XML.

So, any of the following should allow us to store a new revision of the
complete
description contained in an RDF/XML instance.

PUT http://example.com/someResource.rdf HTTP/1.1

PUT http://example.com/someResource.rdf HTTP/1.1
Content-Type: application/rdf+xml

PUT http://example.com/someResource.rdf HTTP/1.1
Content-Type: application/rdf+xml+uriqa

and using conneg, with the necessary MIME type to suffix bindings,
the following also accomplish the same

PUT http://example.com/someResource HTTP/1.1
Content-Type: application/rdf+xml

PUT http://example.com/someResource HTTP/1.1
Content-Type: application/rdf+xml+uriqa

Note that either application/rdf+xml and application/rdf+xml+uriqa is valid
since the input content conforms to both MIME types, the latter being
a specialization of the former, just as many XML encodings with distinct
MIME types are specializations of text/xml and all use the suffix '.xml'.

Since conneg can be used in conjunction with the more general URI denoting
the resource in question rather than its representation, the MIME type
cannot serve as a flag to indicate the shift in behavior between dealing
with representations versus descriptions. Or at best, the conneg model
has to be changed to give special meaning to certain MIME types so
that it doesn't get in the way of the SW behavior (hardly a good idea).

Now, we also have some knowledge about the RDF/XML instance itself
such as the owner, title, creation date, status, etc.

How do we indicate PUTing knowledge about the representation, if that
knowledge is also encoded as the same MIME type as the representation?

I.e. does


PUT http://example.com/someResource.rdf HTTP/1.1
Content-Type: application/rdf+xml+uriqa

mean to completely replace the presently stored representation with a
new version or to update the body of knowledge about the representation
with the statements in the input? The server can't know. It's completely
ambiguous. What if our input only adds a single statement to the total
description of the RDF/XML instance? We'd end up loosing all the other
knowledge about that instance!

Now, you might say, just use yet another URI to denote the description
about the representation which is a description about the resource. E.g.
http://example.com/someResource.rdf.rdf

A major problem with that (and there are several others) is it still
precludes
partial modification of any given
description and forces an application to first (a) lock the body of
knowledge, (b) check out the full body of knowledge about a resource,
(c) modify the body of knowledge accordingly, (d) commit the new
complete body of knowledge to the server, and (e) unlock the body
of knowledge.

While this might work for some trivial scenarios, there are many where it
does not work, particularly when access to the complete body of
knowledge is multileveled, where not all users have access to all
knowledge but still must be able to add/modify/delete that portion
of knowledge that they do have access rights to.

Not to mention that it is typical and advisable practice to capture
knowledge about multiple resources in the same RDF/XML instance,
yet no'one wants to GET an RDF/XML instance describing thousands
of resources just to get a description of a single resource.

Only by allowing for both resource-specific as well as partial
access/modifications to managed knowledge of resources can anything
even closely resembling a global world wide
semantic web of knowledge interchange succeed.

Monolithic, file based views of knowledge storage and interaction
simply cannot meet the scalability and efficiency needs of the SW,
which is why we need a solution such as URIQA which allows one
to interact with (frequently partial) bodies of knowledge about
specific resources rather than merely files.

Yours (and other's) suggestions that SW agents interact with knowledge
in terms of monolithic files is just as unworkable as suggesting that folks
interact
with RDMS data in terms of complete databases, much less even complete
tables. It just won't work.

With regards to the nature and interaction of content, the fundamental
character of the Web and SW are very different, even if we can get
them to share a common infrastructure and set of resource identifiers,
and extensions to the current Web architecture are necessary in order
to capture and exploit these (complementary) differences effectively.

> As far as DELETE, I don't understand why you would DELETE (as opposed to
> clear) the description of an object. Either the resource exists or it
> does not. If nothing is known about it then it should have an empty
> description. But the description doesn't have to be deleted.

I may want to delete a single particular statement about that resource, but
not want to delete the entire body of knowledge known about that
resource stored in the repository.

It is similar (though not identical) to deleting a particular element of an
XML
instance without deleting the entire XML instance. That comparison is of
course grossly imperfect because XML was not designed to allow for such
micromanagement of internal content -- though it can be and is done, but
usually "cheating" with relational databases ;-) and not with files. XML
is all about static structures, and while one certainly can manipulate
subcomponents of
an XML instance, one does not easily manage XML encoded knowledge
on an element by element basis, such that PUT and GET are operating on
individual elements of an XML instance. One must deal with entire XML
instances, or at best, fragments.

For RDF, on the other hand, its XML encoding is just a means for interchange
and is a means to an end, that end being an RDF graph via which one can
interact with individual statements or sets of statements about resources in
a highly
effective manner, including PUTting, GETting, and DELETEing subsets of that
graph
irrespective of any RDF/XML serializations (files) that might be used to
otherwise
interchange, archive, or modify that knowledge.

The SW needs to be able to operate in terms of bodies of knowledge, not
files, and SW server behavior must provide an efficient means of working
with bodies of knowledge rather than files.

And SW agents should be able to consistently interact with bodies of
knowledge irrespective of how that knowledge is maintained on a given
server. Those descriptions *might* be managed as monolithic RDF/XML
instances using GET and PUT to edit them. But they might also be managed
via a proprietary database interface specific to that server. The agent need
not have to worry about that. It shouldn't have to know how each server
stores its descriptions and have to be able to GET, modify, and PUT those
server-specific representations in order to interact with resource
descriptions.

>  >...
> > To do this, the best solution that I've been able to come up with which
> > requires the least modification to the existing Web architecture is a
single
> > header (serving as a flag) which allows us to differentiate between
dealing
> > with representations from dealing with descriptions.
>
> This is actually quite a big change to web architecture.

Err. It seems to follow the most politically correct and recommended method
of extending the present web architecture, specifically by using headers. In
fact, the only reason it uses headers is because of all the heat and
resistence
I got to my proposals based on a new set of methods. So much for trying
to be politically correct...

Still, the extensions needed for SW behavior are far more
important and far reaching than most application-specific extensions, and to
that end, should be accommodated in ways that would not normally be
encouraged for all extensions.

> It will confuse
> vast amounts of technology like caches.

Eh? Do caches discard request headers? If so, then you may have a point.

In fact, if caching discards the header distinguishing between a request
relating to a
representation from a request relating to a description, then I would
consider
the header approach presently taken by URIQA to be unworkable. Note that
that does *not* mean URIQA is unworkable. The URIQA model is not just
the addition of a new header, but a model for SW enabled server behavior
and the "flag" by which the server is triggered to process a request in
terms
of descriptions rather than representations is a minor point to the overall
model.

My preferred solution has long been to use three new methods, MGET, MPUT,
and
MDELETE, rather than the header approach where the 'M' prefix serves the
same
role as the header, acting as a flag to indicate the SW behavior of the
operation.

I.e. the following would be semantically equivalent:

MGET         =  GET + URI-Resolution-Mode: Description
MPUT         =   PUT + URI-Resolution-Mode: Description
MDELETE  =   DELETE + URI-Resolution-Mode: Description

The key benefit to these new methods over the header approach is that a
client has more
reliable feedback whether the server is or is not SW enabled. E.g. if a
server doesn't
understand the MPUT method, it barfs. Whereas if it doesn't understand the
URI-Resolution-Mode: Description header sent along with PUT, it might
completely replace a representation with the description input rather than
update the
description of the resource. And even though the URIQA spec requires a SW
enabled server to return a header indicating that it understood the request
in SW terms, realization of the erroneous action of a non-SW enabled server
only comes after the act, and there still remains the question of whether
the
server actually is SW enabled, did the right thing, but simply failed to
include
the header in the response indicating that all is well.

The header approach is not an example of optimal engineering design, and
IMO seems more like a hack than a proper extension of the Web architecture.
But it does work.

If certain folks weren't so thoroughly opposed to new methods, I would have
adopted the M* methods for URIQA. The header approach is an entirely
political
compromise.

I've come to terms with the header approach based on the view that, anyone
that
has the right to PUT to and DELETE from a server is known to the server and
typically bound to certain usage constraints and also will know the server
and
whether it is or is not SW enabled, so in practice it should not be a huge
problem,
just an inconvenience and an ugly aspect of the header based solution.

Do caches also discard the request method? If not, then IMO that would
constitute
a deciding argument in favor of the use of the new methods rather than
the header approach.

If caches discard both headers and the request method, then clearly caching
will
be a major obstacle to tight integration of Web and SW behavior which will
have to be addressed. And if that is the case, then I think that storing the
method
of the request would be a much smaller and elegant fix than having to store
all (or even worse, specific) headers.

> > ...
> > I agree, and if you look at the behavior of the reference implementation
of
> > URIQA (and this is also noted in the URIQA spec) all concise bounded
> > descriptions are distinct resources in their own right, and are given
> > distinct URIs.
>
> Then that's where you should PUT.

No. This still does not allow for interaction with subsets of knowledge.

A description/representation unique URI also doesn't work for GET, by
itself.
Because a client won't know what URI denotes a particular description
until it is returned.

If you don't know what the URI of the description is, how can you GET it.

If all I have is a URI http://example.com/someResource that denotes, er,
some resource,
and I want a description of it, how do I find out what the URI is that
denotes the
description of that resource? And should I need to?

A key assertion at the heart of the URIQA model is the following:

     A resource is denoted by a URI, and that URI should be all that a
client needs
     to obtain either a representation or a description of that resource, in
a single
     server request.

A client should not have to first execute a HEAD or GET to obtain the URI
of a description, in order to then do a subsequent GET to obtain that
description.

This imposes a two-step process for the most fundamental operation on the SW
and makes SW agents second class citizens of the Web. The Web/SW
architecture,
if it is to share a common core/foundation, must provide for one-step access
to
resource descriptions.

--

My own view of how the Web and SW architectures interrelate is as follows:

The traditional Web can be seen as a set of representations, interrelated by
resource references.

The SW can be seen as a set of descriptions, interrelated by resource
references.

Their intersection is the set of URIs which denote resources, for which
there
are both representations and descriptions.

HTTP+URIQA provides a standardized means to provide global access to both
representations and descriptions using a common infrastructure based solely
on URIs
based on a URI scheme that is meaningful to HTTP servers.

To that end, the complimentary methods GET/MGET, PUT/MPUT, and
DELETE/MDELETE more clearly and elegantly reflect how the same protocol
can be used to navigate the Web and SW respectively. The header approach
tends to hide this fundamental distinction.

> > This also allows one to use PUT/DELETE to interact with those distinct
> > representations, including the use of conneg, in a traditional fashion.
>
> So you've solved the problem without the need for those headers!

No. I haven't. As the above explainations hopefully now make clear.
PUT/DELETE
on their own do not provide a means to interchange/manage bodies of
knowledge about
a resource which may be subsets of the complete body of knowledge about that
resource maintained by a given server. We need something extra to do that.

Patrick

patrick.stickler@nokia.com
Received on Wednesday, 2 July 2003 07:21:11 UTC