RE: Valid representations, canonical representations, and what the SW needs from the Web... from Patrick.Stickler@nokia.com on 2003-02-04 (www-tag@w3.org from February 2003)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 4 Feb 2003 13:02:02 +0200
To: <paul@prescod.net>, <sandro@w3.org>, <www-tag@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBB03@trebe006.europe.nokia.com>
> > Google or a few well-run directory services would provide
> > documentation links, and could actually lead to better updated &
> > maintained documentation after the term-coiner has lost 
> interest (and
> > his domain name :-)
> 
> Sure, we could depend on registry services if we want the 
> Semantic Web 
> to be a centralized rather than decentralized system. Personally, I 
> think that that would be to misunderstand the very strength 
> of the Web.

I agree with Paul here.

I think that what is needed, as I suggested before, is to extend
the Web architecture to explicitly differentiate between access
of representations of resources from access of descriptions of
resources, as these are very different things. Specifically, to
add to HTTP something akin to the following:


New Methods:

MGET     Retrieve RDF/XML instance containing all information 
         known about resource

MPUT     Add all statements expressed in the input RDF/XML instance 
         to resource metadata

MDELETE  Delete all statements expressed in the input RDF/XML instance 
         from resource metadata

New Response Codes:

600  Unknown resource
601  No information available about the known resource
602  RDF/XML instance containing all information known about resource


Knowledge about a resource is not a representation of that
resource (though this is very hard to defend given the vague
and informal definition of what a representation actually is
and can be in terms of the inherent qualities of the resource)
so one must be able to access this knowledge by some means
other than HTTP GET/PUT/POST, all of which deal specifically
with representations.

We can agree that an http: URI denotes a resource, any resource,
whether or not it has any web-accessible representations. 

The HTTP/REST interpretation of that http: URI per GET/PUT/POST 
is to interact with and manipulate one or more representations 
of the resource. 

The (proposed) HTTP/SW interpretation of that URI per the new
MGET/MPUT/MPOST extensions is to interact with and manipulate
knowledge about the resource. 

In neither case does the denotation of the http: URI differ or 
become ambiguous. It always denotes the resource. We are 
simply defining a means by which HTTP can allow access of
either representations or knowledge, in terms of that resource.

Being able to differentiate between representations of versus
knowledge about a given resource resolves the ambiguity which
impacts the SW application of http: URIs per HTTP/REST as it is
now defined.

Now, some applications may wish to explicitly identify each
representation, or the body of knowledge about a resource, but
again, that does not affect the denotation of the resource
itself by the http: URI in question. In fact, obtaining first
the knowledge about the resource may very well help the
application determine *which* representation it prefers and
enable it to ask for that representation by name per standard
content negotiation mechanisms. It could also tell it which,
if any, representation can be considered canonical, in the
case of digital resources for which bit-equal copies can be
obtained.

Thus:
                     a resource
                          ^
                          |
                      (denotes)
                          |
                          |
                http://example.com/foo
                    |            |
                    |            |
                    |    (Web Interpretation)
                    |            |
            (SW Interpretation)  |
                    |            |
                    |            v
                    |     [Representation+]?
                    v
               [RDF Graph]? 
                 

And either the SW interpretation or Web interpretation would be
optional -- as there might exist representations without metadata
and metadata without representations.

A 404 response to GET simply means that no representations are 
available.

A 601 response to MGET would mean that no knowledge is available.

The above solution also clarifies the role of fragment IDs per
either Web or SW interpretations. The Web interpretation
of a fragment ID is denoting a subcomponent of the resource denoted by
the URI and in terms of Web access, as an internal addressing mechanism 
specific to the MIME encoding of a particular representation. The SW
interpretation of a fragment ID is denoting a subcomponent of the
resource denoted by the URI, the nature of that subcomponent and
its actual relation to the superordinate resource described in
the metadata associated with and accessible in terms of that URIref.

Thus, if http://example.com/myBook denotes a book (a work) and
http://example.com/myBook#chapter1 denotes a logical subcomponent
of that book (a chapter). Then the Web interpretation of GET for
http://example.com/myBook#chapter1 is to obtain a representation
per http://example.com/myBook and the requesting agent focus
on the internal component of the representation identified by
#chapter1, per the MIME type of the representation. The SW
interpretation of MGET for http://example.com/myBook#chapter1
is to return an RDF/XML instance containing all statements
where http://example.com/myBook#chapter1 is the subject.

If we are dealing with resources which will never concievably
have any representation or constitute an addressable component
in the representation of some superordinate resource, then
there is no need to use a URIref, and all will still work fine.
Thus, whether to use fragment IDs and create URIrefs is left
up to the content owner (of both representation and knowledge)
and the actual extended HTTP architecture functions the same
either way, since in the case of GET, HTTP doesn't actually
concern itself with the fragment ID and in the case of MGET,
all URIs (including URIrefs) are opaque.

Thus any http: URI can denote any resource, and one can interact with
either representations of that resource or knowledge about that
resource, or both, and the machinery for interacting with 
representations and knowledge remains non-centralized, distributed,
and scalable, just as the present Web is.

And current crawlers could be converted to be SW crawlers simply
by changing GET to MGET, and harvesting the returned knowledge
which would be precise rather than having to analyze representations
and guess.

Caching machinery would possibly need to be extended/optimized to
deal with partial changes to metadata knowledge, but the overall
set of proven web caching methods should apply (or alternately
just don't cache any metadata).

> ... You can't know whether 
> "http://www.prescod.net" refers to 
> "Paul's homepage", "The Prescod Family Homepage", "Paul's business", "A 
> set of links endorsed by Paul" etc. unless I tell you. 

Exactly. Not with the present HTTP architecture, at least. 

Yet the above solution would provide you a means to tell us. Just
define that knowledge for the resource on the same server that
provides access to the representations of that resource, and agents
can then inquire about the resource and know exactly what it is
and what the representations portray, and probably which representation,
if any, is optimal for that agent's needs, if it even needs a
representation after getting all the knowledge about the resource
(maybe all the agent wants is knowledge, not representations).

>... we can argue about each and every one to figure out WHAT 
> concept it is about. Is [1]  a web page, a corporation, a kind of car, a 
> family of cars, a family of car companies??? Only the owner of the URI 
> can answer the question. They can only answer it in a 
> machine-processable way with a machine-processable syntax like RDF. So 
> why not put up an RDF document that answers the question? And heck, why 
> not use that RDFs URI to represent the "thing".

Precisely. But to do so in a scalable and intuitive manner, it requires
extending the functionality of HTTP to explicitly provide for the
needs of SW-relevant knowledge independent of representations.

It must be possible to define and provide that RDF without calling it
a representation, which it is not.

The above solution, by extending HTTP, allows existing Web
applications to remain unchanged, allows the current REST concepts
of resource and representation to remain unchanged, and yet allows
SW agents to obtain the knowledge they need about resources using
the very same URIs that denote those resources, without being 
confused by the Web-specific layer of representations.

The URI is the point of intersection between the Web and Semantic
Web. A URI denotes a resource (any arbitrary resource). The Web provides 
access to representations. The SW provides access to knowledge.
The extended HTTP design addresses both equally well, without
confusion or conflict, in a globally distributed, non-centralized,
scalable manner.

Problem solved. Back to the fun stuff...

Patrick

--
Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com
Received on Tuesday, 4 February 2003 06:02:08 UTC