RE: working around the identity crisis from Patrick.Stickler@nokia.com on 2004-11-23 (www-rdf-interest@w3.org from November 2004)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 23 Nov 2004 11:03:12 +0200
To: <dirkx@webweaving.org>
Cc: <A.J.Miles@rl.ac.uk>, <www-rdf-interest@w3.org>, <public-esw-thes@w3.org>
Message-ID: <1E4A0AC134884349A21955574A90A7A50ADDBB@trebe051.ntc.nokia.com>
> -----Original Message-----
> From: ext Dirk-Willem van Gulik [mailto:dirkx@webweaving.org]
> Sent: 23 November, 2004 00:48
> To: Stickler Patrick (Nokia-TP-MSW/Tampere)
> Cc: A.J.Miles@rl.ac.uk; www-rdf-interest@w3.org; 
> public-esw-thes@w3.org
> Subject: RE: working around the identity crisis
> 
> 
> 
> 
> On Fri, 19 Nov 2004 Patrick.Stickler@nokia.com wrote:
> 
> > If you're going to restrict each term to being
> > defined by a single document, where that document
> > only describes one term, then why not just
> > have a URI for the document and URI for the term
> > and use conneg to relate them. E.g.
> >
> > http://my.org/knowlegebase/chemistry/water      the concept
> > http://my.org/knowlegebase/chemistry/water.html HTML representation
> > http://my.org/knowlegebase/chemistry/water.rdf  RDF representation
> > http://my.org/knowlegebase/chemistry/water.txt  text representation
> > http://my.org/knowlegebase/chemistry/water.jpg  JPG representation
> 
> Of course if you are -this- http inclined (and you must have 
> a protocol
> like ftp or http which at the very least have some assumptions on the
> hierarchy - which may not be warrented, say, for a z39.50 
> URI) then note
> that you can also use:
> 
> 	http://my.org/knowlegebase/chemistry/water
> 	Accept: text/plain
> ...
> and/or use a multiple choise response as a server.

Sure. I was merely trying to reflect a set of URIs, not
how one might obtain the same representation via different
HTTP requests.

Certainly you can ask for particular variant representations
of e.g. the concept resource via the URI identifying the concept
resource -- but you still need distinct URIs for each
representation if you want to talk about the representation
distinctly from the concept resource.

Thus, both of the requests

   GET /knowlegebase/chemistry/water
   Host: my.org
   Accept: text/plain

and

   GET /knowlegebase/chemistry/water.txt
   Host: my.org

would presumably return the same representation (in the
latter case, since the request is for the actual representation,
the representation returned in the response would constitute a 
bit-equal copy of the representation resource).

The response for the first request, specifying the concept and
utilizing conneg to indicate the variant representation preferred, 
would presumably include the header

   Content-Location: http://my.org/knowlegebase/chemistry/water.txt

which tells you which representation of the concept
http://my.org/knowlegebase/chemistry/water you have been
provided in the response.

(it's unfortunate, albeit historically understandable, that the header 
which identifies the representation returned is called 'Content-Location:'
rather than e.g. 'Representation-Identifier:' or similar, but I digress...)

Thus, there is no ambiguity, as all resources are provided distinct URIs,
and clients are free to refer to resources at whatever resolution they like. 

E.g.

http://my.org/knowlegebase   [1]                  the entire knowledgebase [2][3]
http://my.org/knowlegebase.html                   HTML rep. of knowledgebase
http://my.org/knowlegebase.rdf                    RDF rep. of knowledgebase
...
http://my.org/knowlegebase/chemistry              the concept 'chemistry'
http://my.org/knowlegebase/chemistry.html         HTML rep. of concept 'chemistry'
...
http://my.org/knowlegebase/chemistry/water        the concept 'water'
http://my.org/knowlegebase/chemistry/water.html   HTML rep of concept 'water'
http://my.org/knowlegebase/chemistry/water.rdf    RDF rep of concept 'water'
http://my.org/knowlegebase/chemistry/water.txt    text rep of concept 'water'
http://my.org/knowlegebase/chemistry/water.jpg    JPG rep of concept 'water'
...

etc.

[1] Note that the above example set of URIs need not correspond to an
actual filesystem, rather all requests beginning with the URI prefix
http://my.org/knowledgebase could be handled by a specialized, automated
web service portal which maps request URIs to representations based on an
RDF savvy database, and autogenerates all representations on the fly
(that's how I'd do it).

[2] Note that the representations of the knowledgebase and higher level
concepts could be huge, if representations are "exhaustive" in conveying the
state/substance/definition of the resource. Though, it may also be the case 
that the representation at each 'level' in the conceptual hiearchy would
merely refer to its immediate super- and subconcepts, and thus be fairly
concise -- the HTML representation providing links, the RDF representation 
simply referring by URI, etc. and clients being able to traverse the
knowledgebase via subsequent requests about related/referenced resources.

[3] Note that it is useful to make a distinct between a "traditional" 
representation, which likely would convey an information resource
in its entirety, versus a description, which will provide more concise
information about the resource itself, regardless of the type of resource
and for information resources, regardless of the substance of that
resource.

Thus, the response provided to the request

   GET /knowlegebase
   Host: my.org

is likely to be much larger than for

   MGET /knowlegebase
   Host: my.org

And also, the description provided in response to the latter
request may very well include alot of information which is not
included in the representation provided in response to the 
first request, as the purpose/utility of the two different
responses are sufficiently functionally distinct.

--

Whew!

That said, it's also, I think, important to point out that the following
naming schemes would be just as architecturally valid, and in practice 
would provide just as workable a solution (apart from human users probably
feeling more comfortable with mnemmonic URIs -- though I think alot of
users simply won't care one way or another):

http://my.org/0           the entire knowledgebase
http://my.org/0/b         HTML rep. of knowledgebase
http://my.org/0/d         RDF rep. of knowledgebase
http://my.org/0/13        the concept 'chemistry'
http://my.org/0/13/b      HTML rep. of concept 'chemistry'
http://my.org/0/13/43     the concept 'water'
http://my.org/0/13/43/b   HTML rep of concept 'water'
http://my.org/0/13/43/d   RDF rep of concept 'water'
http://my.org/0/13/43/a   text rep of concept 'water'
http://my.org/0/13/43/w   JPEG rep of concept 'water'

or 

http://my.org/cf756bb7-cec1-48ca-b82c-5cee11e23987 the entire knowledgebase 
http://my.org/44a98ff0-8017-4eaf-a091-43640dd340b0 HTML rep. of knowledgebase
http://my.org/a8ab91ff-2d62-4112-8443-b6fcf219d553 RDF rep. of knowledgebase
http://my.org/dae7c7a2-8aea-4014-b191-52c2d2fd934f the concept 'chemistry'
http://my.org/c3644aae-1751-43c6-bfad-010b40b7d3ce HTML rep. of concept 'chemistry'
http://my.org/5120e80e-d1cb-4865-9b9b-6f1016444add the concept 'water'
http://my.org/a25c1e45-6507-41fa-98f7-4da7aa54b3af HTML rep of concept 'water'
http://my.org/07aee8cf-a176-45aa-b574-d66a84b1703a RDF rep of concept 'water'
http://my.org/a9221908-0df0-49b6-a164-e38fe53fe383 text rep of concept 'water'
http://my.org/78fa28b3-aab7-4551-b9b0-99e28fa87ecf JPEG rep of concept 'water'


I.e. it is *entirely* up to the publisher to decide which URIs will be
used to identify which resources, and whether or not relationships between
those resources should be reflected lexically in the URIs. 

The key is that there is no ambiguity/overloading of URIs, and when known,
the distinct URI of each representation should be indicated in the response.

Web clients should not presume anything about the identity of the resources
or their relationships based on the lexical nature of the URIs, and even
if the client is aware of the proprietary URI management methodologies used
to construct URIs, it is unwise and IMO poor engineering to base client
behavior on the lexical structure of those URIs. Any client that does so
will be fragile and inherently non-portable/scalable to other applications.

A web client that is properly designed should be able to interact with
concept representations and descriptions per all of the above examples 
irregardless of what proprietary naming methodology is employed by the
publisher.

Cheers,

Patrick
Received on Tuesday, 23 November 2004 09:04:10 UTC