Re: URI: Name or Network Location? from Benjamin Nowack on 2004-02-19 (www-rdf-interest@w3.org from February 2004)

From: Benjamin Nowack <bnowack@appmosphere.com>
Date: Thu, 19 Feb 2004 17:30:18 +0100
To: Hamish Harvey <david.harvey@bristol.ac.uk>
Cc: www-rdf-interest@w3.org
Message-ID: <PM-EH.20040219173018.A4D13.3.1D@192.168.27.2>
On 19.02.2004 09:37:09, Hamish Harvey wrote:
>
>On Wednesday 18 Feb 2004 8:26 pm, Benjamin Nowack wrote:
>
>> there are lots of approaches (all with their own advantages and
>> disadvantages) that try to allow you to distinguish between the described
>> resource, the rdf representation of the description and maybe an additional
>> human-friendly representation, but at the moment they are still all
>> proprietary..
>
>Have any pointers to these, even just googleable names?
>
>What do you mean by "proprietary" here? That they aren't "standard", be it de 
>facto or de jure? Or that they,
>
>   "protected by trademark or patent or copyright; made or
>           produced or distributed by one having exclusive rights;" (WordNet)?

ah, no. not in the biz sense, I meant the former case (no de facto/jure
standard). you may try searching this list's archive. maybe something along 

"ok no yes never uri uriqa agree disagree why description want not acceptable"

will bring up the related threads ;-)

the discussions always became kind of "political" ("don't convince. persuade!")
very quickly, which may be a sign that a)there is no single perfect solution to
this problem and/or b)people see a business case for their solution and try to
set a de-facto standard. ;-)

I'll try to give a short summary of what I can remember, but it will not even
be close to a good summary as we could probably fill an own mailing list with
this stuff. (there surely are some wiki pages at esw.w3.org but most of the
approaches habe been discussed on this list as well.)

(the following questions exclude the usecase when we have a URIref'd offline
object and would like to provide some kind of representation at that URI. one 
solution to this is [1]. it also excludes WoT issues ("who wrote this" etc.)
and assumes that there is no standard/agreed-on method yet. and I consider
rdf/xml only.)

here we go:
Q1:   given a URIref of a resource, how can we get a representation of that
      resource?
Q2:   given a URIref of a resource, how can we get a description of that
      resource?
Q3:   given a URIref of a resource, when we dereference that URI, how do we
      know if we get a description or a (non-formal) representation?
Q4:   after receiving a description, how do we know if it describes the
      dereferenced URI?
Q5:   given a URIref of a resource, when we dereference that URI, how can
      we be sure that we get a description only?
Q6:   given that representations have a dereferencable URIref on the web,
      how to we make a description of that resource available?
Q7:   the description of a resource is a representation on its own, how
      can we talk about descriptions (= descriptions of descriptions)

(here, I'm -maybe incorrectly- using "description" for a machine-oriented
doc and "representation" as a rather general word for the document you get 
when you dereference the URI. hope this is ok.)

---------------

re Q1 (getting some representation):
(at this stage we are not interested in what we actually get back.)
1) do a simple HTTP GET on the URI. follow http redirects etc.
2) utilize a cache, caching service, or whatever, if you like
no conflicting opinions on that I guess. 
classical Web approach.

---------------

re Q2 (trying to get a description):
(at this stage we still don't really look at the resulting data.)
1) URIQA approach:
   do an MGET on the URI 

2) client header approach
   do a GET and send an "Accept: application/rdf+xml" header or
   an special type of User-Agent header that the server can detect.

3) rdf autodiscovery approach
   do a GET, extract a <link rel="meta" type="application/rdf+xml" 
   href="" /> (or rel="alternate") tag in the resulting data and 
   dereference this/these URI/URIs.

4) remote registry approach
   query some (central) service that stores descriptions (SOAP/GET/...)

5) local service approach
   query a service hosted at the server of the URIref.

6) local "metadata file" approach
   GET an agreed-on file that carries a description (something similar to 
   robots.txt)

7) URI extension approach
   Add an agreed-on parameter (e.g. ?format=rdfxml) to the URI in question
   and do a GET on the resulting URI.

8) Embedded RDF approach(es)
   do a GET and extract embedded RDF from the resulting data

9) server header approach
   do a GET/HEAD on the URI and look for an agreed-on HTTP header in
   the resulting data, which points to the description(s). do a GET on
   this/these URI/URIs then.

10) offline approach
    buy the book "1000 essential SemWeb addresses" and look up the 
    description via the alphabetically sorted URI index..

---------

re Q3 (description or representation)

for any of the approaches above, we can (have to?) check the returned
data for (valid) RDF.

---------

re Q4 (wanted description)

there is some rdf:about= or rdf.ID="[dereferenced URI]" in the description.
for approach 1, 2, 3, 4, 5, 7, 9 a solution, which makes sure that only 
related rdf is served, _can_ be implemented (or is central part of the 
approach, e.g. 1, 4, 5, maybe 9 as well). for approach 1, 2, 4, 5, 7, one
_can_ implement a solution that doesn't need multiple requests, approach
1, 4, 5 could generally offer/standardize such a feature.

---------

re Q5 (request description only)

this _can_ be implemented with approaches 1, 2, 4, 5, 6, 7, 9.
for 3 and 8, we have to read at least a part of the representation (e.g.
the html head tag). not knowing which approach (if it does at all) a 
semantic site follows, we can never be sure that we get back rdf. http 
headers can help saving bandwidth (not found, not implemented, etc.)
approaches 1, 4, 5 (can) have an integrated save-bandwith feature.

some of the approaches (2, 3, 7, 8,) above get complicated when a URI
identifies binary resources (imgs, etc.). one solution to this could
be url rewriting.

---------

re Q6 (server perspective, descriptions of deref'able resources)

1, 4, 5, 6, (and 10!) work for any resource.
2, 7, 9 work for dynamically generated/rewritten representations
3 works for text documents
8 works for certain resources (xmp, exif etc.)

---------

re Q7 (descriptions of descriptions)
(assuming that we don't combine different approaches which
puts the problem just on another level)

I'm not sure, but I think 1 returns some sort of URI for
the MGET which can be used for a separate MGET. a similar method
could be used for 4 and 5.
7 is a little bit more complicated as we can't use
uri?format=rdfxml&format=rdfxml. an alternative is to use a
changing argument, e.g. 
uri?format=rdfxml describes uri
uri?format=rdfxml2 describes uri?format=rdfxml
uri?format=rdfxml3 describes uri?format=rdfxml2
...

hm, 9 could work with dynamically generated headers, too.

10: "reviews of '1000 essential SemWeb addresses'". perfect.

---------

there are lots of additional questions, limitations, or requirements
one might have, e.g.
- not wanting to use content-negotiation
- not wanting to replace apache (hm, maybe there will be a mod_uriqa?)
- having a hosted server with no root access
- not being able to use url rewriting
- wanting an authorative description
- static pages only
- (x)html should validate
- ...

but I think I already wrote too much. hope this helps someone (or me
a few months later) a little bit and does not lead to another endless
thread ;-)

/me needs a break now.

benjamin

--
Benjamin Nowack

Kruppstr. 82-100
45145 Essen, Germany


[1] http://esw.w3.org/topic/SlashRedirection
Received on Thursday, 19 February 2004 11:32:04 UTC