RE: Subclass of Thing/Resource from Jeff Sussna on 2000-03-04 (www-rdf-interest@w3.org from March 2000)

From: Jeff Sussna <jeff.sussna@quokka.com>
Date: Fri, 3 Mar 2000 16:18:44 -0800
To: "'Dan Brickley'" <Daniel.Brickley@bristol.ac.uk>, Guha <guha@epinions-inc.com>
Cc: www-rdf-interest <www-rdf-interest@w3.org>
Message-ID: <E19A882C6CD5D211A8A70008C75B6AF40122D027@pcmail.quokka.com>
It's interesting that, while there are two major forms of URI, URN and URL,
only URL has been strongly used so far. It is designed to tell you where
something is rather than what it is. I think this reflects the overal focus
on physicality rather semantics on the current web. As we are seeing, the
semantic web will need to deal with what things are. It remains to be seen
how we implement reliable protocols for whatness. It's one thing to use ISBN
to identify books. ISBN refers to some external, canonical, centrally
managed identification mechanism. Do we refer to people by their Social
Security numbers? That obviously doesn't work for anyone who's not an
American? Or by a GUID constructed from the GPS location, date, and time at
which they were born?

A whole other approach, that might be more feasible, that comes out of my
previous comments about reliable vs. canonical identification, is to build
up an identification. For example, "the person who was born at such and such
a time in such and such a place, was CEO of such and such a corporation at a
given point in time, etc." RDF could certainly help here. 

By the way, this latter approach correlates with a particular concern I have
in the privacy arena. Even if your identitity isn't known, if it's known
that a single individual engaged in some set of activity on the net, at some
point it becomes possible to "triangulate" your identity from that activity.

Jeff

-----Original Message-----
From: Dan Brickley [mailto:Daniel.Brickley@bristol.ac.uk]
Sent: Friday, March 03, 2000 4:02 PM
To: Guha
Cc: www-rdf-interest
Subject: Re: Subclass of Thing/Resource


On Fri, 3 Mar 2000, Guha wrote:

> Tim,
> 
>  I think many of these questions center around
> precisely defining what an RDF Resource Identifier
> is supposed to be.
> 
>    I agree that we need to distinguish between RDF
> Resource identifiers and URIs. A URI is a pretty formal object
> (protocol + host + opaque string) whose definition pretty
> concretely  constrains what can have a URI. By
> this definition, people, places, etc. cannot have URIs.

Sorry Guha, you're quite definitively wrong on this last claim. I agree
that we need more clarifications in this area, but the URI spec (as
referenced 
from RDF Model and Syntax) is very clear on this point:


From http://www.isi.edu/in-notes/rfc2396.txt

[begin excerpt]

Network Working Group                                     T. Berners-Lee
Request for Comments: 2396                                       MIT/LCS
Updates: 1808, 1738                                          R. Fielding
Category: Standards Track                                    U.C. Irvine
                                                             L. Masinter
                                                       Xerox Corporation
                                                             August 1998
           Uniform Resource Identifiers (URI): Generic Syntax

[...]

	Abstract   
	   A Uniform Resource Identifier (URI) is a compact string of
characters
	   for identifying an abstract or physical resource.  This document
	   defines the generic syntax of URI, including both absolute and
	   relative forms, and guidelines for their use; it revises and
replaces
	   the generic definitions in RFC 1738 and RFC 1808.

[...]
      Resource         
 	 A resource can be anything that has identity.  Familiar
         examples include an electronic document, an image, a service
         (e.g., "today's weather report for Los Angeles"), and a
         collection of other resources.  Not all resources are network
         "retrievable"; e.g., human beings, corporations, and bound
         books in a library can also be considered resources.

	 The resource is the conceptual mapping to an entity or set of
         entities, not necessarily the entity which corresponds to that
         mapping at any particular instance in time.  Thus, a resource
         can remain constant even when its content---the entities to
         which it currently corresponds---changes over time, provided
         that the conceptual mapping is not changed in the process.

[end excerpt]

> 
>  On the other hand, it would be very convenient to have
> a unique canonical identifier for refering to the one TimBL
> or one RalphSwick. In my reading, this is what the RDF
> Resource ID is. Everything (including literals, URIs, ...) could
> potentially have one of these.

Maybe, though I don't see any scenario whereby we'll end up with unique
canonical identifiers for persons. Social/political/privacy issues
aside, it's just too hard to do. That said, mailboxes, national
insurance numbers etc allow us to say things like 'the person whose
util:personalMailbox is mailto:guha@epinions...', uniquely picking out a
flesh and blood person without (a) giving them a URI, (b) making a
category mistake and conflating them with their mailbox URI.

> 
>   I do think it would be nice if an application can assume
> some kind of structure to these identifiers, but not being
> able to do so would not be fatal.
> 
>  I agree with you that http://foo.org/bar.rdf#xyz is a lousy
> identifier for an object. To me, it just represents a position
> is a file.

'#' is a downright broken bit of web architecture. The '#' fragment/view
semantics are defined as being relative to the mime type of the
object. Since mime types can be content-negotiated, that's hairy since
a single URI plus '#' doesn't mean much without additional assumptions
about mime types.

For example,

http://www.w3.org/Icons/WWW/w3c_main
has both GIF and PNG mime-typed variants. So the semantics of 
http://www.w3.org/Icons/WWW/w3c_main#foo can't be considered outside the
context of some HTTP transaction, since the mime type of the resource
isn't an instrinsic property of the resource identified. 

details:

@mail:/users/pldab> telnet www.w3.org 80
	Trying...
	Connected to www.w3.org.
	Escape character is '^]'.
	HEAD /Icons/WWW/w3c_main HTTP/1.1
	Host: mybox.ilrt.bris.ac.uk
	Accept: application/x-fictional-mimetype

	HTTP/1.0 406 Not Acceptable
	Date: Fri, 03 Mar 2000 23:58:48 GMT
	Server: Apache/1.3.6 (Unix) PHP/3.0.11
	Alternates: {"w3c_main.png" 0.7 {type image/png} {length 5904}},
	{"w3c_main.gif"
	 0.5 {type image/gif} {length 5684}}
	Vary: negotiate, accept
	TCN: list
	Content-Type: text/html
                        
Again, the URI spec has words to say on this (unfortunately...):

		4.1. Fragment Identifier
	   When a URI reference is used to perform a retrieval action on the
	   identified resource, the optional fragment identifier, separated
from	
	   the URI by a crosshatch ("#") character, consists of additional
	   reference information to be interpreted by the user agent after
the
	   retrieval action has been successfully completed.  As such, it is
not
	   part of a URI, but is often used in conjunction with a URI.
	      fragment      = *uric
	   The semantics of a fragment identifier is a property of the data
	   resulting from a retrieval action, regardless of the type of URI
used
	   in the reference.  Therefore, the format and interpretation of
	   fragment identifiers is dependent on the media type [RFC2046] of
the
	   retrieval result.  

>  In the long run, the object identifier namespace will have
> to be like the DNS namespace.
> 
>  Reactions?

We could do with a URI activity to fix some of this...

>  guha

dan

> 
> 
> 
>
Received on Friday, 3 March 2000 19:12:57 UTC