Solutions to the Identification Problem, was Re: URIs / URLs

> As I mentioned in an earlier post, the question is, Do we need identifiers
> for non-retrievable items such as concepts, people, etc. or do we really
> only need to identify descriptions and other resources that are retrievable
> over the Internet?
> 
> Hell, I don't know. You tell me.

Yes, we need to identify non-retreivable items.  I see no real
question there.  To change your terms a little, we need to identify
things (lots and lots of things) which are not web pages.

There are several decent and workable ways to do it.  We can probably
use all of them.   Here's a partial list:

1.  Let web addresses (web page identifiers) be semantically
"overloaded" to identify both the page and some object or objects
which are strongly related to the page.  People can handle this
ambiguity just fine ("I work at w3.org" vs "...the last line on
w3.org") and I believe computers can too, if we're careful.  The exact
kind of "careful" you need is based, I think, on type inference.
Basically, avoid using an overloaded term with a predicate which has a
domain/range (pick one, depending how the predicate is using the term)
which includes more than one meaning of the term.  Or if you do,
you're in multiple-solutions territory, which is a whole other
ballgame.

2.  Use descriptions and anonymous/local terms, like (in n3)
      [ foaf:homepage <http://www.w3.org> ] 
which turns into some local (anonymous) term such as #anon102 and the
RDF sentence (words in subject-predicate-object order)
      #anon102 http://xmlns.com/foaf/0.1/#homepage http://www.w3.org

This kind of mechanism can be used with Type 1 identifiers (above) to
get you into un-overloaded territory pretty quickly:
      #w3-the-org type1:has_identifer "http://www.w3.org"
      #w3-the-org rdf:type irstype:non-profit-organization
(This is an unfortunate formulation in requiring the identifier to
appear as a literal instead of an identifier, but I'm having trouble
thinking of a clear way to do otherwise.  What is the relationship of
an object to each of the objects it might be?  It can be hard to get
out of multiple-solutions territory.)

This mechanism can also be used to move into your own namespace
(naming convention, naming contract) as you like, as long as you have
some bootstrapping, as I did with "type1:has_identifier".  I need some
way to name that predicate, but once I do, I can use literals to name
whatever I want.

This can also give multiple solutions, of course.  Another way you can
use it is
    #something [ rdfs:label "my predicate foo" ] #something_else
although you might want something more precisely defined than
rdfs:label for real use.  (I think SemEnglish uses this kind of
construct a lot; I'm not sure if there is working code using it.)

3.  We can, of course, use a non-resolvable URI scheme or make up a
new one (as I'm guilty of doing myself).  This makes the identifier
sort of free-floating, getting its meaning from context, and thus
meaning to a reader whatever it's been defined to mean in the
conjunction of all the messages the reader has so far read (and any
initial out-of-band knowledge, I guess).  But maybe this isn't so
different from all the other schemes -- depending on what useful
things people figure out to do with being able to fetch data from an
authoritative source.


Recognizing that it's rather controversial, I'll suggest that
  http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate
is really just a form of type-1 overloaded address where the
web-page denotation happens to be absent/erroneous at the moment
because the base URI doesn't happen to serve up any content with a
mime-type for which fragment identifiers have a defined meaning.
If it served up text/rdf defined the right way, the two denotations
could happen, at the moment, to be identical.

   -- sandro

Received on Thursday, 12 April 2001 07:43:04 UTC