URIs as identifiers again (was: Re: CURIEs: A proposal) from Paul Prescod on 2006-06-29 (www-tag@w3.org from June 2006)

From: Paul Prescod <paul@prescod.net>
Date: Wed, 28 Jun 2006 20:46:36 -0700
To: "Pat Hayes" <phayes@ihmc.us>
Cc: "Harry Halpin" <hhalpin@ibiblio.org>, www-tag@w3.org
Message-ID: <1cb725390606282046r4f696500md5990b7dceb1f0ed@mail.gmail.com>
On 6/28/06, Pat Hayes <phayes@ihmc.us> wrote:
>
> >Im sure it can often help, but a problem arises when someone insists
> >that there *must* be something there, because there are going to be
> >many cases where it is hard to impossible to provide anything useful,
> >so what will be provided will in fact not be useful, but providing it
> >will nevertheless absorb a lot of effort, the cost of which is a
> >brake on development and deployment.
> >
> >
> >This is the heart of the argument. What examples do you have?
>
> Take almost any URI reference in any OWL ontology, for example
>
> http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#madeFromGrape
>
> Now, what that *means* is a binary relation
> between wines and grape types. There is no way to
> put that meaning at the other end of a
> dereferencing process.


I don't think it will be possible to put the complete meaning of the
referent of an identifier online in my lifetime. What does the Constitution
of the United States Mean?

But you should put as much there as you can express in the time you have
available. And in this particular case, whoever created that document DID
put something that they know about it there. I can traverse the link you
gave me and see that the identifier is for a binary relation and infer some
other stuff from other things on that page.

So I don't understand how this is an example of something that it is "hard
or impossible to provide a useful description of." There is a URI. It is
used as a name. It is possible to dereference it in a web browser or program
and get useful information. The system is working exactly as it should. Do I
misunderstand how this is a counter-example or so you have a different
counter-example that would be better?

No, that is not the point. I know a lot about it
> (let us suppose) which is why Im writing an
> ontology. But the connection between this name
> and its ontology is not that the latter is at the
> other end of an HTTP GET starting with the
> former, it is that the ontology *contains* the
> name itself.


What's wrong with the example you cited (other than that it seems to
strengthen my argument)? The ontology contains the name (as you want) AND
the ontology is at the other end of a GET (as I want).

But, more important: the system is not closed. That ontology is not the sum
total of universal knowledge about grapes. People will make other documents
served by other URLs and sent around through emails and written on napkins.
If I discover one of these and see the string #madeFromGrape I'd like to get
whatever information I can about it. Google may help, but it might also not
have indexed the key document that I would want to read: the one that
describes what the person who minted the identifier #madeFromGrape thought
it was about.

When I find one of those might it not be helpful for me to copy a URI out of
a document, paste it in my web browser address bar and learn something
(anything!) about it?

Similarly, might it not be useful if I load one of these documents into my
semantic web browser for it to say: "This document indicates that there
might be other useful information about #madeFromGrape at a particular URI.
I've checked that URI and it turns out that there is information there.
Would you like me to incorporate it into triple database?"

It is the surrounding text which
> embodies the meaning, not something at the other
> end of a Web dereferencing process. The name, in
> cases like this, gets it meaning from the way it
> is used inside what amounts to a large data
> structure, which is the RDF graph of which the
> OWL/XML text document is a handy rendering
> (representation, in the REST sense?). The Web is
> relevant to this only insofar as it allows these
> graphs/texts to be transmitted, combined and
> used, but it adds nothing to the way that the
> graphs/texts determine the *meaning* of the names
> which occur in them.


If you found a print-out of one of these documents by the side of the road,
how would you begin the process of reconstructing more meaning than was
available in that single document? Google would be one tool. Wouldn't you
use a browser's address bar as another?

Recall a few important facts about Google:

 * it is owned by third parties and not either the minter of the URI nor the
person trying to learn about it.

 * it is not designed to be reliable for any invariant definition of
reliable -- they constantly tweak the algorithm.

 * it is not meant to be used by computer processes like semantic web
browsers

To make the point more forcefully: Imagine an OWL
> ontology located at http://ex.place/foo.html
> which when you look at it you discover that all
> the names in it have the base URI
> http://ex:otherplace/baz. This might be slightly
> discourteous, but it is perfectly legal and would
> not cause any SWeb engines to miss a beat. In
> fact, most of them wouldn't even be aware of it.
> And as for human readers, if they are looking at
> the name, then they are already looking at the
> text which tells them as well as anything can
> tell them what the name means, viz. the text of
> the ontology itself.


Fine, so why not put a copy or a redirect at http://ex:otherplace/baz? When
someone emails me the document I will have lost the URI
http://ex.place/foo.html . When I want to see whether you've updated it and
provided more information I wil go to "otherplace"? I don't want to argue
that the system CANNOT work as you describe. It is just less convenient to
the recipient and therefore impolite.

You've demonstrated that the web does not depend upon what I might call
"link-oriented self-descriptiveness" (described further down). But you
haven't described that such a property is not valuable _when it is used_.

>  >It helpsto make the Web be "self-describing", although the notion of
> >>"self-describing" is something I think is another notion that could
> >>really use some inspection.
> >
> >I'd sure like know what it means, myself :-)Can you elaborate?
> >
> >
> >Self describing means that a reader can start by
> >looking at some data and follow links backwards
> >to the specifications that define the intended
> >meaning of the data.
>
> Yes, I thought that was perhaps what it was
> supposed to mean. Tim BL explained this idea to
> me a few years ago. I don't buy it. First, its
> just not true, and the Web seems to work just
> fine whether its true or not, which suggests it
> is more dogma than theory.


No, the Web has always used SGML-based markup and the term "self-describing"
(as used in this context) comes from the markup world (AFAIK). The meaning
seems very intuitive to me. You probably have never heard of the  document
type I am about to reference but you can figure out something about it
pretty easily.

<locality xmlns="urn:oasis:names:tc:ciq:xsdschema:*xNAL*:2.0"/>

That URN is not dereferencable in any software I know of, but any document
embedding it is self-descriptive insofar as the creator of the vocabulary
put in information specifically designed to help you find out what the
element means.

It would be a lot better if there were some straightforward way to get from
the namespace to the human description and machine-readable schemas for the
vocabulary, but the URN is better than nothing (which is typically what you
got in binary file formats). It should only take you an hour or so of poking
around the OASIS site to figure out what this locality element is all about.

Second, are we talking
> here about human readers or software?


Both.

The SWeb is
> supposed to be usable by software agents which
> are not usually capable of reading a W3C spec
> document and wouldn't be able to do anything with
> it even if they could. In fact, most human
> readers are in the same position most of the time.


"Most" human readers of raw RDF data are in that position? If we put aside
those that click on the wrong link and end up somewhere confusing, I would
posit that most readers of RDF would appreciate at least the opportunity of
reading a specification of what they are looking at. Or at least a web page
that would point them towards the spec and other tutorials. (raises an
interesting point...maybe specs should point to pages that index tutorials
so that people lost in a spec can back out and find something more helpful
for them)

>With raw XML, the tags are "links" to English word meanings
>
> XML tags are linked to English word meanings???
> Where are these word meanings, and how does one
> link to them? Do they have URIs?


No. Self-describing does not depend on the Web. I'm pretty sure the concept
predates the Web. The web enhances self-descriptiveness.

>which are much more helpful than bit patterns.
> >With (for example) HTTP-identified namespaces
> >you have actual links to resources that might
> >describe the meanings of the words in a human or
> >machine-processable language.
>
> Might, yes. In fact do, only rarely. And as I
> say, it doesn't seem to matter a tinkers toss
> whether they do or not.


I disagree. Going back to the URN example above. I specifically asked the
creator of that URN to create a resource that would allow me to efficiently
research the meaning of the  document. I did that just last week because I
was having a tremendous amount of trouble finding the meaning just using the
usual tools. As you've probably already noticed, Google gives you very
little helpful information. I wasted many hours clicking links, searching
specs, unzipping files. Now the first step would be for a decent resource to
exist AT ALL. Then the next logical thing would be for the resource to exist
at an HTTP URI pointed to from the referring document. (in the end I was
pointed to comments in a schema in a zip file, which is not my idea of a
very discoverable resource...)

>In short, a self-describing message or document
> >points from the message towards the spec whereas
> >most messages or documents require you to find
> >the message or document using some out-of-band
> >mechanism. "This file starts with the characters
> >MZ. I wonder what file type this is?"
>
> I find the best way to find out is usually to try
> Google. So, is this a Web architectural principle
> at work? Is Googling a kind of link following?


Yes. Google is an often inefficient kind of link-following tool appropriate
to human beings. Googling for MZ or PK will find you nothing useful because
the file formats that start with those characters are not even
self-describing in the pre-web standard of "providing enough information for
you to research it easily." I would say that there is kind of a hierarchy of
politeness when it comes to self-description:

1. Make it easy to Google or research the message's syntax and meaning

2. Make it possible to learn more about the message's syntax and meaning by
following URLs (that's what the Web is for, after all)

3. Make it possible for machines to learn something useful for processing
the message (whether it be a schema, or a stylesheet or anything else that
can be fairly safely downloaded and evaluated).

One can easily define "self-descriptive" in a way that makes neither XML nor
any other language in the universe self-descriptive:

http://72.14.207.104/search?q=cache:ZwMXp9SCsvgJ:www.oceaninformatics.biz/publications/e2.pdf+self-describing&hl=en&ct=clnk&cd=1&client=firefox-a


But why would you want to define a phrase into uselessness?

XML is self-descriptive in a clearly defined sense. The syntax is designed
to make it possible to learn more about a document or message's syntax or
meaning both through research (1) AND through following URIs (namespace
URIs) (2). For some reason the frequent attempts at (3) are continually
stymied.

 Paul Prescod
Received on Thursday, 29 June 2006 03:46:48 UTC