Re: [URI vs. URIViews] draft-frags-borden-00.txt from Pat Hayes on 2002-02-25 (www-rdf-comments@w3.org from January to March 2002)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Sun, 24 Feb 2002 21:44:30 -0600
To: "Jonathan Borden" <jonathan@openhealth.org>
Cc: <www-rdf-comments@w3.org>
Message-Id: <p0510144eb89f4c68012c@[65.212.118.219]>
>With apologies to Brian: either Pat or I are deeply confused about some
>fundamental issues central to RDF. What this means is likely a problem in
>specification that desperately needs to be clarified.
>
>In any case my responses to this round include specific points that I would
>like clarified by the RDF WG:

OK, Im CCing this as before. Sorry, Brian.

>
>Pat Hayes wrote:
>>  >
>>  >Careful, RDF uses frags in two ways:
>>  >
>>  >1) as you say
>>  >2) any subject,predicate or object of any statement may be identified by
>a
>>  >URI reference.
>>
>>  May BE a uriref, actually; but OK.
>
>In the current RDF REC, Section 5 says "sub is a resource ...", indicates to
>me that the _subject_ is a _resource_ not a URIreferece, hence my specific
>language.
>
>Does the current MT say that a "subject is a URIref" ? If so this seems to
>be a significant change rather than a clarification.

I believe the MT has always said this. Certainly that is my 
understanding of the basic graph syntax: triples consist of a 
subject, a property and an object, all of which can be urirefs. We 
are talking about the actual graph syntax here, right? Not what it 
denotes. So in this sense of 'subject' the subject of a sentence is a 
word, not what the word names.

(Now, of course, urirefs are themselves resources, since everything 
is a resource....)

>
>>
>>  >Such URI references may have a fragment id.
>>
>>  Sure, but what that *means* is not specified. It could well be
>>  meaningless. RDF syntax allows arbitrary urirefs to occur - it
>>  provides no constraints forbidding any URI combinations as illegal or
>>  ill-formed -  but RDF provides no semantic guarantees that any such
>>  usage is meaningful. In particular, the one you provide seems
>>  nonsensical to me:
>
>Precisely my point. Nowhere in any RDF specification have I read anything to
>suggest that a URI reference has any _meaning_ other than what can be
>determined by the RDF statements made about the referenced resource. That is
>to say, there is nothing to suggest that one can determine any meaning from
>the syntactic structure of the URI ref. The example that I provide is
>supposed to be "nonsensical" _only_ if you presume to interpret what the URI
>ref 'means' based on its syntax. I am suggesting that RDF treat URI
>references as opaque identifiers, and that it ought not be possible to
>derive meaning by parsing the structure of the URI ref.
>
>To the WG: does RDF mean to say otherwise?

Good question. I will respond for myself, not in the name of the WG.

Answer: Yes and no.

Yes, as far as RDF semantics is concerned, urirefs are opaque 
identifiers, and their internal structure is of no consequence as far 
as their referential semantics is concerned. All that matters to the 
MT is identity of the uriref, so that two urirefs in two distinct 
documents can be compared for syntactic equality. RDF assumes only 
that they are the same name, and have the same denotation wherever 
they occur.

However, that identity test means that RDF needs to be able to 
discover coincidence between a uriref used in one document, 
consisting of a an absolute URL plus a fragId, and the uriref 
consisting of that fragId used in the RDF document which is 
retrievable by conventional web transfer protocols using the absolute 
URL. So to the extent that RDF inference depends on this ability to 
cross-identify urirefs in various documents, the answer is No.

Notice that this is not a contradiction, but it is an equivocation 
upon 'meaning'. As far as RDF *meaning* is concerned, urirefs are 
opaque. But as far as what might be called the RDF global *syntax* is 
concerned, they are not opaque. RDF (and all web ontology languages) 
depend on a global agreement about the ability to recognize identity 
of *symbols* across documents, and that in turn - although simply 
considered a 'primitive' feature of the syntax and hence of the model 
theory - depends on the internal structure of urirefs being treated 
in a certain coherent way.

For example, If A contains

<http://example.org/Unicorn#Bottock> rdf:type foo:Bar .

and the document at the URL  <http://example.org/Unicorn>  contains

<Bottock> rdf:type Bra .

then I would want A to be able to infer that 
http://example.org/Unicorn#Bra and foo:Bar had a nonempty 
intersection.

And although this is not specified formally, I would expect to be 
able to use the absolute URL as a likely place to locate RDF 
assertions which use the uriref. However, the rest of the WG might 
shoot me down on that.

>
>>
>>  >e.g.
>>  >
>  > ><http://example.org/Unicorn#Bottock> rdf:type foo:Bar
>>  ><http://example.org/Unicorn> rdf:type foo:Unicorn
>>  >
>>  >does not imply any relationship between foo:Bar and foo:Unicorn
>>
>>  Agreed; precisely my point. BUt the reason why it does not, is that
>>  there is no implied relationship between those two urirefs, either,
>>  other than that the *very use* of the first one implicitly assumes
>>  that the absolute URI is a URL of a document which contains some RDF
>>  using the fragID 'Buttock' as a name.
>
>According to the current RDF rec this is not true, there is no assumption
>that a URIref used by an RDF application 'point to' anything in an RDF
>document, explicitly:
>
>[[
>Resources
>
>All things being described by RDF expressions are called resources. A
>resource may be an entire Web page; such as the HTML document
>"http://www.w3.org/Overview.html" for example. A resource may be a part of a
>Web page; e.g. a specific HTML or XML element within the document source. A
>resource may also be a whole collection of pages; e.g. an entire Web site. A
>resource may also be an object that is not directly accessible via the Web;
>e.g. a printed book. Resources are always named by URIs plus optional anchor
>ids (see [URI]). Anything can have a URI; the extensibility of URIs allows
>the introduction of identifiers for any entity imaginable.
>]]
>
>Note in particular: "A resource might be part of a Web page e.g. a specific
>HTML or XML element ..." This seems to indicate that a URIref _when used by
>RDF_ is NOT intended to point to ONLY RDF documents.

We have to distinguish here between two senses of 'point to'. The 
quoted passage is talking about the sense 'mean' or 'refer to' (AKA 
'denote'), which is the RDF semantic notion of naming. I was 
referring to the notion of 'point to' meaning 'indicate the source of 
(the name)'

>Are URIrefs used in RDF statements assumed to point to locations in RDF
>documents? If so this is a big change.

The convention that I have been talking about is implicit in every 
use of RDF in every document on the web. Why else would one include 
things like this in RDF headers?

<RDF
   xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:s="http://www.w3.org/2000/01/rdf-schema#">

Those URL's don't *denote* anything in RDF, but it is sure important 
not to type them wrong.

>
>>  If there is no such document,
>>  or no such use of that fragId, then RDF has no way to make sense of
>>  the first triple, and would probably generate a 409 error.
>
>This confuses me. Does an RDF application need to follow each URIref. What
>about non "http" URI schemes, e.g. "urn"s. Are non resolvable URI refs
>illegal in RDF?

No, sorry if I gave that impression. But those that are resolvable 
are often used in a way that presupposes that they are resolved 
'properly'.

>
>>  >
>>  >The URI reference that identifies the subject of the first statement has
>a
>>  >fragment identifier.
>>  >
>>  >>  If http://example.org/Unicorn
>>  >>  really means a unicorn, then it should never have a fragId attached
>>  >>  to it in RDF.
>>  >
>>  >Really! This is exactly Aaron's argument.
>>
>>  ?? It is? Then I REALLY have not understood what Aaron is saying.
>>
>>  >A unicorn is an example of what
>>  >some people call an "abstract resource".
>>
>>  A unicorn is, sure. But the URI is a name, not what is named. Nobody
>>  is talking about adding a fragId to a unicorn, right?
>
>Right. Hmm. Perhaps you are using the term "mean" in a technical sense and I
>am using it in an English sense. The URIref http://example.org/Unicorn
>doesn't 'mean' Unicorn, but the URIref may be used to name the concept
>"Unicorn". When dereferencing the URI a document entity of type text/plain
>may be returned reading: "Unicorns are mythical creatures ..."

If that ever happens on the semantic web, it ought to generate an 
error. Plain text is meaningless to software.

>
>>  >
>>  >No this is the whole point. If one RDF treats URI references as opaque
>>  >identifiers, then one can make any statement about any URI reference.
>>
>>  What does 'can' mean? RDF syntax does not forbid it, sure. However,
>>  it does make some implicit assumptions about how to interpret it,
>>  which are really part of the syntax of RDF, though implicitly so:
>>  they are incorporated into the very notion of 'merging' two RDF
>>  graphs. Those assumptions were sketched above.
>
>Well I guess what is important is that such assumptions may not be
>reasonable. Because my reading of the current RDF REC says that I can make
>statements about parts of XML or HTML documents. I interpret this to mean
>that the URIref http://example.org/Unicorn#LeftButtock either may not
>resolve at all, else may resolve to a piece of HTML

I agree this is an ambiguity which we have not resolved or even 
discussed properly (since Ive been on the WG, maybe they did 
earlier.) Of course it MAY resolve to a piece of HTML, and indeed 
that would not make it unusable in RDF as a name; but it would not 
automatically make it into the RDF name of that piece of HTML. We 
could adopt this as a convention, I guess, but then we would have 
serious problems with use/mention ambiguities.

[Later. It occurs to me that there is a quick-and-dirty way around 
the use/mention problem that might actually be just what we need. A 
URL-plus-fragID uriref is assumed to *denote* the relevant part of 
the document (where the fragID is interpreted according to the mime 
type), except when that part of the document consists of RDF, in 
which case it is interpreted as being the same identifier as that 
identified by the fragId in the document. In other words, RDF *uses* 
all the RDF it can find, but it treats all other fragIds as *names 
of* parts of documents. The only thing this can't do is refer to RDF 
in RDF, but that's what we have reification for, right? Highly 
unofficial proposal, needless to say.]

>
><div id="LeftButtock">
>     <p>This is a description of the Left Buttock of the mythical Unicorn
></div>
>
>(note use of non-well formed i.e. SGML based HTML)
>
>Now of course one
>>  might want to say something in RDF about a document with a URL, and
>>  it allows one to do that. But that use of an absolute URI as an RDF
>>  name is a very special use.
>
>Why is that a special case? Where does it say that? I assert it is not a
>special case.

Its special because in all the examples Ive seen, such use has been 
taken to mean that the *document* is the thing named by the URI.  I 
agree this is not stated anywhere, but it seems to be universally 
understood.

>
>>
>>  >This
>>  >is the whole argument. Should RDF treat URI references as opaque or not?
>>  >Should all URIs that use the "http" scheme identify _documents_ or might
>not
>>  >the URI http://example.org/Unicorn identify a Unicorn..
>>
>>  I would say that if someone wants to try to use it in that way, then
>>  nothing should prevent them from doing so, but they should be ready
>>  to take the consequences of doing something that makes such fragile
>>  semantic sense. Probably what they write will have ludicrous
>>  consequences.
>
>I dearly hope that RDF is not designed to make such usage ludicrous,
>otherwise we may have huge problems for RDF's usability. At the very least
>this would be a large architectural hole.

Well, as I understand it, it would amount to saying that a unicorn 
had an http URL. (After all, that URI *is* a URL, right?) And that is 
ludicrous, right?

>
>>
>>  >For example,does your model theory contain anything pertaining to the
>>  >syntactic substructure of a URI reference? scheme, authority,
>heirarchical
>>  >part, fragment id? I don't see it.
>>
>>  No, it does not, because the WG consciously decided to avoid going
>>  into that territory. It would have been fun to try it, but it was
>>  outside our charter. But an adequate semantics for a web language
>  > should address such issues, eventually.
>
>Well that is the issue. I will argue strongly that OWL be able to make
>statements about parts of arbirtary XML and HTML documents.

I agree that would be great. Also parts of images, sound files, parts 
of all kinds of things.

But hold on a second. You want it to be able to REFER TO parts of 
documents. OK, fine: but what I was talking about earlier was a 
global convention that allows RDF/DAML/OWL to USE names which are 
USED in other OWL documents. I wasn't talking about *reference to* 
the documents at all, which is another issue altogether. As far as I 
know, RDF has no official means for referring to documents (though 
absolute URLs are often interpreted that way) let alone parts of 
documents . We seem to have a use/mention disconnect here.

BTW, I would predict that most of OWL isn't going to be ABOUT 
documents, but its all going to be WRITTEN IN documents.

>
>>
>>  >  But the referring
>>  >>  thing here is the whole uriref, not the absolute URI. That doesn't
>>  >>  refer to anything but the document. The relationship between
>>  >  > http://example.org/Unicorn and http://example.org/Unicorn#LeftButtock
>>  >>  is not one of resource to subresource;
>>  >
>>  >Read the internet draft carefully. There is no _relationship_ defined
>>  >between _resource_ and _subresource_. A document does contain fragments.
>One
>>  >might consider a sub resource to be contained by  a resource but one can
>>  >make entirely independent assertions about a resource and any of the
>>  >subresources that it supposedly contains.
>>
>>  Ive read this several times and it still seems incoherent to me, I
>>  think because it applies 'sub' to 'resource' rather than 'network
>>  entity'.
>
>Suppose I change the term "subresource" to "node", does that make more
>sense?

Maybe. It was the 'sub' that was puzzling me.

>
>>  >What is returned is not a resource
>>  >but, _by definition_, a network entity.
>>
>>  Why is a network entity not a resource? Surely *anything* can be a
>resource.
>
>True, but the network entity returned by an HTTP GET on a URI _is not the
>same as the resource identified by the URI_.
>
>This needs to be totally clear.

Agreed in principle, though in many cases they might well be the 
same. Certainly that would seem to be a useful and harmless 
convention: how else is one supposed to refer to a web document, 
other than by using its URL? I agree this isn't formally stated 
anywhere in the RDF specs, but its often assumed, eg in the 'Ora 
said' examples in the original M&S.

BUt now you have me puzzled, by the way. You seem to be *wanting* to 
use urirefs to identify parts of web documents, yet you are insistent 
that they do not refer to them. (Or is your point that RDF doesnt 
provide a way to re

Just as a general point, RDF is a very 'weak' language in a strict 
logical sense, but it can be used in the context of what might be 
called extra-logical assumptions which if mutually understood by all 
users of the RDF, can impose a much more precise 'meaning'. The use 
of fragIds to refer to parts of documents might be one such 
convention, and datatyping conventions are another.

>A URIref which _identifies_ a network resource would use the "data" scheme:
>e.g.
>
>data:text/plain,A "Unicorn" is a mythical creature ...

I fail to follow this. How does plain text identify, say, my CV, or 
the front page of the NYT for 13 October 1989?

>  >
>>  >So yes the _document fragment_ obtained by _resolving_
>>  >http://www.w3.org/1999/02/22-rdf-syntax-ns#Class is a piece of XML. And
>the
>>  >_document fragment_ is indeed contained in the document (entity).
>>
>>  We seem to agree.
>
>finally ...
>
>...So in your example, the document fragment obtained
>>  by resolving http://example.org/Unicorn#LeftButtock had better be a
>>  piece of XML (well, RDF in any case).
>
>again, no it could be (non XML) HTML for example.

I meant, if it is not a piece of RDF, then an RDF inference engine 
might get very confused trying to figure out where the identifiers 
are in it. There is certainly no official RDF assumption that the 
intermediate hash is in any way concerned with *referring to* a part 
of a document.

>
>In other words,
>>  http://example.org/Unicorn had better be the URL of a document.  The
>>  RDF semantics might *interpret* it as anything at all, but that's
>>  completely irrelevant to its role in making connections across the
>>  semantic web; and it is only the latter role that is relevant to how
>>  fragIds are treated by an RDF engine.
>
>This is exactly why "rdf:type" is a special kind of property, because the
>resource that an rdf:type points to really does need to be RDF (perhaps),

?? I fail to follow this. As far as I can see, rdf:type is on a par 
with the rest of the RDF vocabulary and is not particularly special.

>but otherwise, an RDF 'engine' whatever that may be, generally won't even
>try to dereference a URI so this should be a non-issue. Correct?

Well, a DAML or OWL engine certainly will, since URIs are used to 
import one ontology into another. Even in RDF, engines like CWM and 
Euler often assume that some absolute URIs identify pieces of 
well-formed RDF, and act on that assumption, though this is not 
'official'.

>
>>
>>  >It is very common to conflate a resource and the entity that represents
>it
>>  >at any point in time. But whether you agree or not, this is how the
>language
>>  >is defined. It is not possible to understand anything about "REST" until
>>  >this distinction is undetstood at least from a terminological point of
>view.
>>
>>  I think we are in violent agreement here.
>>
>
>Yes and perhaps this is why RDF needs to very precisely define what a
>"Resource" is,

I think we do. Anything and everything is a resource. "Resource" 
simply means "entity", ie anything that the human mind can imagine or 
give a name to, and maybe some other things as well.

>to the point, perhaps, of stating that there is (?) no
>relationship between the RFC 2396 resource identified by a URI, and the RDF
>resource identified by the URI. RDF can then define what it means by a
>fragment identifier etc.
>
>The thorny issue, however, gets back to the fact that RDF needs to be able
>to make assertions about Web pages and parts of Web pages e.g. arbitrary XML
>and HTML documents. So try as you like you probably are stuck with RFC 2396
>resources,

??? But that explicitly says that resources are NOT just things like 
web pages, but include off-web entities like books and people.

Pat


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Sunday, 24 February 2002 22:44:43 UTC