RE: [ALL] RDF/A Primer Version from Pat Hayes on 2006-01-26 (public-rdf-in-xhtml-tf@w3.org from January 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 26 Jan 2006 15:02:41 -0600
To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
Cc: "Booth, David \(HP Software - Boston\)" <dbooth@hp.com>, "Ben Adida" <ben@mit.edu>, "SWBPD list" <public-swbp-wg@w3.org>, "public-rdf-in-xhtml task force" <public-rdf-in-xhtml-tf@w3.org>
Message-Id: <p06230908bffd8cabd25c@[10.100.0.23]>
>Comment on
>
>http://www.w3.org/2001/sw/BestPractices/HTML/2006-01-24-rdfa-primer
>RDF/A Primer $Id: 2006-01-24-rdfa-primer.xml,v 1.7 2006/01/24 
>16:43:20 adida Exp $
>
>The example:
>
><html>
>     <head>
>         <title>Jo Lamda's Home Page</title>
>     </head>
>     <body>
>         <p>
>             Hello. This is <span property="foaf:name">Jo Lamda</span>'s
>             home page.
>             <h2>Work</h2>
>             If you want to contact me at work, you can
>             either <a rel="foaf:mbox" 
>href="mailto:jo.lambda@example.org">email
>                 me</a>, or call <span property="foaf:phone">+1 777 
>888 9999</span>.
>         </p>
>     </body>
></html>
>
>... gives the following triples (assuming that the above is the 
>content of Jo's home page):
>
><http://jo-lamda.blogspot.com/>
>   foaf:name "Jo Lambda";
>   foaf:mbox <mailto:jo.lambda@example.org>;
>   foaf:phone "+1 777 888 9999".
>
>What is the URI <http://jo-lamda.blogspot.com/> being used to 
>denote? Jo? Jo's home page? Both?

Not a meaningful question. The technical answer is that it denotes 
whatever it denotes in each possible interpretation of this RDF (plus 
any other RDF that happens to be in use at the time.) This might be 
Jo in some interpretations and Jo's homepage in others, and it might 
be something else entirely in yet others. Moreover, this "ambiguity" 
is not a problem to be solved. All that actually matters is that 
useful conclusions can be drawn from all this RDF.

>How is it possible for Terri's contact software to 'extract' the 
>'information':
>
>foaf:homepage = "http://jo-lamda.blogspot.com/"
>
>... where no such triple is given in the content, unless it 'knows' 
>to handle home pages containing RDF/A in a special way?

Well, it isn't possible, without knowing some more information. But 
why is this question relevant?

>
>If the example were to be:
>
><html>
>     <head>
>         <title>Jo Lamda's Home Page</title>
>     </head>
>     <body>
>         <p>
>             Hello. This is <span property="foaf:name">Jo Lamda</span>'s
>             <a rel="foaf:homepage" 
>href="http://jo-lamda.blogspot.com/">home page</a>.
>         </p>
>     </body>
></html>
>
>... it would give the additional triple:
>
><http://jo-lamda.blogspot.com/> foaf:homepage <http://jo-lamda.blogspot.com/>.
>
>How does Terri's contact software handle this? How should other 
>applications handle this?

If that software is handling RDF, it just draws conclusions which 
contain it. That is about all the 'handling' that is required to 
happen to URIs in semantic web inferential processing.

It might for example do the following. Since the domain of 
foaf:homepage is foaf:Person, it concludes that

http://jo-lamda.blogspot.com/ rdf:type foaf:Person .

Now we want to send a message to that person's emailbox, so we might 
pose this as a SPARQL query:

SELECT ?x WHERE { http://jo-lamda.blogspot.com/ foaf:mbox ?x }

and if Jo had also used the RDF/A from your first example above, this 
would succeed with the result
x/mailto:jo.lambda@example.org , and from here on out it should be 
plain sailing.

The point being that although you or I might find the claim 
(<websiteURI> rdf:type foaf:Person .) to be a bit peculiar, this 
doesn't bother the software. Its not even bothered by using the same 
URI to denote the website and also the owner of the website, just as 
long as those two 'roles' for the URI don't interfere with one 
another; and although I havnt checked all the details, I don't think 
they do anywhere in FOAF. This general multiple-use technique is 
common in theorem-provers, where it is often called 'punning'; it is 
a bit like the overloading of identifiers in typed programming 
languages (though simpler). Logics we are developing for the 
intelligence community and the Common Logic spec uses this technique 
widely throughout its formal semantics, so a single name can refer to 
a class, a property, a function, a proposition and an individual all 
at the same time, because the logical syntax keeps the various roles 
clearly separated.

BTW, this is what I meant by 'local context' in my other message: 
here, the FOAF domain and range information is enough to establish 
that some occurrences of http://jo-lamda.blogspot.com/ refer to the 
person, while others refer to a homepage; which is all that is needed 
in order to make the 'semantic' machinery work properly.

>As I understand it, if you want to use the URI 
><http://jo-lamda.blogspot.com/> to _indirectly identify_ Jo, you 
>have to do something like:
>
>_:aaa foaf:homepage <http://jo-lamda.blogspot.com/>;
>   _:aaa foaf:name "Jo Lambda".

The tag architecture document doesn't mention blank node 
constructions. This isn't indirect, anyway: all the references here 
are direct.

But look, compare this with the 'clashing' usage:

<http://jo-lamda.blogspot.com/> foaf:homepage <http://jo-lamda.blogspot.com/> .
<http://jo-lamda.blogspot.com/> foaf:name "Jo Lambda"

This simply entails the bnode version, so anything you can infer 
using that can also be inferred using this. So using bnodes certainly 
does not *add* any functionality. The case for it, then, must be that 
it avoids making some inferences that would follow from the second 
rendering and that would cause a problem somewhere. This would have 
to be an RDF subgraph containing <http://jo-lamda.blogspot.com/> 
that is actively harmful (probably, which produces a contradiction) 
but where the analogous subgraph with a blank node does not. Well, 
maybe, but I havn't seen any actual examples.

>Similarly, if you want to use the URI of Jo's internet mail box 
><mailto:jo.lambda@example.org> to _indirectly identify_ Jo, you have 
>to do something like:
>
>_:aaa foaf:mbox <mailto:jo.lambda@example.org>;
>   _:aaa foaf:name "Jo Lambda".
>
>Then, because foaf:mbox is an inverse functional property (as is 
>foaf:homepage), if it were declared that:
>
>_:bbb foaf:mbox <mailto:jo.lambda@example.org>.
>
>... we could arrive at the conclusion:
>
>_:bbb owl:sameAs _:aaa.
>
>... which in turn would lead to the conclusion:
>
>_:bbb foaf:name "Jo Lambda".
>
>(Is this correct?)

Yes, provided that this was all inside the same graph.

>This was what I understood by 'indirect identification' as 
>implemented in RDF/OWL.

Its not what I understand by that term, since this style of working 
doesn't use indirection at all: all the URIs and bnodes are being 
used unambiguously here without any overloading. This is not 
analogous to the use of "Downing Street" to refer to the UK 
government (the example given in [4]).

>To say:
>
><http://jo-lamda.blogspot.com/> foaf:name "Jo Lambda".
>
>... is to use the URI <http://jo-lamda.blogspot.com/> to _directly 
>identify_ both Jo and her home page. I.e. this is a URI collision.

Im not sure what the TAG architecture document means by this term, to 
be honest, but this term "directly identify" is, ironically, 
ambiguous, and is used ambiguously throughout the architecture 
document (and through a number of previous documents). It can be 
understood to mean, roughly, 'is a locator of'; or it can be 
understood to mean 'refers to'. These are not the same notion. It is 
perfectly possible for a URI to locate one thing and refer to 
another, and no harm is thereby caused, provided only that one keeps 
in mind that to refer to something is not the same as to access it 
over the Web. To have a URI locating two different things is 
architecturally impossible: to have it referring (ambiguously) to two 
or more different things is inevitable (though you can imagine an 
ideal fantasy world where it never happens: one is described in book 
three of Gulliver's Travels, and another is described in the Earthsea 
novels of Ursula LeGuin) but to have it referring to one thing while 
being used to locate something else (such as a description of the 
referent, or something used conventionally to refer to or indicate 
the referent), seems to me should not be called a 'clash' at all.

>If both Jo's home page URI and her internet mail box URI were used 
>to _directly identify_ Jo, you could end up drawing the conclusion 
>that:
>
><http://jo-lamda.blogspot.com/> owl:sameAs <mailto:jo.lambda@example.org>.
>
>Surely we want to avoid that?

Well, that does look kind of silly, I will concede, and might even 
produce a contradiction if we had enough OWL content to prove that 
pages and emailboxes were disjoint. But my own reaction to this would 
be that such an OWL ontology would in fact be wrong, given the state 
of actual usage, since such sameAs assertions can indeed be treated 
as true, using widely used conventions about reference. Of course 
both of these URIs also have network-functional roles, and to read 
this owl:sameAs as identifying those roles would indeed be a mistake; 
and therein lies one danger. The moral I would draw is that it is 
risky to use owl:sameAs reasoning to conclude that any kind of Web 
operations can be performed on a URI (such as using mailto:foo in an 
http GET); but this would be a pretty poor design in any case. Its a 
bit like using
Pat_Hayes = Patrick_J_Hayes
"Pat_Hayes" contains an abbreviation
to conclude that
"Patrick_J_Hayes" contains an abbreviation

By the way, both of the webpage/emailbox conventions can be 
summarized as a single one, that the referential use of something 
that identifies a reference or description of an entity, should be 
treated as a reference to that entity: that the composition of 
identification and reference should be treated as reference. I don't 
justify this on theoretical semantic grounds, only empirically, as an 
observation that human readers do this spontaneously and apparently 
quite successfully. And then we just have to say that homepages and 
emailboxes conventionally describe or refer to - not identify - their 
owners, and everything works smoothly. This amounts to a systematic 
blurring of the use/mention distinction that logicians get so 
up-tight about (I speak as one, BTW, and reflexive ad hominem isn't 
impolite.)

I'll concede that there is a perfectly reasonable objection to my 
position along the lines, if http://jo-lamda.blogspot.com/ refers to 
Jo, how can we refer to the website when we want to talk about it 
rather than just GET something from it? That is, how do we 'turn off' 
an indirect convention when it is not wanted? In simple cases (like 
FOAF) there isn't any reason why we need to turn it off, since we can 
use the URI in both ways at once, but in more complicated (and more 
tightly constrained) cases this could be a problem. This is one 
reason why I like the idea of explicitly naming RDF graphs instead of 
using the http: URI, which gets a document containing a 
representation of it, as the name for the graph. The same line of 
thinking would reject the use of the website URI as a referring name 
for the website, and would therefore require that we adopt some other 
convention to refer to the website or mailbox (as opposed to access 
it, using a transfer protocol such as http). For example, to adapt 
one of the TAG suggestions, we might require that all URIs that are 
being used referentially should give 303s when used with an http GET. 
Not that I like this 303 redirect idea, but at least this version of 
it would have the merit of not requiring anyone to come up with a 
clear account of what exactly is the difference between a web 
resource and a non-web resource, and it would place the distinction 
where it belongs, between reference and access, rather than where it 
does not belong, in an under-defined ontological distinction. There 
is a transition cost, but not very high: previously published OWL/RDF 
could be fixed just by changing the Qnames in all the headers to 
point to redirecting URIs; whereas with the present muddle, someone 
has to ask whether or not a resource is a web resource, which is 
probably an unanswerable question in general. For example, a website 
is closed down so that a GET on the old URI gives a 404 error, does 
that convert the resource from a Web resource to a non-web resource? 
After all, I can still *refer to* the website, perhaps in order to 
say explicitly that it no longer exists.

Pat Hayes

-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 26 January 2006 21:02:59 UTC