W3C home > Mailing lists > Public > public-swbp-wg@w3.org > January 2006

RE: [ALL] RDF/A Primer Version

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 26 Jan 2006 15:27:15 -0600
Message-Id: <p0623090abffdb6719d0a@[10.100.0.23]>
To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
Cc: "Booth, David \(HP Software - Boston\)" <dbooth@hp.com>, "Ben Adida" <ben@mit.edu>, "SWBPD list" <public-swbp-wg@w3.org>, "public-rdf-in-xhtml task force" <public-rdf-in-xhtml-tf@w3.org>

>Pat Hayes said:
>
><quote>
>[4] has a clear and explicit description (at
>http://www.w3.org/TR/webarch/#indirect-identification
>) of a condition which seems to apply almost
>perfectly to the situation which arises in RDF/A
>and which Alistair deplores, and which is
>correctly described as not constituting a URI
>collision. Using the same name to refer both to a
>thing, and to a piece of a document which itself
>refers to the same thing, seems clearly to be an
>example of indirect reference. As [4] says,
>somewhat pithily," Identifiers are commonly used
>in this way."
></quote>
>
>I understood [4] to be referring to 'indirect identification' as 
>expressed in RDF via properties of type 
>owl:InverseFunctionalProperty. I.e. the following triple:
>
>_:aaa foaf:homepage <http://jo-lamda.blogspot.com/>.
>
>... uses the URI <http://jo-lamda.blogspot.com/> to 'indirectly 
>identify' the blank node _:aaa because the property foaf:homepage is 
>declared by the FOAF ontology [1] to be an inverse functional 
>property.

That isn't how I read [4]. The kind of usage you describe is not 
'indirect', since all the identifiers are being used with a single 
referent in mind: "http://jo-lamda.blogspot.com/" denotes a web page 
and "_:aaa" denotes its owner.

>
>If this is indeed the intended meaning of 'indirect identification' 
>at [4] then I strongly suggest the RDF/A primer does NOT use the 
>term 'indirect identification' to refer to the practice of using 
>URIs to denote both a piece of XML (effectively a part of a 
>document) and an entity in the 'real world' (e.g. a person).
>
>See also related email [2].
>
>Pat Hayes said:
>
><quote>
>It is impossible, both practically and
>theoretically, to completely avoid all ambiguity
>in using referential names. Reference is not
>access. While URLs must be unambiguous locators,
>in the sense of resolving unambiguously to a
>particular Web resource, referential names -
>which is how URI references are used in RDF -
>cannot possibly be specified so exactly as to
>refer uniquely and unambiguously in all
>circumstances. Even globally recognizable proper
>names like "Mount Everest" do not have unique
>referents in all possible circumstances, since
>the exact referent depends on the ontological
>framework being mutually assumed (Where is the
>exact edge of a mountain? Are we talking about
>people as agents or as medical cases? At a
>particular time or as endurants? etc..) Under
>these circumstances, to view every referential
>ambiguity as a Bad Thing is about as useful as
>trying to stamp out breathing.
>
>Like words in human language, URIs can be safely
>overloaded under conditions which allow possible
>misunderstandings to be securely resolved by
>their local context, without requiring
>negotiation: and this need not even require that
>the resolution be actually done, provided that
>the necessary context - which is the case under
>discussion, is likely to be the ontology
>identified by the root URI of the RDF property -
>can be accessed when required. In English we
>safely use "bank" to refer to a side of a river,
>a turning motion or a building, in part because
>these meanings are so divergent that the
>ambiguity can almost always be immediately
>resolved by the immediate context. Similarly, an
>email address can be safely used to refer to its
>owner in part because almost anything that can be
>coherently said about a person could not possibly
>apply to an email account, and vice versa. Even
>the use of a literal string in a context which
>requires a reference to a named agent can be
>interpreted as making sense, since it clearly
>requires a coercion, and it would be natural to
>use the string as a referring name. Whether or
>not this is in some fundamental sense 'correct'
>or 'proper' is not worth discussing: what matters
>is only that a community of agents all agree to
>use the same kind of coercion strategy when it is
>required, which allows strings to be used to
>refer to agents; and to the extent they do, then
>they thereby become genuinely referring names.
>This is how the world comes to use language, both
>in the large and in the small
>(http://www.economist.com/science/displayStory.cfm?story_id=5135495).
></quote>
>
>OK. Tell me what 'local context' is exactly.

I apologize for using the c-word, which I normally try to avoid. I 
didn't mean to imply that there are actual things called 'contexts'. 
The context for a URI occurring in some RDF content is that RDF 
itself, plus any other relevant RDF that can reasonably be presumed 
to be accessible, e.g. the ontologies accessible from the base URIs 
of other identifiers in the  transmitted RDF, or in imported 
ontologies. I meant only that if the identifier is  transmitted from 
A to B, then there is enough information available at B to do the 
necessary disambiguation, without having to go back to A and ask for 
clarification. In ordinary conversation, this corresponds to not 
having to say something like "what sense of 'bank' did you mean, 
exactly?".

>How do I as a publisher ensure that sufficient 'context' is 
>available for the applications I intend to support?

You don't. But how, as a publisher, do you ensure that there enough 
of anything to support the processing you hope will happen at the 
other end? You cannot establish this absolutely, in all cases. The 
best you can do is to provide pointers to anything that you feel is 
relevant, and in many cases rely on a presumption that both you and 
your readers share some common ground. It seems to me that there is 
absolutely no way to avoid making assumptions like this.

>What about unforeseen applications? As a consuming application, how 
>do I get at the 'context'

see above

>, and how do I use it to resolve ambiguities?

Well, my point is that most of these apparent ambiguities will either 
not in fact need to be resolved, or their resolution will be done by 
applying conventions that have evolved within a community of use, 
which in a Web context means a community which uses a particular 
vocabulary consistently in a certain way. The use of webpage address 
URIs to denote people in FOAF is an excellent example. But the basic 
point is that inferential processing (drawing conclusions, querying, 
checking consistency, etc.) can all be done without needing to 
'resolve' ambiguities. The ambiguity of a URI's reference, if 
present, can usually simply be left ambiguous. The logical semantics 
underlying inference presumes that identifiers are ambiguous in this 
way: ambiguity is the norm. In fact, the reduction (not total 
elimination) of ambiguity is often one of the main reasons for doing 
inference.

>  Where are these issues addressed in current specifications?
>
>Surely it is good practice for publishers to clearly understand how 
>and when ambiguities can arise, to be aware of each and every action 
>that could lead ambiguity, and to undertake such actions in full 
>knowledge of the consequences.

Well, yes, it is hard to argue with that. But if 10|6 websites, say, 
already use webpage URIs to refer to their owners, and if the 
normative semantic theories in the specifications do not prohibit 
this (as they do not) and all the machinery that processes this 
information works (as it does) why is it considered 'good practice' 
to set out to re-educate everyone and to oblige them to to change? 
Seems to me it might be more productive to take a more empirical and 
less judgmental stance, and ask why and how this situation, which 
theory predicts should lead to confusion, apparently does not lead to 
confusion. The TAG recommendations seem to be based on an implicit 
theory of ambiguity and communication. Projects like FOAF seem to me 
to be empirical refutations of this theory.

>Surely it is also good practice for publishers in the majority of 
>cases to design systems that do not lead to ambiguity

No. Ambiguity is inherent in the very idea of using names to refer in 
a descriptive formalism. This is the point I tried to get across to 
the TAG. There is a common presumption that ambiguity is a Bad Thing, 
and so we should make every reasonable effort to Stamp It Out. But 
this is nonsense: ambiguity *of reference* is not only not a bad 
thing, it is a *necessary* thing. There are theorems which show that 
only an uncomputable amount of assertional effort could ever 
completely remove it. Even trying to remove it in realistic cases is 
unfeasible. Take an ordinary unambiguous name: what *exactly* does 
"Mount Everest" refer to? (What is its volume? Where are its edges? 
etc..) Or take my name, and ignore the fact that there are many Pat 
Hayes' in the world: does the "Pat Hayes" that identifies me refer to 
me now, me throughout my lifetime; me considered as a social agent, 
me considered as an organism, etc.? These are all distinctions that 
formal ontologies regularly make. So these are ambiguities too: the 
fact that they are not acknowledged by linguists doesn't make them 
any less real, it is just a testament to our human ability to 
communicate successfully using ambiguous notations. Almost all names 
are referentially ambiguous; and ironically, every attempt to remove 
this ambiguity by imposing more exactly defined lexica 
(mountain-as-physical-object, mountain-as-geographical-entity, 
mountain-as-climbing-peak, etc.) actually makes the ambiguity worse 
for all other names, since it provides for making finer and finer 
ontological distinctions elsewhere, thereby creating (or perhaps 
revealing) ambiguity where none was previously noticed. If there are 
ten distinct referents for "Pat Hayes" and also for "Jackie Hayes" 
then there are a hundred different types of binary relation between 
us that could all be described as "marriedTo". There is no final end 
state where every name is unambiguous: this vision is a chimera.

One reaction I meet when I try to point this out is along the lines: 
even if what you say is true, it is like saying that the world is 
full of sin: but still, we should all strive to be good. But this 
misses my point. I'm not saying that the problem is unsolvable. Im 
saying that there is no problem. Ambiguity does not get in the way of 
communication or inference. Setting out to remove all ambiguity is 
like setting out to walk to the moon: its a futile goal since it 
can't be done, and also because there is absolutely no need to even 
try to do it. It is certainly not good practice in general.

Of course there are cases (medicine, biology, science generally, law, 
international standards) where 'ordinary' identifiers are not 
precisely defined enough for some technical usage, and specialized 
lexica are necessary, often requiring careful management, because 
certain kinds of ambiguity must be caught and corrected. I don't mean 
to imply that this kind of effort is pointless: only that to assert 
as a general property of the Web architecture that all identifiers 
should be unambiguous, is nonsensical.

>, or that minimise the potential for ambiguity, because in doing so 
>they simpify the management of change, and increase the ease with 
>which their data can be repurposed in unforseen contexts? I.e. by 
>acting to minimise the potential for ambiguity, a publisher 
>increases the value of its published data, because the data is more 
>portable.

Well, that might be a good case, but the conclusion isn't obvious. 
I'd like to see a really good (mathematical?) account of why and how 
less ambiguity makes for improved portability. I can see good 
informal arguments both ways.

>A practical question: If I operate under the assumption that the 
>same URI will commonly be used to denote both a person and their 
>home page, doesn't this make the notion of logical consistency 
>effectively useless?

No. Absolutely not.

>Don't domains and ranges become effectively useless also?

No. Although, to be fair, one common way to understand domains and 
ranges, as 'constraints' on what can be said, which can be 'checked' 
to detect 'errors', would indeed be in opposition to what Im saying 
here. But none of those words Ive highlighted have any natural place 
in an inferential framework: they all come from thinking about 
programming language design.

>E.g. if I have:
>
><http://jo-lamda.blogspot.com/> foaf:mbox <mailto:jo.lambda@example.org>.
>
>... and I also have:
>
>_:aaa foaf:homepage <http://jo-lamda.blogspot.com/>.
>
>... then via the domain of foaf:mbox and the range of foaf:homepage 
>I may conclude:
>
><http://jo-lamda.blogspot.com/> a foaf:Agent, foaf:Document.
>
>What is the usefulness of this new information?

I don't vouch for its usefulness, but I would argue that it is a 
reasonable statement of exactly the overloading or punning condition 
that I have no problems with, which is that a single URI can usefully 
play several referential roles at the same time. So, it might not be 
particularly useful, but it can be harmlessly true.

Pat

>
>Cheers,
>
>Al.
>
>[1] http://xmlns.com/foaf/0.1/
>[2] http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0145.html
>[4] http://www.w3.org/TR/webarch/#indirect-identification
>
>
>-----Original Message-----
>From: public-rdf-in-xhtml-tf-request@w3.org on behalf of Pat Hayes
>Sent: Wed 25/01/2006 05:30
>To: Booth, David (HP Software - Boston)
>Cc: Ben Adida; SWBPD list; public-rdf-in-xhtml task force
>Subject: RE: [ALL] RDF/A Primer Version
>
>
>>I hate to say this, but I think the URI identity issues that Alistair
>>raised in email[3] after yesterday's teleconference are important enough
>>to delay publication until they are either fixed or visibly marked as
>>problems.  The WebArch document is clear that URI collisions[4] are A
>>Bad Thing.  It would seem wrong to endorse such collisions, even
>>implicitly.
>
>I beg to differ.
>
>[4] has a clear and explicit description (at
>http://www.w3.org/TR/webarch/#indirect-identification
>) of a condition which seems to apply almost
>perfectly to the situation which arises in RDF/A
>and which Alistair deplores, and which is
>correctly described as not constituting a URI
>collision. Using the same name to refer both to a
>thing, and to a piece of a document which itself
>refers to the same thing, seems clearly to be an
>example of indirect reference. As [4] says,
>somewhat pithily," Identifiers are commonly used
>in this way."
>
>It is impossible, both practically and
>theoretically, to completely avoid all ambiguity
>in using referential names. Reference is not
>access. While URLs must be unambiguous locators,
>in the sense of resolving unambiguously to a
>particular Web resource, referential names -
>which is how URI references are used in RDF -
>cannot possibly be specified so exactly as to
>refer uniquely and unambiguously in all
>circumstances. Even globally recognizable proper
>names like "Mount Everest" do not have unique
>referents in all possible circumstances, since
>the exact referent depends on the ontological
>framework being mutually assumed (Where is the
>exact edge of a mountain? Are we talking about
>people as agents or as medical cases? At a
>particular time or as endurants? etc..) Under
>these circumstances, to view every referential
>ambiguity as a Bad Thing is about as useful as
>trying to stamp out breathing.
>
>Like words in human language, URIs can be safely
>overloaded under conditions which allow possible
>misunderstandings to be securely resolved by
>their local context, without requiring
>negotiation: and this need not even require that
>the resolution be actually done, provided that
>the necessary context - which is the case under
>discussion, is likely to be the ontology
>identified by the root URI of the RDF property -
>can be accessed when required. In English we
>safely use "bank" to refer to a side of a river,
>a turning motion or a building, in part because
>these meanings are so divergent that the
>ambiguity can almost always be immediately
>resolved by the immediate context. Similarly, an
>email address can be safely used to refer to its
>owner in part because almost anything that can be
>coherently said about a person could not possibly
>apply to an email account, and vice versa. Even
>the use of a literal string in a context which
>requires a reference to a named agent can be
>interpreted as making sense, since it clearly
>requires a coercion, and it would be natural to
>use the string as a referring name. Whether or
>not this is in some fundamental sense 'correct'
>or 'proper' is not worth discussing: what matters
>is only that a community of agents all agree to
>use the same kind of coercion strategy when it is
>required, which allows strings to be used to
>refer to agents; and to the extent they do, then
>they thereby become genuinely referring names.
>This is how the world comes to use language, both
>in the large and in the small
>(http://www.economist.com/science/displayStory.cfm?story_id=5135495).
>
>I suggest that if current real-world usage of a
>metadata vocabulary seems to be causing no actual
>operational problems, it might be better to study
>this real-world usage carefully with a view to
>learning something about how symbols actually are
>being used on the Web, than to set out to take
>great pains to improve it.
>
>In the meantime, I also suggest that RDF/A might
>usefully use the term "indirect identification"
>to point out that subjects of RDF triples can
>both be pieces of XML markup and also refer to
>entities in the real world, and that this need
>not be deplored as harmful ambiguity.
>
>Pat Hayes
>
>>David Booth
>>
>>[3] Identity issues raised by Alistair:
>>http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0113.html
>>[4] TAG's Web Architecture:
>>http://www.w3.org/TR/webarch/#URI-collision
>>
>>
>>>   -----Original Message-----
>>>   From: public-swbp-wg-request@w3.org
>>>   [mailto:public-swbp-wg-request@w3.org] On Behalf Of Ben Adida
>>>   Sent: Tuesday, January 24, 2006 12:03 PM
>>>   To: SWBPD list
>>>   Cc: public-rdf-in-xhtml task force
>>>   Subject: [ALL] RDF/A Primer Version
>>>
>>>
>>>
>>>
>>>   Hi all,
>>>
>>>   I made a mistake in the version of the RDF/A Primer that I presented
>>>   at the telecon yesterday. I have just finished uploading the right
>>>   version, which you can find here:
>>>
>>>   http://www.w3.org/2001/sw/BestPractices/HTML/2006-01-24-rdfa-primer
>  >>
>>>   With the WG and specifically the reviewers' approval (DBooth,
>>>   GaryNg,
>>>   and also "unofficial" reviewers), I am hoping that we can rapidly
>>>   agree that this latest version should be the one that becomes our
>>>   first published WD.
>>>
>>>   The only difference in content is that the new version has an extra
>>>   section (section #2), and the old sections 2 and 3 are merged into
>>>   the new section 3 for purely organizational purposes (no text
>>>   is lost
>>>   or added in those sections, just reorganized.) The point of the new
>>>   section 2 is to add an even simpler introductory example. We believe
>>>   this additional section is in line with the comments we
>>>   received from
>>>   reviewers, both official and earlier, unofficial reviews. In
>>>   fact, we
>>>   began writing it in part to respond to some of these early
>>>   comments 2
>>>   weeks ago.
>>>
>>>   The already-approved version is still at the old URL for
>>>   comparison:
>>>   http://www.w3.org/2001/sw/BestPractices/HTML/2006-01-15-rdfa-primer
>  >>
>>>   I want to stress that this is entirely *my* mistake: the TF had
>>>   agreed [1,2] that this second version would be presented to the WG
>>   > yesterday, and I simply forgot. Publishing these additional examples
>>   > now is quite important for getting the word out about RDF/A and
>>   > making it competitive against other metadata inclusion proposals,
>>>   outside of W3C, that are gaining traction.
>>>
>>>   Apologies for my mistake. I hope you'll see that these edits do not
>>>   constitute a substantive change to the document, rather they help
>>>   make the same points more appealing to and understandable by
>>>   a larger
>>>   audience.
>>>
>>>   -Ben Adida
>>>   ben@mit.edu
>>>
>>>   [1] Discussion during last segment of January 10th TF
>>>   telecon: http://www.w3.org/2006/01/10-swbp-minutes
>>>
>>>   [2] Discussion, at beginning, of Mark's new examples during January
>>>   17th TF telecon:
>>>   http://www.w3.org/2006/01/17-swbp-minutes
>>>
>>>
>
>
>--
>---------------------------------------------------------------------
>IHMC		(850)434 8903 or (650)494 3973   home
>40 South Alcaniz St.	(850)202 4416   office
>Pensacola			(850)202 4440   fax
>FL 32502			(850)291 0667    cell
>phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 26 January 2006 21:27:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:20 GMT