Re: Working without being ambushed by Ambiguity (was: issue-57 background reading for F2F (short required reading) from David Booth on 2012-10-20 (www-tag@w3.org from October 2012)

From: David Booth <david@dbooth.org>
Date: Fri, 19 Oct 2012 22:11:14 -0400
To: Tim Berners-Lee <timbl@w3.org>, www-tag <www-tag@w3.org>
Message-ID: <1350699074.27835.15893.camel@dbooth-laptop>
Hi Tim,

Thanks for your ever-insightful comments.  My pedantry may be naive, but
in engineering fundamental aspects of the Semantic Web, such as the use
of URIs to denote things, I do not see how we can avoid certain details.
I am sorry if my comments, suggestions and periodic attempts to keep our
engineering on track seem like ambushing, but if am only seeing a
document for the first time when it is posted for public review, then I
see no way avoid that appearance.  

Your overall point seems to be that we all understand these pedantic
details about ambiguity, and therefore we should stop getting
sidetracked by them and focus on the larger engineering issues.  But
AFAICT, we do *not* all understand these details, and this is causing
the TAG to get sidetracked in a different way, due to some assumptions
that I believe are incorrect.  This is evidenced by the TAG's continued
focus on the web-page-versus-its-subject distinction, as though this
particular distinction is somehow more important to Semantic Web
architecture than any other distinction, such as the distinction between
a gene and the protein that it encodes.

The explanation that you posted below is excellent as far as it goes,
and I believe I agree with *all* of the principles that you expressed
and your description of how the Semantic Web works.  We are definitely
on the same page in this regard, and I could not have expressed them
better.  But to my mind there are some important elements still missing
from this explanation, and it seems skewed as a result, thus causing me
to disagree with some of the conclusions and characterizations.  

1. The examples you gave nicely illustrate the fact that models can be
"good enough" in some sense, just as URI definitions can be "good
enough".  It also outlines some social processes that are used to refine
URI definitions to make them "good enough" when they initially were not.
I completely agree that these social processes work, and this notion of
"good enough" is *exactly* what allows us to use URIs to (uniquely)
denote things, share common understandings, etc., and that this works in
a scale-free manner.  Furthermore, it is awesomely cool that we can do
this.  :-)

But what exactly does "good enough" mean?   Why is one URI definition
"good enough" but not another?  How exactly do these notions of "good
enough" and "not good enough" play out in terms of Semantic Web
applications, and the conventions that URI owners, RDF authors and RDF
consumers should follow to best enable the Semantic Web?  Perhaps we can
still make the right architectural choices without figuring out these
details, but I for one think they're important -- not for all
architectural choices, but certainly for any that deal with such
fundamental issues as determining what a URI denotes.

To my mind, a critical insight about this notion of "good enough" is
that it has virtually *nothing* intrinsically to do with the URI
definition itself.  Rather, it is determined by the *application* (or
class of applications) that consume the RDF that uses that definition.
A URI definition that is good enough (i.e., unambiguous) for one
application may not be good enough for another.  (This class of
applications corresponds to what you call the "community" that is
concerned with that URI, and you rightly point out that there are
different communities concerned with different URIs, some bigger, some
smaller.)  The reason this insight is important is that it determines
the criteria for "good enough": *first* you decide what applications you
wish to support, *then* you can say whether a URI definition is good
enough.  One *cannot* do so in isolation.  This flies in the face of the
conventional (naive?) view that a URI should be application independent,
so it ought to get some attention.  

The architecture of the Semantic Web should support *any* application.
Furthermore, my intuition tells me that it should not be biased toward
any particular application, though I admit that my intuition could be
wrong.  But it seems to me that if there *is* a compelling reason that
the Semantic Web architecture should have such a bias, then there should
be a solid rationale for it.  It is not enough to merely point out that
some applications would break without that bias, because that would be
true of *any* bias.

Returning now to the draft at
http://www.w3.org/2001/tag/doc/uri-usage-primer-2012-10-03/ 
it seems to me that this document fails to appreciate this point.  Yes,
some applications would break without the web-page-versus-its-subject
distinction.  Big deal.  So too would some applications break without
the gene-versus-its-encoded-protein distinction.  Obviously the TAG
cannot issue an endless series of guidelines, one for each kind of
ambiguity, pointing out how that kind of ambiguity breaks some
applications and recommending techniques to avoid it.  Furthermore, we
already have a simple, well-established technique that allows RDF
authors to avoid this and any other kind of ambiguity that they choose
to avoid: use different URIs.  So it seems to me that this
web-page-versus-its-subject fixation is an unproductive diversion, and
the burden of proof lies with its proponents to justify why it should
have special treatment.

2. Another important element that I think your explanation omitted was
the question of what an RDF author should do when a URI definition is
going through a social process of disambiguation, and how this impacts
downstream RDF consumers and authors.  Let us assume that at any point
in time, a URI has a particular definition that an RDF author can
obtain.  First off, how is that author to even *know* that the existing
definition will later prove to have been ambiguous to an RDF consumer's
application?  Alternatively, how is the RDF author to know that his/her
RDF is not *accidentally* constraining that URI's resource identity in a
way that later proves to be in conflict with the next version of the
URI's definition?  This is what I've been calling the resource identity
guessing game.  

Perhaps one may claim that the RDF author should have anticipated the
ambiguity and sought clarification before using the URI, but that does
not seem realistic to me.  In human communication people are blind to
their own ambiguities all the time.  They only discover them when
there's a problem.

Or perhaps the RDF author should be able to tell the difference between
properties that constrain the URI's resource identity -- the "essential
properties" -- and other properties that one may wish to assert about
the resource.  (For example, essential identifying properties for Alice
might say that "Alice Draper was born on 12-Jan-1997 in Boston, MA USA
to parents Bob and Carol Draper", whereas the RDF author may wish to
state that "Alice had scrambled eggs for breakfast on 15-Oct-2012".)
But in RDF, such properties look remarkably similar.  I certainly do not
know of any algorithmic way to tell them apart.  So AFAICT, that option
is a dead end also.

Thus, unless I'm mistaken, well-intentioned RDF authors *will* write RDF
that either knowingly (because they cannot get, or do not want to wait
for, a disambiguation from the URI owner or some social process) or
unknowingly further constrains the URI's resource identity.  And since
different RDF authors may constrain the URI's resource identity
differently, this will inevitably lead to well-intentioned RDF authors
writing RDF datasets that are compatible individually with the URI's
definition, but *incompatible* with each other, as illustrated in Figure
26 here:
http://dbooth.org/2010/ambiguity/paper.html#inconsistent-merge

In other words, AFAICT it is not possible to merely '"look up the
meaning" of something without having to have a notion that meaning is
unambiguous' and expect that this will allow one author's RDF to be
merged with another author's RDF without conflict.  This is a direct,
practical consequence of ambiguity -- not merely a pedantic annoyance --
and we should be understanding it and discussing practical ways to deal
with it rather than dismissing it as a distraction.  To my mind, this is
about learning to accept and live with inevitable ambiguity, rather than
pretending that we can avoid it because we have disambiguation processes
available.

Merging independently authored RDF is hard -- *much* harder than the
vision of the Semantic Web would lead one to believe.  (And one irony is
that the more precisely we define our resource identities, the *less*
widely useful they become!)  Ambiguity does not render useless the idea
that URIs denote things.   But it does *constrain* that idea, and I
think it is important to understand exactly how, because it has
ramifications that are relevant to Semantic Web architecture.  So to my
mind, figuring out how to make the Semantic Web work in spite of
ambiguity is not a rat hole, but essential engineering!

3. This point is not directly about ambiguity, but is about issue-57 and
URI definitions.  I have repeatedly seen the issue-57 problem
characterized as one of *communication* from a sender to a receiver,
including in your explanation below.  But this characterization is
insufficient to properly address issue-57.  The reason is that issue-57
is intended to support the Semantic Web.  And to achieve the vision of
the Semantic Web, an RDF consumer needs to be able to merge two RDF
datasets that were authored *independently*.  This means that two RDF
authors, acting independently, should be able to use URIs according to
the same URI definitions, i.e., without communicating with each other.
Since the authors cannot communicate with each other, they *must* use a
common convention to obtain the same URI definition, and this is the
purpose of the role of URI owner.  Thus there are *three* essential
parties in this scenario: two RDF authors and a URI owner.  

By attempting to frame the issue-57 problem as one of communication
between two parties, this essential requirement -- that two RDF authors
acting *independently* should be able to obtain the same URI definition
-- would be lost.  Indeed, if the problem were merely one of
communication between two parties then the sender could simply choose
his/her desired URI definition, use some common convention to tell the
receiver what definition was used (perhaps as message metadata), and --
voila -- the problem of obtaining the correct definition would be
solved, and there would be no need for the role of URI owner in Semantic
Web architecture!  

AFAICT, this too has not been fully appreciated in the issue-57
discussions to date.
                              -----

To sum up, I absolutely agree with the architectural principles that you
describe below, but I think they omit some elements -- particularly
around unavoidable ambiguity -- that are important in properly framing
and addressing issue-57.  So if it seems like I'm ambushing a discussion
of these issues, it is precisely because I am trying to keep the
engineering task *on* track instead of getting derailed by what appear
to me to be false assumptions or improper problem framing.  

I hope these comments have been helpful in shedding more light on the
reasons these issues keep popping up.  On the other hand, if I've got
this all wrong I hope someone will explain exactly where I've gone
wrong, so that I can become a slightly less naive pedant.

Thanks!
David


On Mon, 2012-10-15 at 14:53 -0400, Tim Berners-Lee wrote: 
> (I guess this is one of these things which is perennial.  I have not
> studied much of the history of philosophy but I do find one
> needs to be prepared to jump in in order to keep the  course
> of what I otherwise regard as engineering still on track…
> as I have said before, this is philosophical engineering we are doing...)
> 
> The point which David Booth has brought up, not for the
> first time, and which Pat has expounded very well, that 
> no symbol can ever have completely unambiguous meaning
> is, yes, quite valid.  There are several such points which
> we have to go over every now and again (preferably out of the critical path of
> working group work) and agree we all understand it and
> agree that we can all continue in practice without it.
> And indeed continue in theory without it as well.
> And Pat, you have lead us through that journey from 
> philosophical foundationlessness to logical foundations
> before and maybe you can help us again or just point
> to where you did before.  And Graham you make an
>  important distinction.
> 
> There are lots of models, I am sure, one can make of
> ambiguity and language and communication which will
> allow us to do this, and they may differ in how they work
> and it probably is best that we agree they exist but not get 
> hung up arguing about which one is "right". They
> will all be imperfect, but good enough. 
> 
> PHYSICS ANALOGY
> 
> I have before and will now compare this with classical and 
> quantum physics.  We go through our young lives with 
> classical physics, and are taught that a billiard ball
> has a given diameter, a given mass, and a given position
> and a velocity, all of which we can measure.
> We learn how to build houses and drive cars
> all based on this physics. And then we get older and people
> tell us that actually a billiard ball does not have a well defined
> diameter.  Not only, if you look closely at it edge,
> is it a mass of atoms, but also those atoms in fact have only
> a probability of being in any one place at any one time.
> And even the billiard ball itself, if we measure its position too
> accurately in principle we can only do it by losing knowledge of
> its momentum.   

> Now the naively pedantic response may be to insist, that
> everything we learned in Classical Physics be
> thrown away.  This is the response which says
> that it is no use talking about the position of a ball anyway,
> as its atoms could in fact just randomly move 3 inches east 
> at the same time.  So it is that those who see that 
> in a deep enough analysis almost given term admits of ambiguity
> might say that the Architecture of the WWW" is useless as
> it says URIs should only be used to denote one thing.
> 
> But in fact we really need to use the physics we have learned.
> We need to keep all we know about the way billiard balls
> interact at human scale.  Even though we have to be aware of
> quantum effects every now and again, when we find light
> being diffracted through a grating instead of being scattered,
> or electrons tunneling though a thin layer,
> we have ways of going into the details of the quantum effects
> where appropriate, and interfacing that thinking with the 
> classical thinking.  So it is with denotation by names.  We need to 
> keep the models of ambiguity in our back pocket  and
> bring them out when we need them, but not use them
> to ambush any discussion in the classical form.
> We should not use them to suggest that any use of the idea of a name
> having something it denotes is to be thrown away.
> 
> Ok, so in physics there is maths which allows you to show that 
> in the large scale, the quantum model of the world in fact gives
> rise, to a very high degree of approximation, to the classic model.
> 
> VARIOUS WAYS OF DEALING WITH AMBIGUITY
> 
> So now how do we construct a practical ability to use
> terms like the thing that a string denotes from the morass 
> of ambiguity which is communication?
> There are a number of models, none of which is perfect.
> What have we?
> 
> 1) The Authoritative Dictionary model.  The guy who puts together
> the Oxford English Dictionary just knows more than anyone else 
> about how people use words, and we all make sure we use words
> just as they are described there.  If we don't find a use in it we want,
> we sent him a note.
> 
> (This is perhaps the model we have in kindergarden)
> 
> 2) The naive "meaning as use" model, sometimes blamed on Wittgenstein. 
> You use terms however you like, as meaning is use, and so you can
> never be using them inconsistently with their meaning.
> 
> (Sometimes this may be -- who knows -- a response to realizing that
> the model 1 is not perfect) 
> 
> 3) The Expertise model.  The OED applies as above, but
> also we send lawyers to school for several years to agree on a set
> of terms which are more closely defined so we can use them
> in cases where we need unambiguity, like in contracts.
> To know what something means, ask a lawyer and if necessary 
> go to court to add enough extra definition to be able to continue.
> 
> Pat describes some of the great lengths to which lawyers sometimes
> have to go 
> 
> 4) The Areas of Expertise model. As above, but add in 
> groups of people with expertise in given areas. 
> Ask them to write anything you need in that area, and in 
> court  bring them in as expert witnesses.
> 
> 5) The Standards Committee model.
> A committee writes a standard for use in a particular area
> writes it using a mixture of words which it feels are well enough
> defined in models 1 2 or 3, and terms which it defines
> specifically locally for its own use within the standard specification.
> It discusses and ruminates until it feels it has found a set
> or terms which are all mutually well defined and tight enough
> to make a standard which people will use without undesirable
> consequence through misunderstanding. (Not a standard
> which everyone will understand unambiguously in exactly the same way, note).
> 
> (From time to time, the group may share its work with others
> and be horrified to find it has in the now larger community involved
> go through much longer discussion and rumination.)
> 
> There is recourse in that others can, while the group is extant
> in some form, challenge it to resolve perceived ambiguities in
> the terms it uses or the things it writes.
> 
> A FRAMEWORK WHICH ALLOWS THESE WAYS TO MIX
> 
> A common facet of all these models is that they 
> do not give complete unambiguity at all, just a good enough 
> definition.  
> "Good enough for government work" as they saying goes.
> Where "government work" is defined within some community
> of some size (See http://www.w3.org/DesignIssues/Fractal).
> 
> We can continue listing these sorts of models.
> More importantly, we can engineer them.
> The initial philosophers seemed to treat language as a
> natural or god-made thing to be investigated not
> engineered invented things,
> but in fact dictionaries and court procedure and standards bodies
> are all engineered systems.  So we can design the ones we need.
> 
> We therefore can improve on these systems,
> and, given that there is so much violence and counter productivity
> in the world and that much of it one might imagines stems from
> misunderstandings of some sort, it may behoove use to improve 
> on them.  That said, lets talk about this for URIs and
> specifically the Semantic Web design.
> 
> The Semantic Web meta model.
> 
> In a way the semantic web out-metas the model question.
> By focussing on the interchange of data in a restricted
> normal form, it can treat mathematically the systems 
> above -- and other systems -- in a logical way impossible
> with natural language terms.
> 
> The semantic web itself is a design, not a philosophical observation
> about how language works anywhere else. 
> 
> It decrees that there should be terms defined in the 
> http: URI space, and decrees that the DNS
> be part of a system of delegation of Ownership  
> of each term.  (I'm not going to quibble here about 
> whether ownership of terms delegated within domains)
> By realizing that there are many communities of people
> using all sorts of combination, and allowing people
> to create new terms very easily and being able to 
> avoid re-use of the same string,  it allows us to set up
> a system where the participating parties agree
> 
> - The DNS, and further systems within many domain's http spaces,
> allow a social entity to allocate a name in HTTP space.
> That social entity is deemed the "Owner" of the name.
> Ownership is defined 
> - The network and the HTTP allows a machine to look up
> the name and get information back
> - This information you get back provides elucidation in two forms,
> in natural language (with various models of ambiguity relief)
> and logic (where the core terms such as the syntax of turtle,
> and rdf:type are defined in mode 5 by the W3C working groups
> etc).
> 
> Everyone who uses the semantic web has to then sign
> up to this meta-model, though they can pick and chose 
> models above.
> 
> Importantly, implicit or explicit in the information which is 
> returned is information about which mode is used 
> to relieve ambiguity.
> 
> So the crucial design, then, is that when one agent sends
> another a message, that agent will pick a set of 
> terms which have different owners who operate or curate
> different vocabularies using different models above or 
> indeed combination of models and new models.
> 
> The vocabularies are picked so that the disambiguation
> is good enough.   Good enough for the situation,
> for the sending and receiving agent.
> 
> (We tend to call the information which we get back over HTTP
> the definition of the term. Well, we would except that we 
> would be ambushed by people who want to use the word
> "definition" specifically for a definition using one or other
> particular model).
> 
> 
> Of course in parallel with the actual looking up 
> of stuff on the web, also people share understandings
> over beers in bars as they always have done,
> but the semantic web linked data system is cool in two ways:
> Firstly, it instantiates the models of disambiguation
> providing a way to "look up the meaning" of something
> without having to have a notion that meaning is unambiguous.
> Secondly, it gives us the ability to write programs to help us,
> because of the logic interchanged. That's really handy.
> 
> Now we have to, mainly, get on with the business of
> building systems, but we have to be aware of when the ambiguity
> case arises.  We need, in our discussions, to have things
> to point people to so that naive pedantic arguments don't
> derail perfectly good discussion and logic based on the idea that 
> names denote things.  But we need to be aware
> of when the pedantry is appropriate, and have avenues
> ready to go down.
> 
> Example 1.
> 
> In our semantic web based world,
> When you are using a form, you may fill in details
> about, say, a seminar you are organizing, and generally
> the prompts on the form allow you to fill in things
> like "Date", "Start time" and "End time" without likely
> damage due to misunderstanding.
> If you have to choose in a pull-down menu whether to categorize it
> as a talk or a class or a seminar or a concert, you might
> be more puzzled, but a good app will pull in comments
> from the ontology when you hover over it uncertainly,
> giving you enough more detailed information to make
> your decision. You can maybe even clock off and follow
> a link to bring up the detailed information from the ontology,
> and also you can search for members of each subclass,
> to see what existing things have been categorized each way and do on.
> So a user can well use the meaning lookup system,
> resolve the meaning well enough.
> 
> Example 2
> 
> Consider now the person who is creating the form.
> Each time they add a field, they will hopefully pick 
> an associated property for it.  And hopefully they 
> will pick a property from an existing ontology which 
> will give it wide interoperability.  You want the events defined by users of the form  to appear on people's calendars, for example,
> and feeds of upcoming talks.
> So at this point the user as form creator is
> more aware of the different organizations, and the different
> disambiguation models, which apply to each.
> The user will at this point quite likely pick a number
> very standard terms, a few from other ontologies,
> and then be stuck and have to make up a few properties.
> This is when the system needs ideally to be able to 
> give the user a feel for the cost of
> getting others to agree on the ontology, of keeping it up.
> 
> This is where there should be buttons to invite comments
> and buttons to form a group, an buttons to to allow
> one to ask another group to collaborate, and so on.
> And depending on the sort of group formed
> and the sorts of groups to be collaborated with, 
> the social processes will be of all kinds.
> 
> End of examples.
> 
> So we can build systems which instantiate 
> and enhance the social processes which 
> we use to resolve ambiguity.
> 
> So yes there many times when all the details of the
> way the semantic web resolves ambiguity enough
> for us to be able to talk about names having a single
> thing they denote, and even having a definition.
> 
> And we understand the extent to
> which that breaks and where it affects us and we
> have a task of creating systems (technical and social)
> which behave appropriately and allow us to agree
> enough on the meaning of old terms and new ones
> to be able to collaborate better and better.
> 
> But right now these social systems are in place in various forms
> so we need not be ambushed by the many rat-holes
> around this, some of which need to be charted and left rarely visited.
> 
> Tim
> 
> 
> 
> 
> * "God created the Counting Numbers, and man invented the rest" -- @@@? 
> 
> ** We don't want to send all the naive pedantic arguments off
> on the B ark, and then die from an unsanitized telephone.
> 
> 
> 
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Saturday, 20 October 2012 02:11:45 UTC