Working without being ambushed by Ambiguity (was: issue-57 background reading for F2F (short required reading) from Tim Berners-Lee on 2012-10-15 (www-tag@w3.org from October 2012)

From: Tim Berners-Lee <timbl@w3.org>
Date: Mon, 15 Oct 2012 14:53:55 -0400
To: Graham Klyne <GK@ninebynine.org>
Cc: Pat Hayes <phayes@ihmc.us>, David Booth <david@dbooth.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <3722231C-6F28-4C9F-9A92-AD4CFF734A2C@w3.org>

(I guess this is one of these things which is perennial. I have not
studied much of the history of philosophy but I do find one
needs to be prepared to jump in in order to keep the course
of what I otherwise regard as engineering still on track…
as I have said before, this is philosophical engineering we are doing...)

The point which David Booth has brought up, not for the
first time, and which Pat has expounded very well, that
no symbol can ever have completely unambiguous meaning
is, yes, quite valid. There are several such points which
we have to go over every now and again (preferably out of the critical path of
working group work) and agree we all understand it and
agree that we can all continue in practice without it.
And indeed continue in theory without it as well.
And Pat, you have lead us through that journey from
philosophical foundationlessness to logical foundations
before and maybe you can help us again or just point
to where you did before. And Graham you make an
important distinction.

There are lots of models, I am sure, one can make of
ambiguity and language and communication which will
allow us to do this, and they may differ in how they work
and it probably is best that we agree they exist but not get
hung up arguing about which one is "right". They
will all be imperfect, but good enough.

PHYSICS ANALOGY

I have before and will now compare this with classical and
quantum physics. We go through our young lives with
classical physics, and are taught that a billiard ball
has a given diameter, a given mass, and a given position
and a velocity, all of which we can measure.
We learn how to build houses and drive cars
all based on this physics. And then we get older and people
tell us that actually a billiard ball does not have a well defined
diameter. Not only, if you look closely at it edge,
is it a mass of atoms, but also those atoms in fact have only
a probability of being in any one place at any one time.
And even the billiard ball itself, if we measure its position too
accurately in principle we can only do it by losing knowledge of
its momentum. Now the naively pedantic response may be to insist, that
everything we learned in Classical Physics be
thrown away. This is the response which says
that it is no use talking about the position of a ball anyway,
as its atoms could in fact just randomly move 3 inches east
at the same time. So it is that those who see that
in a deep enough analysis almost given term admits of ambiguity
might say that the Architecture of the WWW" is useless as
it says URIs should only be used to denote one thing.

But in fact we really need to use the physics we have learned.
We need to keep all we know about the way billiard balls
interact at human scale. Even though we have to be aware of
quantum effects every now and again, when we find light
being diffracted through a grating instead of being scattered,
or electrons tunneling though a thin layer,
we have ways of going into the details of the quantum effects
where appropriate, and interfacing that thinking with the
classical thinking. So it is with denotation by names. We need to
keep the models of ambiguity in our back pocket and
bring them out when we need them, but not use them
to ambush any discussion in the classical form.
We should not use them to suggest that any use of the idea of a name
having something it denotes is to be thrown away.

Ok, so in physics there is maths which allows you to show that
in the large scale, the quantum model of the world in fact gives
rise, to a very high degree of approximation, to the classic model.

VARIOUS WAYS OF DEALING WITH AMBIGUITY

So now how do we construct a practical ability to use
terms like the thing that a string denotes from the morass
of ambiguity which is communication?
There are a number of models, none of which is perfect.
What have we?

1) The Authoritative Dictionary model. The guy who puts together
the Oxford English Dictionary just knows more than anyone else
about how people use words, and we all make sure we use words
just as they are described there. If we don't find a use in it we want,
we sent him a note.

(This is perhaps the model we have in kindergarden)

2) The naive "meaning as use" model, sometimes blamed on Wittgenstein.
You use terms however you like, as meaning is use, and so you can never be using them inconsistently with their meaning.

(Sometimes this may be -- who knows -- a response to realizing that the model 1 is not perfect)

3) The Expertise model. The OED applies as above, but
also we send lawyers to school for several years to agree on a set
of terms which are more closely defined so we can use them
in cases where we need unambiguity, like in contracts.
To know what something means, ask a lawyer and if necessary
go to court to add enough extra definition to be able to continue.

Pat describes some of the great lengths to which lawyers sometimes
have to go

4) The Areas of Expertise model. As above, but add in
groups of people with expertise in given areas.
Ask them to write anything you need in that area, and in
court bring them in as expert witnesses.

5) The Standards Committee model.
A committee writes a standard for use in a particular area
writes it using a mixture of words which it feels are well enough
defined in models 1 2 or 3, and terms which it defines
specifically locally for its own use within the standard specification.
It discusses and ruminates until it feels it has found a set
or terms which are all mutually well defined and tight enough
to make a standard which people will use without undesirable
consequence through misunderstanding. (Not a standard
which everyone will understand unambiguously in exactly the same way, note).

(From time to time, the group may share its work with others
and be horrified to find it has in the now larger community involved
go through much longer discussion and rumination.)

There is recourse in that others can, while the group is extant
in some form, challenge it to resolve perceived ambiguities in
the terms it uses or the things it writes.

A FRAMEWORK WHICH ALLOWS THESE WAYS TO MIX

A common facet of all these models is that they
do not give complete unambiguity at all, just a good enough
definition. "Good enough for government work" as they saying goes.
Where "government work" is defined within some community
of some size (See http://www.w3.org/DesignIssues/Fractal).

We can continue listing these sorts of models.
More importantly, we can engineer them.
The initial philosophers seemed to treat language as a
natural or god-made thing to be investigated not
engineered invented things,
but in fact dictionaries and court procedure and standards bodies
are all engineered systems. So we can design the ones we need.

We therefore can improve on these systems,
and, given that there is so much violence and counter productivity
in the world and that much of it one might imagines stems from
misunderstandings of some sort, it may behoove use to improve
on them. That said, lets talk about this for URIs and
specifically the Semantic Web design.

The Semantic Web meta model.

In a way the semantic web out-metas the model question.
By focussing on the interchange of data in a restricted
normal form, it can treat mathematically the systems
above -- and other systems -- in a logical way impossible
with natural language terms.

The semantic web itself is a design, not a philosophical observation
about how language works anywhere else.

It decrees that there should be terms defined in the
http: URI space, and decrees that the DNS
be part of a system of delegation of Ownership
of each term. (I'm not going to quibble here about
whether ownership of terms delegated within domains)
By realizing that there are many communities of people
using all sorts of combination, and allowing people
to create new terms very easily and being able to
avoid re-use of the same string, it allows us to set up
a system where the participating parties agree

- The DNS, and further systems within many domain's http spaces,
allow a social entity to allocate a name in HTTP space.
That social entity is deemed the "Owner" of the name.
Ownership is defined
- The network and the HTTP allows a machine to look up
the name and get information back
- This information you get back provides elucidation in two forms,
in natural language (with various models of ambiguity relief)
and logic (where the core terms such as the syntax of turtle,
and rdf:type are defined in mode 5 by the W3C working groups
etc).

Everyone who uses the semantic web has to then sign
up to this meta-model, though they can pick and chose
models above.

Importantly, implicit or explicit in the information which is
returned is information about which mode is used
to relieve ambiguity.

So the crucial design, then, is that when one agent sends
another a message, that agent will pick a set of
terms which have different owners who operate or curate
different vocabularies using different models above or
indeed combination of models and new models.

The vocabularies are picked so that the disambiguation
is good enough. Good enough for the situation,
for the sending and receiving agent.

(We tend to call the information which we get back over HTTP
the definition of the term. Well, we would except that we
would be ambushed by people who want to use the word
"definition" specifically for a definition using one or other
particular model).

Of course in parallel with the actual looking up
of stuff on the web, also people share understandings
over beers in bars as they always have done,
but the semantic web linked data system is cool in two ways:
Firstly, it instantiates the models of disambiguation
providing a way to "look up the meaning" of something
without having to have a notion that meaning is unambiguous.
Secondly, it gives us the ability to write programs to help us,
because of the logic interchanged. That's really handy.

Now we have to, mainly, get on with the business of
building systems, but we have to be aware of when the ambiguity
case arises. We need, in our discussions, to have things
to point people to so that naive pedantic arguments don't
derail perfectly good discussion and logic based on the idea that
names denote things. But we need to be aware
of when the pedantry is appropriate, and have avenues
ready to go down.

Example 1.

In our semantic web based world,
When you are using a form, you may fill in details
about, say, a seminar you are organizing, and generally
the prompts on the form allow you to fill in things
like "Date", "Start time" and "End time" without likely
damage due to misunderstanding.
If you have to choose in a pull-down menu whether to categorize it
as a talk or a class or a seminar or a concert, you might
be more puzzled, but a good app will pull in comments
from the ontology when you hover over it uncertainly,
giving you enough more detailed information to make
your decision. You can maybe even clock off and follow
a link to bring up the detailed information from the ontology,
and also you can search for members of each subclass,
to see what existing things have been categorized each way and do on.
So a user can well use the meaning lookup system,
resolve the meaning well enough.

Example 2

Consider now the person who is creating the form.
Each time they add a field, they will hopefully pick
an associated property for it. And hopefully they
will pick a property from an existing ontology which
will give it wide interoperability. You want the events defined by users of the form to appear on people's calendars, for example,
and feeds of upcoming talks.
So at this point the user as form creator is
more aware of the different organizations, and the different
disambiguation models, which apply to each.
The user will at this point quite likely pick a number
very standard terms, a few from other ontologies,
and then be stuck and have to make up a few properties.
This is when the system needs ideally to be able to
give the user a feel for the cost of
getting others to agree on the ontology, of keeping it up.

This is where there should be buttons to invite comments
and buttons to form a group, an buttons to to allow
one to ask another group to collaborate, and so on.
And depending on the sort of group formed
and the sorts of groups to be collaborated with,
the social processes will be of all kinds.

End of examples.

So we can build systems which instantiate
and enhance the social processes which
we use to resolve ambiguity.

So yes there many times when all the details of the
way the semantic web resolves ambiguity enough
for us to be able to talk about names having a single
thing they denote, and even having a definition.

And we understand the extent to
which that breaks and where it affects us and we
have a task of creating systems (technical and social)
which behave appropriately and allow us to agree
enough on the meaning of old terms and new ones
to be able to collaborate better and better.

But right now these social systems are in place in various forms
so we need not be ambushed by the many rat-holes
around this, some of which need to be charted and left rarely visited.

Tim

* "God created the Counting Numbers, and man invented the rest" -- @@@?

** We don't want to send all the naive pedantic arguments off
on the B ark, and then die from an unsanitized telephone.

Received on Monday, 15 October 2012 18:53:59 UTC