W3C home > Mailing lists > Public > public-swbp-wg@w3.org > January 2005

RE: [SE] Composite Identification Schemes on the Semantic Web

From: Phil Tetlow <philip.tetlow@uk.ibm.com>
Date: Thu, 27 Jan 2005 05:18:05 -0500
To: "Bernard Vatant" <bernard.vatant@mondeca.com>
Cc: <public-swbp-wg@w3.org>, tom.croucher@sunderland.ac.uk, j.r.c.geldart@durham.ac.uk
Message-ID: <OF099062D4.EB3993B2-ON80256F96.00374A35-85256F96.00374331@uk.ibm.com>


Many thanks for sharing your thoughts publicly. I think it is also
important to point to the exceptional work that Tom Croucher and Joe
Geldart have been doing in this area. They have undoubtedly contributed
significantly and deserve due recognition. An early draft of their paper on
'Situation and Identity - A Generalisation of Inverse Functional
Properties' can be found at http://osiris.sund.ac.uk/~cs0tco/eswc2005.pdf
and is also referenced via http://dl-web.man.ac.uk/~panz/swse/. I'm sure we
will be hearing much more from both Tom and Joe in the future. The have
some particularly unique and valuable ideas.


Phil Tetlow
Senior Consultant
IBM Business Consulting Services
Mobile. (+44) 7740 923328

             "Bernard Vatant"                                              
             ondeca.com>                                                To 
             Sent by:                  <public-swbp-wg@w3.org>             
             public-swbp-wg-re                                          cc 
                                       RE: [SE] Composite Identification   
             26/01/2005 17:46          Schemes on the Semantic Web         

Dear all

I have answered privately to Phil, thinking what I had to say was a bit out
of the scope
of this WG/TF, but he suggested that it might be of interest to all. So
please find below
this answer (just a bit re-worded and expanded in the last sections).

Phil Tetlow wrote :

> One minor point - The URI system is the foundation on which the Web is
> built. So (to use your words) I think that the time may well be 'right'
> consider the validity of identification schemes that
> or 'extend' this system, rather than 'shift from it'. A subtle change in
> words - I hope you do not mind?

Well, actually, I do, and think it is not minor :)
Though English is not my native language, when I write "shift" I mean

- As is well attested by the neverending debates about URIs "meaning"
(social or
otherwise), we are in a situation in which URIs share basically the same
as names in natural languages, or plain identifiers in various information
systems, like
telephone numbers, or credit card numbers, which have no meaning outside
the telephone
network context, or the bank network context. URIs are used to identify
resources, but
there is not, and most likely there will never be any universal agreement
on what a
resource exactly is, neither in general, nor in particular for any
identified resource -
except trough a very recursive definition : "A resource is something
identified by a URI"
and "This particular resource is what is identified by this particular URI"
... To make it
short, it's not because you've agreed on using, say, passport number,
and/or Family Name +
First Name + Birth Date + Birth Place to identify a person, that you know
what a person is
in general, or what/who this particular person, identified in such a way,
is. You only
agree on some identification protocol when checking in at the airport.
That's why I keep
saying : there is no (absolute) identity, there are only identification

- Like it or not, people will use the same URI in different contexts to
identify different
things, whatever the strength of recommendations saying: "This is bad
practice, you should
not do that". People will do it anyway, for various well known reasons :
because they are
not aware of the fact that the URI they use is already used, or they are
aware of it but
they don't understand the semantics already declared, or they don't care,
or they think
this very URI should mean something else, or they deliberately want to
screw up the system

- People will create a proliferation of new URIs when there are already a
lot of them to
represent the concepts they need - see the 399 "foo#Person" URIs on Swoogle
- because they
want them in their own namespace, because they are lazy, because they have
not discovered
the existing URIs or they are not sure the existing one(s) mean exactly
what they need, or
they don't trust the source etc.

- In short, URI-based languages, so to speak, are bound to evolve like all
languages, with a mess of homonymy, and synonymy, and ambiguity as the
general rule, and
identification contexts, situations, protocols, conversations inside which
ambiguity is
resolved, and used names hopefully identify the same thing for all the
interlocutors in
the conversation (humans and machines). And, IMO, this reality is
completely orthogonal to
the fact that URIs represent very formal elements in ontologies (say, a
class in a
well-engineered OWL ontology) or loosely-defined plain RDF resources.

- Outside URI-based identification, there are already a lot of
identification protocols
taking place on the Web, either based on non-URI but non-ambiguous
identifiers such as
ISBN numbers (see http://isbn.nu), airport codes, country codes, language
codes, etc ...
or composite identification schemes, or full-text entity recognition
performed by NL tools
... (see Google News). Some of those protocols are pretty effective, some
generate noise
and silence, and so far URI-based identification is just another of them,
and it's no more
100% proof than any of those, for the above reasons.

Seeking dynamic and seamless integration of all various, existing, foreseen
and unforeseen
identification protocols, is IMO the way to go, and yes somehow it
"augments" the
URI-based identity system if you like to see it like that. But in fact it's
not as if URIs
were the only identification tools, and other ones to be invented and
added, they are
already here, and what we need is integration. If we don't look for
integration, we will
keep on having on one side the so-called semantic technologies, seen as the
academic, AI,
KR and logic camp, and on the other side the full-text, linguistic,
fuzzy-but-efficient algorithms of Google and al. We really need both, not
on two sides of
a no man's land, but working seamlessly together. Would not that be a
"shift" from the
current state of things?

For the record, in Mondeca we've been working for a while with linguistic
tools connected
with our semantic data bases, both with Danish and Italian research groups
Computational Linguistics in the framework of the European project MOSES,
and with our
partner Temis [1], including in customers projects both assistance to
indexing and entity
and relationships extraction.  Matching the settings of NL processing
components with
formal ontologies is a challenging task, but the results we have obtained
so far in
domains like legal documentation or economic intelligence are really



[1] http://www.temis-group.com/


Bernard Vatant
Senior Consultant
Knowledge Engineering

"Making Sense of Content" :  http://www.mondeca.com
"Everything is a Subject" :  http://universimmedia.blogspot.com

Received on Thursday, 27 January 2005 10:14:30 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:09:41 UTC