- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Wed, 26 Jan 2005 23:46:18 +0100
- To: <public-swbp-wg@w3.org>
Dear all I have answered privately to Phil, thinking what I had to say was a bit out of the scope of this WG/TF, but he suggested that it might be of interest to all. So please find below this answer (just a bit re-worded and expanded in the last sections). Phil Tetlow wrote : > One minor point - The URI system is the foundation on which the Web is > built. So (to use your words) I think that the time may well be 'right' to > consider the validity of identification schemes that 'augment','complement' > or 'extend' this system, rather than 'shift from it'. A subtle change in > words - I hope you do not mind? Well, actually, I do, and think it is not minor :) Though English is not my native language, when I write "shift" I mean "shift" - As is well attested by the neverending debates about URIs "meaning" (social or otherwise), we are in a situation in which URIs share basically the same characteristics as names in natural languages, or plain identifiers in various information systems, like telephone numbers, or credit card numbers, which have no meaning outside the telephone network context, or the bank network context. URIs are used to identify resources, but there is not, and most likely there will never be any universal agreement on what a resource exactly is, neither in general, nor in particular for any identified resource - except trough a very recursive definition : "A resource is something identified by a URI" and "This particular resource is what is identified by this particular URI" ... To make it short, it's not because you've agreed on using, say, passport number, and/or Family Name + First Name + Birth Date + Birth Place to identify a person, that you know what a person is in general, or what/who this particular person, identified in such a way, is. You only agree on some identification protocol when checking in at the airport. That's why I keep saying : there is no (absolute) identity, there are only identification protocols. - Like it or not, people will use the same URI in different contexts to identify different things, whatever the strength of recommendations saying: "This is bad practice, you should not do that". People will do it anyway, for various well known reasons : because they are not aware of the fact that the URI they use is already used, or they are aware of it but they don't understand the semantics already declared, or they don't care, or they think this very URI should mean something else, or they deliberately want to screw up the system etc. - People will create a proliferation of new URIs when there are already a lot of them to represent the concepts they need - see the 399 "foo#Person" URIs on Swoogle - because they want them in their own namespace, because they are lazy, because they have not discovered the existing URIs or they are not sure the existing one(s) mean exactly what they need, or they don't trust the source etc. - In short, URI-based languages, so to speak, are bound to evolve like all natural languages, with a mess of homonymy, and synonymy, and ambiguity as the general rule, and identification contexts, situations, protocols, conversations inside which ambiguity is resolved, and used names hopefully identify the same thing for all the interlocutors in the conversation (humans and machines). And, IMO, this reality is completely orthogonal to the fact that URIs represent very formal elements in ontologies (say, a class in a well-engineered OWL ontology) or loosely-defined plain RDF resources. - Outside URI-based identification, there are already a lot of identification protocols taking place on the Web, either based on non-URI but non-ambiguous identifiers such as ISBN numbers (see http://isbn.nu), airport codes, country codes, language codes, etc ... or composite identification schemes, or full-text entity recognition performed by NL tools ... (see Google News). Some of those protocols are pretty effective, some generate noise and silence, and so far URI-based identification is just another of them, and it's no more 100% proof than any of those, for the above reasons. Seeking dynamic and seamless integration of all various, existing, foreseen and unforeseen identification protocols, is IMO the way to go, and yes somehow it "augments" the URI-based identity system if you like to see it like that. But in fact it's not as if URIs were the only identification tools, and other ones to be invented and added, they are already here, and what we need is integration. If we don't look for integration, we will keep on having on one side the so-called semantic technologies, seen as the academic, AI, KR and logic camp, and on the other side the full-text, linguistic, heuristic, fuzzy-but-efficient algorithms of Google and al. We really need both, not on two sides of a no man's land, but working seamlessly together. Would not that be a "shift" from the current state of things? For the record, in Mondeca we've been working for a while with linguistic tools connected with our semantic data bases, both with Danish and Italian research groups in Computational Linguistics in the framework of the European project MOSES, and with our partner Temis [1], including in customers projects both assistance to indexing and entity and relationships extraction. Matching the settings of NL processing components with formal ontologies is a challenging task, but the results we have obtained so far in domains like legal documentation or economic intelligence are really exciting. Cheers Bernard [1] http://www.temis-group.com/ ********************************************************************************** Bernard Vatant Senior Consultant Knowledge Engineering bernard.vatant@mondeca.com "Making Sense of Content" : http://www.mondeca.com "Everything is a Subject" : http://universimmedia.blogspot.com **********************************************************************************
Received on Wednesday, 26 January 2005 22:46:45 UTC