RE: Why Literals should be unique and why this is a serious issue from Hans Teijgeler on 2005-11-19 (semantic-web@w3.org from November 2005)

From: Hans Teijgeler <hans.teijgeler@quicknet.nl>
Date: Sat, 19 Nov 2005 12:05:43 +0100
To: "'Andreas Andreakis'" <andreas.andreakis@gmx.de>
Cc: "'semantic-web at W3C'" <semantic-web@w3c.org>, "'Richard Newman'" <r.newman@reading.ac.uk>, "Paap, Onno" <onno.paap@ezzysurf.com>, "Price, David" <david.price@eurostep.com>
Message-ID: <000301c5ecf9$339f7ff0$6c7ba8c0@hans>
Hi Andreas,

Here a contribution from a field that cannot be "lazy" as you mentioned.
This is the field of lifecycle information integration for facilities. Our
work entails setting up "confederations" of MANY triple stores of systems,
groups, companies involved in that life cycle.

What we do is:
*   each resource gets a unique "SystemID" (the ID allocated to a resource
within your 
     system, like a primary key in an RDBMS)
*   that SystemID stays with the resource forever (a kind of "resource DNA")
*   since that SystemID is prefixed with the URI of that system, the
combination is
    unique on the Internet
*   names like "Tiger Woods" are no good substitute for this DNA, because
people can 
    (and do) change names in their lifetime (this also applies to the
somewhat strange 
     habit of  identifying a person with his/her e-mail address)

About Literals the following:
*   Literals are, from a modelling point of view, classes. Any Literal class
has zillions of
    members (you look at some of them)
*   That's why we model them as the owl:Class "XmlSchemaLiteral" with
subClasses for 
     each datatype (e.g."XmlSchemaString"), and subsubClasses for each
particular string, 
     integer, etc. They have a Property "content". That content has the
actual value 
     expressed in rdf:datatype terms
*   Advantage of this approach is that you can easily define translations
between any two 
     of such classes, and you have to do it only once for each pair in a
certain context
*   This approach obviously creates an overhead, but when you take the
global Semantic 
    Web (not just a US/UK English one) serious, then such translations are
important

An example of this in OWL Full (the prefix XSST is an acronym for the class
type (here: XmlSchemaSTring)):

<owl:Class rdf:ID="XSST-487832">
      <rdfs:subClassOf
rdf:resource="http://www.15926.org/dm#XmlSchemaString"/>
      <rdf:type rdf:resource="http://www.15926.org/rd#LANG-347001">
    <dm:content
 
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">pump</dm:content>
</owlClass>

<owl:Class rdf:ID="XSST-548388">
    <rdfs:subClassOf
rdf:resource="http://www.15926.org/dm#XmlSchemaString"/>
    <rdf:type rdf:resource="http://www.15926.org/rd#LANG-347012">
    <dm:content 
 
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">bomba</dm:content>
</owlClass>

where LANG-347001 is defined as "English" and LANG-347012 as "Italian". 
A Property "translatedTo" does the rest. 

In case we want to define the context we use our "templates", which are
standard n-ary relations. 

Regards,
Hans

_______________________ 
Hans Teijgeler
ISO 15926 specialist
www.InfowebML.ws
hans.teijgeler@quicknet.nl
phone +31-72-509 2005      
________________________________________-----Original Message-----
From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
Behalf Of Andreas Andreakis
Sent: Saturday, November 19, 2005 11:20 AM
To: Richard Newman
Cc: semantic-web at W3C
Subject: Re: Why Literals should be unique and why this is a serious issue


this is a good example.

The scenarion you describe is a matter of modelling and at last related 
to resource-identification. For instance, If you describe an Ontology 
with persons having the same name, you simply have to add
more inverse functional properties (in owl) to identify persons. And 
dont forget that in terms of resource identification it is a 
prerequirement to assume a specific class and not only literals, since 
literals themselfs can not talk detailed about a resource they describe. 
What does "Tiger" mean ? OS, Animal oder somekind of other Product ? or 
what does "David Green" alone mean ? a company name oder a person name ?

The FOAF Ontology uses for instance a combination of 2-3 
inv-funct-properties do identify persons, where the email is of of those.


But anyway, we have still not solved the dublication-problem and talking 
around will not bring us forward. So I ask again. And Im really looking 
forward to suggestions from you guys.
How can we prevent this ?
People are lazy and will not search if others have created something 
similar. Higher levels of abstractions can prevent dublications, but we 
need a unified specification on this ! There are already Implementations 
that ignore rdf:ID´s of resources, the most common example is FOAF. FOAF 
says in its specification, not to include rdf:ID´s, so where will this 
lead us ? If the one uses ID´s and ther other inverse-funct-properties ??

cheers,
Andreas



Richard Newman schrieb:

> Let's have a counter-example.
>
> I know two people named David Green. Almost no literal-valued  
> property can really be termed inverse-functional: even genetic code  
> sequences can be shared (between twins, for example). Certainly,  
> terming names ("Tiger Woods") as IFPs (your more "fundamental  
> problem") doesn't work.
>
>> So, in a relational Database this problem would have never arrised.  
>> So why can´t be do the same in Ontologies ?
>
>
> Well, as has been pointed out, we can -- IFPs. We don't do so very  
> often because our assertions have global scope, and I *know* that the  
> two David Greens are separate individuals.
>
> Relational databases rarely choose to deal with the possibilities of  
> integrating data from a dozen sources.
>
> -R
>
>
Received on Saturday, 19 November 2005 11:06:25 UTC