Re: Why Literals should be unique and why this is a serious issue from Andreas Andreakis on 2005-11-19 (semantic-web@w3.org from November 2005)

From: Andreas Andreakis <andreas.andreakis@gmx.de>
Date: Sat, 19 Nov 2005 13:29:10 +0100
To: Hans Teijgeler <hans.teijgeler@quicknet.nl>
CC: 'semantic-web at W3C' <semantic-web@w3c.org>
Message-ID: <437F1A96.1020302@gmx.de>
hi Hans,


thanks lot for your reply. With "lazy" I mean the people who create and 
use ontologies, which are supposted to drive the SemanticWeb.

As a first impression, I think your contribution covers a lot of the URI 
problem. I agree that additional overhead (in termns of literals as 
instances of classes) will be nassesary. I also like the idea of a 
system related Resource DNA´s, since we must differentiate resources, 
which are resused by another Ontology (and since we must provide in 
general a simple way to reference resources to each other).

Have you written a paper on this ? I mean, do you have any kind of more 
detailed information.


Somethink, I can´t find in your contribution, is how you finaly prevent 
dublication ? See the RDF example in my first email . I mean you assign 
a DNA to every resource, but what prevents this resources having exacly 
the same set of literal values ( / in your idea literal instances). Well 
you can use InverseFuncProperties -> "sure". BUT don´t forget that the 
InverseFuncProperty is only an additional and voluntary feature on top 
of OWL Full. It is save to say that we will face significant accaptance 
problems, if we dont force (sounds hard but is not) a value-Aware 
Identification of resources.

Think a little bit about Objects in an OOP Language like Java, this 
meight sound odd, but give me a chance. Think in Java of a Person Bean, 
containing a set of appropriate attributes, like name, email etc..
If you create in Java two Objects, each will get a place in memory which 
is unique (like your DNA´s id, but ignore distributed sources for now). 
What happens if you like to compare the two person objects ? Nothink 
really helpful if you dont overwrite equals() to compare on *top of 
specific identifing values*. But this is voluntary, you dont have to 
overwrite equals!

Currently in the SemanticWeb we have exacly the same situation, it is 
voluntary to specify properties as InverseFunc, so that instances can be 
identified uniqly *on top of values they describes*. And dont forget as 
in OOP you can just compare individuals of the same type (class).

Do you think teh current situation is good ? Do you think it is 
flexible, whether to use InverseFuncProperties or not? Yes, maybe in the 
short term. But how in the longterm ?


Maybe we could combine both ideas ? So we would have useful 
Resource-ID´s as Hash described in his  contribution and in addition we 
could identify a resource both on its ID *and* its values.


How do you guys in here rate this approach.
Personaly, I will think about it in more detail and write you soon back.

best regards,
Andreas




Hans Teijgeler schrieb:

>Hi Andreas,
>
>Here a contribution from a field that cannot be "lazy" as you mentioned.
>This is the field of lifecycle information integration for facilities. Our
>work entails setting up "confederations" of MANY triple stores of systems,
>groups, companies involved in that life cycle.
>
>What we do is:
>*   each resource gets a unique "SystemID" (the ID allocated to a resource
>within your 
>     system, like a primary key in an RDBMS)
>*   that SystemID stays with the resource forever (a kind of "resource DNA")
>*   since that SystemID is prefixed with the URI of that system, the
>combination is
>    unique on the Internet
>*   names like "Tiger Woods" are no good substitute for this DNA, because
>people can 
>    (and do) change names in their lifetime (this also applies to the
>somewhat strange 
>     habit of  identifying a person with his/her e-mail address)
>
>About Literals the following:
>*   Literals are, from a modelling point of view, classes. Any Literal class
>has zillions of
>    members (you look at some of them)
>*   That's why we model them as the owl:Class "XmlSchemaLiteral" with
>subClasses for 
>     each datatype (e.g."XmlSchemaString"), and subsubClasses for each
>particular string, 
>     integer, etc. They have a Property "content". That content has the
>actual value 
>     expressed in rdf:datatype terms
>*   Advantage of this approach is that you can easily define translations
>between any two 
>     of such classes, and you have to do it only once for each pair in a
>certain context
>*   This approach obviously creates an overhead, but when you take the
>global Semantic 
>    Web (not just a US/UK English one) serious, then such translations are
>important
>
>An example of this in OWL Full (the prefix XSST is an acronym for the class
>type (here: XmlSchemaSTring)):
>
><owl:Class rdf:ID="XSST-487832">
>      <rdfs:subClassOf
>rdf:resource="http://www.15926.org/dm#XmlSchemaString"/>
>      <rdf:type rdf:resource="http://www.15926.org/rd#LANG-347001">
>    <dm:content
> 
>rdf:datatype="http://www.w3.org/2001/XMLSchema#string">pump</dm:content>
></owlClass>
>
><owl:Class rdf:ID="XSST-548388">
>    <rdfs:subClassOf
>rdf:resource="http://www.15926.org/dm#XmlSchemaString"/>
>    <rdf:type rdf:resource="http://www.15926.org/rd#LANG-347012">
>    <dm:content 
> 
>rdf:datatype="http://www.w3.org/2001/XMLSchema#string">bomba</dm:content>
></owlClass>
>
>where LANG-347001 is defined as "English" and LANG-347012 as "Italian". 
>A Property "translatedTo" does the rest. 
>
>In case we want to define the context we use our "templates", which are
>standard n-ary relations. 
>
>Regards,
>Hans
>  
>
Received on Saturday, 19 November 2005 12:30:16 UTC