Why Literals should be unique and why this is a serious issue

Hi all,


I?ve recently read some posts about the so called "uri-crisis",  maybe 
"resource identification problem" should be more precise.
I dont want to re-exmplain the problem in here, since it has allready 
been discused many time.

Well, actually I think that the problem (I personally identify it as a 
problem) with resource identfication has deeper roots, then it might be 
obvious. The question is not whether unique-uris are good or bad, at 
least from one perspective they are good, since then can identify a 
resource uniqly in the resource-graph.

The question is how can we prevent same-concept dublication ?

To give a small example consider the following Ontology:

<!-- Shema -->
<rdfs:Class rdf:ID="Person" />

<rdfs:Property rdf:ID="personName">
       <rdfs:domain rdf:resource="#Person" />
       <rdfs:range rdf:resource="&xsd;string" />
</rdfs:Property>

<!-- Instances -->
<Person rdf:ID="Person1">
    <personName>Tiger Woods</personName>
</Person>

<Person rdf:ID="Person2">
    <personName>Tiger Woods</personName>
</Person>


What is wrong with this ontology? From a theoretical perspective, this 
ontology is fully valid, but from a practical perspective this ontology 
is a demo to the uri-crisis.
But is it actually the URI?s fault ? I think "no". Look a bit closer at 
the Instances, Person1 und 2. What have this two in common ? 
They have both the same value in personName "Tiger Woods" -> does this 
help ?

What am I up to ?
this-> In terms of datamodelling (and Ontologies - SemanticWeb are very 
much about this) we should make progress and this means we should learn 
from previous and current technologies.

As an example: How could this problem be solved in a relational shema ? 
Simply by setting the property personName as a unique key. Obvious right ?
So, in a relational Database this problem would have never arrised. So 
why can?t be do the same in Ontologies ?


Here is a possible (but yet not fully evaluated) suggestion:

1) Let?s set every Literal as a unique Resource in the graph
2) We need a new construct, where we can assign to datatype-properties 
whether they are unique or not.
3) Resource identification is based on top of classes + the contained 
unique Literals, and not on non-sense uris.

to 1) I am sure that a lot of people will not like this suggestion. But 
think again, why not ? In fact, everythink we describe is based on 
literal values, even if we are working with high abstractions.
If we see every Literal uniqly, there is no possibility to create 
dublications and thus the problem in the previous Person Ontology would 
never occour. Note that the use and reuse of unique Literals should not 
be restricted by the pure appereance of Literals, but through 
specificifation in "properties".


to 2) a refactored Person ontology could look like:

<rdfs:*UniqueProperty* rdf:ID="personName">
       <rdfs:domain rdf:resource="#Person" />
       <rdfs:range rdf:resource="&xsd;string" />
</rdfs:*UniqueProperty*>

Here every Instance of type Person with the property personName must 
contain a unique Value. If two Person instances point on the same 
Literal, they are not valid ! As simple as that.


to 3) It is interessting to ask if this could work in interconnected 
ontologies, where concepts are reused. I think yes, because if you merge 
two(or more) Ontologies, you must simply merge equal Literals as well. 
And anyway if Literals are used in different contexts this is 100% ok.
Don?t forget: properties activate a literal as unique only in definition 
to a specific class (see point 2 if this is still not clear)


Like mentioned before, this idea is still very breef and it is possible, 
that I have missed important facts. So this is where you guys come into 
the game.

How do you rate this idea ? Im looking forward to your oppinion


best regards,
Andreas

Received on Saturday, 19 November 2005 05:29:12 UTC