Re: Tidy/untidy: that's all about assumptions, folks from Sergey Melnik on 2002-09-26 (w3c-rdfcore-wg@w3.org from September 2002)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Thu, 26 Sep 2002 11:49:24 +0200
To: Patrick Stickler <patrick.stickler@nokia.com>
CC: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <3D92D824.2010409@db.stanford.edu>
Patrick Stickler wrote:

> [...]
>>Recall the motivating example from the RDF 1.0 Spec:
>>
>>foo dc:Creator "John Smith"
>>
>>Is "John Smith" supposed to represent a person or a string? 
>>
> 
>>From a data markup perspective, we can say that the form of
> the expression is an ambiguous name, since it is not a URIref.


There is nothing ambiguous about strings that populate databases.

 
>>From a knowledge representation perspective, I think it's pretty
> intuitive that we're talking about a person in the real world,
> and not a string.


That's the assumption you are making, and the one I'm questioning! Of 
course, there might be a person in a real world who is the creator of 
foo. But the object of the triple must not be the identifier of that 
person to make sense.


>>The key 
>>argument behind untidiness is that "John Smith" (or "10") cannot 
>>possibly be meant to be a string, so it has to be something else, whose 
>>meaning can be deduced using a right bit of logic and AI.
>>
> 
> Right, but the basis for that argument is not really that the literal
> cannot possibly mean a string, but that given the role and purpose
> of RDF as a language for making statements about the world, it is
> rather bizarre for it to mean a string.


 From your perspective is may look bizarre, so you might want to choose 
a different modeling style. However, the above reflects a common 
modeling practice. Modelers and developers conventionally use integers 
and strings for modelings all sorts of things, such as ages, weights, 
income, etc.

>>Ok, we have datatyping now, so let's do it right:
>>
>>:x age int"10"
>>:y shoeSize int"10"
>>
>>Now we got it! int"10" is not a string now; it's what we want it to 
>>mean: an integer. Damn. The entailment
>>
>>:x age :z
>>:y shoeSize :z
>>
>>still holds...
>>
> 
> Er, why is that a problem that it holds. It should hold. 


The above entailment has been *the* focal point of argument. Once it is 
abandoned, tidy interpretation does not pose any "semantic" problems for 
the existing apps.


> Also, what will the impact be to applications expecting tidy semantics
> when
> 
>   :x age "10"
>   :y shoeSize "010"
> 
> does *not* entail
> 
>  :x age :z
>  :y shoeSize :z
> 
> ???


IMO, zero. Are you aware of any existing apps whose behavior depends on 
whether the above entailment holds or not?


>>Just as well, shoeSize could be defined as a 
>>property that holds between shoes and strings/reals/etc.
>>
> 
> Well, one problem with saying that the interpretation is based on
> a property that holds between resources and lexical representations
> is that there is not, and IMO can never be, any restriction against
> non-canonical lexical forms. Therefore, even if you were to take a
> tidy approach where inline literals denote themselves, you would 
> *still* have to evaluate cases such as
> 
>    :x :p "10"
>    :x :p "10.0"
>    :x :p "010"
>    :x :p "010.0"
>    etc.
> 
> in terms of a lexical to value mapping to determine actual equality
> of the objects.


We don't need to prohibit non-canonical lexical forms. In the above 
example, the equality of "10", "10.0" etc. does not matter. What 
matters, however, is how these values impact the interpretation of :x. 
This knowledge is captured by the semantics of the property p, which is 
typically built-in into apps. Thus, it may well be the case that all of 
the four statements above imply an identical interpretation for :x. If a 
developer wants to make this knowledge explicit, and support 
interoperation with other apps, he/she can a posteriori provide a rule 
such as

(x1 :p s1) & (s1 :isLexicalTokenOf xsd:int) & (v1 xsd:int s1) &
(x2 :p s2) & (s2 :isLexicalTokenOf xsd:int) & (v2 xsd:int s1) &
(v1 == v2)
==> x1 = x2

in a schema describing p. If we choose untidy semantics, the developer 
would simply have to write two different rules, to achieve the same effect:

(1) p :range xsd:int
(2) (x1 :p v1) & (x2 :p v2) & (v1 == v2)
      ==> x1 = x2

Both approaches are practically equivalent (of course, you might say the 
first one is more "bizarre" that the second).

> I believe that the primary purpose of RDF is as a language for
> interchange of knowledge (not just structured markup), and as such, 
> the more explicitly that meaning can be expressed in that language 
> the better. 


Your are preaching to the converted. I think you and me have a quite 
similar view of the world in this respect. The way I'd model, is 
illustrated in

http://www-db.stanford.edu/~melnik/rdf/datatyping/fig/rich_types.gif

I'd model age using durations, and weight in terms of masses, not integers.

Recall that our tidy/untidy discussion refers to *existing* applications 
that already made their choices about how they model the world. We don't 
want those Adobe, CC/PP etc. folks and API developers, who bought into 
RDF, do a whole lot of work required to recode their data and reprogram 
their apps. We agreed on that.

Now we are working on making sure that their apps are forward-compatible 
to future Semantic Web standards, specifically, ontology and rule 
languages. I'm convinced that both tidy and untidy semantics work 
equally well. Staying with tidy requires certain developers to change 
their perception of how properties and values they used refer to the 
real world. Going for untidy requires APIs and apps to be adjusted. The 
choice is ours.

Sergey
Received on Thursday, 26 September 2002 05:51:11 UTC