- From: Sergey Melnik <melnik@db.stanford.edu>
- Date: Thu, 26 Sep 2002 11:49:24 +0200
- To: Patrick Stickler <patrick.stickler@nokia.com>
- CC: RDF Core <w3c-rdfcore-wg@w3.org>
Patrick Stickler wrote:
> [...]
>>Recall the motivating example from the RDF 1.0 Spec:
>>
>>foo dc:Creator "John Smith"
>>
>>Is "John Smith" supposed to represent a person or a string?
>>
>
>>From a data markup perspective, we can say that the form of
> the expression is an ambiguous name, since it is not a URIref.
There is nothing ambiguous about strings that populate databases.
>>From a knowledge representation perspective, I think it's pretty
> intuitive that we're talking about a person in the real world,
> and not a string.
That's the assumption you are making, and the one I'm questioning! Of
course, there might be a person in a real world who is the creator of
foo. But the object of the triple must not be the identifier of that
person to make sense.
>>The key
>>argument behind untidiness is that "John Smith" (or "10") cannot
>>possibly be meant to be a string, so it has to be something else, whose
>>meaning can be deduced using a right bit of logic and AI.
>>
>
> Right, but the basis for that argument is not really that the literal
> cannot possibly mean a string, but that given the role and purpose
> of RDF as a language for making statements about the world, it is
> rather bizarre for it to mean a string.
From your perspective is may look bizarre, so you might want to choose
a different modeling style. However, the above reflects a common
modeling practice. Modelers and developers conventionally use integers
and strings for modelings all sorts of things, such as ages, weights,
income, etc.
>>Ok, we have datatyping now, so let's do it right:
>>
>>:x age int"10"
>>:y shoeSize int"10"
>>
>>Now we got it! int"10" is not a string now; it's what we want it to
>>mean: an integer. Damn. The entailment
>>
>>:x age :z
>>:y shoeSize :z
>>
>>still holds...
>>
>
> Er, why is that a problem that it holds. It should hold.
The above entailment has been *the* focal point of argument. Once it is
abandoned, tidy interpretation does not pose any "semantic" problems for
the existing apps.
> Also, what will the impact be to applications expecting tidy semantics
> when
>
> :x age "10"
> :y shoeSize "010"
>
> does *not* entail
>
> :x age :z
> :y shoeSize :z
>
> ???
IMO, zero. Are you aware of any existing apps whose behavior depends on
whether the above entailment holds or not?
>>Just as well, shoeSize could be defined as a
>>property that holds between shoes and strings/reals/etc.
>>
>
> Well, one problem with saying that the interpretation is based on
> a property that holds between resources and lexical representations
> is that there is not, and IMO can never be, any restriction against
> non-canonical lexical forms. Therefore, even if you were to take a
> tidy approach where inline literals denote themselves, you would
> *still* have to evaluate cases such as
>
> :x :p "10"
> :x :p "10.0"
> :x :p "010"
> :x :p "010.0"
> etc.
>
> in terms of a lexical to value mapping to determine actual equality
> of the objects.
We don't need to prohibit non-canonical lexical forms. In the above
example, the equality of "10", "10.0" etc. does not matter. What
matters, however, is how these values impact the interpretation of :x.
This knowledge is captured by the semantics of the property p, which is
typically built-in into apps. Thus, it may well be the case that all of
the four statements above imply an identical interpretation for :x. If a
developer wants to make this knowledge explicit, and support
interoperation with other apps, he/she can a posteriori provide a rule
such as
(x1 :p s1) & (s1 :isLexicalTokenOf xsd:int) & (v1 xsd:int s1) &
(x2 :p s2) & (s2 :isLexicalTokenOf xsd:int) & (v2 xsd:int s1) &
(v1 == v2)
==> x1 = x2
in a schema describing p. If we choose untidy semantics, the developer
would simply have to write two different rules, to achieve the same effect:
(1) p :range xsd:int
(2) (x1 :p v1) & (x2 :p v2) & (v1 == v2)
==> x1 = x2
Both approaches are practically equivalent (of course, you might say the
first one is more "bizarre" that the second).
> I believe that the primary purpose of RDF is as a language for
> interchange of knowledge (not just structured markup), and as such,
> the more explicitly that meaning can be expressed in that language
> the better.
Your are preaching to the converted. I think you and me have a quite
similar view of the world in this respect. The way I'd model, is
illustrated in
http://www-db.stanford.edu/~melnik/rdf/datatyping/fig/rich_types.gif
I'd model age using durations, and weight in terms of masses, not integers.
Recall that our tidy/untidy discussion refers to *existing* applications
that already made their choices about how they model the world. We don't
want those Adobe, CC/PP etc. folks and API developers, who bought into
RDF, do a whole lot of work required to recode their data and reprogram
their apps. We agreed on that.
Now we are working on making sure that their apps are forward-compatible
to future Semantic Web standards, specifically, ontology and rule
languages. I'm convinced that both tidy and untidy semantics work
equally well. Staying with tidy requires certain developers to change
their perception of how properties and values they used refer to the
real world. Going for untidy requires APIs and apps to be adjusted. The
choice is ours.
Sergey
Received on Thursday, 26 September 2002 05:51:11 UTC