Re: Tidy/untidy: that's all about assumptions, folks from Patrick Stickler on 2002-09-26 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 26 Sep 2002 13:53:52 +0300
To: "ext Sergey Melnik" <melnik@db.stanford.edu>
Cc: "RDF Core" <w3c-rdfcore-wg@w3.org>
Message-ID: <001201c2654b$008d0670$d74416ac@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Sergey Melnik" <melnik@db.stanford.edu>
To: "Patrick Stickler" <patrick.stickler@nokia.com>
Cc: "RDF Core" <w3c-rdfcore-wg@w3.org>
Sent: 26 September, 2002 12:49
Subject: Re: Tidy/untidy: that's all about assumptions, folks


> Patrick Stickler wrote:
> 
> > [...]
> >>Recall the motivating example from the RDF 1.0 Spec:
> >>
> >>foo dc:Creator "John Smith"
> >>
> >>Is "John Smith" supposed to represent a person or a string? 
> >>
> > 
> >>From a data markup perspective, we can say that the form of
> > the expression is an ambiguous name, since it is not a URIref.
> 
> 
> There is nothing ambiguous about strings that populate databases.

Well, yes, but an apple is not an orange.

Is relational database content specifically intended to reflect
statements about the world which can be tested? I think not.

A relational database *could* be used to do that, but that is not
its primary purpose and most databases are not designed with
such a purpose in mind.

> >>From a knowledge representation perspective, I think it's pretty
> > intuitive that we're talking about a person in the real world,
> > and not a string.
> 
> 
> That's the assumption you are making, and the one I'm questioning! Of 
> course, there might be a person in a real world who is the creator of 
> foo. But the object of the triple must not be the identifier of that 
> person to make sense.

If you are intending to talk about a person, then I wouldn't think
that an object that does *not* denote a person makes any sense.

If you are talking about the name of a person, the I wouldn't
think that an object that *does* denote a person makes any sense.

It all depends on what the object is meant to denote.

In most cases when folks use a name, I think it is reasonable
to presume they are talking about the thing denoted by the name 
and not the name itself, thus the MT should take a similar stance.

> 
> >>The key 
> >>argument behind untidiness is that "John Smith" (or "10") cannot 
> >>possibly be meant to be a string, so it has to be something else, whose 
> >>meaning can be deduced using a right bit of logic and AI.
> >>
> > 
> > Right, but the basis for that argument is not really that the literal
> > cannot possibly mean a string, but that given the role and purpose
> > of RDF as a language for making statements about the world, it is
> > rather bizarre for it to mean a string.
> 
> 
>  From your perspective is may look bizarre, so you might want to choose 
> a different modeling style.  However, the above reflects a common 
> modeling practice. Modelers and developers conventionally use integers 
> and strings for modelings all sorts of things, such as ages, weights, 
> income, etc.

I don't think we are in disagreement about using integers and other
general datatypes to model things such as ages, weights, etc.. But I d
disagree that they use strings, directly, to model such things rather
than as lexical representations of the more general values. At some
point, applications are mapping those strings to values to get "real
work" done, thus it is the values that count, not the strings.

> >>Ok, we have datatyping now, so let's do it right:
> >>
> >>:x age int"10"
> >>:y shoeSize int"10"
> >>
> >>Now we got it! int"10" is not a string now; it's what we want it to 
> >>mean: an integer. Damn. The entailment
> >>
> >>:x age :z
> >>:y shoeSize :z
> >>
> >>still holds...
> >>
> > 
> > Er, why is that a problem that it holds. It should hold. 
> 
> 
> The above entailment has been *the* focal point of argument. 

Actually, it has been the one based on age and title where
intuitively, in the real world, the intended denotation of 
the lexical forms can be considered to be unequal, one
denoting an integer and the other a string.

Having two properties that intuitively imply the same datatype
does not reflect the focal point of the argument, IMO.

> Once it is 
> abandoned, tidy interpretation does not pose any "semantic" problems for 
> the existing apps.

Well, if we just abandon the entire RDF MT, then we don't have any more
semantic problems for existing or future apps, no?

The point of the RDF MT is that different applications have some
basis of mutual understanding of the knowledge expressed in RDF.
IMO, tidy semantics for inline literals simply excludes any real
interpretation for such nodes in the graph, leaving them as semantic
wildcards for applications to impose whatever meaning they so choose.

> 
> > Also, what will the impact be to applications expecting tidy semantics
> > when
> > 
> >   :x age "10"
> >   :y shoeSize "010"
> > 
> > does *not* entail
> > 
> >  :x age :z
> >  :y shoeSize :z
> > 
> > ???
> 
> 
> IMO, zero. Are you aware of any existing apps whose behavior depends on 
> whether the above entailment holds or not?

Any query engine worth its salt. Presuming that we are treating
string equality the same as value equality.

It may very well be that I am interested in resources with any
properties that have a value of ten. Which, I specify in my
input query as "10", but then that misses the one that has
the value expressed as "010", etc.

> >>Just as well, shoeSize could be defined as a 
> >>property that holds between shoes and strings/reals/etc.
> >>
> > 
> > Well, one problem with saying that the interpretation is based on
> > a property that holds between resources and lexical representations
> > is that there is not, and IMO can never be, any restriction against
> > non-canonical lexical forms. Therefore, even if you were to take a
> > tidy approach where inline literals denote themselves, you would 
> > *still* have to evaluate cases such as
> > 
> >    :x :p "10"
> >    :x :p "10.0"
> >    :x :p "010"
> >    :x :p "010.0"
> >    etc.
> > 
> > in terms of a lexical to value mapping to determine actual equality
> > of the objects.
> 
> 
> We don't need to prohibit non-canonical lexical forms. In the above 
> example, the equality of "10", "10.0" etc. does not matter. 

It certainly matters if those literal nodes are meant to communicate
the values, and one wishes to query an RDF graph based on values.

And if I have to be concerned with all of the possibly infinite
lexical variants in my query, that's a pretty gross overhead to
have to deal with -- and since a tidy RDF MT  says that they are
self denoting, and not datatyped values, then attempting to 
interpret those literals as lexical forms rather than strings
both would not be licensed by the RDF MT and likely result in
non-monotonic behavior.

> What 
> matters, however, is how these values impact the interpretation of :x. 
> This knowledge is captured by the semantics of the property p, which is 
> typically built-in into apps.

Again, we want to expose these implicit assumptions hidden in the applications
so that when applications exchange knowledge, it can be done in an unambiguous,
generic, and explicit fashion.

Personally, if I were building some system that didn't care about exchanging
knowledge with other arbitrary systems, I wouldn't waste my time with all
the extra effort to use RDF! If RDF's not generic, explicit, portable,
and unambiguous, so that it frees my applications from worrying about
the presumptions of other systems, then it's hardly worth the effort to use it!

>  Thus, it may well be the case that all of 
> the four statements above imply an identical interpretation for :x. If a 
> developer wants to make this knowledge explicit, and support 
> interoperation with other apps, he/she can a posteriori provide a rule 
> such as
> 
> (x1 :p s1) & (s1 :isLexicalTokenOf xsd:int) & (v1 xsd:int s1) &
> (x2 :p s2) & (s2 :isLexicalTokenOf xsd:int) & (v2 xsd:int s1) &
> (v1 == v2)
> ==> x1 = x2
> 
> in a schema describing p. 

Argh! Now applications have to support rules too, to make this clear?
and then the interpretation posited by the rules conflicts with the
interpretation of the tidy RDF MT. Blech. 

And what about potentially infinite sets of lexical variants. That's
going to be one hell of a large rules file!

We already have the machinery in RDFS to make that knowledge explicit.
Let's use it! We don't need to resort to yet another layer of machinery.

KISS and Occam's Razor should come into play big-time here.

> If we choose untidy semantics, the developer 
> would simply have to write two different rules, to achieve the same effect:
> 
> (1) p :range xsd:int
> (2) (x1 :p v1) & (x2 :p v2) & (v1 == v2)
>       ==> x1 = x2
> 
> Both approaches are practically equivalent (of course, you might say the 
> first one is more "bizarre" that the second).

I would call it non-monotonic.

> > I believe that the primary purpose of RDF is as a language for
> > interchange of knowledge (not just structured markup), and as such, 
> > the more explicitly that meaning can be expressed in that language 
> > the better. 
> 
> 
> Your are preaching to the converted. I think you and me have a quite 
> similar view of the world in this respect. The way I'd model, is 
> illustrated in
> 
> http://www-db.stanford.edu/~melnik/rdf/datatyping/fig/rich_types.gif
> 
> I'd model age using durations, and weight in terms of masses, not integers.

Fair enough, but that's yet another (proprietary) datatyping model...

> Recall that our tidy/untidy discussion refers to *existing* applications 
> that already made their choices about how they model the world. We don't 
> want those Adobe, CC/PP etc. folks and API developers, who bought into 
> RDF, do a whole lot of work required to recode their data and reprogram 
> their apps. We agreed on that.

It also involves a group of information model developers who strongly
feel that it is good design to generalize knowledge where possible,
as well as expose all significant presumptions of the system in the
RDF itself.

Tidy literals and typed literals is a combination of ambiguity and
brute force that lacks the elegance of most modern information models.

> Now we are working on making sure that their apps are forward-compatible 
> to future Semantic Web standards, specifically, ontology and rule 
> languages. I'm convinced that both tidy and untidy semantics work 
> equally well. Staying with tidy requires certain developers to change 
> their perception of how properties and values they used refer to the 
> real world. Going for untidy requires APIs and apps to be adjusted. The 
> choice is ours.

Agreed.

And I've already outlined my views on why that choice should be the one
made last Friday.

Yes, it means perhaps some non-trivial effort on the part of some
developers, but it will IMO serve the RDF and SW community far better in
the long term.

Cheers,

Patrick
Received on Thursday, 26 September 2002 06:54:47 UTC