Re: datatypes [was Re: Argh!] from Sergey Melnik on 2001-10-30 (w3c-rdfcore-wg@w3.org from October 2001)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Mon, 29 Oct 2001 17:40:36 -0800
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
CC: phayes@ai.uwf.edu, w3c-rdfcore-wg@w3.org
Message-ID: <3BDE0514.2700A858@db.stanford.edu>
"Peter F. Patel-Schneider" wrote:
> 
> From: Sergey Melnik <melnik@db.stanford.edu>
> Subject: Re: Argh!
> Date: Mon, 29 Oct 2001 11:49:17 -0800
> 
> > "Peter F. Patel-Schneider" wrote:
> > >
> > > > You are right, according to the DAML/OIL schema, &quot;2&quot; is supposed to be a
> > > > nonNegativeInteger. Following Pat's proposal, &quot;2&quot; could denote many
> > > > different things like numbers, shoe sizes and weights in pounds
> > > > depending on the context. In my opinion, this ambiguity is
> > > > counterproductive and is a heavy burden for interoperability.
> > >
> > > Arghhhh!  Pat's proposal does not introduce any extra ambiguity.  Pat's
> > > proposal is quite compatible to the DAML+OIL way of doing things.  Pat's
> > > proposal produces a single denotation for literals that are the object of
> > > DAML+OIL cardinality properties.
> > >
> > > Please, please, please, if you are going to argue against Pat's proposal
> > > get the facts right!
> >
> > I'd be glad to and I'm sorry if I don't. Of course, you are right in
> > that the immediate interpretation (by means of 'I') of each literal
> > symbol is some uniquely determined entity in DD. Then, in your/Pat's
> > suggestion,  some other mapping (IDT, if I remember right) assigns each
> > of these uniquely determined entities a set of other entities. So what's
> > the difference? Effectively, a literal symbol is mapped to a *set* of
> > entities in DD by a composition of mappings.
> 
> Close (in my model theory a literal is mapped to a set of entities by a
> union of mappings, Pat's model theory may be slightly different), but that
> is no more ``ambiguous'' than a blank-node proposal.  The ambiguity is just
> in a different place.

I politely disagree. The other proposal (I would not call it mine - it's
completely consistent with the current MT draft) does not have this
ambiguity at all. Each resource symbol or literal symbol is mapped to
exactly one entity in DD, period.

> > Please prove me wrong - maybe that's just a big misunderstanding. I
> > brought up several examples in
> > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0530.html and
> > illustrated graphically my understanding of how they should be
> > interpreted. I'm still waiting for Pat's output of MT DT section, in
> > which he will presumably provide an illustration of an alternative
> > approach. How would the examples I referred to look like?
> 
> Unfortunately, all the examples in your message use a different way of
> doing literals than Pat's/my way.  My way, in particular, does not allow
> types to be directly associated with literals, you have to use range
> restrictions instead. (Modifications of my model theory could have closer
> typing, such as via
> 
>         <foo xsi:type="xsd:int">570</foo>
> or
>         <foo xsi:type="xsd:float">0570</foo>
> 
> but this would NOT be handled by an xsi:type link in the RDF graph.)

I see several critical problems with Pat's/your way. I'll summarize them
it in a separate message.

> > > > If literals may denote everything you like (and many things at once), I
> > > > don't see why we need resources/URIs any more. We could do just fine
> > > > with literals. For example, literal &quot;Peter&quot; could denote a person,
> > > > sometimes Peter Pan, another time Peter The Great (even in the same
> > > > graph!). Literal &quot;2&quot; in the above example could well denote Peter The
> > > > Great, too.
> > >
> > > This is soo wrong that I don't know how any reasonable person could even
> > > think of it.  Literals are constrained in their interpretation.
> > > Non-literals are less constrained than literals.  The only differences
> > > between literals and non-literals is that a literal has a ``print-string''
> > > that is used to restrict what it can denote and a non-literal can have a
> > > label, which has no real import in a tidy RDF graph.
> >
> > So? Literals are constrained by what? By properties defined in some
> > schemas? So why can't "Peter" denote Peter Pan (or Patel-Schneider,
> > given the context ;) ?
> 
> [Warning:  In the following, I will be bit loose in terminology and
> syntax.  No ambiguities should result, however!]
> 
> My model theory states that a literal Peter, as in
> 
>         peter name "Peter".
> 
> can only denote something that has an XML Schema datatype lexical
> representation that is the literal "Peter".

One of the problems is, for example, that XML Schema datatypes are
"built-in". You can neither construct derived datatypes, nor use
datatypes defined by other languages. If you do, you would end up with a
self-morphing model theory. At least, the approach you/Pat described
would.

> If Peter Pan is not in the value space of any XML Schema datatype then it
> cannot be the denotation of a literal.  peter, of course, could denote
> Peter Pan.  Only if you were using a datatype scheme that had Peter Pan as
> one of its literal values, and a lexical representation of that literal
> value was "Peter", could you have "Peter" denote Peter Pan.

I might want to use a datatype scheme like that! After all, I'd like to
write "July" and mean month July, right? So if I'm able to introduce a
type for months, who prevents me from saying that "Peter" should be
interpreted as Peter Pan?

> Further, if you have
> 
>         name rdfs:range xsd:string .
>         peter name "Peter" .
> 
> then the name of peter has to be the string ``Peter''.
> 
> More interesting is the following situation:
> 
>         peter age "07" .
>         age rdfs:range xsd:integer .
> 
>         susan shoe-size "07" .
>         shoe-size rdfs:range xsd:string .
> 
> Then, indeed, the two occurences of "07" have different denotations, one
> being the integer 7 and the other being the string ``07''.

This is what I mean by ambiguity. Recall your "Arghhhh!" reply on the
top of this message.

> Similarly, if
> all you have is
> 
>         mary phone "5824471" .
> 
> then you don't know whether mary's phone is the decimal 5 824 471 or the
> string ``5824471'' or even the floating point number 5824471 (which is
> different, in XML schema, from the decimal 5 824 471).  There are even
> other interpretations for mary's phone (such as a URI, I think).
> 
> > > If it makes you feel better, you could use a different (but equivalent)
> > > graph where a literal maps into two nodes.  One node, corresponding to the
> > > node in Pat's RDF graphs, would be *like* a blank node.  The other node
> > > would be the ``print-string'' of the first node.  The model theory would
> > > then constrain the interpretation of the first kind of node by having a
> > > built-in interpretation of ``print-string''.
> >
> > The last sentence stops me from feeling better. If you change it to "The
> > property that connects the blank node and the ``print-string'' node
> > constrains the interpretation of the blank node", I would totally
> > subscribe under the above paragraph.
> 
> In my paragraph above ``print-string'' is the relationship between the two
> nodes.  I didn't want to call it a property.

That's not what bothers me. Making the blank node in the above example
to denote the string is what I don't agree with. It denotes something.
The interpretation of this something is restricted by the attached
``print-string''.

> > > Why use Pat's RDF graphs instead of these graphs?  Two reasons:
> > > 1/ Pat's RDF graphs are closer to RDF M&S.
> >
> > Graphs are graphs, and are as close to graphs, as any other graphs. I
> > don't buy that Pat's *interpretation* of those graphs is closer to M&S -
> > I believe the opposite.
> 
> How can you say this?
> 
> The first example in M&S is
> 
>         http://www.w3.org/Home/Lassila Creator "Ora Lassila" .

Do you think this example is ok? So, who is the "Creator" of the Web
page in the above example, after all? Peter Pan? According to your
proposal, the Creator would be an xsd:string as defined in the XML
Schema.

> Further, I see many RDF ``documents'' that have things like
> 
>         john name "John" .
>         john age "07" .

The first statement is fine, IMO. Property "name" could be thought of as
a relationship between Persons and 
Literals. A concept of a "name" could be stretched to that of a
character string. The second statement I'd consider bad modeling
practice. And in the long run we won't do ourselves a favor by endorsing
it.

> In both cases the literal is the object of the relationship and no other
> node is required.
> 
> Requiring a blank literal value node and a ``print-string'' between the
> literal value node and its lexical form would go against both M&S and
> common RDF practice.
> 
> > > 2/ The ``print-string'' edge would be subject to lots of misunderstandings.
> >
> > This is where I vehemently disagree. The edge determines the
> > interpretation of the blank node, it represents a mapping (possibly
> > partial) between a lexical space and a value space. XML Schema defines
> > such mappings in a quite precise fashion. IMO, the edge is what makes
> > datatyping work!
> 
> If you employ the ``print-string'' edge, you have to disallow or provide
> interpretations for several constructions that use it, including
> 1/ multiple print-string edges from the same literal value node
> 2/ print-string edges from non-literal nodes
> Both of these possibilities give rise to potential misunderstanding.

I'd rather say corrupt data. Misunderstanding is when you (or an
application) look at the data and imagine one thing, but in fact it
means something completely different. Moreover, the proper understanding
could still be quasi-enforced by cardinality and domain/range
constraints, right?

> The details can be worked out, and the misunderstandings can be reduced by
> suitable wording in the new RDF documents, but why bother when the
> ``print-string'' edge is not necessary?

If the meaning of a literal can change from statement to statement, I
would never use resources. Why bother? I might end up with a model
without a valid interpretation... Moreover, URIs are waaaay too long...
So, I'd just use a bunch of literals - I'm sure that a really
intelligent human can determine what I had in the back of my mind. No,
wait, why bother... In fact, I'd just stick with XML after all.

Having said all that, I'd like to add that I appreciate your input very
much. I'm still abstaining from writing an itemized critique of
Pat's/your proposal because Pat may be working on a version of
datatyping for MT that does not require polymorphic literals.

Sergey
Received on Monday, 29 October 2001 20:14:11 UTC