FW: Even more simplified datatyping proposal

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com


------ Forwarded Message
From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 21 Feb 2002 10:17:37 +0200
To: Pat Hayes <phayes@ai.uwf.edu>
Cc: Jos De_Roo <jos.deroo.jd@belgium.agfa.com>, "Brian McBride
<brian_mcbride" <brian_mcbride@hp.com>, ext Graham Klyne
<Graham.Klyne@mimesweeper.com>
Subject: Re: Even more simplified datatyping proposal

On 2002-02-21 2:55, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote:

> (MOST of this is argumentation. There is one good idea in here,
> though, so to cut to the chase find *****  at the end.)
> 
>> On 2002-02-20 19:01, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote:
>> 
>> 
>>>  The problem here is making sure that its the literal that denotes one
>>>  thing and the bnode that denotes the other. What stops an
>>>  interpretation having the literal denote the value and the bnode
>>>  denote the lexical form? They are all just semantic objects in the
>>>  set at this point, not segregated by some kind of intrinsic type.
>> 
>> The answer is quite simple, RDF provides no *representation* of
>> values. Period. Ever. It may provide a bNode denotation of some
>> value, but never, ever a representation.
>> 
>> If it is a literal, it is *always* a lexical form.
> 
> Im not following you. Sure it IS a lexical form. The question is,
> what (if anything) is it required to denote in an interpretation?
>
> Through yesterday Ive been assuming that it always denotes itself, in
> effect. The version I sent today says that it can denote anything
> (unless there is datatyping information which fixes it somehow),
> which makes a literal exactly like a bnode with an attached (but
> meaningless) label.

I think this is where we may be having a disconnect. Datatyping
doesn't "fix" the meaning of a literal. It provides a context
for interpreting the literal as a lexical form.

If the meaning were "fixed", we'd need untidy literals, since
different range constraints could fix the same literal to
lexical forms of different datatypes.
 
It's very very important, I think, to keep clear that datatyping
*interpretation* does not happen in the graph -- is not expressed
in the graph. The graph provides the pieces of information that
allows for a consistent, unambiguous interpretation, but that
interpretation happens above/outside/beyond the graph.

Thus, the literal always denotes a literal. It may, in the context
of a datatyping interpretation, be taken to represent the lexical
form of a value, from which the mapping to the value is clear,
but it never ever *is* the datatype-specific lexical form.

That may seem like a paradox at first reading, but its not, really.

All that the graph syntax is denoting are literals, datatype URIrefs,
and value bNodes for the datatype idiom. The interpretation that
treats a literal as a lexical form and resolves the actual value
denoted by a value bNode (or implicitly if no bNode) is part of
the interpretation, and that interpretation is not reflected
explicitly in the graph.

>> Outside the context of datatyping, a literal is just a literal.
> 
> And it denotes.....what?

A literal. With no defined datatype interpretation. Any interpretation
is application specific and not specified by any official RDF/RDFS/RDFDT
rec.

> (Anything, like a bnode?

It's not quite the same as a bNode, since it does have lexical
distinction. One cannot differentiate bNodes by any quality of
the node itself, since it's blank.

Literal nodes are distinguishable from one another, even if they
do not have any fixed interpretation.

> Or nothing at all?

Insofar as RDF is concerned. Yes. Nothing at all.

> If the latter, how could any triple containing a literal ever be
> true?)

How could it be false?

It would only be true or false according to extra-RDF interpretation.


>> The RDFS spec does not include rdfs:Literal as a subclass of rdfs:Resource,
>> and I thought it was agreed that literals do not denote resources, only
>> bNodes and URIrefs do.
> 
> Thats not my understanding. The MT leaves the issue open. It
> certainly seems reasonable to be able to say that literals denote
> *something*, and it seems to me that everything is a resource.

Well, if (and I'm not saying it should be) rdfs:Literal is a subclass
of rdfs:Resource, then that means that we only lose the ability
to restrict property values to the bNode idioms, but can still
restrict them to the inline idiom. I.e. the following still works
as before

   ppp rdfs:range ddd .
   ppp rdfs:range rdfs:Literal .

Perhaps there is some other way to achieve the same results as
the previously suggested

   ppp rdfs:range ddd .
   ppp rdfs:range rdfs:Resource .

Perhaps also giving bNodes and URIref nodes each a class of their
own, in the same vane as rdfs:Literal?

>> Those datatype schemes don't have literals as values. They have
>> strings as values. That is not the same.
> 
> Sure its the same. What is a literal if not a string? (Just look at
> it, there's the string right there, between the quote marks.) But see
> below.

A literal may have a string representation, and a literal may
be string equal to some string value of a datatype, but that
does not mean that any member of a value space of a datatype
has a literal representation in the graph.

They are *not* the same.

>> Literals are constructs of the RDF graph, not of a given
>> datatype. Any intersection in nature between literals and
>> strings is insignificant.
> 
> Well, I think that all that Dan C really wants is a licence to use
> literal-string matching to test identity in simple in-line cases.
> That requires only that values and literal forms are 1:1,

Right. It requires that they be string-equal, but not the same 'thing'.
And this is an exceptional case, not the norm, so it shouldn't drive
the datatyping model (even if it is accomodated by datatyping).


>> Thus, as in the case of xsd:string, the fact that every
>> lexical form is also string-equal to the value it denotes
>> is a characteristics of the datatype, not a feature of
>> RDF or the expression of RDF datatyping in the graph.
> 
> I agree with that.
> 
>> A literal is either a literal or a lexical form. Never an
>> actual datatype value. RDF provides no representation whatsoever
>> for datatype values (even if for some datatypes, that would
>> be technically feasible).
> 
> And with that.

Cool. This is good, because the above two points are crucial.


>> The RDFS spec makes no such subclass
>> assertion that I can find.
> 
> I think that it follows from the MT. I need to check that.

Please do (and also, please don't make that assertion ;-)
 
>> You don't equate a literal node with an instance of rdfs:Literal?
> 
> No, never have. That intepretation doesn't make sense, because
> literal NODES are in the graph, not in the semantic domain. See
> http://www.w3.org/TR/rdf-mt/#literalnote

Then what the heck is the purpose/significance of rdfs:Literal???

>> I thought that we agreed that literals denote literals, always.
> 
> Well, I thought so too, and I wrote the datatype thing up on that
> basis; but now I have you, Graham, Brian and uncle Tom Cobbley and
> all screaming that they cannot live with that decision.

Not me. See above ;-)

> You want the 
> denotation of a literal to be influenced by datatyping information.

Nope. Not at all. A literal always denotes a literal.

> I wish y'all would make up your minds :-)

I did when we agreed a literal denotes a literal, and haven't changed
it since.

>> Just because a literal may be interpreted in a datatype context
>> as a lexical form does not mean it's not a literal in the graph.
> 
> Being a literal in the graph is a matter of syntax. Of course it IS a
> literal in the graph. BUt what does such a literal denote in an
> interpretation??

It depends on the context. A literal itself denotes a literal. A literal
in a datatyping context represents a lexical form -- but that representation
is part of the interpretation, not part of the literal.


>> Now, two datatypes may have the same lexical form, even if they
>> do not give it the same interpretation, so having tidy literals
>> is no big deal, because all the literal node denotes is the
>> lexical representation.
> 
> BUt look, if you say that, then you have already said that
> 
> jenny ex:age "35" .
> 
> asserts that Jenny's age IS the lexical representation. You are now
> STUCK with that, and you can't influence it by talking about ranges
> or datatypes. 

No. See above. An interpretatation does not fix the meaning of
a literal. If there is no datatype range asserted for ex:age
(or it is ignored in the interpretation) then Jenny's age
is *interpreted* to be "35". If there is a range defined as
xsd:integer, then in that context, Jenny's agre is *interpreted*
to be 35. In either case, the denotation of the literal node
"35" is the literal "35".

I don't see why this is difficult. It's consistent. It's unambiguous.
Applications always know how to interpret literals in a datatype
context.

No, you don't know *in the graph* which value a given literal
denotes and a given literal *in the graph* does not explicitly
denote a value -- and that's why a literal is not a resource,
it is just a syntactic construct that contributes to some
interpretation which is meaningful to the application, but
that interpretation is not explicit *in the graph*.

The convergence proposal (your summary3) is one way to have
such interpretations reasonably explicit in the graph, but
it's a camel (for the typical user). The datatype-as-union
proposal says don't bother trying to capture the interpretation
in the graph, leave it "up there" above the graph, and gives
us a lean and mean arabian (for the typical user).

*Both* of those proposals provide for consistent, unambiguous,
and functional interpretation of datatyping for applications.
They simply differ in how much of that is reflected in the
graph itself.

Eh?

> That's what the damn literal denotes, end of story. So
> (I presume) you DONT want to say that, right?

Right.

> What you want to say
> is, that what the literal denotes varies from interpretation to
> interpretation, ie it has flexibility, so that you can then use other
> assertions (eg about drange) to rule out the cases you want to
> exclude and rule in the ones you want to have as a legal
> interpretation that satisfies the datatyping. If you fix the
> denotation rigidly up front, then you havn't got any flexibility left.

Right. Exactly. (though we don't need drange anymore)

>> It needs the context of the datatype
>> to provide the actual interpretation to a value.
> 
> Context is irrelevant if you have already fixed the denotation.

True, but don't assert that denotation of literals is fixed to
a given datatype interpretation.

I propose we maintain flexibility in the denotation and leave
denotation up to the datatype-context specific interpretation.

>> 
>> The literal "5" may be a lexical form of countless datatypes,
>> but in isolation, it's just a lexical representation, just a
>> literal, just a string. It does not in and of itself denote
>> the value 5.
> 
> OK, fine. So what DOES it denote? Its got to denote *something*, or
> else every triple including it is false (see the basic RDF MT).

Insofar as RDF MT truth is concerned, it denotes a literal, always.
But MT truth and datatyping truth are not the same thing, I think.

Like I said above, the graph denotes literals, datatype URIrefs,
and value bNodes. Not lexical forms. So any occurrence of a literal
is true, insofar as the MT is concerned -- i.e. it *is* a literal,
even if its significance in datatyping interpretation may differ from
context to context.

>> You need the datatype context to achieve that
>> mapping and that happens "above" the graph, not in it.
> 
> That sounds like what Peter P-S and I tried to do long ago, with an
> 'external' datatyping interpretation system that kind of glommed onto
> the literals and got added to a conventional RDF interpretation. But
> that was woefully complicated, and as things turned out it didn't
> work in any case when we got down to the details. I got so
> discouraged I gave up, you may recall.

It's going to happen above the graph anyway, because all real
validation of datatyping knowledge must be performed by some
application using datatype specific understanding and real
system-internal representations of values.

So, even if it's a bitch to make work from the viewpoint of KR,
it will work just fine for what most folks need RDF datatyping
for, namely: what the !@#&*(!*$ is this value.

The lighter, union-based, contextualized extra-graph interpretation
approach is easy for users to approach/use, is 100% clear to application
developers what is meant, and provides for some degree of MT
support, insofar as what is actually denoted in the graph (literals,
datatype URIrefs, and value bNodes).

The value equality problem, as with all resource equality, is something that
has to be worked on, and likely will require functional layers above
the basic *declaration* of datatyping knowledge provided by the
RDF datatyping idioms. We're not going to solve that this go-round.

What we have to capture now is simply which value we are talking about,
and either of the proposals does that.

Since we're going to have those higher layers later anyway, I think
the lighter contextualized approach gives more leeway to those later
efforts to provide machinery for the context specific interpretations
and may actually make later solutions to the value equality problem
easier to achieve. The more the interpretation is explicit in the
graph now, the fewer options remain for later.

>> For the datatype triple idiom, we have a nice bNode to
>> uniquely denote that value. For the inline idiom, there
>> is no denotation of the value in the graph. But the
>> actual value is a product of extra-RDF interpretation, not
>> intra-RDF inference (if that makes any sense whatsoever).
> 
> But its got to be connected to RDF inference in some ways. It has RDF
> consequences, for example. So its not enough just to invoke a kind of
> external magic.

See above. We can't get away from that magic just now. Either
the magic is needed to do value equality merging in the graph
or query by value, etc.

We just won't at this time be able to capture the totality of
typed data values in the RDF graph -- nor do I think we should.

Datatyping and full interaction with datatype values will always
be tied to the application space. That is unavoidable.

>> So, taking the case of Dan's wish to use inline literals to
>> denote just the literal, just the string representation, is fine,
>> and such literals will have globally consistent meaning, but
>> only as long as some range constraint doesn't assert a datatyping
>> context that gives it some other interpretation (not denotation,
>> just interpretation) *and* an application heeds that datatyping
>> interpretation.
> 
> Well, the sticking point is going to be that 'other', because that's
> where the whole thing goes nonmonotonic.

Yes. Exactly. RDFS range and domain constraints *are* non-monotonic. Yup.

Because I can make long range assertions about your knowledge that
you did not make.

Once we merge our graphs, we get different interpretations than for
each graph in isolation.

Cest la vie.


> *****
> I think it is better to hold a gun to Dan's head (or maybe its the
> Dublin Core's head) and insist that if he wants to say literals
> denote themselves (or strings, if you like), then that is a
> datatyping decision, and he should be explicit about it. All he has
> to to do is to add
> 
> rdfs:dlex rdfs:subPropertyOf xsd:string .

Pat, you're still in the convergence proposal, not the "Even more
simplified" proposal. There is no rdfs:dlex.

But I agree. If Dan, or DC or anyone wants to say that the range of
their properties are strings, then they should say so explicitly,

   dc:title rdfs:range xsd:string .

Though some folks want to say that the range of their properties
are strings that are string-equal with lexical forms from a given
datatype, and to do that they can use a range intersection:

   dc:title rdfs:range xsd:date .
   dc:title rdfs:range rdfs:Literal .

which is, I think, really what Dan wants to do.

> to his graph, and he's got it locked down tight: every time he uses a
> literal anywhere in that graph, it's got to be interpreted using
> xsd:string. Its not a default or anything else underhand or
> 'magical': if he tries to add any other datatyping information to his
> graph with this in it, he's going to get an explicit clash. Now
> everyone is wearing their datatyping assumptions on their sleeves.

Agreed.

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com


------ End of Forwarded Message

Received on Thursday, 21 February 2002 04:05:37 UTC