Re: Denotation of datatype values

On 2002-04-11 13:22, "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com> wrote:


> As for non-monotonicity the datatyping conceptual layer that we are
> discussing is non-monotonic in its own right.
> 
> e.g.
> 
> <Jane> <age> "25" .
> 
> delivers the unicode string "25", i.e. <xsd:string,"25">.
> 
> <film> <title> "25" .
> 
> similarly delivers  <xsd:string,"25">.

Well, I consider both, in the absence of an rdfd:range assertion,
to be underdefined. I.e. the pairings are actually

  <???, "25"> and <???, "25"> respectively

> 
> At this datatyping conceptual level
> 
> <Jane> <age> "25" .
> <film> <title> "25" .
> 
> allows us to conclude that Jane's age and the film's title are the same.

We would not be able to make this comparison, as incomplete
datatyped literal pairings have no meaning, and thus cannot
be compared. They are like variables that have no value assigned.

Either we say that the comparison operation cannot be performed,
or it fails if either or both pairings are incomplete (undefined).

> Then we add the range constraint on <title>
> 
> <Jane> <age> "25" .
> <film> <title> "25" .
> <title> <range> <xsd:string> .
> 
> I take it that the range constraint changes nothing, we are still having the
> value
> <xsd:string,"25"> delivered in both cases, and so we are still concluding
> that Jane's age and the film's title are the same.

No. The information does change. Now we know that the latter
pairing <???, "25"> is really <xsd:string, "25"> and any
comparison as above will still fail because the <age> pairing
is still incomplete.

> Now we add the range constraint on <age>
> 
> <Jane> <age> "25" .
> <film> <title> "25" .
> <title> <range> <xsd:string> .
> <age> <range> <xsd:integer> .
> 
> We now have the film's title delivered as <xsd:string,"25"> the woman's age
> delivered as <xsd:integer,"25"> and they are different.

Yes and no. 

Now we have the complete pairing for <age> which is <xsd:integer, "25">
and now we can make the comparision and know for certain that the pairings
are in fact different. BUT, we can't know for certain that the values
themselves are different, insofar as the conceptual level of datatyped
literal pairings are concerned. It may in fact be the case that xsd:string
is an identical datatype to xsd:integer and the name is just misleading,
and in fact, the two values actually are identical. We can't know that at
this level. We'd need to go up a level, to that highest extra-RDF level
where the datatypes are fully understood.

So this isn't non-monotonic. It is just a matter of going from incomplete
to complete knowledge about datatyped literal pairings.

> Hence we see defeasible reasoning: in the light of new information we revise
> our knowledge that Jane's age is <xsd:string,"25">, which in turn causes us
> to revise our conclusion that Jane's age and the film's title are the same.

Well, the comparison should IMO fail in all of the above cases, firstly
for incompleteness, and secondly for inequality -- but in the latter case
you have to understand what it is you are comparing, and just as two URIs
may not be string equal but resolve to the same representation, two
datatyped literal pairings may not be equal yet identify the same value.

This is a *very* important point to grasp. Even when you have the datatyped
literal pairings you still can't be sure if the values they represent
are not equal. You can only be sure if they are equal, insofar as the
pairings themselves are concerned..

> This is non-mononotonic, and the WG cannot escape that by simply saying that
> it is not in the MT. The only escape route is to acknowledge that in the
> absence of type information, the datatype is unknown (e.g. xsd:anyType or
> maybe xsd:anySimpleType). If we stay aware that
> <xsd:anyType,"25"> != <xsd:anyType,"25">
> because anyType does define a mapping, then the problem begins to disappear.

I would not assign any datatype if it is not known. There is *no* default
datatype (I should state that in the WD somewhere...)

A literal node, all by itself, always denotes itself, i.e. a string. But
an incomplete datatype pairing, where the datatype is not known, does *not*
default to any string datatype. Its datatyping interpretation is simply
incomplete.

> Unfortunately this is bringing untidiness back into the datatyping layer. We
> now are reading
> 
> <Jane> <age> "25" .
> 
> not as 'Jane's age is "25"' but as 'Jane's age can be written as "25"'. That
> is rather than having a 'tidy' reading of the triple, we are having an
> 'untidy' reading.

Correct.

> If we are going to allow untidiness anywhere, it seems to me to be more
> consistent and less of a intellectual somersault to allow untidiness at the
> lower levels of the analysis (such as in the syntactic graph, or perhaps
> only in the first level of interpretation in the model theory) rather than
> sneak it in at the last moment in the final less formal layer.

I am sympathetic to that view, but I also think the present approach
is quite workable.

> I am sorry that my critique seems to jump all over the place. Fundamentally
> the decision by the WG for tidiness was critically wrong.
> abstained since it felt that we were going to get a samll fudge where we had
> syntactic tidiness but untidiness from there on.
> 
> As is, now any place where I attack any of the consequences of that decision
> I am told that the problem is fixed elsewhere - but IMO the complete
> end-to-end picture does not fix the problem.
> 
> Fixing the tidiness problem in the datatyping layer is firstly not in the
> current proposal, secondly would need to be explicit and examples showing
> how these critically conflict with the MT interpretation should be given,
> and thirdly it is unnecessarily complicated.
> 
> The simplest fix is to allow untidy graphs (in the graph syntax).

There is ugliness there too, as I pointed out elsewhere, such that
in the inline idiom the literal would denote the value but in the other
idioms it either does not or does so redundantly. Yuck.

The ideal scenario, IMO, which is not going to happen, is for
literals to be untidy, literals to be subjects, literals to
denote the value and bear the lexical form as their label, and
the local idiom be based on a property such as rdf:type that
associates the datatype with the literal. I.e.

   ex:age rdfs:range xsd:integer .
   Jane ex:age _1:"10" .

or

   MyBook ex:title _2:"10" .
   _2:"10" rdf:type xsd:string .

but, cest la vie, we work within very many constraints...

> The next simplest fix is Pat's simpleDatatypes2, which is a very elegant
> fudge.

There is very clear and strong demand for the inline idiom. I don't see
this as an option. We need to make the inline idiom work.

> The next simplest fix is to allow an untidy interpretation of tidy graphs as
> the first step in the model theory ('Jane's age can be written as "25"').

I would be very happy to see the conceptual model of datatyped literal
pairings get a more complete treatment in the MT.

But... I think the present proposal is reasonable and workable, all things
considered (and that last qualification is very important to embrace ;-)

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Thursday, 11 April 2002 07:49:54 UTC