RE: heading toward datatyping telecon from Patrick.Stickler@nokia.com on 2001-11-06 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 6 Nov 2001 12:27:16 +0200
To: phayes@ai.uwf.edu
Cc: w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C068@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Pat Hayes [mailto:phayes@ai.uwf.edu]
> Sent: 06 November, 2001 04:25
> To: Stickler Patrick (NRC/Tampere)
> Cc: w3c-rdfcore-wg@w3.org
> Subject: Re: heading toward datatyping telecon
> 
> 
> >
> >The lexical validity of sub-types with regards to their super-types
> >is important in the context of ontological transparency whereby
> >a given value may be defined in terms of a very specific data
> >type yet a given query (and the resultant knowledge) is defined
> >in terms of a more general data type, and the response must be
> >encoded in a *lexical* form that is valid.
> >
> >Thus, it is not acceptable to e.g. create a sub-type hexInteger
> >of type integer which has a lexical form that is invalid for
> >integer because a system may recieve a query for the value of
> >a property that has a range of 'integer' yet the knowledge
> >is defined via a sub-property having a range of hexInteger, and
> >the resultant response would encode a hexInteger literal as
> >the value of a integer property, which is invalid.
> 
> Wait a minute. How could this happen, again? Suppose indeed I define 
> a datatype class called xxd:hexInteger which is a rdfs:subClassOf 
> Integer, say. .And suppose that the property eg:hexish is defined to 
> have hexIntegers as its range:
> 
> eg:hexish rdfs:range xxd:hexInteger .
> 
> And suppose the graph also contains
> 
> aaa eg:hexish "37" .
> 
>  From which it follows that the value of hexish on aaa is 55, in the 
> MT extension, but we get a query like:
> 
> aaa eg:hexish _:x .
> _:x rdf:type Integer .
> 
> Then, if I follow your point, it would be erroneous to return the 
> binding _:x/"37", since neither the query nor the response would 
> indicate the datatyping information which tells one that this numeral 
> should be understood to mean 55 rather than 37. Is that right? But 
> this is indeed a binding that would satisfy the query, following the 
> usual rdfs closure rules for subClassOf.
> 
> Please confirm if I have this right, 

Yes. The result of the query is in essence loosing the information
needed to properly interpret the actual value. An application that
does not know that it was defined according to a data type that
requires hexidecimal notation, will likely interpret it (on inspection)
as a decimal value, and thus the integrity of the knowledge is 
compromised.

Taking the scenarios in the AmSci SW article, just think what chaos
would have ensued in coordinating schedules if one agent was providing
values that really were hexidecimal but were interpreted as decimal
by another agent ;-)

> because it does indeed seem like 
> a rather serious matter for the proposed modification to the model 
> theory, which would only work if the datatype scheme satisfies the 
> mandatory validity of instances of subclasses, which you tell us XML 
> Schema does:
> 
> >Thus, per the XML Schema specification, a nonNegativeInteger
> >lexical form is also a valid integer lexical form is also a valid
> >decimal lexical form, etc. These data types are very well defined,
> >and the hierarchical equivalence issues were obviously well
> >understood by the folks who wrote it -- and of course, the whole
> >concept of manditory validity of instances of sub-classes in
> >super-classes is at the very heart of the XML Schema model. Sub-types
> >are defined only be restriction, not by deviation which does not
> >conform to all superclasses. This characteristic should *not*
> >be discarded by RDF in the interpretation of literals by defined
> >data type by assuming that rdf:type only applies to value space
> >and not also to lexical space.
> 
> Pat
> 
> PS. I now understand why you reacted as you did to the 
> octal/decimal/binary examples, and I see your point.

Where the "breakdown" seems to be occurring is that there is 
insufficient focus (that I can see, apologies if I'm wrong) on the
lexicalization/serialization issues and thus problems like this go
unnoticed. It's not enough to simply achieve the correct *logical*
binding to a query based on subClass relations. One must also take 
into account how those results are communicated back to the source 
of the query in terms of the vocabulary in which the query was
expressed, and lexical forms of literals, qualfied names, etc. all 
come into play -- yet such issues seem to be lost in the graph.

Just as one does not expect to input "dc:title" and get back
"xyz:e" (which happens to map to the same URI as the input
qname); likewise, if one inputs [ rdf:value "12"; rdf:type dt:hex ]
and that knowledge is bound to a query that is expected to 
returning a decimal integer, one shouldn't get "12" but rather "10",
etc.

One of the utilities of using URV encodings such as <xsd:integer:10>
or <xyz:hex:12> is that these resources can be bound to query
results without loss of their "original" data typing. Thus, 
[ rdf:value <xyz:hex:12>; rdf:type xyz:hex ] may get bound to
a property which has a range xsd:integer, but since the value
will remain <xyz:hex:12>, the application is fully aware of the
actual data type (and lexical form) of the value. Whether or
not it can digest that particular value is another issue, but
no different that being able to deal with the structure
and semantics of any value -- the key is that there is no loss
of information.

As you said earlier, and I believe I also echoed myself, the
data type of the literal provides the means for its interpretation.

That might suggest that we are not actually dealing with "literals"
here, if interpretation is necessary to perform such generic
operations.

Perhaps "true" literals should have no type, and things such
as [ rdf:value "12"; rdf:type dt:hex ] are not really "literals"...

These types of issues have been struggled with at length in
standards such as CORBA, and I think it would be useful to
take an "Object" (as in OOP) view of how such values are 
encoded, interchanged, and interpreted. A URV approach may
free us from alot of these lexical issues. 

Just a thought...

Cheers,

Patrick
Received on Tuesday, 6 November 2001 05:27:41 UTC