RE: Literals: lexical spaces and value spaces from Patrick.Stickler@nokia.com on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 7 Nov 2001 08:52:15 +0200
To: phayes@ai.uwf.edu, w3c-rdfcore-wg@w3c.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C06E@trebe003.NOE.Nokia.com>
> I'm not sure what a URV is, but how would one say (the equivalent of)
> 
> aaa eg:shoeSize "10" .

  aaa eg:shoeSize <xsd:integer:10> .

> using URVs? (BTW, I gave 'URV' to Google, and it found this: 
> http://www.scudworks.com/pablo/possum/urvtips.html )

 
This is outlined in my X-Values proposal. C.f.

http://www-nrc.nokia.com/sw/X_Values_URI.pdf

I'm working on a slightly more polished I-D for the extended
URI taxonomy itself, and separate I-D for some URI schemes
grounded in that taxonomy. Hopefully those will be done and
submitted this month.

> >Here's a radical (?) suggestion: data typing and type validation
> >only apply to resources, not literals! Literals are not interpreted.
> >They are only sequences of bytes.
> 
> If I follow you, that is close to the 'bnode' suggestion already on 
> the table, ie to translate the above into something like
> 
> aaa eg:shoeSize _:1 .
> _:1 rdf:type actas:ShoeSizes
> _:1 actas:shoesizeString "10" .

Yes and no. It seems that the problem of misinterpretation
of lexical forms which may differ between subtype and supertype
really only comes into play when literals are not locally
typed, but type is inferred from range constraints of the
properties. Perhaps this is where the "bug" is. Perhaps range
constraints should *only* be constraints, and that they only
apply when the value itself has an explicit, local type defined.

If no local type is defined for a value, then any range 
definitions are meaningless, and applications should not rely
on range constraints to determine the type of non-locally typed
literals. The idea that range defintions could be descriptive,
per a strongly typed model, seems to miss the fact that lexical
forms remain uninterpreted until needed, and may be invalid
for a given binding as defined by the range of a property. Values
are not in any canonical internal representation, where it is
enough if there is proper intersection of the value spaces of
original type and property (variable) type. The persistent 
lexical form precludes such a treatment.

In my example to Brian recently, the misinterpretation of "0x12"
as an integer in decimal notation occurred because the literal
itself was not specified for type (which also defines lexical
form); hence the problem.

ASSERTION:
  Range definitions are *only* prescriptive, and *only* when
  local type is defined for values. Any assignment of type
  for a value must be made locally for each occurrence of the
  value.

Hmmmm... that might do the trick...

Thus, with the bNode approach, any locally defined type for a
literal is inseparable from the lexical form, and logical binding
of values based in inference and subClassOf and subPropertyOf
relations must bind the bNode, not the literal, thus keeping
together the original type and lexical form.

The key here is that original type and lexical form *always*
remain together in RDF/S space. Just as a Banana object
in an OOP model remains a Banana even when it is held in a Fruit
variable, so must a typed literal value retain its original data
type, even when bound to a property with a range and interpretation
of a superordinate type.

And just as the interaction with a Banana as a Fruit is dependent
on the API, so is the interpretation of e.g. a hexInteger as
an integer dependent on "some" API. But the underlying representation
in the graph remains explicit and type is inseparable from value.

bNodes may very well provide that "persistent typing", but I'd like 
to see the above constraints on inseperability and binding, as well
as the assertion that range is only prescriptive, stated explicitly 
and officially somewhere.

The URV encoding could be seen as a compression of a bNode typing,
where the type and lexical form are encapsulated in a URI. The
additional benefit to this encoding (apart from ensuring that
type and form are inseparable) is that each value in a given value
space (presuming semantically-vacuous lexical variation is disallowed)
shares a common global identity which can be useful in various
contexts (see my X-Values proposal).

Thus 

   <eg:shoeSize rdf:resouce="xsd:integer:10"/>

defines a typed resource, encoded in a URV, for which we can
perform type validation and all logical bindings inferred from
subClassOf and subPropertyOf relations do not result in potential
misinterpretation of the lexical form, as the primary data type
is inseperable from the lexical form. 

> But it seems to me that one *must* interpret literals, since they 
> provide essential semantic content. Clearly, it matters whether the 
> shoe size is 10 or 12.

But such interpretation of literals belongs at or above the API
layer (and its a great pity that RDF does not yet have a standardized
API...)

The logical operations that may be applied by inference engines on
the graph itself should not IMO be mucking about with interpreting
literals, but only the defined types of those values.
 
> >If one wants to express a value
> >of a given data type via a particular lexical form, one can encode
> >it as a URI (URV).
> 
> So there is a URI for every value of every literal in every 
> datatyping scheme?? 

Yes, just as there can be a URI for every other 'thing' we may
wish to reify in our systems.

> Instead of writing 10, I have to know that 
> http://www.w3.org/xmldt/integers/ten is the web address of the number 
> I need?? That seems like an intolerable burden on content writers.

No. It is not so burdensome. You are perhaps confusing qnames
based on URL namespaces with URVs, which are much more concise.
In fact, which of the following do you consider to be more verbose
and burdensome:

  aaa eg:someProperty _:1 .
  _:1 rdf:value "10" .
  _:1 rdf:type  <http://www.w3.org/2001/XMLSchema#int> .

or

  aaa eg:someProperty <xsd:integer:10> .

Note that 'xsd:integer:10' is *not* any kind of qname form. It
is a URI (URV).

Type can then be assigned via aboutEachPrefix (yes, a pity it
is deprecated...)

   <rdf:Description aboutEachPrefix="xsd:integer:">
      <rdf:type rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>
   </rdf:Description>

or can be implicitly understood by the API.

And note that the binding of type to lexical form as encapsulated
in a URV is portable across any URI aware application, and thereby
offers greater utility than an RDF-only representation.
 
> >Applications are free to parse and interpret
> >URVs according to their data type, and bindings to queries based
> >on logical inference of subClass relations do not effect the
> >data type identity of the particular value.
> >
> >Thus, one avoids the "basket of fruit" problem which plagued
> >early OOP languages (including C++) where the identity of
> >a value was dependent on the type of the variable, and if one
> >had a collection for a type 'Fruit' and one put all kinds of
> >fruit into the container (basket of fruit), an iterator over
> >that container, when assigned each member of the basket, only
> >provided a 'Fruit', not a Banana, or an Apple, etc.  Ada95
> >was the first OOP language to properly solve this problem.
> >Sather hinted at it, but failed. Java fortunately manages it
> >quite well. And most C++ compilers now seem to handle it.
> >
> >But RDF suffers now from the same problem, since type identity is
> >not inseparable from the value. It may be that a given value
> >corresponding to a nonNegativeInteger is also an integer, even
> >a number, but binding that value to a query property of type
> >number should not cause the value to loose its type identity
> >as a nonNegativeInteger. That's "OOP 101" folks.
> 
> No doubt we missed this because, in part, to consider querying 
> seriously would be beyond our charter.

Perhaps technically, but without considering operations such
as inference and queries at least generally, one cannot ensure
that the foundational representation of knowledge in the graph
will effectively support such essential applications. I.e., it
may not be in the charter to define inference or querying
functionality, but surely we would want to ensure solid support
of them. Right?

(sorry for making so much noise right off the bat, being the
new guy and all...  ;-)

Cheers,

Patrick
Received on Wednesday, 7 November 2001 01:52:27 UTC