Re: RDF datatyping from Patrick Stickler on 2002-01-10 (w3c-rdfcore-wg@w3.org from January 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 10 Jan 2002 17:56:21 +0200
To: RDF Core <w3c-rdfcore-wg@w3.org>
CC: Sergey Melnik <melnik@db.stanford.edu>, ext Graham Klyne <GK@NineByNine.org>
Message-ID: <B8638645.B59E%patrick.stickler@nokia.com>
On 2002-01-10 13:48, "ext Graham Klyne" <GK@NineByNine.org> wrote:

> At 10:27 AM 1/10/02 +0200, Patrick Stickler wrote:
>> On 2002-01-07 20:54, "ext Graham Klyne" <GK@NineByNine.org> wrote:
> Pat did sketch a model theory for the
> P/P+ 
> schemes, and it (a) certainly seems to add a significant degree of
> complexity to the model theory, and (b) it seems possible that it has
> some 
> fatal logical flaws (i.e. logical consequences that are different from
> what 
> is desired or expected).

From my recollection of those earlier discussions, there were two
issues that were brought up as problemmatic: (1) that rdfs:subClassOf
relations, if based on lexical space rather than value space could
be problemmatic in the context of certain inference operations, but
that is an issue for *any* data typing solution, not just for P/P++;
and (2) there was originally the expectation that values themselves
(not just lexical forms) would have some representation in the graph
which presumed native data types.

I don't recall any other flaw discussed -- though I may certainly be
wrong about that, and Pat or others are free to refresh our memories.

>> If the present idioms/practices did not do the job, then of course
>> there is justification for a new approach such as S, but since the
>> current idioms work fine there is no need or reason to take on yet
>> another approach.
> 
> I'd say that it's pretty well impossible to say that the present
> idioms/practices "do the job" unless there's an accompanying theory that
> says what entailments are licensed; i.e. what conclusions can validly be
> drawn from some given collection of RDF.  (Back to the model theory.)
>
>> The D (rdf:value/rdf:type, local) and P (rdfs:range, global) idioms do
>> the job just fine.
> 
> Point me to the model theory, and I may be convinced.

Though your point is taken, I'd say its stretching things a bit. Just
because the MT doesn't exist for P and D, doesn't mean that they don't
do the job. Since both idioms are in use, they apparently do "work". I
might not be able myself to provide you with the math to *prove* that
they work as well (or better than) S, but that doesn't mean they don't.

Still, it's a fair request that the options be considered at the same
level of precision, and I agree that we need P and D defined in mathematical
terms.

In fact, it appears that most of the math for P and D is defined, insofar
as they are based on the fundamental concept of a pairing of lexical form
to data type identity, and that is also the foundation of S, and therefore
much/most of Sergey's document appears to significantly intersect the needed
definition of P and D.

I don't think that talking about P and D is just so much hand waving simply
due to the absence of a complete MT specification for them.

>> The S idioms, while also doing the job, do so with more machinery and
>> most significantly are contrary to the intuitions of current RDF users
>> (data typing by predicate rather than by rdf:type).
> 
> I don't recognize that description of S:
> - I don't see "more machinery" here, whatever that means,

That means having to use multiple URIrefs and global schema
definitions to pair a lexical form with its data type. I've
given more than enough  examples to bear that out. No need
to repeat them here.

S imposes separate URIrefs for lexical and value spaces and
the mapping relation, and requires one to use data typing
properties relating to those data type URIs rather than the
generic rdf:type. It also requires either parsing the URIref
to determine the type of component, or further definitions
for type for each URIref for each component. I.e.

   xsd:integer.lex  rdf:type s:LexicalSpace .
   xsd:integer.val  rdf:type s:ValueSpace .
   xsd:integer.map  rdf:type s:Mapping .
   xsd:integer.cmap rdf:type s:CanonicalMapping .

and then likely further statements to relate the data type
to its component predicates

   xsd:integer s:lexicalSpace     xsd:integer.lex .
   xsd:integer s:valueSpace       xsd:integer.val .
   xsd:integer s:mapping          xsd:integer.map .
   xsd:integer s:canonicalMapping xsd:integer.cmap .

Now, while all that information is likely to be useful,
it is not actually *necessary* to know what value the
pairing ("30",xsd:integer) actually maps to and therefore
to make it required simply to know what value we're talking
about is wasteful.

If you don't consider that more machinery (and just more headache)
all just to pair a lexical form to a data type for interpretation
by some ex-RDF application, then I'm not surprised if our
conclusions and opinions differ so greatly.

> - "contrary to the intuitions of current RDF users" is precisely one
> area 
> where I think S scores very strongly, based on my intuitions from work
> with 
> CC/PP (modulo small issues raised in my comments to Sergey's paper).
> - I don't know what "data typing by predicate" means

Er, from Sergey's document, section 4:

  A datatype mapping can be "named" using RDF properties. In this
  document, such properties are referred to as datatype properties.

Forgive me for using the term predicate rather than property.

> (though I observe
> that 
> rdf:type *is* commonly used as the name of a predicate).

That in order to associate a literal (lexical form) with a data
type, I must use a data type specific property rather than the
generic property rdf:type which is intended for that purpose and
which is used for associating URIref labeled resources with data
types.

I must then relate that data type specific property to the
actual data type, and from that relation deduce the pairing
I need. It's extra work that is unnecessary and forces an obese
sub-vocabulary for every data type which otherwise serves no
useful purpose. See above.

All we need is a single URIref for each data type and a standardized,
generic means to pair literals with that URIref to achieve all the
knowledge needed to interpret those pairings to actual values.

Thus, the P and D idioms are far more economical than S in that
regard, and thus have less machinery -- and also don't require folks
to learn new machinery and expend extra effort in making their
data typing schemes RDF-usable.

No, I don't have the math to throw at you, but that doesn't mean
I'm not right ;-)


>> There is also the drawback that the S idioms represent a method of
> typing
>> literal labeled resources that differs from the typing of URIref labled
>> resources, introducing an inconsistency into RDF with regards to
> resource
>> typing in general.
> 
> Er, how is it different?

By not using rdf:type and rdfs:range in the same manner as
for URIref labeled resources.

And if P+ were adopted, then we could have P/P+ for global/local
data typing of literals where the treatment would be even more
the same as URIref labeled resources, including analogous graph
constructs with literals serving as subjects.

But, since we are (presumably) bound by the charter against the
changes needed for P+, we can still use P/D now and then
later migrate more easily to P/P+ with much less trouble than
migrating from S to P/P+.


> (The *syntax* of RDF doesn't allow us to apply rdf:type directly to a
> literal, so more indirect means are required to indicate the intended
> type 
> of literal values;

Right, such as the D idiom already in use, and seeming to work just
fine.  

> I (personally) don't think that constraint needs to
> apply in the RDF graph, and I'd favour relaxing the syntax in a future
> version of RDF.  

If you mean allowing literal nodes to be subjects, per P+, then I
fully agree. But certainly the D idiom is far easier to contract
to an eventual P+ representation when that time comes. I.e. going
to P+

   Bob ex:age _:1:"30" .
   _:1 rdf:type xsd:integer .

from D

   Bob ex:age _:1 .
   _:1 rdf:value "30" .    (simnply move to anon-node label)
   _:1 rdf:type xsd:integer .

is far more straightforward than from S

   Bob ex:age _:1 .
   _:1 xsd:integer.map "30" .
   xsd:integer.map rdfs:range xsd:integer.lex .
   xsd:integer.map rdfs:domain xsd:integer.val .

and presumably

   xsd:integer.lex rdf:type s:LexicalSpace .
   xsd:integer.val rdf:type s:ValueSpace .
   xsd:integer.map rdf:type s:Mapping .
   xsd:integer.cmap rdf:type s:CanonicalMapping .
   xsd:integer s:lexicalSpace xsd:integer.lex .
   xsd:integer s:valueSpace xsd:integer.val .
   xsd:integer s:mapping xsd:integer.map .
   xsd:integer s:canonicalMapping xsd:integer.cmap .

and we still haven't actually said anything about
the data type xsd:integer itself, but have to "know"
about how to parse the special names and remove the
'.lex', '.val', and '.map' suffixes.

And again, with S we get into the fun stuff with the
ability to say

   Bob ex:age "30" .
   ex:age rdfs:subPropertyOf xsd:integer.map .

which given the above range and domain statements
thereby declares Bob to be an instance of
xsd:integer.val, etc. etc.

Yes, alot more machinery... and all unnecessary, and
with unknown conflict/interaction with usage/semantics
of rdfs:subPropertyOf, etc. etc.

> I don't see any other proposal addressing this point,
> other than by adding whole new layers of structure to RDF, or making
> literals a truly different kind of thing in the RDF graph and associated
> semantics.)

From what I have understood, none of P, P+, D, or U make literals
anything other than they are as defined in the Dec14 draft of the MT.

The only change to the current MT is suggested by P+ which allows
literal nodes to act as subjects.

>> Finally, if we later (e.g. in RDF 2.0) want to take a P+ approach,
>> (which I think is the most ideal approach once the necessary extensions
>> to the graph model can be made) the P and D idioms are far more
>> compatible with P+ than are the S idioms and therefore with P/D we
>> have a much easier evolution path to P+ than with S.
> 
> P+ allows literals-as-subjects, right?
> 
> Again, I don't agree that P and D are more compatible with this
> approach.  

See examples above, where D is easily contracted into P+ by
simply being able to attach the literal as a label to the
object node rather than having to attach it via rdf:value.

I'd say that's much more compatible than the S idiom.

It probably should be stressed that I am focusing on the efficacy of the
representation of data typing knowledge in the RDF graph more so than the
details of the semantic interpretation of that representation -- which may
in the eyes of some disqualify me from this discussion entirely-- but I feel
that my intuitions about those alternate representations will be borne out
in the math (once someone more capable than I provides it ;-)

> I think S would also benefit from such a relaxation.

I would think that literals as subjects would break S, as
it depends on an anonymous node to represent the .val(ue)
and the literal to represent the .lex(ical form).

How would you use a .map predicate to define the mapping?!
You'd have to resort to rdf:type, just as with P+.

> I
> suspect 
> that much of your resistance to S is not to the basic proposal per se,
> but 
> to the indirections that are used to apply S in the face of the
> syntactic 
> restriction noted.

My objection certainly has to do with those indirections, yes, as
I see them as entirely unnecessary and just that much more work
for folks to have to deal with in order to work with typed data
literals in RDF.

Indirection is a good thing in general, but only if it is providing
something needed or useful. I don't see the S indirection do that.

All of the key distinctions between lexical and value spaces and
the mapping between them is clear simply from the pairing of lexical
form (literal) to data type identity (URIref). Explicit identities
for those spaces and mappings is unecessary in the RDF graph.

All that information about spaces and mappings, and in particular
the relationships between data types with regards to subset spaces
and mappings, is very important and useful, but that is knowledge
about the data types, not mechanisms for defining the data type
of literals. 


> This is an important debate that we need to have, which I hoped to spark
> by 
> my comments to which you responded.  I believe the group needs to come
> to 
> consensus on this issue, with some urgency.

I agree.


> Well, I certainly disagree that description of S should be removed from
> a 
> discussion document.

Point taken. I was taking your recommendation as meaning a promotion
or change of the role of that document to something other than for
discussion. Apologies if I misunderstood.

> I would certainly accept the addition of corresponding descriptions of P
> and D, if such descriptions are forthcoming -- that way we have an
> even-handed basis from which to make our choice.

I agree, and wish I could provide them, but it is beyond my present
skill.

Regards,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Thursday, 10 January 2002 10:55:42 UTC