Re: RDF datatyping

At 10:27 AM 1/10/02 +0200, Patrick Stickler wrote:
>On 2002-01-07 20:54, "ext Graham Klyne" <GK@NineByNine.org> wrote:
> > For the rest of the group, my question is this:  do we just tidy up the
> > "S"
> > proposal and go with it, or do we need to discuss the alternatives?
>
>I am strongly opposed to this (no surprise, I'm sure).

Sure.  I should probably stand back and let someone else offer a viewpoint, 
but I'll make one pass at responding to your points (constructively, I hope).

Firstly, let me say that I am quite open to using a different scheme if it 
meets the various criteria raised.  Currently, S seems to be closest we 
have to a fully worked-out scheme.

Secondly, I must emphasize that the point around which the choice currently 
revolves (for me, at least) is the tractability of the associated model 
theory.  The other issues seem to be more or less addressed by all the 
contending schemes.  This in turn means that any alternative must come with 
an account of how the model theory must be applied and/or changed to 
support the proposal offered.  Pat did sketch a model theory for the P/P+ 
schemes, and it (a) certainly seems to add a significant degree of 
complexity to the model theory, and (b) it seems possible that it has some 
fatal logical flaws (i.e. logical consequences that are different from what 
is desired or expected).  I don't think I've ever seen a proposal of formal 
semantics for the D proposal.

That said...

>I don't see how the S proposal serves to provide a solution which the
>present practices/idioms P & D (DAML) do not provide -- therefore there
>is no justification for adopting what is a more complex solution for a
>problem that it is already fully solved by present practices.

The advantage of the S scheme is that is sits comfortably within the 
current model theory.  The only variations from the model theory in its 
purest form, as far as I can tell, are:
- literal nodes denote the literal string value that is used to labels 
them, and
- it is understood by the communicating applications, ex-RDF, that certain 
datatype URIs have fixed interpretations defined in part by a suitable 
datatype specification.  (I see this corresponding to the XL mapping of the 
model theory - it's common across all interpretations but isn't specified 
by the RDF core itself.)

>If the present idioms/practices did not do the job, then of course
>there is justification for a new approach such as S, but since the
>current idioms work fine there is no need or reason to take on yet
>another approach.

I'd say that it's pretty well impossible to say that the present 
idioms/practices "do the job" unless there's an accompanying theory that 
says what entailments are licensed; i.e. what conclusions can validly be 
drawn from some given collection of RDF.  (Back to the model theory.)

>The D (rdf:value/rdf:type, local) and P (rdfs:range, global) idioms do
>the job just fine.

Point me to the model theory, and I may be convinced.

>The S idioms, while also doing the job, do so with more machinery and
>most significantly are contrary to the intuitions of current RDF users
>(data typing by predicate rather than by rdf:type).

I don't recognize that description of S:
- I don't see "more machinery" here, whatever that means,
- "contrary to the intuitions of current RDF users" is precisely one area 
where I think S scores very strongly, based on my intuitions from work with 
CC/PP (modulo small issues raised in my comments to Sergey's paper).
- I don't know what "data typing by predicate" means (though I observe that 
rdf:type *is* commonly used as the name of a predicate).

In any given interpretation, a denoted value *has* a data type separately 
from any predicate in which it appears.  The RDF semantics of a graph 
constrain the interpretations that satisfy the graph, including (possibly) 
constraints on the types of things denoted by the graph.  rdf:type is (used 
as) a predicate that defines a relatively direct constraint on the type of 
its subject, though it still depends on the interpretation of its 
object.  Other predicates (e.g. rdfs:range) may impose type constraints by 
more indirect means.


>There is also the drawback that the S idioms represent a method of typing
>literal labeled resources that differs from the typing of URIref labled
>resources, introducing an inconsistency into RDF with regards to resource
>typing in general.

Er, how is it different?  I'd say that a strength of S is precisely that it 
treats all RDF graph nodes pretty much the same, regardless of how they are 
labelled.

(The *syntax* of RDF doesn't allow us to apply rdf:type directly to a 
literal, so more indirect means are required to indicate the intended type 
of literal values;  I (personally) don't think that constraint needs to 
apply in the RDF graph, and I'd favour relaxing the syntax in a future 
version of RDF.  I don't see any other proposal addressing this point, 
other than by adding whole new layers of structure to RDF, or making 
literals a truly different kind of thing in the RDF graph and associated 
semantics.)

>Finally, if we later (e.g. in RDF 2.0) want to take a P+ approach,
>(which I think is the most ideal approach once the necessary extensions
>to the graph model can be made) the P and D idioms are far more
>compatible with P+ than are the S idioms and therefore with P/D we
>have a much easier evolution path to P+ than with S.

P+ allows literals-as-subjects, right?

Again, I don't agree that P and D are more compatible with this 
approach.  I think S would also benefit from such a relaxation.  I suspect 
that much of your resistance to S is not to the basic proposal per se, but 
to the indirections that are used to apply S in the face of the syntactic 
restriction noted.

>CONCLUSION:
>
>     S is a radical and unnecessary departure from the way
>     data typing of literals is now done in the RDF community,
>     and therefore is the least optimal choice presently on the
>     table, and should not be promoted as a replacement for the
>     P and D idioms.

We seem to be talking about different 'S's here.  (Maybe the generally 
accepted meaning of the text in Sergey's proposal allows two radically 
different interpretations to both satisfy that text?  Which, if true, would 
be precisely an illustration of why a formal mathematical semantics is so 
important to RDF.)

My support for S (modulo comments elsewhere...) is precisely on the basis 
that it matches with the way that literal data typing is used in CC/PP, 
which observation I offer as a counter-example to the assertion in your 
conclusion.  And since I don't accept your premiss, I am not bound to 
accept what follows from it.

>I know that alot of work has gone into the definition and refinement of S,
>and that work has certainly been invaluable in clarifying the general
>understanding of what data typing means within the scope of RDF, but ...

Yes, the amount of work is important and not something to be lightly put aside.
Unless, that is:
(a) it is demonstrated to be fatally flawed in some way, OR
(b) a better [**], fully worked-out proposal (including semantics) is on 
the table

I remain open to arguments along the lines of (a) or (b)...

[**] Of course, this begs a view of "better" - in the final analysis it may 
be an arbitrary choice based on personal preference.  But we're not there yet.

>I sincerly think Sergey's Datatyping document is great, and
>I am not objecting to the majority of its content, but only to the
>S idioms as becoming the official/recommended idioms for defining
>pairings of lexical form to data type (literal and data type URI).

This is an important debate that we need to have, which I hoped to spark by 
my comments to which you responded.  I believe the group needs to come to 
consensus on this issue, with some urgency.

>My view is that discussion of S should be removed and descriptions
>of the P and D idioms added as the currently used and now "official"
>idioms for global and local literal data typing (respectively).
>
>Am I alone in this thinking?

Well, I certainly disagree that description of S should be removed from a 
discussion document.

I would certainly accept the addition of corresponding descriptions of P 
and D, if such descriptions are forthcoming -- that way we have an 
even-handed basis from which to make our choice.  (But if no such 
description is forthcoming, we have little choice but to proceed with what 
we have.)

(I've indicated above what I think it would need to change my mind.)

#g


--------------------------
        __
       /\ \    Graham Klyne
      /  \ \   (GK@ACM.ORG)
     / /\ \ \
    / / /\ \ \
   / / /__\_\ \
  / / /________\
  \/___________/

Received on Thursday, 10 January 2002 07:05:45 UTC