RE: new datatyping proposal from Patrick.Stickler@nokia.com on 2002-08-08 (w3c-rdfcore-wg@w3.org from August 2002)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 8 Aug 2002 18:55:06 +0300
To: <Jan.Grant@bristol.ac.uk>
Cc: <melnik@db.stanford.edu>, <w3c-rdfcore-wg@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBA7D@trebe006.NOE.Nokia.com>
> -----Original Message-----
> From: ext Jan Grant [mailto:Jan.Grant@bristol.ac.uk]
> Sent: 08 August, 2002 16:21
> To: Stickler Patrick (NRC/Tampere)
> Cc: melnik; w3c-rdfcore-wg
> Subject: RE: new datatyping proposal
> 
> 
> On Thu, 8 Aug 2002 Patrick.Stickler@nokia.com wrote:
> 
> >
> >
> >
> > > Jenny --ageYears--> int_5
> > >
> > > I(xsd:integer) = {I(int_0), I(int_1), ... }
> >
> > To have to define a lexical grammar for the constants of
> > typed literals which further have a lexical grammar for
> > the lexical form portion is not practical at all.
> >
> > The graph syntax representation of a typed literal should
> > be the fusion of the URI denoting the datatype and the
> > lexical form, the lexical grammar of which is defined
> > by that datatype.
> >
> > While I consider this to be a tangient from what the WG
> > should be focusing on regarding datatyping, I'm not opposed
> > to the idea of having typed literals as an additional
> > atomic element of the graph syntax -- *but* its representation
> > should not require any further specification beyond the URI
> > identity of the datatype and the lexical grammar of the
> > datatype itself.
> 
> No; lexical grammar of datatypes belongs in parsers and 
> serialisers. 

Really. Even if the semantics of those datatypes are external
to RDF? Should we require RDF parsers to be able to parse
and validate the most common URI syntaxes? 

I understand that this argument hinges on whether there
are or are not native RDF datatypes. I consider built
in datatypes to be inherently non-extensible and contrary
to the model-neutral stance that RDF has taken to date
(and datatype frameworks are clearly models).

Furthermore, if a set of native RDF datatypes are defined,
we *still* need ways to support datatyping of literals for
custom, user-defined datatypes! This is *not* a solution,
it is a detour from the real problem at hand, which is
devising a way to express and communicate knowledge using
*arbitrary* datatyped values.

To limit RDF to a core set of datatypes is no different from
limiting RDF to a core set of instances of rdfs:Class.

> The
> tokens - magically created by mathematical whim - belong in the graph
> syntax, which shouldn't be too closely tied to lexical representation.
> (IMHO)

Created by mathematical whim? Magically? And here I thought the
graph was to be the basis of global interchange and syndication
of knowledge...

Just for the record, I do not oppose a new node type that
constitutes a datatyped literal (a pairing of datatype and
lexical form) which denotes a particular datatype value.
I've wanted such a thing for some time -- though I now do
it using a URI (the val: URV). Which, by the way, works just
fine.

But I am very much opposed to the graph label of such a
datatyped literal node being anything other than the
lexical combination of the datatype URI and the lexical
form. Creating yet another set of identifiers for these
nodes is completely unnecessary and weakens the clarity
of RDF datatyping by introducing a level of naming 
indirection between datatype URIs and these "magical,
whimsical constants".

> > > It is the intent that at least some of these data types (like
> > > integers,
> > > floats and strings) correspond to data types provided by 
> programming
> > > languages and storage systems, so as to allow for efficient
> > > storage and
> > > retrieval of RDF.
> >
> > Nothing prevents that with the previous proposals. This is a
> > trivial implementational issue that has no relevance to the
> > abstract graph syntax.
> >
> > It is to be expected that APIs and other RDF applications will
> > provide abstractions of datatyped values based on their idomatic
> > expression in the graph. So what. That is true for all of the
> > proposals that have been on the table thus far. The benefit
> > asserted here is an illusion.
> 
> I disagree. There is (in my opinion, again) a distinct advantage with
> this proposal, because the type of a literal is atomically tied to the
> literal itself. This requires no idiomatic interpretation between a
> triple store and the view presented to the user of an api. 

Eh? Writing f(DDD, LLL) is harder than f((ddd)LLL) ???

I have yet to hear any argument that would demonstrate any
such benefit. I think it is simply a matter of taste.

> While having
> types expressed as property arcs mitigates this somewhat, I've had a
> stab at implemeting the datatyping idiom from the previous 
> proposal and
> it's a hell of a fag (this is despite having support for hiding all
> sorts of extra information on bnodes).

No comment.

> The best route I could see for the older proposal was to add a slew of
> extra calls to an API. 

A "slew". More like just two. One to get the "raw" object of
a statement and one to get a (if possible) interned datatyped
value of a property.

And most likely, one would create the API to use a single
virtual function for getting property values and just set
a flag to differentiate between raw or datatyped.

> This isn't something I'm averse to, however,
> because "grovelling around at the triple level" is not particularly
> productive (again, in my opinion).

I certainly agree with you there.

> > Whether the application interns
> > based on typed literal constants, multi-node idioms, or any
> > other representation, it's all the same, and no alternative
> > is not significantly better or worse than any other.
> 
> > > 2. Concrete syntaxes
> > > --------------------
> > >
> > > In the RDF/XML syntax, non-string literals are encoded in 
> accordance
> > > with the XML Schema spec as
> > >
> > > <propName xsi:type="URI">XML content</propName>
> >
> > Again, I understand xsi:type to be tied to XML Schema datatypes,
> > not to arbitrary datatypes, therefore adoption of this term
> > constitutes treading on other folks front lawn.
> >
> > Likewise, if it the case that 'int_5' is of rdf:type xsd:integer,
> > i.e. that it denotes the value five, then we can just as well
> > use rdf:type instead of xsi:type.
> 
> Certainly if literals ever get to sit on the blunt end of arcs, there
> will be something like
> 
> 	int(5) rdf:type xsd:integer .
> 	xsd:integer rdfs:subClassOf rdfs:Literal .

Fair enough, so in this case, presuming we adopt datatyped
literal nodes, why not just use rdf:type to begin with?
I've yet to see any demonstrated requirement for using
xsi:type given the existing rdf:type which more acurately
reflects the semantics of the datatyped node.

> > > RDF/XML parsers provide callbacks that allow generating a compact
> > > internal representation of literals that correspond to data types
> > > provided by programming languages and storage systems 
> (e.g., integers,
> > > floats and strings). Similarly, the serializers provide 
> callbacks for
> > > encoding such literals in RDF/XML.
> >
> > These are all simply implementational issues that can also be
> > done for any of the proposed datatyping idioms. We are not tasked
> > to say how RDF parsers and APIs are to be implemented. There
> > are many ways to optimize the internal storage of RDF expressed
> > knowledge.
> 
> I'd like to hear or see a proposal for the datatyping idiom that does
> this, but this isn't really necessary WG material. Maybe on
> rdf-interest?

This was discussed to death months ago. I don't intend
to repeat it this late in the game. See the archives.

> > This proposal is not offered as a solution to any fatal flaw
> > in the local datatyping mechanisms outlined in the latest
> > datatyping WD (the stake-in-the-ground proposal), and IMO
> > is a less optimal solution that the present stake-in-the-ground
> > proposal.
> 
> I find it conceptually simpler, and easier to sell to people.
> 
> > It also fails to address global typing and the relationship of
> > datatyping to RDF types in general.
> 
> Agreed (I think); there's omre to fill out in this regard.

To be quite direct about it, having RDF datatyping be
datatype framework neutral has been in the documented
desiderada for nearly a year -- and it's pretty darn late
to be tossing such a proposal for native RDF datatypes
on the table. You've had more than enough time to contest
the datatype neutral view inherent in the proposals to
date, and as this proposal (however interesting) does not address
any substantial flaw in the current stake-in-the-ground
proposal, which you also had more than ample time to 
contest, I find the discussion of this proposal to be
very much contrary to the process for concensus that was
set up and agreed to in Cannes and re-asserted in Bristol.

It's too late. Put it on the wish list for RDF 2.0.

I believe there is a point of order to be addressed here,
and if neither chair is to be present at tomorrows telecon,
that this issue be omitted from the aggenda until it
is decided that it warrants the WG's further time.

Patrick
Received on Friday, 9 August 2002 07:47:57 UTC