RE: High-level comments on datatyping draft from Sergey Melnik on 2002-08-28 (w3c-rdfcore-wg@w3.org from August 2002)

From: Sergey Melnik <melnik@DB.Stanford.EDU>
Date: Wed, 28 Aug 2002 10:34:02 -0700 (PDT)
To: Patrick.Stickler@nokia.com
cc: melnik@DB.Stanford.EDU, Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org
Message-ID: <Pine.GSO.3.94.1020827083959.7697A-100000@Hake.Stanford.EDU>
On Tue, 27 Aug 2002 Patrick.Stickler@nokia.com wrote:

> 
> 
> > -----Original Message-----
> > From: ext Sergey Melnik [mailto:melnik@DB.Stanford.EDU]
> > Sent: 26 August, 2002 17:33
> > To: w3c-rdfcore-wg@w3.org; Stickler Patrick (NRC/Tampere)
> > Subject: High-level comments on datatyping draft
> > 
> > 
> > 
> > I think in the current draft a more clear separation between 
> > the abstract
> > syntax and the RDF/XML syntax is needed (the original RDF 
> > spec suffered
> > from exactly this problem). The first figure in Sec. 3.1 illustrates
> > nicely what datatypes are about in the abstract syntax - they are
> > first-class citizens in graphs. This point can be made earlier.
> 
> OK. Perhaps it would be good to have an initial "Datatyping in a Nutshell"
> section that provides this up-front, and then explains it in detail in
> the remainder of the spec. A kind of "Quick Glimpse for Gurus" so to
> speak.
> 
> Or perhaps this would be something the Primer could provide?

In fact, I'd rather first introduce datatyping in the abstract syntax
(backed by examples) and then explain the impact on the concrete syntaxes,
separately. In terms of the abstract syntax, datatyping is a quite general
and extensible mechanism and this need to be made clear.
 
> > I'm also uneasy about nailing down the internal structure of datatype
> > values as a 4-tuple or the like. I think literals should be 
> > opaque wrt RDF
> > abstract syntax. This makes the data model simple and appealing.
> 
> Well, the string-xmlbit-lang portions can be opaque, but the datatype
> portion (either URIref or systemID) needs to be visible to the MT.

Why is this? What the MT cares about is the equality of literals, nothing
more, right? That is, the only place where it is critical to know whether
two literals are identical is when parsing a concrete syntax and
generating an RDF graph for it. I think the equivalence rules of
language-tagged literals, which Jeremy summarized, could kick it here,
right? In case I'm missing something essential, what exactly are the
places in the MT where we care about the internal structure of literals?

> I originally defined literals as 3-tuples (per the Bristol f2f) and
> then typed literals as structured objects consisting of datatype 
> name (explicit or implicit) and literal, where the literal structure
> would be opaque to the MT but the typed literal structure would not.
> 
> The reason I went with a 4-tuple is that the datatype denotation is
> required, even if implicit as a systemID.
> 
> I'm quite open as to how the typed literals themselves are modelled,
> so long as the MT works correctly.

So am I. I'm just wondering whether we can simplify things even further.

> What do you recommend?

Checking with Pat and Guha whether the internal structure is needed for
the MT...

> > In addition, in implementations literals might be mapped directly to
> > native types and not have any structure whatsoever. Insisting 
> > that they do
> > would make implementations more complex and less efficient.
> 
> Well, we've already agreed that literals have structure, the
> string, xmlbit, and lang -- even if the latter two are irrelevant
> to the DT MT and that structure is opaque to RDF in general -- and
> applications will have to preserve that structure whatever their
> internal representation. I see no reason why an internal struct
> couldn't have fields for xmlbit and lang as well as native value
> representation.

Language-tagged literals do have structure in the RDF/XML syntax, no
doubt. The question is whether we have to prescribe how this structure is
mapped to the abstract syntax. For example, an integer literal in a
statement (foo, prop, (int)5) may be represented internally as

class IntegerStatement {
 Resource s;
 Resource p;
 int o;
}

It might not matter for implementations what we say about the abstract
syntax, but I think we should require only what is necessary.
 
> > It seems that all that business of rdfs:Datatype, value spaces and
> > lexical spaces could be eliminated. What we care about are the value
> > spaces, c'est tout.
> 
> I'm not sure I follow you here. We are stuck with lexical representations,
> and thus mappings from lexical forms to values -- and thus we are stuck
> with lexical spaces, mappings, etc. and rdfs:Datatype is the vehicle
> by which we define those entities.

Oh, in the latest proposal we are not stuck to lexical forms like the ones
specified in the XSD spec. Integers, floats, binary data, etc. are
directly available in the abstract syntax as constants. Of course, we need
the lexical forms in the concrete syntaxes, but in this case the mapping
from the lexical forms to typed values is simply a part of the parser
specification; the graph does not need to know about the such mappings or
lexical forms.
 
> How would you capture datatype semantics otherwise? It already seems
> to be defined as minimally as possible.
> 
> > Finally, I think that global datatyping (Sec. 3.2) is out of 
> > scope of the
> > current document.
> 
> Eh? Global datatyping has been in scope from the start and has never
> become out of scope insofar as the desiderada and interests of the
> WG as a whole (even if one or more proposals have omitted it). It
> certainly is in scope, and is a key requirement. This also was recently
> re-inforced by the CC/PP community and was an expectation of the
> original RDF WG.

Jeremy commented on the above in another email, I'll follow up there...

Sergey
Received on Wednesday, 28 August 2002 13:36:33 UTC