RE: High-level comments on datatyping draft

> -----Original Message-----
> From: ext Sergey Melnik [mailto:melnik@DB.Stanford.EDU]
> Sent: 28 August, 2002 20:34
> To: Stickler Patrick (NRC/Tampere)
> Cc: melnik@DB.Stanford.EDU; Jeremy Carroll; w3c-rdfcore-wg@w3.org
> Subject: RE: High-level comments on datatyping draft
> 
> 
> On Tue, 27 Aug 2002 Patrick.Stickler@nokia.com wrote:
> 
> > 
> > 
> > > -----Original Message-----
> > > From: ext Sergey Melnik [mailto:melnik@DB.Stanford.EDU]
> > > Sent: 26 August, 2002 17:33
> > > To: w3c-rdfcore-wg@w3.org; Stickler Patrick (NRC/Tampere)
> > > Subject: High-level comments on datatyping draft
> > > 
> > > 
> > > 
> > > I think in the current draft a more clear separation between 
> > > the abstract
> > > syntax and the RDF/XML syntax is needed (the original RDF 
> > > spec suffered
> > > from exactly this problem). The first figure in Sec. 3.1 
> illustrates
> > > nicely what datatypes are about in the abstract syntax - they are
> > > first-class citizens in graphs. This point can be made earlier.
> > 
> > OK. Perhaps it would be good to have an initial "Datatyping 
> in a Nutshell"
> > section that provides this up-front, and then explains it 
> in detail in
> > the remainder of the spec. A kind of "Quick Glimpse for Gurus" so to
> > speak.
> > 
> > Or perhaps this would be something the Primer could provide?
> 
> In fact, I'd rather first introduce datatyping in the abstract syntax
> (backed by examples) and then explain the impact on the 
> concrete syntaxes,
> separately. In terms of the abstract syntax, datatyping is a 
> quite general
> and extensible mechanism and this need to be made clear.

Well, most users approach RDF from the XML syntax, so I think
it is good to introduce both concrete and abstract syntax together,
which also makes their relationship explicit and hopefully clearer.

I agree, though, that the abstract model is quite general and
extensible.

> > > I'm also uneasy about nailing down the internal structure 
> of datatype
> > > values as a 4-tuple or the like. I think literals should be 
> > > opaque wrt RDF
> > > abstract syntax. This makes the data model simple and appealing.
> > 
> > Well, the string-xmlbit-lang portions can be opaque, but 
> the datatype
> > portion (either URIref or systemID) needs to be visible to the MT.
> 
> Why is this? What the MT cares about is the equality of 
> literals, nothing
> more, right? That is, the only place where it is critical to 
> know whether
> two literals are identical is when parsing a concrete syntax and
> generating an RDF graph for it. I think the equivalence rules of
> language-tagged literals, which Jeremy summarized, could kick it here,
> right? In case I'm missing something essential, what exactly are the
> places in the MT where we care about the internal structure 
> of literals?

Ummmm. It depends on how we define literals. If literals are 3-tuples
of string, XML bit, and lang, and typed literals are pairs of datatypes
and literals, then the literals can remain fully opaque to the datatyping
model theory. Sure.

> > I originally defined literals as 3-tuples (per the Bristol f2f) and
> > then typed literals as structured objects consisting of datatype 
> > name (explicit or implicit) and literal, where the literal structure
> > would be opaque to the MT but the typed literal structure would not.
> > 
> > The reason I went with a 4-tuple is that the datatype denotation is
> > required, even if implicit as a systemID.
> > 
> > I'm quite open as to how the typed literals themselves are modelled,
> > so long as the MT works correctly.
> 
> So am I. I'm just wondering whether we can simplify things 
> even further.

I'm all for simplification. Stay tuned for a simplified proposal...

> > What do you recommend?
> 
> Checking with Pat and Guha whether the internal structure is 
> needed for
> the MT...

The internal structure insofar as string-XMLbit-lang is not
needed to be visible to the MT. But the datatype is. 

Again, it all boils down to what we call a "literal" and 
whether literals include the (possibly implicit) datatype
or whether a typed literal is something different from
and contains a literal, along with a datatype. In the latter
case, then literals can remain opaque and typed literals must
be transparent to the datatyping MT.

> > > In addition, in implementations literals might be mapped 
> directly to
> > > native types and not have any structure whatsoever. Insisting 
> > > that they do
> > > would make implementations more complex and less efficient.
> > 
> > Well, we've already agreed that literals have structure, the
> > string, xmlbit, and lang -- even if the latter two are irrelevant
> > to the DT MT and that structure is opaque to RDF in general -- and
> > applications will have to preserve that structure whatever their
> > internal representation. I see no reason why an internal struct
> > couldn't have fields for xmlbit and lang as well as native value
> > representation.
> 
> Language-tagged literals do have structure in the RDF/XML syntax, no
> doubt. The question is whether we have to prescribe how this 
> structure is
> mapped to the abstract syntax. For example, an integer literal in a
> statement (foo, prop, (int)5) may be represented internally as
> 
> class IntegerStatement {
>  Resource s;
>  Resource p;
>  int o;
> }
> 
> It might not matter for implementations what we say about the abstract
> syntax, but I think we should require only what is necessary.
>  
> > > It seems that all that business of rdfs:Datatype, value spaces and
> > > lexical spaces could be eliminated. What we care about 
> are the value
> > > spaces, c'est tout.
> > 
> > I'm not sure I follow you here. We are stuck with lexical 
> representations,
> > and thus mappings from lexical forms to values -- and thus 
> we are stuck
> > with lexical spaces, mappings, etc. and rdfs:Datatype is the vehicle
> > by which we define those entities.
> 
> Oh, in the latest proposal 

Firstly, could we avoid references such as "latest proposal". There
have been several "latest proposals" and some have been rejected
by members of the WG. Just because someone suggested X does not mean
that X has been agreed upon by the WG. Please include the relevant
qualifications and/or links so that we know which "latest proposal"
you are referring to. Thanks.

> we are not stuck to lexical forms 
> like the ones
> specified in the XSD spec. Integers, floats, binary data, etc. are
> directly available in the abstract syntax as constants. 

I do not believe that having native datatypes in the abstract syntax
has been accepted by the WG and therefore this is not reflected in
the current working draft.

> Of 
> course, we need
> the lexical forms in the concrete syntaxes, but in this case 
> the mapping
> from the lexical forms to typed values is simply a part of the parser
> specification; the graph does not need to know about the such 
> mappings or
> lexical forms.

Having the datatype values explicitly represented in the abstract
syntax forces all RDF applications to either support all specified
datatypes natively or use yet another alternate abstract syntax
for datatyped values for which they cannot provide native representations.

RDF cannot have native datatype representations in the abstract graph
and remain portable, extensible, and model neutral, all of which are
characteristics that RDF should embody if it is to satisfy its key
purpose as a vehicle for knowledge interchange.

> > How would you capture datatype semantics otherwise? It already seems
> > to be defined as minimally as possible.
> > 
> > > Finally, I think that global datatyping (Sec. 3.2) is out of 
> > > scope of the
> > > current document.
> > 
> > Eh? Global datatyping has been in scope from the start and has never
> > become out of scope insofar as the desiderada and interests of the
> > WG as a whole (even if one or more proposals have omitted it). It
> > certainly is in scope, and is a key requirement. This also 
> was recently
> > re-inforced by the CC/PP community and was an expectation of the
> > original RDF WG.
> 
> Jeremy commented on the above in another email, I'll follow 
> up there...

I appreciate that there have been recommendations that we adopt 
a core set of machinery for datatyping that for the time being (or
forever) excludes global implicit datatyping, and I'm all for that
(which should become evident shortly) but that still does not mean
that global implicit datatyping is out of scope and no longer of
concern to the WG, and I've not seen any formal decision that such
would be so. A proposal made by a WG member, even with alot of head
nodding from other members in agreement does not constitute a 
decision by the WG. And until such a decision is made, as an editor,
I consider it my responsibility to reflect issues that remain on
the table, until they are officially removed from the table.

Regards,

Patrick

Received on Thursday, 29 August 2002 04:17:24 UTC