Re: abstract syntax representation of inline literals from Patrick Stickler on 2002-09-13 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Fri, 13 Sep 2002 13:37:16 +0300
To: "ext Sergey Melnik" <melnik@db.stanford.edu>, "Jeremy Carroll" <jjc@hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>, <melnik@db.stanford.edu>
Message-ID: <002d01c25b11$87b9d630$864416ac@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Sergey Melnik" <melnik@db.stanford.edu>
To: "Jeremy Carroll" <jjc@hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>; <melnik@db.stanford.edu>
Sent: 13 September, 2002 11:33
Subject: Re: abstract syntax representation of inline literals


> 
> Jeremy,
> 
> let me offer you another opinion.
> 
> I think the distinction between the RDF abstract syntax and the
> concrete syntaxes is critical for datatyping. The job of an RDF parser is
> to transform an expression in a concrete syntax to an expression in the
> abstract syntax, i.e., a graph.
> 
> In the abstract syntax, typed literals are opaque constants. The semantics
> of these constants (e.g., the numerical order on integers etc.) does not
> surface in the RDF MT. The set of these opaque constants is open-ended and
> by design includes the XSD primitive types.
> 
> Now let's take RDF/XML as one of the concrete syntaxes for RDF. We want
> the apps to interoperate on datatypes, so that we define a standard way of
> encoding the XSD primitive types in RDF/XML. The compliant RDF/XML parsers
> have to implement the lexical-to-value mapping, i.e., make sure that
> the proper constants are produced in the graph during parsing. 

Sigh.

Is it so unclear that the datatype+lexicalform name *is* a globally
unambiguous "constant" denoting the value?

There is no need whatsoever for a parser to have any clue about the
lexical to value mapping of any datatype.

Just as URIrefs are opaque names that are carried from the RDF/XML
to the abstract graph unchanged, so too are datatype+lexicalform names
opaque and should be carried from the RDF/XML to the abstract graph
unnamed. 

> In
> particular, it is the job of the *parser* to make sure that an xsd:int
> "010", xsd:int "10", and xsd:decimal "10" are all mapped to the same
> constant in the graph. 

Why? What does the parser care about synonymous names? The parser
doesn't care about synonymous URIrefs. Why should it care about
synonymous typed literals?

> That is, whatever built-in datatypes we choose now,
> this decision only affects the concrete syntaxes.

We are not, or at least should not, be choosing *any* built-in
datatypes for RDF!

If there is a proposal to choose specific built-in datatypes
for RDF, then let's have it put officially on the table for a
vote.

> So what about the datatypes we know nothing about yet? Observe
> that since we do not provide a mechanism for defining new datatypes,
> there is no standard way for RDF applications to interoperate on
> datatypes that they have never heard of before. However, within a single
> application (which may be as big as enterprise-wide), it is convenient
> to be able to use user-defined types, which are not known to other apps.
> This is the same situation as with using the different variants of SQL
> in Oracle, DB2 etc. For this purpose, we can define "hooks" in the
> concrete syntaxes, which support round-tripping of non-standard datatypes.

Sigh. More machinery that is not needed if the original terms of
expression are simply used.

> That is, when the parser encounteres such datatype, it invokes a callback
> function that creates a graph node for this unknown lexical encoding. The
> serializer can use the same kind of hook.

Why not just have the parser map the datatype and lexical form pair
to a datatype+lexicalform node label, which unambiguously denotes
the value in question, and which can be easily reserialized using
the exact terms originally used, providing reliable and consistent
round tripping -- and this would be done with generic functions, with
no need of hooks and callbacks to datatype specific functions.

Now, I admit that this would be less attractive than all those callback
functions to software developers that get paid by the total number of 
lines of code...

> An interoperable way of defining new datatypes can only be provided if  
> the lexical-to-value mappings of datatypes are represented explicitly in
> RDF (and are not built-in in parser specs), e.g., by means of datatype
> properties. This is subject of future work.

There is no need for any future work whatsoever if the abstract syntax
remains neutral to datatyping schemes and simply preserves the
unambiguous and fully sufficient typed literal names used in the RDF/XML.

> Some comments to your options are below, following the above perspective.
> 
> Sergey
> 
> 
> On Fri, 13 Sep 2002, Jeremy Carroll wrote:
> 
> > 
> > 
> > 
> > Well it's half past three in the morning, and I can't sleep, and Patrick's 
> > wrong !
> > I blame the macchiato that I drank yesterday afternoon.
> > 
> > Contents:
> > A: the choice of lexical form of datatype value is a *comment*
> > B: We really have a closed set of datatypes
> > C: infinite variability in representation should be syntactic
> > D: URI case analgoy is specious
> > E: round-tripping argument is specious
> > 
> > A: the choice of lexical form of datatype value is a *comment*
> > 
> > We all agree that 
> > 
> > <rdf:Description>
> >   <eg:prop>2</eg:prop>
> > </rdf:Description>
> > 
> > and
> > 
> > <rdf:Description>
> > <!-- I like leading zeros. -->
> >   <eg:prop>2</eg:prop>
> > </rdf:Description>
> > 
> > are the same.
> > 
> > There is no, a priori, reason why we should not see the choice of lexical 
> > representation of a data value as similarly only a superficial irrelevance, 
> > and see graph equality test cases as holding (with say
> > <xsd:int>"2" == <xsd:int>"02")
> 
> This is right. The parser maps the above two pieces of XML to graph
> according to a built-in lexical-to-value mapping.

I strongly disagree. The parser should not have to concern itself
about the semantics of datatypes no more so that it should concern
itself about the semantics of URI schemes.

> > B: We really have a closed set of datatypes
> > 
> > XSD specifies a closed set of 19 base types, all others, including user 
> > defined types, derive from these. The only values you can have for a user 
> > define type is one of the base types (or a list thereof: at most another 19 
> > new different types). Derived types are subsets of these (at most 38) types.
> > If I choose to represent an xsd:decimal "2" as an xsd:int "2" I have only made 
> > a comment that can be automatically verified. There is nothing that is worth 
> > preserving.
> 
> The set of datatypes is open-ended in the abstract syntax. For interoperability,
> we have to require apps to support a chosen subset of datatypes (XSD
> types), i.e., the interoperable set of datatypes is closed in the concrete
> syntaxes (still, we have the parser hooks, and an upcoming datatype
> definition language for RDF).

All unnecessary constraints and unnecessary work if RDF remains datatype neutral.

> > C: Infinite variability
> > 
> > In XSD there are an infinite number of ways of writing the number 2.
> > Some of these cosnsit of leading and trailing zeros. Others consist of 
> > defining a new type that directly or indirectly derives from xsd:decimal.
> > In RDF/XML we already have infinite variability in the choice of how to 
> > serialize a graph (e.g. whitespace and XML comments).
> > However the model theory is finite in style, and is most easily understood by 
> > adding triples using closure rules.
> > 
> > Patricks position is that the infinite set of representations of 
> > <xsd:decimal>"2" all are interpreted as the number 2.
> 
> I agree with that. All these lexical representations get mapped to the
> same constant in the graph.

Rather, all synonymous lexical representations denote the same value,
but that does not mean that values should be represented explicitly
in the abstract graph.

> > This means that any 
> > graph involving one of these will entail an infinite number of other graphs.
> 
> I disagree. The lexical forms are not part of the graph, and are not
> visible to the MT. No logical rules are necessary (they are all syntactic
> and are part of the parser spec).
> 
> 
> > We would also need a closure rule in the MT of the form:
> > 
> > If
> > aaa ppp <ddd>"lll" .
> > is in the graph,
> > and
> > <ddd>"lll" maps to the same value as <DDD>"LLL" under xsd rules then add
> > aaa ppp <DDD>"LLL" .
> > to the graph.
> > RDF closure would be transformed from a fairly easy computation to a merely 
> > theoretical device. Fine for OWL (where we do have to worry about infinity), 
> > unnecessary and a mistake for RDF.
> > 
> > D: URI case analgoy is specious
> > Patrick said:
> > > If we are going to do this, then let's be sure that
> > > http://foo.com/blarg and http://FOO.COM/blarg are
> > > both mapped to the same URIref node too, eh?
> > Nowhere in RDF do we suggest any relationship at all between these two URIs. 
> > (Other than they will retrieve the same document - which is implicit in our 
> > specs)
> > However, any account of datatyping does say that the datatypevalue nodes in 
> > the graph are interpreted as the values from the value space in the model 
> > theory. Moreover, we know that we use the XSD rules to work out which values.
> > Thus, two way entailment between the graphs in the test case is at least 
> > implicit in Part I.
> > Thus <xsd:int>"2" and <xsd:decimal>"02.0" are much more closely related in our 
> > specs than the two URIs above.
> > 
> > 
> > E: round-tripping argument is specious
> > We already choose that a lot of things are irrelevant to round tripping (e.g. 
> > whitespace, order, xml comments, use of which syntactic rules).
> > We are free to define another thing that is not included in round tripping.
> 
> Absolutely. We can even round-trip between different concrete syntaxes...

Give me enough time and money, and I'll program you damn near anything...

Patrick
Received on Friday, 13 September 2002 06:37:20 UTC