Re: abstract syntax representation of inline literals from Sergey Melnik on 2002-09-13 (w3c-rdfcore-wg@w3.org from September 2002)

From: Sergey Melnik <melnik@DB.Stanford.EDU>
Date: Fri, 13 Sep 2002 01:33:16 -0700 (PDT)
To: Jeremy Carroll <jjc@hpl.hp.com>
cc: w3c-rdfcore-wg@w3.org, melnik@DB.Stanford.EDU
Message-ID: <Pine.GSO.3.94.1020913003748.18126A-100000@Hake.Stanford.EDU>
Jeremy,

let me offer you another opinion.

I think the distinction between the RDF abstract syntax and the
concrete syntaxes is critical for datatyping. The job of an RDF parser is
to transform an expression in a concrete syntax to an expression in the
abstract syntax, i.e., a graph.

In the abstract syntax, typed literals are opaque constants. The semantics
of these constants (e.g., the numerical order on integers etc.) does not
surface in the RDF MT. The set of these opaque constants is open-ended and
by design includes the XSD primitive types.

Now let's take RDF/XML as one of the concrete syntaxes for RDF. We want
the apps to interoperate on datatypes, so that we define a standard way of
encoding the XSD primitive types in RDF/XML. The compliant RDF/XML parsers
have to implement the lexical-to-value mapping, i.e., make sure that
the proper constants are produced in the graph during parsing. In
particular, it is the job of the *parser* to make sure that an xsd:int
"010", xsd:int "10", and xsd:decimal "10" are all mapped to the same
constant in the graph. That is, whatever built-in datatypes we choose now,
this decision only affects the concrete syntaxes.

So what about the datatypes we know nothing about yet? Observe
that since we do not provide a mechanism for defining new datatypes,
there is no standard way for RDF applications to interoperate on
datatypes that they have never heard of before. However, within a single
application (which may be as big as enterprise-wide), it is convenient
to be able to use user-defined types, which are not known to other apps.
This is the same situation as with using the different variants of SQL
in Oracle, DB2 etc. For this purpose, we can define "hooks" in the
concrete syntaxes, which support round-tripping of non-standard datatypes.
That is, when the parser encounteres such datatype, it invokes a callback
function that creates a graph node for this unknown lexical encoding. The
serializer can use the same kind of hook.

An interoperable way of defining new datatypes can only be provided if  
the lexical-to-value mappings of datatypes are represented explicitly in
RDF (and are not built-in in parser specs), e.g., by means of datatype
properties. This is subject of future work.

Some comments to your options are below, following the above perspective.

Sergey


On Fri, 13 Sep 2002, Jeremy Carroll wrote:

> 
> 
> 
> Well it's half past three in the morning, and I can't sleep, and Patrick's 
> wrong !
> I blame the macchiato that I drank yesterday afternoon.
> 
> Contents:
> A: the choice of lexical form of datatype value is a *comment*
> B: We really have a closed set of datatypes
> C: infinite variability in representation should be syntactic
> D: URI case analgoy is specious
> E: round-tripping argument is specious
> 
> A: the choice of lexical form of datatype value is a *comment*
> 
> We all agree that 
> 
> <rdf:Description>
>   <eg:prop>2</eg:prop>
> </rdf:Description>
> 
> and
> 
> <rdf:Description>
> <!-- I like leading zeros. -->
>   <eg:prop>2</eg:prop>
> </rdf:Description>
> 
> are the same.
> 
> There is no, a priori, reason why we should not see the choice of lexical 
> representation of a data value as similarly only a superficial irrelevance, 
> and see graph equality test cases as holding (with say
> <xsd:int>"2" == <xsd:int>"02")

This is right. The parser maps the above two pieces of XML to graph
according to a built-in lexical-to-value mapping.


> B: We really have a closed set of datatypes
> 
> XSD specifies a closed set of 19 base types, all others, including user 
> defined types, derive from these. The only values you can have for a user 
> define type is one of the base types (or a list thereof: at most another 19 
> new different types). Derived types are subsets of these (at most 38) types.
> If I choose to represent an xsd:decimal "2" as an xsd:int "2" I have only made 
> a comment that can be automatically verified. There is nothing that is worth 
> preserving.

The set of datatypes is open-ended in the abstract syntax. For interoperability,
we have to require apps to support a chosen subset of datatypes (XSD
types), i.e., the interoperable set of datatypes is closed in the concrete
syntaxes (still, we have the parser hooks, and an upcoming datatype
definition language for RDF).


> C: Infinite variability
> 
> In XSD there are an infinite number of ways of writing the number 2.
> Some of these cosnsit of leading and trailing zeros. Others consist of 
> defining a new type that directly or indirectly derives from xsd:decimal.
> In RDF/XML we already have infinite variability in the choice of how to 
> serialize a graph (e.g. whitespace and XML comments).
> However the model theory is finite in style, and is most easily understood by 
> adding triples using closure rules.
> 
> Patricks position is that the infinite set of representations of 
> <xsd:decimal>"2" all are interpreted as the number 2.

I agree with that. All these lexical representations get mapped to the
same constant in the graph.

> This means that any 
> graph involving one of these will entail an infinite number of other graphs.

I disagree. The lexical forms are not part of the graph, and are not
visible to the MT. No logical rules are necessary (they are all syntactic
and are part of the parser spec).


> We would also need a closure rule in the MT of the form:
> 
> If
> aaa ppp <ddd>"lll" .
> is in the graph,
> and
> <ddd>"lll" maps to the same value as <DDD>"LLL" under xsd rules then add
> aaa ppp <DDD>"LLL" .
> to the graph.
> RDF closure would be transformed from a fairly easy computation to a merely 
> theoretical device. Fine for OWL (where we do have to worry about infinity), 
> unnecessary and a mistake for RDF.
> 
> D: URI case analgoy is specious
> Patrick said:
> > If we are going to do this, then let's be sure that
> > http://foo.com/blarg and http://FOO.COM/blarg are
> > both mapped to the same URIref node too, eh?
> Nowhere in RDF do we suggest any relationship at all between these two URIs. 
> (Other than they will retrieve the same document - which is implicit in our 
> specs)
> However, any account of datatyping does say that the datatypevalue nodes in 
> the graph are interpreted as the values from the value space in the model 
> theory. Moreover, we know that we use the XSD rules to work out which values.
> Thus, two way entailment between the graphs in the test case is at least 
> implicit in Part I.
> Thus <xsd:int>"2" and <xsd:decimal>"02.0" are much more closely related in our 
> specs than the two URIs above.
> 
> 
> E: round-tripping argument is specious
> We already choose that a lot of things are irrelevant to round tripping (e.g. 
> whitespace, order, xml comments, use of which syntactic rules).
> We are free to define another thing that is not included in round tripping.

Absolutely. We can even round-trip between different concrete syntaxes...

Sergey

> Jeremy
Received on Friday, 13 September 2002 04:35:52 UTC