Re: abstract syntax representation of inline literals from Patrick Stickler on 2002-09-13 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Fri, 13 Sep 2002 08:00:32 +0300
To: "ext Jeremy Carroll" <jjc@hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <001001c25ae2$7cc96f80$9782720a@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]


----- Original Message -----
From: "ext Jeremy Carroll" <jjc@hpl.hp.com>
To: <w3c-rdfcore-wg@w3.org>
Sent: 13 September, 2002 05:08
Subject: abstract syntax representation of inline literals


>
>
>
> Well it's half past three in the morning, and I can't sleep, and Patrick's
> wrong !
> I blame the macchiato that I drank yesterday afternoon.

So sorry if I drove you to drink (coffee at least ;-)

Maybe my comments below will help you wake up the morning after ;-)


> Contents:
> A: the choice of lexical form of datatype value is a *comment*
> B: We really have a closed set of datatypes
> C: infinite variability in representation should be syntactic
> D: URI case analgoy is specious
> E: round-tripping argument is specious
>
> A: the choice of lexical form of datatype value is a *comment*
>
> We all agree that
>
> <rdf:Description>
>   <eg:prop>2</eg:prop>
> </rdf:Description>
>
> and
>
> <rdf:Description>
> <!-- I like leading zeros. -->
>   <eg:prop>2</eg:prop>
> </rdf:Description>
>
> are the same.

Yes. But they are the same because the comments are
not represented in the abstract graph, but live only in the
RDF/XML serialization, so one would not expect to see
any different reflected in the graph due to the presence
of the comment in the second case.

> There is no, a priori, reason why we should not see the choice of lexical
> representation of a data value as similarly only a superficial irrelevance,

But we have no way to know whether some variation in lexical representation
is superficial or not. RDF would have to grok the datatype in its
entirety to know that.

> and see graph equality test cases as holding (with say
> <xsd:int>"2" == <xsd:int>"02")

We can say that these two representations are synonymous only by
knowing the L2V mapping of xsd:int, and that knowledge is not
internal to RDF.

Let's try another example that may help separate what is clearly
obvious to us, knowing about integers, and what is clear to RDF.

Is <foo:blarg>"xxxyyy" == <foo:blarg>"xxxxyyy"???

After all, they only differ by having one extra 'x' in the
second case. How could that really matter? Well, we really
don't know, because we don't know what the L2V mapping for
foo:blarg is, and whether that extra 'x' simply produces
a lexical synonym of the shorter lexical form, or results
in a different mapping.

So, because the semantics of particular datatypes are not
known to RDF (and I assert strongly should *never* be known,
jus as URI schemes are not known by RDF) all the abstract
graph can capture is the language in which typed literals
(datatype values) are expressed, and that language employes
specific datatype URIrefs and specific lexical forms, and
RDF has no knowledge whatsoever to interpret these, only
to present them to applications in a consistent and clear
manner.

If an application that groks the datatype wishes to employ
some mechanism of node equality or node merging based on
the identitical value interpretation of the two different
lexical form representations, fine, but that's the domain
of the specific application and not the abstract graph.

>
> B: We really have a closed set of datatypes
>
> XSD specifies a closed set of 19 base types, all others, including user
> defined types, derive from these. The only values you can have for a user
> define type is one of the base types (or a list thereof: at most another 19
> new different types). Derived types are subsets of these (at most 38) types.
> If I choose to represent an xsd:decimal "2" as an xsd:int "2" I have only made
> a comment that can be automatically verified. There is nothing that is worth
> preserving.

Hmmmm. OK, re-reading the XSD spec (section 2) we find

[Definition:]   There exists a conceptual datatype, whose name is
anySimpleType, that is the simple version of the ur-type definition
from [XML Schema Part 1: Structures]. anySimpleType can be considered
as the ·base type· of all ·primitive· types. The ·value space· of
anySimpleType can be considered to be the ·union· of the ·value space·s
of all ·primitive· datatypes.

It is this restriction that every user-defined type must also have
a value space that is subsumed by this union of the value spaces
of all primitive types that motivates RDF datatyping to be open, and
able to work with any conformant datatype, XSD or otherwise. Otherwise,
RDF would cease to be extensible and able to address future needs
as yet unknown to us, the XML Schema WG, etc.

RDF datatyping is not bound to XML Schema datatypes, but supports
any datatypes which exibit the required characteristics (lexical
space, value space, and N:1 mapping from lexical to value space,
where N > 1). So even if the set of XML Schema datatypes was
closed, the set of possible RDF compatable datatypes is infinite.

Secondly, the WG has not specified that the RDF MT will include
the full semantics of all primitive XML Schema datatypes. Thus,
even if the set of datatypes is closed, RDF itself knows nothing
whatsoever about what xsd:int means and that it is a subtype of
xsd:integer, etc. And thus, the RDF MT cannot equate synonymous
datatype+lexicalform names by value. The actual values are unknown,
and unknowable, to RDF without that full knowledge of the datatypes.

> C: Infinite variability
>
> In XSD there are an infinite number of ways of writing the number 2.
> Some of these cosnsit of leading and trailing zeros. Others consist of
> defining a new type that directly or indirectly derives from xsd:decimal.
> In RDF/XML we already have infinite variability in the choice of how to
> serialize a graph (e.g. whitespace and XML comments).
> However the model theory is finite in style, and is most easily understood by
> adding triples using closure rules.
>
> Patricks position is that the infinite set of representations of
> <xsd:decimal>"2" all are interpreted as the number 2. This means that any
> graph involving one of these will entail an infinite number of other graphs.
> We would also need a closure rule in the MT of the form:
>
> If
> aaa ppp <ddd>"lll" .
> is in the graph,
> and
> <ddd>"lll" maps to the same value as <DDD>"LLL" under xsd rules then add
> aaa ppp <DDD>"LLL" .
> to the graph.

No. You wouldn't necessarily need to expand the graph to include
the potentially infinite set of possible synonomous expressions.

And furthermore, you could even write such a closure rule as part
of the RDF MT, because the RDF MT -- having no knowledge of which
values the datatype+lexicalform names denote -- would not be able
to determine such mapping equality.

Thus, any such closure rule would depend on some other explicit
assertion about the equality mapping for two typed literals, and
indirectly at that since literals can't be subjects:

   IF
   aaa ppp <ddd>"lll" .
   aaa ppp <DDD>"LLL" .
   ppp rdf:type daml:uniqueProperty .
   bbb qqq <ddd>"lll" .
   THEN
   bbb qqq <DDD>"LLL" .

But I still don't see the utility of such a closure rule. I certainly
don't see that it is necessary for the core RDF MT.

In practice, users will tend to use fairly canonical lexical forms
and fairly common datatypes to denote particular values, and full
interpretation of those denotations (which actual value is denoted)
will happen outside the RDF MT, not within it.

> RDF closure would be transformed from a fairly easy computation to a merely
> theoretical device. Fine for OWL (where we do have to worry about infinity),
> unnecessary and a mistake for RDF.

I'm not going to say anything more about this issue, being woefully
unqualified. Perhaps Pat or others may wish to offer comments about
this.

> D: URI case analgoy is specious
> Patrick said:
> > If we are going to do this, then let's be sure that
> > http://foo.com/blarg and http://FOO.COM/blarg are
> > both mapped to the same URIref node too, eh?
> Nowhere in RDF do we suggest any relationship at all between these two URIs.
> (Other than they will retrieve the same document - which is implicit in our
> specs)

And this is my point. RDF *doesn't* state any such relationship.

> However, any account of datatyping does say that the datatypevalue nodes in
> the graph are interpreted as the values from the value space in the model
> theory.

I agree that the MT must state that any datatype+lexicalform labeled
node denotes *some* value, according to the L2V mapping of the datatype,
BUT the MT does not and cannot state *which* value that node denotes.

I.e. the MT will say that <xsd:integer>"10" denotes the member of the
value space of <xsd:integer> to which the lexical form "10" maps to
according to the L2V mapping defined for <xsd:integer> but it does NOT
say that <xsd:integer>"10" denotes the value ten. It cannot say that,
because it does not include the semantics of <xsd:integer>.

> Moreover, we know that we use the XSD rules to work out which values.

In the case of the particular XSD datatypes, yes, but it is not RDF
that knows that, it is the application that groks the XSD datatypes.

RDF itself doesn't know it's *ss from it's elbow insofar as
particular datatypes are concerned. It simply knows that a
typed literal node denotes *some* value of that datatype.
Which value it denotes remains a mystery to the RDF MT.

I suspect this perhaps is a key point in our disagreement. You seem to
be operating on an incorrect assumption about what a member of rdfs:Datatype
is and the degree of knowledge the RDF MT has about it. You seem
to be arguing that the RDF MT has full knowledge about the machinery
and semantics of XML Schema simple datatypes and that all members of
rdfs:Datatype will also be ancestors of xsd:anySimpleType. Both of
these assumptions are incorrect, insofar as the new stake-in-the-ground
defined by Part 1. In fact, Part 1 is rather pedantic about this point.

> Thus, two way entailment between the graphs in the test case is at least
> implicit in Part I.
> Thus <xsd:int>"2" and <xsd:decimal>"02.0" are much more closely related in our
> specs than the two URIs above.

But the RDF MT cannot know that any more than it can know about
the relationship between the two URIs differing in case. Both URI
schemes and datatypes are fully opaque to RDF, and the latter is
very clearly specified in Part 1, and that is what the WG has agreed.

>
> E: round-tripping argument is specious
> We already choose that a lot of things are irrelevant to round tripping (e.g.
> whitespace, order, xml comments, use of which syntactic rules).
> We are free to define another thing that is not included in round tripping.

Well, yes, we are. But...

Though my argument about round-tripping was mostly concerned with
having the abstract graph capture the original statements made in
the RDF/XML in the original language in which they were expressed,
and if the abstract graph discarded the original datatyping terms,
then I consider that far more significant a loss of information than
the exclusion of whitespace, comments, etc. (which, by the way, are
excluded from having semantic significance by the XML specs, not
by RDF).

Regards,

Patrick

PS: I hope this doesn't add to your insomnia... ;-)
Received on Friday, 13 September 2002 01:01:33 UTC