Re: Proposal for abstract syntax representation of inline literals (was Re: weekly call for agenda items) from Patrick Stickler on 2002-09-12 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 12 Sep 2002 16:56:58 +0300
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "RDF Core" <w3c-rdfcore-wg@w3.org>, "ext Brian McBride" <bwm@hplb.hpl.hp.com>
Message-ID: <001001c25a64$4365a700$864416ac@NOE.Nokia.com>
Firstly, let me say that I am very sympathetic to Jeremy's
interests in making the abstract syntax as easy as possible
to realize in various RDF applications, and in principle,
am very much in favor of making it as close as possible to
practical concrete representations that might be used.

That said... ;-)

----- Original Message -----
From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>; "Patrick Stickler" <patrick.stickler@nokia.com>; "RDF Core" <w3c-rdfcore-wg@w3.org>;
"ext Brian McBride" <bwm@hplb.hpl.hp.com>
Sent: 12 September, 2002 15:50
Subject: RE: Proposal for abstract syntax representation of inline literals (was Re: weekly call for agenda items)


> > I suggest an abstract syntax along the lines of:
> >
> > An RDF Literal Node can be labelled with one of:
> >
> > - an RDF String Literal (as now)
> > - an RDF XML Literal  (as now)
> > - a value from the value space of a datatype.
> >
> > We simply note that an implementation that is unaware of a
> > specific datatype
> > used in an RDF/XML document will need to store the datatype URI + lexical
> > form pair as a fall-back.
> >
> > An RDF Graph contains precisely one triple for each Literal node in the
> > graph.
> >
> > [As I have previously indicated this extreme syntactic untidiness is
> > practically indistinguishable from extreme tidiness, but leaves Pat more
> > room to wriggle.]
>
> Benefits:
>
> 1. Implementations don't need to store the original string, fits with almost
> all plausible implementations of values. (e.g. in a database)

What about round-tripping? Does the application then just choose
any suitable datatype and lexical representation as it likes,
rather than the original pair it recieved? What if the RDF/XML
is being returned to the system it originated from, and a
datatype that is not recognized or supported by the originating
system is used?

An application is free to derive and store a native value for
each typed literal pairing as part of the internal representation
of the typed literal node while still respecting and preserving
the datatype/lexicalform label that exists in the abstract graph.

The abstract graph should exemplify genericity, system neutrality,
and portability. It should not presume anything about system or
platform specific characteristics or innate knowledge about
datatypes. It is a tool for interchange between *any* arbitrary
RDF systems, irregardless of what extra-RDF knowledge those systems
might have about particular datatypes, or particular URI schemes, etc.

True, many applications will grok the XML Schema predefined types,
and probably a number of other commonly used types, and those
applications are certainly free to (and would be expected to)
interpret typed literal nodes according to the actual values
denoted when making comparisons, etc. But that does not mean
that the abstract graph should do anything but reflect the
actual assertions made, and those assertions are made in terms
of a *specific* datatype URIref and a *specific* lexical form,
and just as the XML flag and xml:lang information, that should
not be discarded in the abstract syntax, but preserved and
respected.

> 2. Makes the Closed World Assumption on datatyping. This is more accurate
> than an Open World Assumption.
> When the document author writes
>   <my:datatype>"10"
> they know already whether they mean the string "10", the number 2 (binary),
> or 10 or 16 (hex).

Yes, as will any application that knows about the datatype <my:datatype>.
But knowing that the value is 2 is not necessary to denote the value
unambiguously in the abstract syntax.

> They are not saying that they are going to make up their mind later about
> what the "10" means. <my:datatype> is already defined and they are using
> that definition. Someone else shouldn't come and redefine <my:datatype> so
> that "10" means something else.

How could they? The "owner" of the datatype says what value
the lexical form denotes. Someone else can't come along and change
the L2V mapping.

Using the pairing of datatype URIref and lexical form is just as
"closed world" as using the value as the label of the node.

The "owner" of a URIref says what it denotes, and if they say
that it denotes a particular datatype with particular characteristics,
someone else can't come along and change that (at least not in
any socially acceptable fashion).

>
> I agree with Patrick that it has a weakness to do with implementations that
> do not know a particular datatype have to fallback to something like his
> proposal; which licenses two slightly different behaviours.

But even more, how to applications that interact with e.g. a triples
store that has opted to map as many typed literal pairing labels to
values tell that triples store which datatypes the client application
also groks so that the triples store knows which to re-map back into
pairing labels versus leaving as values?! This introduces a *huge*
interoperability issue, and completely unnecessarily.

Each application should itself decide which typed literal nodes it
is able to interpret as values, and if it likes, store that value
statically internally, but the label of that node is not the value,
but the bi-part name of the value consisting of the datatype URIref
and a lexical form, and it is that bi-part name that it should use
by default when interacting with other RDF applications by generic
means (e.g. RDF/XML).

> However, note
> that XML Schema Datatypes is closed. There are only the predefined basic
> types and (possibly user defined) types derived from them. Hence, at least
> in principle, an RDF implementation could know all possible predefined
> datatypes and the rules for understanding a user definition of a new type.

Actually no. XML Schema does not provide the machinery for actually
defining the L2V mapping for user-defined datatypes which are direct
subtypes of xsd:anySimpleType, therefore it is not possible for an RDF
application to know what some arbitrary datatype might mean simply
based on the XML Schema machinery. Also, RDF datatyping is not restricted
to XML Schema datatypes, and should not be.

--

I believe that having values as labels in the abstact graph will
introduce portability and interoperability problems between
applications as well as misunderstandings and conflict between
application developers insofar as different applications/platforms
are able to natively interpret and represent different sets of
datatyped literals. I also believe it further complicates round-tripping
of RDF/XML which already is problemmatic and has been clearly
identified as an important functionality of RDF systems.

Furthermore, I don't see that having values as labels in the abstract
graph actually provides any substantial benefit to developers,
as they are free to use actual values in their internal representation
anyway, and will anyway have to use bi-part labels for unsupported
datatypes. It introduces a variable abstract graph from application
to application and two variant representations for typed literals
whether the datatype is understood or not.

It seems to me to be far clearer, consistent, portable, and
reliable to label typed literal nodes with generic pairings
than actual values and leave application developers to decide
themselves if/how they wish to intern the actual denoted values
into their system-specific representations.

Cheers,

Patrick
Received on Thursday, 12 September 2002 09:57:02 UTC