Typed literals: current status from Jeremy Carroll on 2002-10-18 (w3c-rdfcore-wg@w3.org from October 2002)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Fri, 18 Oct 2002 22:02:23 +0200
To: w3c-rdfcore-wg@w3.org
Message-Id: <200210182202.23978.jjc@hpl.hp.com>
Summary: do nothing; worry what Tim will say next.


I am trying to reflect back to the WG the "advice" I heard at today's telecon.

The values for typed literals become apparant in a layered model theory.

[[ aside - a sketch of the model theory we outlined ...

The abstract graph has typed literals that are tidied on the basis of 
syntactic (string*3) identity.

An RDF(S) interpretation maps each typed literal to some value.

An RDF(S) interpretation conforms to some datatype d if every typed literal 
with datatype d is mapped to its value under that datatype.

Three natural levels of datatyping are:
+ XSD - the only interpretations considered are those that conform to all XSD 
built-in types.
+ none - interpretations are considered without taken datatype conformance 
into account
+ all - interpretations are only considered if they conform with all the 
datatypes that occur in the graph

However, it was noted that these datatyped interpretations are monotonic with 
respect to the set of datatypes conformed with.

i.e.

if 

a entails b with respect to a set D of datatypes

and D' is a superset of D

then

a entails b with respect to D'

]]

Given that sort of approach to the semantics, it is helpful for the abstract 
syntax to identify how a datatype is applied to a typed literal, but it 
should be clear that such an application is not a syntactic requirement.

Surprisingly, that is precisely the text I had in my back pocket!

See:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0218.html

This took the "majority" position - a typed literal is a triple, it defines 
how a datatype URI might be related to a datatype, and how a typed literal 
might be related to a value; but then defines equality (explicitly) ignoring 
those definitions.

[[[
Within an RDF graph, a typed literal is a triple: 

+ An RDF URI reference (the datatype URI).
+ A Unicode [UNICODE] string (the lexical form).
+ A language identifier

The datatype URI refers to a datatype. For XML Schema built-in datatypes, URIs 
such as <http://www.w3.org/2001/XMLSchema#int> are used. There may be other, 
implementation dependent, mechanisms by which URIs refer to datatypes. 

The typed value associated with the typed literal is found by applying the 
datatype mapping associated with the datatype URI to the lexical form. This 
mapping fails if the lexical form is not in the lexical space of the datatype 
associated with the datatype URI. 

However, the abstract syntax does not presuppose such datatype specific 
processing. 

Two typed literals are equal if and only if all of the following hold: 

+ The two datatype URIs compare equal, character by character.
+ The two lexical forms compare equal, character by character.
+ The two language identifiers compare equal (case insensitive comparison).
]]]

I am inclined to leave it alone.

Two issues are:
+ the dangling langtag is somewhat ugly (but as Pat pointed out that isn't 
much of an argument against Patrick's practical use cases)
+ should we try and have a more unified approach to literals.

I note in particular TBL's comments on XML Literal ...

http://www.w3.org/2002/07/29-rdfcadm-tbl.html

[[[
I have to say I have a problem with RDF being tied to always have to have an 
XML literal as a base type. This breaks layering - and level breaking 
features should I believe be left for another layer. You should not require 
any RDF machine to have to include an XML infoset system. The choice of XML 
syntax was supposed to be an enginering but arbitrary choice. 
]]]

Given the deployed code using parseType="Literal" and the I18N use cases such 
as ruby and bidi its a non-starter to try and remove this functionality. But 
if we had two new types rdf:ClassicLiteral, rdf:ClassicXMLLiteral then we 
could move all the complexity of XML Literals into a datatype definition.

This would address TBLs issue here in that RDF as an abstraction, would be 
free from the XML base.

Disadvantages are:
+ defining a datatype outside XSD, not a team play
+ both these datatypes may from the (lexical form, langtag) pair to a value, 
rather than the XSD convention of  mapping from the lexical form (alone) to a 
value, this would involve knock on effects in our docs.
+ having to add new terms to the namespace, agree the terms, agree where to 
put the definition etc. 
+ a peculiar equivalence where 
  rdf:parseType="Literal" and
  rdf:datatype="&rdf;ClassicXMLLiteral" 
  are sort of synonymous

Advantages are:
+ a unified framework for literals
+ possibly keeping TBL onside
+ the treatment of the langtag is more coherent with the decision to keep it 
in the abstract syntax
+ might allow further enhanced (non-XSD) datatypes that do good things with 
the lang tag.

I guess if there was some pull from the WG in this direction, I would be 
inclined to add a note to the doc:

[[
Note: the WG is still considering whether to unify the treatment of literals. 
This would involve regarding all literals as typed literals, and would use 
two new datattypes (rdf:ClassicLiteral rdf:ClassicXMLLiteral) to correspond 
to the old String Literal and XML Literal respectively.
]]]

Jeremy
Received on Friday, 18 October 2002 16:04:05 UTC