W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2011

History: why are lang tags and datatypes disjoint.

From: Jeremy Carroll <jeremy@topquadrant.com>
Date: Wed, 18 May 2011 14:19:01 -0700
Message-ID: <4DD437C5.107@topquadrant.com>
To: RDF Working Group WG <public-rdf-wg@w3.org>
I have been tracking the string literals discussion, and not felt a need 
to join in. As far as I can see everyone is doing a great job. However, 
Gavin suggests my input on the historical question may be useful, so 
here goes.

Short version:

We had made mistakes with XMLLiteral design, and when fixing them some 
WG participants changed the literal design as well.

Five different designs were considered in
and OPTION 4 (the current design) was the clear winner (although not my 
favorite - I don't remember what was - and I have found the archived 
message yet!)

Full version:
In early versions of RDF Concepts lang tags and datatypes were not 
disjoint, the first 'last call'* text was:

6.5 RDF Literals

A literal in an RDF graph contains three components called:

     * The lexical form being a Unicode [UNICODE] string in Normal Form 
C [NFC].
     * The language identifier as defined by [RFC-3066], normalized to 
     * The datatype URI being an RDF URI reference.

The lexical form is present in all RDF literals; the language identifier 
and the datatype URI may be absent from an RDF literal.

A plain literal is one in which the datatype URI is absent.

A typed literal is one in which the datatype URI is present.

This generate negative feedback, particularly:

specification of literals is goofy... "A literal in an RDF graph
   contains three components called: ...
   The datatype URI being an RDF URI reference. ...
   A plain literal is one in which the datatype URI is absent."
Hello? you just told me every literal has one.

Specify that the datatype URI and language identifier
are optional.

and the resulting new text to address this comment was:

6.5 RDF Literals

A literal in an RDF graph contains one or two named components.

All literals have a lexical form being a Unicode [UNICODE] string in 
Normal Form C [NFC].

Plain literals have a lexical form and optionally a language tag as 
defined by [RFC-3066], normalized to lowercase.

Typed literals have a lexical form and a datatype URI being an RDF URI 

Hmmmm. That is the formal trail, and it does not reveal why the design 
changes. Dan Connolly's comment was largely editorial, and it resulted 
in a substantive change.

The formal response to DanC, from RDF Core (signed by me) was

Drilling back in the archive, this issue was discussed along with rather 
more significant stuff to do with problems with XMLLiteral,
there seem to have been four different designs from me for these various 

Option 4 was chosen:


>  Option 4:
>  Language tag is simply dropped from all typed literals including
>  rdf:XMLLiteral

   Concepts is changed to say that a literal can have either a datatype or a
language tag and not both.
   rdf:XMLLiteral datatype is changed to have the identity as its lexical
value mapping (no wrapping), with consequential change to the value space of

   Other editors to make consequential changes


The minute on this, written by myself - again! - was - I note that I 
abstained on the issue.
PatrickS is Stickler form Nokia, ILRT was Dave Beckett and Dan Grant and 
Dan Brickley maybe. Somewhat misminuted in that I used member (ILRT) in 
one instance, and participant (PatrickS and Jeremy) in another.


12: Language tags in typed literals

The discussion meandered somewhat.

We noticed that the issue list is out of date.
ACTION: bwm fix issue list resolution for rdfms-literal-is-xml-structure

This msg contains the four options considered.

Of which option 4, which is in:
was the favourite.

There was discussion of whether option 4 would require review
from I18N-WG and XML Core.

The following text was quoted from the exclusive XML
  Canonicalization recommendation:

"attributes in the XML namespace, such as xml:lang and xml:space
are not imported into orphan nodes"

PatrickS proposes option 4 from msg 0086.
ILRT seconds. Jeremy abstaining.

RESOLVED: Typed literals option 4 from msg 0086

ACTION: jjc Make typed literal changes in concepts.
ACTION: jjc Review concepts to make consequential changes concerning typed
ACTION: path Review semantics to make changes concerning typed literals
ACTION: jjc Provide anchor for rdf:XMLLiteral to Pat Hayes
ACTION: daveb Change Ntriples to remove language from typed literals
ACTION: daveb Review syntax to make changes concerning typed literals
ACTION: jang Review all tests to make changes concerning typed literals
ACTION: em Review primer to make changes concerning typed literals
ACTION: bwm Review issue list and update those affected.
ACTION: jjc Inform reagle-0[12] raisers of typed literals decision
ACTION: path Tell pfps of change to literals decision
ACTION: jjc Inform I18N-WG of literals decision.
ACTION: bwm Update status of pfps-08 if necessary

The IRC log for this agendum is found here:

I think the most important point is:

15:11:18 [jjcscribe]
    many people propose/second option 4 ...

Summarized by me in e-mail as:

So far I am the only one to have spoken against 4 - if there are no others
who join me in that position in the telecon I am currently expecting option
4 to win.

Option 4 makes XMLLiteral ignore language.

Ahhh - here is a key link: "the ugly parade"


* For those new to W3C, the beginning of the end of the Recommendation 
track is a publication called 'last call', but RDF like many recs had 
more than one, hence the somewhat odd construct of "first 'last call'" 

PS Gavin is unavailable for the next few telecons, and I will 
participate in his place.
Received on Wednesday, 18 May 2011 21:19:36 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:42 GMT