Re: Proposal for ISSUE-12, string literals

On Fri, May 13, 2011 at 11:00 AM, Richard Cyganiak <>wrote:

> On 13 May 2011, at 15:33, Alex Hall wrote:
> > It's for this reason that I'd prefer to keep rdf:PlainLiteral out of the
> core RDF specs and reserve it for exchanging language-tagged literals with
> systems that don't support that notion.  Having to deal with the extraneous
> '@' for literals without language tags seems like needless complexity for
> what should be a simple string manipulation.
> Strong +1. Earlier I tried to work out the changes to the spec that would
> be required to make rdf:PlainLiteral the unified representation of strings,
> and it's a bloody mess and I really don't want to go there. I kept my notes
> on the wiki anyways:
> > If we're going to say that everything has a datatype, I'd prefer to see
> "foo" get normalized to "foo"^^xsd:string.  But my reasons there are more
> aesthetic; it just seems wrong to single out that one particular primitive
> datatype and say that it should not be used.
> >
> > FWIW, my preferred approach would be to:
> > 1. Say that every literal has *either* a datatype *or* a language tag.
> > 2. Say that the datatype of the surface form "foo" is xsd:string.
> This feels weird. Ok, "foo" is of type string, even though the type is
> implicit, I can understand that. But why is it no longer a string if I tag
> it as English? Shouldn't it still have an implicit type of string? So you
> have replaced one weird thing (multiple ways of representing a string) with
> another weird thing (a notion of string datatypes that doesn't make sense).

But this notion of string datatypes follows from the existing semantics.
 The value space of both plain literals with no language tag and xsd:string
literals is the set of Unicode character strings.  The value space of
langauge-tagged literals is the set of <lexical, lang> pairs such that
'lexical' is a Unicode string and 'lang' is a language tag.  The type of a
language-tagged literal cannot be string without changing those semantics.

Users might expect simple literals to behave more like language-tagged
literals, but they have more in common with xsd:string.

> I think the sensible way would be:
> 1) every literal has *both* a datatype and a (possibly empty) language tag;
> 2) of the built-in datatypes, only xsd:string can have non-empty language
> tags;
> 3) plain literals and rdf:PlainLiterals don't exist;
> 4) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
> 5) "foo"@en in concrete syntaxes is syntactic sugar for
> "foo"^^xsd:string@en.
> This *might* work better than the rdf:PlainLiteral mess when translated
> into spec changes, but raises BC issues, and requires changes to syntax
> specs to add the syntactic sugar, so I prefer the proposal that says
> implementations MAY unify to plain literals, as it doesn't require changes
> to the abstract syntax.

I've played around with that notion myself, but it just seems too difficult
to enforce the "only xsd:string can have a non-empty language tag" part.
 I'm interested in hearing other people's response, though.  My instinct is
that it's a drastic enough change to the spec that implementors will cry

The main roadblock that I can see is that a datatype maps a single lexical
string to a value; you'd have to define a special notion of datatyping for
xsd:string which is essentially an identity mapping of <lexical, lang>
pairs.  Otherwise you'd have "chat"^^xsd:string@en and
"chat"^^xsd:string@frwith the same value, which won't fly.


> > As long as the surface forms "foo" and "foo"^^xsd:string get normalized
> to the same thing (or systems have permission to do such normalization) then
> I'm happy.
> Good to hear that.
> Best,
> Richard

Received on Friday, 13 May 2011 15:53:21 UTC