Re: Proposal for ISSUE-12, string literals

On Fri, May 13, 2011 at 11:00 AM, Richard Cyganiak <richard@cyganiak.de>wrote:

> On 13 May 2011, at 15:33, Alex Hall wrote:
> > It's for this reason that I'd prefer to keep rdf:PlainLiteral out of the
> core RDF specs and reserve it for exchanging language-tagged literals with
> systems that don't support that notion.  Having to deal with the extraneous
> '@' for literals without language tags seems like needless complexity for
> what should be a simple string manipulation.
>
> Strong +1. Earlier I tried to work out the changes to the spec that would
> be required to make rdf:PlainLiteral the unified representation of strings,
> and it's a bloody mess and I really don't want to go there. I kept my notes
> on the wiki anyways:
> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/SyntacticSugarProposal
>
> > If we're going to say that everything has a datatype, I'd prefer to see
> "foo" get normalized to "foo"^^xsd:string.  But my reasons there are more
> aesthetic; it just seems wrong to single out that one particular primitive
> datatype and say that it should not be used.
> >
>
> > FWIW, my preferred approach would be to:
> > 1. Say that every literal has *either* a datatype *or* a language tag.
> > 2. Say that the datatype of the surface form "foo" is xsd:string.
>
> This feels weird. Ok, "foo" is of type string, even though the type is
> implicit, I can understand that. But why is it no longer a string if I tag
> it as English? Shouldn't it still have an implicit type of string? So you
> have replaced one weird thing (multiple ways of representing a string) with
> another weird thing (a notion of string datatypes that doesn't make sense).
>

But this notion of string datatypes follows from the existing semantics.
 The value space of both plain literals with no language tag and xsd:string
literals is the set of Unicode character strings.  The value space of
langauge-tagged literals is the set of <lexical, lang> pairs such that
'lexical' is a Unicode string and 'lang' is a language tag.  The type of a
language-tagged literal cannot be string without changing those semantics.

Users might expect simple literals to behave more like language-tagged
literals, but they have more in common with xsd:string.


>
> I think the sensible way would be:
> 1) every literal has *both* a datatype and a (possibly empty) language tag;
> 2) of the built-in datatypes, only xsd:string can have non-empty language
> tags;
> 3) plain literals and rdf:PlainLiterals don't exist;
> 4) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
> 5) "foo"@en in concrete syntaxes is syntactic sugar for
> "foo"^^xsd:string@en.
>
> This *might* work better than the rdf:PlainLiteral mess when translated
> into spec changes, but raises BC issues, and requires changes to syntax
> specs to add the syntactic sugar, so I prefer the proposal that says
> implementations MAY unify to plain literals, as it doesn't require changes
> to the abstract syntax.
>

I've played around with that notion myself, but it just seems too difficult
to enforce the "only xsd:string can have a non-empty language tag" part.
 I'm interested in hearing other people's response, though.  My instinct is
that it's a drastic enough change to the spec that implementors will cry
foul.

The main roadblock that I can see is that a datatype maps a single lexical
string to a value; you'd have to define a special notion of datatyping for
xsd:string which is essentially an identity mapping of <lexical, lang>
pairs.  Otherwise you'd have "chat"^^xsd:string@en and
"chat"^^xsd:string@frwith the same value, which won't fly.

-Alex



>
> > As long as the surface forms "foo" and "foo"^^xsd:string get normalized
> to the same thing (or systems have permission to do such normalization) then
> I'm happy.
>
> Good to hear that.
>
> Best,
> Richard

Received on Friday, 13 May 2011 15:53:21 UTC