Re: language-tagged literal datatypes from Ivan Herman on 2011-09-06 (public-rdf-wg@w3.org from September 2011)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 6 Sep 2011 16:00:06 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <5EFFD8D5-F55B-45C0-8A03-60A80AEE14C1@w3.org>
I had an action last week to produce a WBS form clearly spelling out the various options, that the group could vote on it.

Seeing this discussion I wonder whether we *do* have the options clear for such a vote. If so, I would appreciate if one of you gave them to me...

Ivan

On Sep 6, 2011, at 15:37 , Richard Cyganiak wrote:

> Andy,
> 
> On 5 Sep 2011, at 22:51, Andy Seaborne wrote:
>> On 19/08/11 14:28, Richard Cyganiak wrote:
>>> On 19 Aug 2011, at 00:11, Pat Hayes wrote:
>>>> Option 2. All literals have a type. rdf:LangString is a special
>>>> datatype whose L2V mapping takes a pair of strings as input and
>>>> returns a language-tagged pair as output. This mapping is the
>>>> identity mapping on pairs<string, tag>, just as xsd:String is the
>>>> identity mapping on single strings. DATATYPE("foo"@en) returns
>>>> rdf:LangString, following the normal rules for datatyping.
>>> 
>>> There's also 2b:
>>> 
>>> All literals have a type. rdf:LangString is a special type, where the
>>> lexical form is<string,langtag>  rather than just a string, and it
>>> doesn't have an L2V mapping. The value of an rdf:LangString literal
>>> is the same as the lexical form. DATATYPE("foo"@en) returns
>>> rdf:LangString, following the normal rules.
>>> 
>>> (The advantage of 2b versus 2 is that the L2V mechanism can remain
>>> unchanged. It can remain defined as functions from string to value,
>>> rather than functions from anything to value as required by 2. In 2,
>>> the L2V of rdf:LangString is just the trivial identity mapping
>>> anyways, and resorting to the L2V mapping device just to explain a
>>> no-op mapping is overkill.)
>>> 
>>> (2b also makes it easy to re-write the rdf:PlainLiteral spec into a
>>> spec titled “An L2V mapping for rdf:LangString” that just defines an
>>> L2V mapping that takes "foo@en" to<"foo","en">, while keeping the
>>> current restrictions on use of such lexical forms. So I'd hope it
>>> would be an easier sell to the OWL/RIF WGs.)
>> 
>> Slight problem:
>> 
>> STR(?x) returns the lexical form of a literal.  The language string is the conventional extension to SPARQL in current deployments.
>> 
>> If the lexical form is <string,langtag>, then that would be returned. There is also whether you can write
>> 
>> ???^^rdf:LangString
>> 
>> c.f. rdf:PlainLiteral.
> 
> Later in the thread I came around to see that it's better to define it differently: "foo"@en has a lexical form "foo" and a language tag "en". This is how the terminology was used in RDF 2004 and there isn't really any reason to change it.
> 
>> A solution is to just say in the syntaxes '''the value of "foo"@en is <foo, en>'''
>> 
>> This leave L2V alone 9it's not used) and answers what happens if you write  ???^^rdf:LangString -- it's an ill-defined literal.
> 
> Yes, this is basically what I'm advocating now. rdf:langString would still *have* an L2V, but it wouldn't be *used* to define its value, just like you say above. The L2V is the empty mapping and the lexical space is empty and the value space is <lex,lang> pairs. Since the lexical space is empty, "anything"^^rdf:langString is going to be ill-typed.
> 
> This “vestigial” datatype definition for rdf:langString is just to meet the formal definition of datatypes in RDF. If we don't do this, then all the machinery around datatypes-as-classes in RDF Semantics breaks (or so I'm told).
> 
>> It's also posisble to define STR() specifically for language tagged literals to mean the string part.  
> 
> If you say, “STR() returns the lexical form of a literal” then it should be fine.
> 
> Summary of proposal:
> 
> rdf:langString typed literals are completely normal typed literals, except:
> 1. they have a non-empty language tag besides the lexical form
> 2. their lexical space is empty
> 3. their value is not L2V(datatypeIRI)(lexicalForm) but instead a pair <lexicalForm, languageTag>
> 
> Best,
> Richard
> 
> 
>> that stil leaves opne about writing ^^rdf:LangString.
>> 
>> 	Andy
>> 
>> 
>>> 
>>>> option 2: + simplifies literal syntax + removes SPARQL errors +
>>>> theoretically clean -- requires change to the datatyping model
>>> 
>>> option 2b: + simplifies literal syntax + removes SPARQL errors + no
>>> changes to datatyping model -- introduces one exceptional datatype
>>> that works differently from all others
>>> 
>>>> If we say that the L2V mapping takes as input all the syntactic
>>>> 'components' of a literal, rather than forcing these to be all
>>>> inside one string, then we allow such things as literals with
>>>> latitude and longitude denoting positions, complex numbers with
>>>> real and imaginary parts, etc.., without forcing people to invent
>>>> coding tricks (like the trailing '^' in rdf:PlainLiteral) to
>>>> artificially map these into a single string. This might be a
>>>> genuinely useful extension, in other words.
>>> 
>>> Being able to express lat/long pairs and complex numbers in the
>>> abstract syntax isn't really if you have no way of writing them down
>>> in a concrete syntax. So you either still need to squish them into a
>>> single string, or extend your RDF syntax of choice with additional
>>> syntactic sugar for expressing that kind of literal.
>>> 
>>>> We can also quietly deprecate rdf:PlainLiteral along with 8-track
>>>> tape players.
>>> 
>>> A major motivation for rdf:PlainLiteral is the desire to
>>> stick<string,langtag>  pairs into a single string, so I'm afraid it
>>> won't be quite as easy.
>>> 
>>> Best, Richard
>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 6 September 2011 14:00:26 UTC