Re: language-tagged literal datatypes

Andy,

On 5 Sep 2011, at 22:51, Andy Seaborne wrote:
> On 19/08/11 14:28, Richard Cyganiak wrote:
>> On 19 Aug 2011, at 00:11, Pat Hayes wrote:
>>> Option 2. All literals have a type. rdf:LangString is a special
>>> datatype whose L2V mapping takes a pair of strings as input and
>>> returns a language-tagged pair as output. This mapping is the
>>> identity mapping on pairs<string, tag>, just as xsd:String is the
>>> identity mapping on single strings. DATATYPE("foo"@en) returns
>>> rdf:LangString, following the normal rules for datatyping.
>> 
>> There's also 2b:
>> 
>> All literals have a type. rdf:LangString is a special type, where the
>> lexical form is<string,langtag>  rather than just a string, and it
>> doesn't have an L2V mapping. The value of an rdf:LangString literal
>> is the same as the lexical form. DATATYPE("foo"@en) returns
>> rdf:LangString, following the normal rules.
>> 
>> (The advantage of 2b versus 2 is that the L2V mechanism can remain
>> unchanged. It can remain defined as functions from string to value,
>> rather than functions from anything to value as required by 2. In 2,
>> the L2V of rdf:LangString is just the trivial identity mapping
>> anyways, and resorting to the L2V mapping device just to explain a
>> no-op mapping is overkill.)
>> 
>> (2b also makes it easy to re-write the rdf:PlainLiteral spec into a
>> spec titled “An L2V mapping for rdf:LangString” that just defines an
>> L2V mapping that takes "foo@en" to<"foo","en">, while keeping the
>> current restrictions on use of such lexical forms. So I'd hope it
>> would be an easier sell to the OWL/RIF WGs.)
> 
> Slight problem:
> 
> STR(?x) returns the lexical form of a literal.  The language string is the conventional extension to SPARQL in current deployments.
> 
> If the lexical form is <string,langtag>, then that would be returned. There is also whether you can write
> 
> ???^^rdf:LangString
> 
> c.f. rdf:PlainLiteral.

Later in the thread I came around to see that it's better to define it differently: "foo"@en has a lexical form "foo" and a language tag "en". This is how the terminology was used in RDF 2004 and there isn't really any reason to change it.

> A solution is to just say in the syntaxes '''the value of "foo"@en is <foo, en>'''
> 
> This leave L2V alone 9it's not used) and answers what happens if you write  ???^^rdf:LangString -- it's an ill-defined literal.

Yes, this is basically what I'm advocating now. rdf:langString would still *have* an L2V, but it wouldn't be *used* to define its value, just like you say above. The L2V is the empty mapping and the lexical space is empty and the value space is <lex,lang> pairs. Since the lexical space is empty, "anything"^^rdf:langString is going to be ill-typed.

This “vestigial” datatype definition for rdf:langString is just to meet the formal definition of datatypes in RDF. If we don't do this, then all the machinery around datatypes-as-classes in RDF Semantics breaks (or so I'm told).

> It's also posisble to define STR() specifically for language tagged literals to mean the string part.  

If you say, “STR() returns the lexical form of a literal” then it should be fine.

Summary of proposal:

rdf:langString typed literals are completely normal typed literals, except:
1. they have a non-empty language tag besides the lexical form
2. their lexical space is empty
3. their value is not L2V(datatypeIRI)(lexicalForm) but instead a pair <lexicalForm, languageTag>

Best,
Richard


> that stil leaves opne about writing ^^rdf:LangString.
> 
> 	Andy
> 
> 
>> 
>>> option 2: + simplifies literal syntax + removes SPARQL errors +
>>> theoretically clean -- requires change to the datatyping model
>> 
>> option 2b: + simplifies literal syntax + removes SPARQL errors + no
>> changes to datatyping model -- introduces one exceptional datatype
>> that works differently from all others
>> 
>>> If we say that the L2V mapping takes as input all the syntactic
>>> 'components' of a literal, rather than forcing these to be all
>>> inside one string, then we allow such things as literals with
>>> latitude and longitude denoting positions, complex numbers with
>>> real and imaginary parts, etc.., without forcing people to invent
>>> coding tricks (like the trailing '^' in rdf:PlainLiteral) to
>>> artificially map these into a single string. This might be a
>>> genuinely useful extension, in other words.
>> 
>> Being able to express lat/long pairs and complex numbers in the
>> abstract syntax isn't really if you have no way of writing them down
>> in a concrete syntax. So you either still need to squish them into a
>> single string, or extend your RDF syntax of choice with additional
>> syntactic sugar for expressing that kind of literal.
>> 
>>> We can also quietly deprecate rdf:PlainLiteral along with 8-track
>>> tape players.
>> 
>> A major motivation for rdf:PlainLiteral is the desire to
>> stick<string,langtag>  pairs into a single string, so I'm afraid it
>> won't be quite as easy.
>> 
>> Best, Richard
> 

Received on Tuesday, 6 September 2011 13:38:10 UTC