W3C home > Mailing lists > Public > public-rdf-wg@w3.org > August 2011

Re: language-tagged literal datatypes

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Mon, 22 Aug 2011 09:59:24 +0200
Message-ID: <4E520C5C.1080202@liris.cnrs.fr>
To: Pat Hayes <phayes@ihmc.us>
CC: Richard Cyganiak <richard@cyganiak.de>, "public-rdf-wg@w3.org Group WG" <public-rdf-wg@w3.org>
My 2 cents concerning the 2/2b debate:

I sympathize with Pat's desire for elegance, and I agree that
generalizing the notion of datatype to include language-tagged strings
seems feasible;

however, I share Richard's concerns that, with such a generalization, we
might drift too far away from XML-Schema, and that it may be overkill
for solving only one use-case (as Richard pointed out, there is no plan
to extend RDF for supporting other "compound" lexical values).

Finally, while I see Richard's arguments, his proposal seems a bit
cumbersome to me as well.

Here's an attempt to make both of them happy...

Option 2c: All literals have a type. rdf:LangString is a special type,
with an empty lexical space, and a value space containing pairs of the
form <string,langtag>; obviously, its L2V mapping is empty. Literals
with datatype rdf:LangString are special in that they are represented
(in the abstract syntax) directly by their value, rather than by a
lexical form. DATATYPE("foo"@en) returns rdf:LangString, following the
normal rules.

In a nutshell, it keeps language-tagged strings a special case (rather
than engaging in a grand generalization), but shifts the specialness to
the literals themselves rather than the datatype.

  pa

On 08/19/2011 11:02 PM, Pat Hayes wrote:
> 
> On Aug 19, 2011, at 1:12 PM, Richard Cyganiak wrote:
> 
>> On 19 Aug 2011, at 17:00, Pat Hayes wrote:
>>> So let me get this clear. This exceptional datatype associates the lexical form <string, tag> to the identical value <string, tag>,
>>
>> Yes. The <string,tag> pair should perhaps not be called a “lexical form” but something else.
> 
> I think this is completely absurd, but I won't argue the point any further. 
> 
>>
>> Strawman text:
>>
>> “A language-tagged string consists of a lexical form that is a Unicode string, and a language tag. Its datatype IRI is rdf:LangString. Unlike in regular typed literals, no lexical-to-value mapping is associated with this datatype IRI. The value of a language-tagged string is a tuple consisting of the lexical form and the language tag: <lexicalForm, languageTag>.”
> 
> Suggested alternative:
> 
> "A language-tagged literal consists of a lexical form with two parts, a Unicode string and a language tag. Its datatype IRI is rdf:LangString. The value of the literal is a language-tagged string, i.e. a pair consisting of the string and the tag, in that order. While the mapping between the lexical form and the value is not a lexical-to-value mapping as defined in <<link to relevant section>> as it applies to two arguments rather than a single string,  rdf:LangString is considered to be an RDF datatype and satisfies all other conditions on a datatype."
> 
>>
>>> but its L2V mapping is not the identity map, because it doesn't have an L2V mapping?
>>
>> Yes.
>>
>> RDF Concepts has this to say about datatypes:
>>
>> * The lexical space of a datatype is a set of Unicode [UNICODE] strings.
>> * The lexical-to-value mapping of a datatype is a set of pairs whose
>>  first element belongs to the lexical space of the datatype […].
>> * RDF may be used with any datatype definition that conforms to this
>>  abstraction […].
>>
>> This definition would have to be changed to accommodate approach 2. The lexical space would have to be broadened to contain … what? Everything?
> 
> All the syntactic content of the literal other than the datatype IRI itself. If this syntax defines, say, three strings, then the L2V map has three arguments. We could make this change by the following wording change:
> 
> * The lexical space of a datatype is a set of Unicode strings or tuples of Unicode strings. 
> 
> although I think I would prefer this (to avoid debates about whether a tag is a Unicode string):
> 
> * The lexical item of a datatype is a sequence of items encoded as character strings, which must be provided as part of the literal syntactic form. In most cases, this lexical item will be a single Unicode string. 
> 
> * The lexical space of a datatype is the set of all lexical items of that datatype.
> 
>> Unicode strings and <Unicode string, language tag> pairs? No change is required for 2b.
>>
>> The spec also says:
>>
>> “The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [XMLSCHEMA-2].”
>>
>> XML Schema 1.1 Datatypes has this to say:
>>
>> “In this specification, a datatype has […] a ·lexical space·, which is a set of character strings used to denote the values. […] The lexical space of a datatype is the prescribed domain of ·the lexical mapping· for that datatype.”
>>
>> I wouldn't want to deviate from the XSD spec if possible.
> 
> We already deviate from XSD, eg we do not use the 'facets' notion and its associated machinery. 
> 
>>
>>> This seems completely insane to me, and I don't quite know how I would justify it to a cynical reader, but whatever.
>>
>> We can call it “rhubarb mapping” or anything else instead of “L2V mapping” if that seems less insane to you.
>>
>>> There really is no practical difference for users between 2 and 2b,
>>
>> Correct.
>>
>>> so at this point we are arguing about theoretical elegance rather than about anything that actually matters. 
>>
>> Well, my concern is to come up with words that get us the desired effect and that don't get us in trouble with other WGs.
> 
> Me too, but I think we can do that either way. Either way, we are asking people to be able to handle literals with two strings in their syntax, right? So why not just admit that this is the case? 
> 
> But OK, in the interests of moving forward, I guess I will stop arguing at this point. 
> 
> Pat
> 
>>
>> Looks like you don't really mind whether language-tagged strings use the lexical-to-value mapping device or not. So let's see what Andy says.
>>
>> Best,
>> Richard
>>
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
Received on Monday, 22 August 2011 08:06:34 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:44 GMT