Re: language-tagged literal datatypes from Pat Hayes on 2011-08-19 (public-rdf-wg@w3.org from August 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 19 Aug 2011 16:02:02 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "public-rdf-wg@w3.org Group WG" <public-rdf-wg@w3.org>
Message-Id: <4DA071D1-A8D5-48CD-B796-72D543F4D9B9@ihmc.us>
On Aug 19, 2011, at 1:12 PM, Richard Cyganiak wrote:

> On 19 Aug 2011, at 17:00, Pat Hayes wrote:
>> So let me get this clear. This exceptional datatype associates the lexical form <string, tag> to the identical value <string, tag>,
> 
> Yes. The <string,tag> pair should perhaps not be called a “lexical form” but something else.

I think this is completely absurd, but I won't argue the point any further. 

> 
> Strawman text:
> 
> “A language-tagged string consists of a lexical form that is a Unicode string, and a language tag. Its datatype IRI is rdf:LangString. Unlike in regular typed literals, no lexical-to-value mapping is associated with this datatype IRI. The value of a language-tagged string is a tuple consisting of the lexical form and the language tag: <lexicalForm, languageTag>.”

Suggested alternative:

"A language-tagged literal consists of a lexical form with two parts, a Unicode string and a language tag. Its datatype IRI is rdf:LangString. The value of the literal is a language-tagged string, i.e. a pair consisting of the string and the tag, in that order. While the mapping between the lexical form and the value is not a lexical-to-value mapping as defined in <<link to relevant section>> as it applies to two arguments rather than a single string,  rdf:LangString is considered to be an RDF datatype and satisfies all other conditions on a datatype."

> 
>> but its L2V mapping is not the identity map, because it doesn't have an L2V mapping?
> 
> Yes.
> 
> RDF Concepts has this to say about datatypes:
> 
> * The lexical space of a datatype is a set of Unicode [UNICODE] strings.
> * The lexical-to-value mapping of a datatype is a set of pairs whose
>  first element belongs to the lexical space of the datatype […].
> * RDF may be used with any datatype definition that conforms to this
>  abstraction […].
> 
> This definition would have to be changed to accommodate approach 2. The lexical space would have to be broadened to contain … what? Everything?

All the syntactic content of the literal other than the datatype IRI itself. If this syntax defines, say, three strings, then the L2V map has three arguments. We could make this change by the following wording change:

* The lexical space of a datatype is a set of Unicode strings or tuples of Unicode strings. 

although I think I would prefer this (to avoid debates about whether a tag is a Unicode string):

* The lexical item of a datatype is a sequence of items encoded as character strings, which must be provided as part of the literal syntactic form. In most cases, this lexical item will be a single Unicode string. 

* The lexical space of a datatype is the set of all lexical items of that datatype.

> Unicode strings and <Unicode string, language tag> pairs? No change is required for 2b.
> 
> The spec also says:
> 
> “The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [XMLSCHEMA-2].”
> 
> XML Schema 1.1 Datatypes has this to say:
> 
> “In this specification, a datatype has […] a ·lexical space·, which is a set of character strings used to denote the values. […] The lexical space of a datatype is the prescribed domain of ·the lexical mapping· for that datatype.”
> 
> I wouldn't want to deviate from the XSD spec if possible.

We already deviate from XSD, eg we do not use the 'facets' notion and its associated machinery. 

> 
>> This seems completely insane to me, and I don't quite know how I would justify it to a cynical reader, but whatever.
> 
> We can call it “rhubarb mapping” or anything else instead of “L2V mapping” if that seems less insane to you.
> 
>> There really is no practical difference for users between 2 and 2b,
> 
> Correct.
> 
>> so at this point we are arguing about theoretical elegance rather than about anything that actually matters. 
> 
> Well, my concern is to come up with words that get us the desired effect and that don't get us in trouble with other WGs.

Me too, but I think we can do that either way. Either way, we are asking people to be able to handle literals with two strings in their syntax, right? So why not just admit that this is the case? 

But OK, in the interests of moving forward, I guess I will stop arguing at this point. 

Pat

> 
> Looks like you don't really mind whether language-tagged strings use the lexical-to-value mapping device or not. So let's see what Andy says.
> 
> Best,
> Richard
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 19 August 2011 21:02:33 UTC