Re: varieties of datatyped tagged literals

On 07/09/11 17:42, Pierre-Antoine Champin wrote:
> Following todays's discussion, let me rephrase the rationale of each
> "family" of solution:
>
> 1. Don't change anything: literals will have *either* a datatype or a
> literal.
>
> In the following options, we unify literals by ensuring that every
> literal has a datatype.
>
> 2. The language tag is still "outside" the (lexical/value) mechanism of
> the datatype; the various sub-options differ in how this
> extra-information is introduced in the system.
>
> In the following options, we unify literals even more by making
> language-tagged literals a special case of datatyped literal.
>
> 3. The language tag is attached to the by the datatype.
>
> 4. The language tag is attached to the lexical form.
>
>
> I agree with Pat: the longer I think about it, the better 4 looks after all.
>
> I know that the pain is in the ugly lexical form "chat@fr", but I would
> expect the following arrangements to make that bearable:
>
> * SPARQL would have to define a special case for the str() function, so
> that it does not return the *full* lexical form (e.g. "chat@fr") but the
> *stripped* one (e.g. "chat").
>
> * APIs could arrange similarly. Jena, for example, could return "chat"
> for Literal.toString(), but "chat@fr" for Literal.getLexicalForm(),
> though I suspect this may cause some backward compatibility problem.

Yes - this would a significant compatibility problem.

> Another option would be to let Literal.getLexicalForm() return "chat" as
> before (documenting the fact that, in that case, this is not the "real"
> lexical form) and introduce a new method Literal.getFullLexicalForm()
> return "chat@fr", for the sake of completeness.

Aside from Jena, I also think it would be a hard to explain why

<prop xml:lang="en">foo</prop>

or

"foo"@en

has lexical form of "foo@en" when all other lexical forms are what's 
between the "-quotes or the >...<.

> But those are minor pains compared to the implications of any other
> solution, I think.

Minor pain for you maybe ... major for me :-(

>    pa
>
> PS: of course, the WG will not tell the API implementors what to do, but
> it should probably provide guidelines about how to handle the changes in
> RDF 1.1 .

+1

I think guidance will yield consistency, hopeful speed the transistion 
and provide the implementers with something to point users to rathe than 
developing their own explanations.

 Andy

>
>
> On 09/07/2011 06:10 AM, Pat Hayes wrote:
>> OK, sorry this is late, but here is my best attempt to summarize the various options for how to handle datatyping of tagged literals. I have tried to be objective and up to date, but feel free to correct any mistakes y'all might still find here. Thanks to Pierre-Antoine and Richard for recent corrections.
>>
>> Throughout, I will illustrate with the literal "foo"@tag. In some cases it is necessary to distinguish this surface syntax from the abstract "real" syntax form. As SPARQL refers to the 'lexical form' of a literal, which has to be a string, to be returned by STR(), I will list what this is in each case.
>>
>> In all cases, the value is the pair<"foo", tag>.
>>
>> 1. Current state: tagged literals have no type.
>>
>> 2. Lexical form is "foo", datatype is rdf:TaggedLiteral. There are various ways to "fix" the spec to make this possible:
>>
>> 2a. Abstract syntax is a pair<"foo", str>, and we modify the RDF datatype definitions to allow an L2V mapping from pairs to pairs. (Pain: major change to specs, possible clash with OWL and XSD specs.)
>> 2b. There is no L2V mapping, and this datatype is anomalous but specified by the RDF semantics directly, and is a datatype by fiat. (Pain: this datatype is anomalous and must not be used with the ^^ syntax.)
>> 2c. The abstract syntax has no lexical form, the dataype is empty and the L2V is the empty mapping. Nevertheless, the value is linked to the present syntax by the RDF semantics directly and this is a datatype by fiat. (Pain: overly elaborate; the idea of an empty datatype is confusing, and having an L2V map which does not specify the actual value is even more confusing :-).)(Positive: the illegality of literals of the form "string"^^rdf:TaggedLiteral falls out automatically.)
>>
>> 3. Lexical form is "foo", datatype is unique to the tag, ie there is one datatype per tag. These are conventional datatypes with a welldefined L2V mapping. Again there are several (well, two) options based on this idea.
>>
>> 3a. We invent an IRI naming convention for these datatypes, eg rdf:taggedLiteral/tag. Then this is the type of the literal. (Pain: inventing this open-ended naming convention.)
>> 3b. These per-tag datatypes are all anonymous and have no IRI, but are sub-datatypes of rdf:TaggedLiteral, which is returned as the type for them all. (Pain: overly elaborate; potentially confusing; need to define a new notion of sub-datatype.)
>>
>> 4. Lexical form is "foo@tag", where tag is required to be nonempty and not contain '@' (just as in the rdf:PlainLIteral spec). This is a conventional datatype (it is rdf:PlainLiteral restricted to nonempty tags) with a conventional L2V mapping. (Pain: might be considered to be the wrong lexical form (??)) (Positive: conforms closely to existing specs; simple; extra tag information might be useful?)
>>
>> ------
>>
>> On balance, my own vote is for either 2b or 4, and the longer I think about it, the better 4 looks after all. If we choose one of the 2 family, I would plead editorial discretion to be allowed to choose among them depending on which one fits best with the semantics, when we get down to details. They differ only in theoretical issues. Well, OK, I give up on 2a.
>>
>> Pat
>>
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>>
>
>

Received on Wednesday, 7 September 2011 18:15:23 UTC