Re: Rethinking ISSUE-12 with lang datatypes from Antoine Zimmermann on 2011-05-26 (public-rdf-wg@w3.org from May 2011)

From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
Date: Thu, 26 May 2011 15:43:01 +0000
Cc: public-rdf-wg <public-rdf-wg@w3.org>
Message-Id: <4DDE74E6.4070707@insa-lyon.fr>
Le 26/05/2011 16:27, Richard Cyganiak a écrit :
> On 25 May 2011, at 17:50, Antoine Zimmermann wrote:
>> Adding datatypes for each language tags may work as follows:
> 
> Thanks for writing this up Antoine. Would be great if you could
> maintain this in a wiki page as well!

Yes, I'm going to do it.

>> For a language tag {langTag}, "xxx"@{langTag} would be interpreted
>> as a typed literal of type rdf:lang{langTag}.
> 
> Make that rdf:string-{langTag}, so we'd end up with rdf:string-fr,
> rdf:string-en-gb and so on.

You're right, it looks better.

> So, would it be accurate to say that "xxx"@en is syntactic sugar for
> "xxx"^^rdf:string-en ?

Yes. In any case, the interpretations of "xxx"@en and of
"xxx"^^rdf:string-en would be the same, so the syntax "xxx"@en would be
preferred for backward compatibility reason.

> Would serializers be allowed to emit the "xxx"^^rdf:string-en form?

Preferably not.

>> For any language tag {langTag}, there is a datatype
>> rdf:lang{langTag} such that:
>> 
>> - the lexical space is all unicode strings. - the value space is
>> all pairs<string,{langTag}> - the lexical to value space is
>> L2V(rdf:lang{langTag})(xxx)=<xxx,{langTag}>
> 
> I see
> 
>> There is an infinite number of lang datatypes and {langTag} SHOULD
>> be restricted to what RFC 5646 defines, but implementation MAY
>> accept any string for lang tags (e.g., "foo"@mylangtag-bar42 MAY be
>> considered as a valid literal by parsers),
> 
> RDF Concepts currently says that the language tag must be valid
> according to RFC 5646, and lowercase. So I'd say that anything of the
> form rdf:lang{langTag} where {langTag} is not lowercase or not
> syntactically valid according to RFC 5646 is an ill-typed literal.

The MAY is there to allow implementers to avoid digging into the horrible lang tag specifications when reading the tag as a simple string is enough to do a good job. I'm not sure it would cause any failure if a string such as "mylang" was incorrectly accepted as a language tag.

>> in which case, a corresponding datatype rdf:land{langTag} MUST
>> exist.
> 
> I don't know what that is supposed to mean.

I wanted to say that if {langTag} is a valid language tag (according to RFC 5646), then there is a datatype rdf:lang{langTag} (or rdf:lang-{langTag} if you prefer).

>> Additionally, we can add an additional datatype which is a
>> superclass of all the lang datatypes (e.g.,
>> rdf:LangTaggedLiteral).
> 
> Make that rdf:LangTaggedString for increased clarity.

Temporarilly, and make it shorter when a better name is found.

>> This additional datatype has an empty lexical space but its value
>> space is the set of all pairs<string,tag>.
> 
> This doesn't have to be a datatype. Making it a class would be easier
> and sufficient for using it in rdfs:range declarations.

Yes, and in fact it is better since by the definition of datatype, the lexical space must be a non empty set.

>> It follows that the following triples are valid under the
>> appropriate entailment regime:
>> 
>> rdf:lang{langTag} rdf:type rdfs:Datatype; rdfs:subClassOf
>> rdf:LangTaggedLiteral .
> 
> I see
> 
>> rdf:LangTaggedLiteral rdf:type rdfs:Datatype;
> 
> I'd make this:
> 
> rdf:LangTaggedString a rdfs:Class;

Agreed.

>> rdfs:subClassOf rdf:PlainLiteral .
>> 
>> In OWL, we have, for all pairs of distinct {langTag1} and
>> {langTag2}:
>> 
>> rdf:lang{langTag1} owl:disjointWith rdf:lang{langTag2}.
>> rdf:LangTaggedLiteral owl:equivalentClass [ rdf:type
>> rdfs:Datatype; owl:onDatatype rdf:PlainLiteral;
>> owl:withRestrictions( [rdf:langRange "*"] ) ]. rdf:lang{langTag}
>> owl:equivalentClass [ rdf:type rdfs:Datatype; owl:onDatatype
>> rdf:PlainLiteral; owl:withRestrictions( [rdf:langRange "{langTag}"]
>> ) ].
>> 
>> DRAWBACKS: - an infinite number of datatypes (but we already have
>> an infinite number of RDF properties anyway); - OWL 2 does not talk
>> about these new types, so the OWL 2 RDF-based semantics is
>> incomplete wrt RDF 1.1 semantics; - there is no relationship
>> between "sublanguages" like "en" VS "en-GB".
> 
> This point is no different than in current RDF, nor is it any
> different from any other proposal considered so far, so it's not a
> drawback.

Yes, I just wanted to emphasise that this proposal does not add this feature, because some people in the WG said they would like to see a relationship between, e.g., "foo"@en and "foo"@en-GB (see the answers to your quiz).

>> - others?
>> 
>> ADVANTAGES: - compared to rdf:PlainLiteral, we distinguish
>> langTagged and non-langTagged literals; and the lexical form is
>> more natural; - one can define language-specific range restrictions
>> (e.g., ex:englishLabel rdfs:range rdf:langen.) in RDF without the
>> need for OWL 2 datatype machinery; - compared to RDF alone, we have
>> everything typed, which can be seen as a simplification. - others?
>> 
>> 
>> Regards, -- Antoine Zimmermann Researcher at: Laboratoire
>> d'InfoRmatique en Image et Systèmes d'information Database Group 7
>> Avenue Jean Capelle 69621 Villeurbanne Cedex France Tel: +33(0)4 72
>> 43 61 74 - Fax: +33(0)4 72 43 87 13 Lecturer at: Institut National
>> des Sciences Appliquées de Lyon 20 Avenue Albert Einstein 69621
>> Villeurbanne Cedex France antoine.zimmermann@insa-lyon.fr
>> http://zimmer.aprilfoolsreview.com/
>> 
>> 
> 


-- 
Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
France
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex
France
antoine.zimmermann@insa-lyon.fr
http://zimmer.aprilfoolsreview.com/
Received on Saturday, 28 May 2011 10:14:08 UTC