Rethinking ISSUE-12 with lang datatypes


[disclaimer: I am not vehemently in favour of that proposal, just expressing my thoughts aloud.]

Adding datatypes for each language tags may work as follows:

For a language tag {langTag}, "xxx"@{langTag} would be interpreted as a typed literal of type rdf:lang{langTag}. For any language tag {langTag}, there is a datatype rdf:lang{langTag} such that:

- the lexical space is all unicode strings.
- the value space is all pairs <string,{langTag}>
- the lexical to value space is L2V(rdf:lang{langTag})(xxx)=<xxx,{langTag}>

There is an infinite number of lang datatypes and {langTag} SHOULD be restricted to what RFC 5646 defines, but implementation MAY accept any string for lang tags (e.g., "foo"@mylangtag-bar42 MAY be considered as a valid literal by parsers), in which case, a corresponding datatype rdf:land{langTag} MUST exist.

Additionally, we can add an additional datatype which is a superclass of all the lang datatypes (e.g., rdf:LangTaggedLiteral). This additional datatype has an empty lexical space but its value space is the set of all pairs <string,tag>.

It follows that the following triples are valid under the appropriate entailment regime:

rdf:lang{langTag} rdf:type rdfs:Datatype;
  rdfs:subClassOf rdf:LangTaggedLiteral .
rdf:LangTaggedLiteral rdf:type rdfs:Datatype;
  rdfs:subClassOf rdf:PlainLiteral .

In OWL, we have, for all pairs of distinct {langTag1} and {langTag2}:

rdf:lang{langTag1} owl:disjointWith rdf:lang{langTag2}.
rdf:LangTaggedLiteral owl:equivalentClass [
 rdf:type rdfs:Datatype;
 owl:onDatatype rdf:PlainLiteral;
 owl:withRestrictions( [rdf:langRange "*"] )
rdf:lang{langTag} owl:equivalentClass [
 rdf:type rdfs:Datatype;
 owl:onDatatype rdf:PlainLiteral;
 owl:withRestrictions( [rdf:langRange "{langTag}"] )

- an infinite number of datatypes (but we already have an infinite number of RDF properties anyway);
- OWL 2 does not talk about these new types, so the OWL 2 RDF-based semantics is incomplete wrt RDF 1.1 semantics;
- there is no relationship between "sublanguages" like "en" VS "en-GB".
- others?

- compared to rdf:PlainLiteral, we distinguish langTagged and non-langTagged literals; and the lexical form is more natural;
- one can define language-specific range restrictions (e.g., ex:englishLabel rdfs:range rdf:langen.) in RDF without the need for OWL 2 datatype machinery;
- compared to RDF alone, we have everything typed, which can be seen as a simplification.
- others?

Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex

Received on Thursday, 26 May 2011 14:04:38 UTC