Lang tags are not datatypes

A paper I wrote in 2004 with Addison Phillips is available here:

http://www.inter-locale.com/whitepaper/iswc2004.pdf

or here:

http://www.springerlink.com/content/v0wecux93d2vjejt/

The basic idea is to map each lang tag to a class of the literals tagged 
with that lang tag.
Hence this is some, but not all, of the datatyping thing. In particular, 
it allows use of lang tag classes in rdfs:range expressions.

Here are some of the issues discussed:

1) case insensitivity of lang tags
2) the relationship between newer tags such as tli and their 
grandfathered equivalents such as i-klingon
3) the relationship between a tag such as en and a tag using the 
implicit script for that language, en-latn
4) interactions in asian languages (chinese script and geography), 
japanese scripts

1 and 2 are definitely meant as identities, so much so that we should 
definitely respect (1) and respecting (2) is rather like respecting the 
built-in relationship between the xsd datatypes, i.e. there is some 
identity between

"Dah mojaqmeyvam divusnisbe' 'e' vihar"@tli
and
"Dah mojaqmeyvam divusnisbe' 'e' vihar"@i-klingon

====

Clearly the proposal to map a datatype uri to a map from a string  to a 
pair consisting of the string and the datatype uri is mathematical 
possible, if not exactly enlightening, I suspect that the values of a 
datatype are meant to be /interesting/ in terms of that datatype, so 
that the natural relationships between
"cat"@en and "chat"@fr
appear
and the unnatural ones between
"pavement"@en-gb and "pavement"@en-us do not.

By keeping language tagged literals separate from datatypes, we make it 
clear that very different processing is needed, and very different 
considerations should be applied.

Jeremy

Received on Saturday, 28 May 2011 01:05:28 UTC