syntax for "langed" literals

RDF already has a syntax for plain literals with language tags, namely

Boris has added a built-in datatype for these plain literals,
owl:internationalizedString, to go along with xsd:string, the existing
datatype for plain literals without language tags.

It remains to add syntax to select on the language tag. 

I suggest using the already-existing facilities for datatype
restrictions to select on language tags.

This would add a new dataytpe facet, langPattern - owl:langPattern in
RDF - that would be applicable only to owl:internationalizedString.  The
meaning of this facet would be to match the *value* of the language tag
against the pattern, using the same algorithm as in XML Schema

As well, owl:internationalizedString would admit the length, minLength,
maxLength, and pattern facets, which would be applied to the string part
of the literal.

So, strings in English or dialects of English would be 

  DatatypeRestriction(owl:internationalizedString langPattern "en*")

Note: The divergence from the pattern matching in XML Schema datatypes
is intentional, as RDF language tags are normalized into lower case, so
pattern matching against any lexical form ends up being much more

Question:  Should langPattern be turned into lower case as well?   It
would be nice to have langPattern "en-US" match against a particular
dialect of English.  Language tags in RFC3066 are supposed to be case
insensitive, so it would be possible to specify a case insensitive
pattern match, and then suggest that this could be done just by
normalizing everything to lower case.  

Aside: Is there a new version of RFC3066 out?  I seem to remember
something along these lines.

Note: Matching against the string part of an owl:internationalizedString
is in keeping with XML Schema datatypes, as the XML lexical form of an
owl:internationalizedString is the string part (i.e., not including the
language tag).

Peter F. Patel-Schneider
Bell Labs Research

PS:  This appears to satisfy Bijan's ACTION-142.

Received on Tuesday, 6 May 2008 10:47:19 UTC