xml:lang on literals. from Jan Grant on 2001-07-12 (w3c-rdfcore-wg@w3.org from July 2001)

From: Jan Grant <Jan.Grant@bristol.ac.uk>
Date: Thu, 12 Jul 2001 14:10:12 +0100 (BST)
To: RDFCore Working Group <w3c-rdfcore-wg@w3.org>
Message-ID: <Pine.GSO.4.31.0107121407550.7393-100000@mail.ilrt.bris.ac.uk>
Per the action of the telecon, I'll describe the scenario/use case
that I have for xml:lang on literals, and the resulting requirements.

Much like Eric, I'm interested in meta-metadata repositories; in
particular, the application I'm involved in (describing the schemas
of remote repositories and supporting cross-search) involves a registry
of schema elements. Being a EC project, it's important that

REQUIREMENT: the schema element registry should be language-neutral.

This is pretty much an absolute requirement in Europe. Where I have
keywords, descriptions, etc., those values should carry a language with
them through some mechanism.

What goes into the schema element registry? Much like Eric's work, classes,
relationships, values, taxons (we use a lightweight event-based modelling
approach to do what DC does with qualification). RDF was a win here,
since we can give the concepts we're modelling abstract identifiers
(in this case, URIs) and hang associated per-language preferred terms,
descriptions, keywords, and so on off them.

That allows us to give users a reasonably language-neutral view of the
schema elements (example: the UDC classification has been translated into
- if memory serves - something like 30 languages).

Looking at M+S, it was quite clear that a literal was a compound term
(having a language attribute); that's the approach I took.

Note, the notion of literal having a language tag does _not_ change the
RDF model from "triples" into "quadruples". They're still triples;
it's just that a literal itself is (at least) a pair; it has its own
internal structure.

(In fact, in our application, _instances_ of literals in returned results
actually have more structure than that; they can be assigned a "type"
derived from the property arc they appear "at the sharp end of"; I used
rdf schema to describe the range of a number of properties as being
a subclass of Literal - by reference to these schema elements, we
get an effective type. This is only used for presentation, constraint
selection, etc.)

The type issue was an interesting one. In the end, I cheated; however,
it's quite possible to model types using a construct like this:

	<something> <random:property> _:a .
	_:a <rdf:type> <some:type> .
	_:a <rdf:value> "foo" .

and I did consider using a similar approach to language tagging. In the
end I rejected it for somethign more in keeping with the "feel" of RDF
that I'd got from M+S. This was a question of taste; I much preferred

	FIND	a
	WHERE	a <tax:hasBroaderTerm> <udc:101.010> .
	AND	a <tax:hasKeyword> x .
	AND	x = "chat"
	AND	x.lang = "fr"

to the alternative

	FIND	a
	WHERE	a <tax:hasBroaderTerm> <udc:101.010> .
	AND	a <tax:hasKeyword> x .
	AND	x <rdf:value> "chat" .
	AND	x <xml:lang> "fr" .

Conclusion: we're in an international environment. Removing language from
RDF is a terrible step backwards. I find the notion of the notion of
literals having structure very appealing. If we ditch this, we absolutely
must have a solid working replacement. There may be a more general
notion of a structured literal, but we should not give up what we have
while we search for it.

jan
-- 
jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/
Tel +44(0)117 9287163 Fax +44 (0)117 9287112 RFC822 jan.grant@bris.ac.uk
Prolog in JavaScript: http://tribble.ilrt.bris.ac.uk/~cmjg/logic/prolog-latest
Received on Thursday, 12 July 2001 09:10:28 UTC