RE: xml:lang [was Re: Outstanding Issues ]

I have risked blood with I18N, I have risked my sanity with C14N, ...

... to ensure one principle that I think is key to this discussion.

Literals have well defined equality; and that equality is an equivalence
relation.

As long as that is true then the model theory works, tidiness works, our
test cases work.


What is a Literal?
==================

Given two literals it is possible to tell whether they are the same or not.
A literal is used to label the nodes in the graph that are not blank and are
not labelled with URIrefs.



From the non-syntactic point of view that is it.

Of course with the datatyping based on XML Schema Datatypes, we also need:

There is a partial function from the set of literals to strings.



....

OK, let's try again.

What is a Literal
=================

Each literal is one of
  - a unicode string
  - a unicode string and an RFC 3066 lang tag
  - a well-balanced XML fragment
  - a well-balanced XML fragment and an RFC 3066 lang tag

Two literals are equal if they are of the same kind (i.e. the same one of
the above list) and each component of the two literals are equal.

Two unicode strings are equal if they are equal as binary UTF-8 sequences.
Two RFC 3066 lang tags are equal if they are ASCII equal ignoring case.
Two well-balanced XML fragments are equal if they canonicalise to equal
strings.

(note: the latter statement is not yet quite precise enough, I am working on
it).

(note: greater mathematical precision would define a literal as an
equivalence class of the above set of literals).

Terminology
===========
A literal of the first two kinds is referred to as a string literal.
A literal of the last two kinds is referred to as an XML literal.

What is Tidiness
================

A tidy graph has no two nodes with equal labels.

How does datatyping work
========================
An RDF graph using an XML literal in a datatyping context is ill-formed.
In a datatyping context where a unicode string is needed as the argument in
a lexical to value mapping the string component of a string literal is used.
The language component (if any) of the string literal is ignored.




Hope this clarifies things.

Jeremy

Received on Friday, 1 March 2002 13:42:03 UTC