W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > April 2002

RE: Charmod-Literal

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 4 Apr 2002 13:29:50 +0100
To: "Brian McBride" <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDKEJHCDAA.jjc@hplb.hpl.hp.com>
Again, I am using HTML/UTF-8 to try and prevent the funny chars getting munged.


Brian:
> I like the test cases approach Jeremy.  It greatly helps to
> clarify the issue.
>
> The thing that struck me was whether these cases were expressing rulings
> about RDF/XML or about the graph syntax, e.g. is a literal
> beginning with a
> combining character black, white or grey in the graph syntax.
>

Like the xml:base test cases I am reusing our standard test case format to verbosely say something rather simpler:

White test case:

"Dürst" is a legal literal label in an RDF graph, (in N-triple escape notation "D\u00FCrst" ).

Black test case 1 & 2:

"Dürst" is not a legal literal label in an RDF graph (in N-triple escape notation "Du\u0308rst")

(In my UTF-umlaut aware mailer the two statements above appear to directly contradict one another! They don't. If I click at the beginning of one of the lines and then use the right arrow to move through the string I see they have different numbers of characters).

Grey test case 1 & 2

"̈foo" is a legal but not fully interoperable literal labal in an RDF graph ("\u0308foo")

Black test case 3, grey test case 3

"̸foobar" is a legal but not fully interoperable literal label in an RDF graph ("\u0338foobar")
It presents particular difficulties with the XML serialization.

The phrase "legal but not fully interoperable" is a different approach to trying to get the wiggle room that your erratum approach was trying to achieve. In particular, it would allow a later WG to prohibit or fully bless these literals in the light of the consensus forming around charmod.

An argument about the grey cases that is not charmod/xml dependent is as follows:

Unicode strings in NFC are safe with certain operations (e.g. substring) but not safe against concatenation. Unicode strings in NFC that do not start in a combining character can be safely concatenated but substring is unsafe. Since concatenation is the more useful operation, it is better to give the class of unicode strings in NFC that do not start in a combining character the priviledged status. Personally, I think that's where the I18N guys are arriving at at the moment; but there is not consensus yet, and charmod (not RDF) is the vehicle to achieve that consensus.

Jeremy
Received on Thursday, 4 April 2002 07:31:25 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:47:21 EDT