Re: Unicode NFC - status, and RDF Concepts from David Wood on 2011-10-12 (www-international@w3.org from October to December 2011)

From: David Wood <david@3roundstones.com>
Date: Tue, 11 Oct 2011 22:49:14 -0400
To: Eric Prud'hommeaux <eric@w3.org>
Cc: "Phillips, Addison" <addison@lab126.com>, Jeremy Carroll <jeremy@topquadrant.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, John Cowan <cowan@mercury.ccil.org>, "www-international@w3.org" <www-international@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <E5C720A6-5167-4F3C-B34E-05C8CB6ADE3E@3roundstones.com>

On Oct 11, 2011, at 18:58, Eric Prud'hommeaux <eric@w3.org> wrote:

> * David Wood <david@3roundstones.com> [2011-10-11 17:00-0400]
>> 
>> On Oct 11, 2011, at 16:49, "Phillips, Addison" <addison@lab126.com> wrote:
>> 
>>>>> B)
>>>>> 2) drop the "SHOULD use NFC" requirement on literals
>>>> 
>>>> I'm good with this one, unless we decide to do something around our ISSUE-63:
>>>> http://www.w3.org/2011/rdf-wg/track/issues/63
>>>> 
>>> 
>>> For reasons I just outlined, I think this would be a mistake. By avoiding denormalized text, RDF users can help ensure interoperability. In practice, this is a no-op for implementers.
>> 
>> Why do you see it as a noop?
> 
> I guess it depends on which implementors we're talking about, but most of the current stack (OWL, SPARQL, RIF implementers) are invoked after the implied pre-normalization step. They don't have to do any normalization. Exceptions would be those creating RDF from user input or mapping non-RDF data (e.g. RDBs) to RDF. For those folks, the advice to pre-normalize could help them to converge on one of many possible representations of e.g. product names.

Well, right, but it seems like normalizing RDF upon ingest to a triple store of any form would hurt, maybe a lot.  I don't think we should just dismiss that without some analysis. 

Regards,
Dave

> 
> I'm pretty confident that we don't want to rule out having non-normalized forms in the domain of discourse (especially since applying the same codepoint comparison works regardless of normalization), but that we'd like to *advise* folks to converge where it's in their interest to do so and advising NFKC is a good path to that end. Thus, if say "It is recommended to use Unicode Normal Form KC [NFKC] for both literals and IRIs when there is no explicit reason to preserve the non-normalized form.", we probably hit the sweet point (and most present implementors don't have to do anything).
> 
> 
>> Regards,
>> Dave
>> 
>>> 
>>> Addison
>> 
> 
> -- 
> -ericP
>

Received on Wednesday, 12 October 2011 02:49:53 UTC