Re: Unicode NFC - status, and RDF Concepts from Andy Seaborne on 2011-10-14 (public-rdf-wg@w3.org from October 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 14 Oct 2011 13:34:19 +0100
To: public-rdf-wg@w3.org
Message-ID: <4E982C4B.8010706@epimorphics.com>

On 12/10/11 03:49, David Wood wrote:
> On Oct 11, 2011, at 18:58, Eric Prud'hommeaux<eric@w3.org>  wrote:
>
>> * David Wood<david@3roundstones.com>  [2011-10-11 17:00-0400]
>>>
>>> On Oct 11, 2011, at 16:49, "Phillips, Addison"<addison@lab126.com>  wrote:
>>>
>>>>>> B)
>>>>>> 2) drop the "SHOULD use NFC" requirement on literals
>>>>>
>>>>> I'm good with this one, unless we decide to do something around our ISSUE-63:
>>>>> http://www.w3.org/2011/rdf-wg/track/issues/63
>>>>>
>>>>
>>>> For reasons I just outlined, I think this would be a mistake. By avoiding denormalized text, RDF users can help ensure interoperability. In practice, this is a no-op for implementers.
>>>
>>> Why do you see it as a noop?
>>
>> I guess it depends on which implementors we're talking about, but most of the current stack (OWL, SPARQL, RIF implementers) are invoked after the implied pre-normalization step. They don't have to do any normalization. Exceptions would be those creating RDF from user input or mapping non-RDF data (e.g. RDBs) to RDF. For those folks, the advice to pre-normalize could help them to converge on one of many possible representations of e.g. product names.
>
> Well, right, but it seems like normalizing RDF upon ingest to a triple store of any form would hurt, maybe a lot.  I don't think we should just dismiss that without some analysis.
>
> Regards,
> Dave
>
>>
>> I'm pretty confident that we don't want to rule out having non-normalized forms in the domain of discourse (especially since applying the same codepoint comparison works regardless of normalization), but that we'd like to *advise* folks to converge where it's in their interest to do so and advising NFKC is a good path to that end. Thus, if say "It is recommended to use Unicode Normal Form KC [NFKC] for both literals and IRIs when there is no explicit reason to preserve the non-normalized form.", we probably hit the sweet point (and most present implementors don't have to do anything).

Normalization could be a L2V-ism.

and viewed like values 001 and +1

i.e. nice to treat them value-same but don't expect it always to be done 
for you.

	Andy

Received on Friday, 14 October 2011 12:34:51 UTC