Re: Unicode NFC - status, and RDF Concepts from Martin J. Dürst on 2011-10-14 (www-international@w3.org from October to December 2011)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Fri, 14 Oct 2011 15:45:41 +0900
To: "Phillips, Addison" <addison@lab126.com>
CC: Jeremy Carroll <jeremy@topquadrant.com>, John Cowan <cowan@mercury.ccil.org>, "www-international@w3.org" <www-international@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4E97DA95.90006@it.aoyama.ac.jp>

On 2011/10/12 5:46, Phillips, Addison wrote:
>>
>> Addison, Martin,
>>
>> my attempt to summarize this discussion for the RDF people is as follows:
>>
>> 9 years ago, at the Cannes tech plenary, I18N-WG advised RDF Core  WG
>> (i.e. Martin advised me!) that early uniform normalization was the way
>> to go, but that there was still debate about other approaches.
>
> I remember that conversation.
>
>>
>> As a result RDF Core WG deferred to the IRI draft for IRI normalization,
>> and used SHOULD language around literal normalization.
>>
>> The situation now, is that there is still debate in the I18N community,
>> and there is less consensus around early uniform normalization than before.
>
> Actually, that's not quite true. There is genuine consensus that:
>
> 1. Early Uniform Normalization would have been a Good Thing, but that it is impossible to apply it at this late date (we actually came to that conclusion at the aforementioned technical plenary nine years ago).

As I said in another mail, *strict* Early Uniform Normalization doesn't 
work, but *best-effort* Early Uniform Normalization is still very 
worthwhile.


> 2. There exist cases in which automatic normalization of content that is not already normalized is a Bad Thing. If the "should" has been ignored, the content should not later be transparently normalized. Hence...
>
> 3. Since users cannot always see or control the code point sequences used to represent their particular content, there may be difficulty in making their content "work correctly" in normalization affected cases. This means that "SHOULD" is still a good recommendation for content authors and is the reason you definitely should not remove it.

Definitely.

Regards,   Martin.

> Lack of Early Uniform Normalization means that content that is not Unicode normalized may not meet user expectations when being matched, selected, processed, etc. Note that this is precisely what Appendix J in XML addresses. The I18N community, as a last glimmer of activity, is hoping to get a change in string comparison behavior in specifications (which is *not* a change from the existing CharMod, please note, but *is* a change from the de facto state), but this doesn't affect what RDF does in this case.
>
>>
>> Hence, reasonable options for RDF WG are
>>
>> A)
>> 1) drop the informative reference to the normalization section of
>> charmod, but otherwise make no change
>> B)
>> 2) drop the "SHOULD use NFC" requirement on literals
>> C)
>> 3) update the informative reference to the normalization section of
>> charmod, but it is unclear quite to what since the situation has become
>> more confused since the 2004 publication
>
> There will be closure inside the next few months. Either way, we will be in a position to finalize CharMod in a way that is consistent with what specs actually do. I would actually do (3).
>
>>
>> RDF systems do compare literals fairly frequently, but not usually as
>> 'identifiers'
>
> No, I know it. And RDF systems will have to deal with non-normalized content as a result. The "should" recommendation makes a lot of sense (it makes literal comparison "just work"), but I believe it would be wrong to go beyond that normatively.
>
>>
>> I guess I am saying that RDF Concepts did "normatively address
>> normalization issues" but it seems that that is a moving target, so
>> maybe it was an error to try.
>>
>
> Actually, the target is pretty stable. The only remaining question (which doesn't appear to apply to RDF at present) is whether we do or do not recommend normalization on the comparison of identifier.
>
> Addison

Received on Friday, 14 October 2011 06:46:14 UTC