Re: Unicode NFC - status, and RDF Concepts from Eric Prud'hommeaux on 2011-10-11 (www-international@w3.org from October to December 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 11 Oct 2011 18:58:50 -0400
To: David Wood <david@3roundstones.com>
Cc: "Phillips, Addison" <addison@lab126.com>, Jeremy Carroll <jeremy@topquadrant.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, John Cowan <cowan@mercury.ccil.org>, "www-international@w3.org" <www-international@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <20111011225848.GE10078@w3.org>

* David Wood <david@3roundstones.com> [2011-10-11 17:00-0400]
> 
> On Oct 11, 2011, at 16:49, "Phillips, Addison" <addison@lab126.com> wrote:
> 
> >>> B)
> >>> 2) drop the "SHOULD use NFC" requirement on literals
> >> 
> >> I'm good with this one, unless we decide to do something around our ISSUE-63:
> >>  http://www.w3.org/2011/rdf-wg/track/issues/63
> >> 
> > 
> > For reasons I just outlined, I think this would be a mistake. By avoiding denormalized text, RDF users can help ensure interoperability. In practice, this is a no-op for implementers.
> 
> Why do you see it as a noop?

I guess it depends on which implementors we're talking about, but most of the current stack (OWL, SPARQL, RIF implementers) are invoked after the implied pre-normalization step. They don't have to do any normalization. Exceptions would be those creating RDF from user input or mapping non-RDF data (e.g. RDBs) to RDF. For those folks, the advice to pre-normalize could help them to converge on one of many possible representations of e.g. product names.

I'm pretty confident that we don't want to rule out having non-normalized forms in the domain of discourse (especially since applying the same codepoint comparison works regardless of normalization), but that we'd like to *advise* folks to converge where it's in their interest to do so and advising NFKC is a good path to that end. Thus, if say "It is recommended to use Unicode Normal Form KC [NFKC] for both literals and IRIs when there is no explicit reason to preserve the non-normalized form.", we probably hit the sweet point (and most present implementors don't have to do anything).

> Regards,
> Dave
> 
> > 
> > Addison
> 

-- 
-ericP

Received on Tuesday, 11 October 2011 22:59:35 UTC