Re: Unicode NFC - status, and RDF Concepts from Martin J. Dürst on 2011-10-13 (www-international@w3.org from October to December 2011)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Thu, 13 Oct 2011 10:48:49 +0900
To: John Cowan <cowan@mercury.ccil.org>
CC: "Phillips, Addison" <addison@lab126.com>, Jeremy Carroll <jeremy@topquadrant.com>, "www-international@w3.org" <www-international@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4E964381.7080500@it.aoyama.ac.jp>

On 2011/10/11 23:57, John Cowan wrote:
> Phillips, Addison scripsit:
>
>> XML is an interesting case because it makes the opposite decision
>> consciously: two canonically-equivalent but unequal identifiers are
>> not equal.
>
> And this applies to both XML names and to namespace URIs.

And it therefore also applies to RDF URIs, because otherwise you would 
get very weird inconsistencies between RDF/XML and the rest of RDF.

In RDF, it's the RDF URIs that play the role of identifers and hold the 
RDF graph together, and RDF Literals are just data hung off of the graph 
like hydrogen atoms in chemistry, and can be anything from typed data to 
short strings to very long text (think dc:description or something 
such). Therefore, it seems rather counterintuitive (to say the least) to 
require codepoint-by-codepoint comparision for the identifiers but 
normalization before comparison for literals.

My understanding is that when comparing literals in RDF, in many cases, 
some guessing is involved. As an example, if there are different 
literals that read:

a) Martin Dürst
b) Martin J. Dürst
c) a) in NFD
d) b) in NFD
e) Martin Duerst
f) martin duerst
g) Martin Dürst
and so on.

Then it may easily be that a) and g) are not the same person even though 
the Literals are codepoint-by-codepoint identical, and on the other 
hand, a), b), e,), and f) are the same despite not being equal even if 
you take normalization into account.

So in my understanding of RDF and the Semantic Web, comparing RDF 
Literals is an issue where each application has to make its own choices, 
and the relevant specs (e.g. SPARQL) should offer these various choices 
where feasible.

Regards,   Martin.

Received on Thursday, 13 October 2011 01:49:20 UTC