Re: Unicode NFC - status, and RDF Concepts

On 2011/10/11 23:57, John Cowan wrote:
> Phillips, Addison scripsit:
> 
>> XML is an interesting case because it makes the opposite decision
>> consciously: two canonically-equivalent but unequal identifiers are
>> not equal.
> 
> And this applies to both XML names and to namespace URIs.

And it therefore also applies to RDF URIs, because otherwise you would get very weird inconsistencies between RDF/XML and the rest of RDF.

In RDF, it's the RDF URIs that play the role of identifers and hold the RDF graph together, and RDF Literals are just data hung off of the graph like hydrogen atoms in chemistry, and can be anything from typed data to short strings to very long text (think dc:description or something such). Therefore, it seems rather counterintuitive (to say the least) to require codepoint-by-codepoint comparision for the identifiers but normalization before comparison for literals.

My understanding is that when comparing literals in RDF, in many cases, some guessing is involved. As an example, if there are different literals that read:

a) Martin Dürst
b) Martin J. Dürst
c) a) in NFD
d) b) in NFD
e) Martin Duerst
f) martin duerst
g) Martin Dürst
and so on.

Then it may easily be that a) and g) are not the same person even though the Literals are codepoint-by-codepoint identical, and on the other hand, a), b), e,), and f) are the same despite not being equal even if you take normalization into account.

So in my understanding of RDF and the Semantic Web, comparing RDF Literals is an issue where each application has to make its own choices, and the relevant specs (e.g. SPARQL) should offer these various choices where feasible.

Regards,   Martin.

Received on Thursday, 13 October 2011 05:47:49 UTC