IRI normalization

Some unicode characters are called "combining characters". NCCHAR
permits some of these, eg. [#x0300-#x036F] :

NCCHAR ::= NCCHAR1 | '_' | '-' | "." | [0-9] | #x00B7
           | [#x0300-#x036F] | [#x203F-#x2040]

At issue is whether RDF data may have both of these predicates:
  HR:resumé (normalized EACUTE)
  HR:resumé(latin 'e' with COMBINING ACUTE ACCENT)

Some time between Jan [200301] and Sep [200309], the RDF Core WG
dropped the following text from the RDF URI References definition:
[[
  [A URI reference] is in Normal Form C [NFC] and
..
  Note: RDF URI references are compatible with the  anyURI datatype as
  defined by XML schema datatypes [XML-SCHEMA2], constrained to be an
  absolute rather than a relative URI reference, and constrained to be
  in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
]]

The reason [why] given is:
[[
  Given changes in advice from I18N, we deleted the normalization form
  C constraint from RDF URI references definition.
]]

The spec refrences RFC2396 but not RFC3987 (published later, in Jan
2005). [RFC3987] has this to say about normalization:
[[
  a. If the IRI is written on paper, read aloud, or otherwise
  represented as a sequence of characters independent of any character
  encoding, represent the IRI as a sequence of characters from the UCS
  normalized according to Normalization Form C (NFC, [UTR15]).
]]

If RDF data is s'posed to be normalized, then we should do the same
with SPARQL Query. Still researching. Relevent test case:
  http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization-01.rq
on
  http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl


[200301] http://www.w3.org/TR/2003/WD-rdf-concepts-20030123/#dfn-URI-reference
[200309] http://www.w3.org/TR/2003/WD-rdf-concepts-20030905/#dfn-URI-reference
[why] http://www.w3.org/TR/2003/WD-rdf-concepts-20030905/#section-substantive-Revisions
[RFC3987] http://www.ietf.org/rfc/rfc3987.txt
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Tuesday, 28 June 2005 10:37:23 UTC