W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > April 2002

Charmod-Literal

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Wed, 3 Apr 2002 19:34:40 +0100
To: <w3c-rdfcore-wg@w3.org>
Cc: <bwm@hplb.hpl.hp.com>
Message-ID: <JAEBJCLMIFLKLOJGMELDCEJBCDAA.jjc@hplb.hpl.hp.com>
Last telecon the charmod literal issue (number 13) collapsed because DanC didn't like the erratum process that was being proposed.

I wanted to have another go basically aiming at a similar resolution, using grey test-cases.

I am using HTML/UTF-8 to try and prevent the funny characters getting munged.


--------------------------------------------------------------------------------

White test case:

RDF/XML

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/">
   <!-- Dürst registers himself as a creator of the Charmod WD. -->
   <rdf:Description rdf:about="http://www.w3.org/TR/2002/WD-charmod-20020220">
   <!-- The ü below is a single character #xFC in NFC -->
      <eg:Creator eg:named="Dürst"/>
   </rdf:Description>
</rdf:RDF>

N-Triple

_:j21411 <http://example.org/named> "D\u00FCrst" .
<http://www.w3.org/TR/2002/WD-charmod-20020220> <http://example.org/Creator> _:j21411 .


--------------------------------------------------------------------------------


DaveB: have I encoded the ü correctly?


--------------------------------------------------------------------------------

Black test case 1:

Not RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/">
<!-- Someone else registers himself under the unused name of Dürst,
        along with some other creation as its creator. -->
   <rdf:Description rdf:about="http://example.org/adult-content.html">
   <!-- The ü below is two characters a u followed by 
          #x308. It should be displayed identically to  ü. -->
      <eg:Creator eg:named="Dürst"/>
   </rdf:Description>
</rdf:RDF>

Rationale: the literal value is not in NFC.

--------------------------------------------------------------------------------

Black test case 2

Not RDF/XML

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/">
<!-- Someone else registers himself under the unused name of Dürst,
        along with some other creation as its creator. -->
   <rdf:Description rdf:about="http://example.org/adult-content.html">
   <!-- The ü below is two characters a u followed by 
          #x308. It should be displayed identically to  ü. -->
      <eg:Creator eg:named="Du&#x308;rst"/>
   </rdf:Description>
</rdf:RDF>

Rationale: the literal value is not in NFC.


--------------------------------------------------------------------------------

Grey test case 1:

 
Maybe RDF/XML

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/#">
<!-- A string beginning with a non-spacing umlaut. -->
   <rdf:Description>
   <!-- The ̈ below is a non-spacing umlaut, i.e. #x308. -->
      <eg:strange>̈foo</eg:strange>
   </rdf:Description>
</rdf:RDF>

Corresponding to maybe n-triple

_:j21475 <http://example.org/#strange> "\u0308foo" .

--------------------------------------------------------------------------------

Grey test case 2:

Maybe RDF/XML
 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/#">
<!-- A string beginning with a non-spacing umlaut. -->
   <rdf:Description>
   <!-- The &#x308; below is a non-spacing umlaut. -->
      <eg:strange>&#x308;bar</eg:strange>
   </rdf:Description>
</rdf:RDF>

Corresponding to maybe n-triple

_:j21475 <http://example.org/#strange> "\u0308bar" .

--------------------------------------------------------------------------------


Black test case 3:

Not RDF/XML
 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/#">
<!-- A string beginning with a non-spacing umlaut. -->
   <rdf:Description>
   <!-- The first character of the string below is a COMBINING LONG SOLIDUS OVERLAY
        U+0338. 
       It combines with the > of the XML tag to make U+226F NOT GREATER-THAN.
      Thus this document is not in NFC.
     -->
      <eg:strange≯foobar</eg:strange>
   </rdf:Description>
</rdf:RDF>

--------------------------------------------------------------------------------


Grey test case 3:

Maybe RDF/XML

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:eg="http://example.org/#">
<!-- A string beginning with a non-spacing umlaut. -->
   <rdf:Description>
   <!-- The first character of the string below is a COMBINING LONG SOLIDUS OVERLAY
        U+0338. 
       It combines with the > of the XML tag to make U+226F NOT GREATER-THAN.
      Thus this document is not in NFC.
     -->
      <eg:strange>&#x338;foobar</eg:strange>
   </rdf:Description>
</rdf:RDF>


Maybe N-triple

_:j21573 <http://example.org/#strange> "\u0338foobar" .

--------------------------------------------------------------------------------

I propose 
a) the white test case is legal RDF/XML
b) black test cases 1 & 2 are not RDF/XML
c) conformant RDF processors may treat the grey test cases 1 and 2 in any of the following fashions:
   - reject the input with an error message
   - process the input to produce the given n-triples preferrably with a warning message.
d) black test case 3 is not RDF/XML
e) grey test case 3 is processed as under (c)


Justifications:
a) the literal is in NFC
b) the literals are not in NFC
c) the literals start with a combining character.
    implementations conforming with the current version of charmod should reject this. Implementations conforming with earlier versions of charmod would accept this. There does not yet appear to be any consensus concerning strings starting with a combining character.
   I see it as desirable to:
   - permit implementations that conform with our best guess at what the charmod req will say (which is the current WD)
   - not to over-commit to an I18N requirement that lacks stability and consensus
d) the whole file is not in NFC and is not legal by any (recent) version of charmod
e) ARP cannot distinguish this case from the previous one, but it is similar to (c)


(Note: I believe I have used the N-triple unicode escaping mechanism correctly but I would prefer if we don't get hung up about that aspect).


Brian,

can I suggest we try discussing the test cases this week. If we can agree the behaviour then I can produce text to resolve the issue for another time.

Jeremy
Received on Wednesday, 3 April 2002 13:36:54 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:47:20 EDT