[RESOLVED] Re: surrogates is literals

On 3/24/13 7:51 AM, Eric Prud'hommeaux wrote:
> * Dave Beckett <dave@dajobe.org> [2013-03-23 15:38-0700]
>> … [eliding license issues addressed in a separate sub-thread]
>> I've got some tests I made for raptor after the original Turtle submission
>> that the WG might want to use.  I give permission for them to be used
>> under the W3C software license
>> http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
>>
>> This is what they test:
>> … [eliding other tests addressed in a separate sub-thread]
>>    test-38.ttl - unicode surrogates ok or not
> 
> <http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-literal>'s
> reference to a "Unicode string" means that "\ud801\udc69" is not a
> valid RDF literal:
> 
> [[
> D80 Unicode string:
> A code unit sequence containing code units of a particular Unicode
> encoding form
> …
> D92 UTF-8 encoding form:
> The Unicode encoding form that assigns each Unicode scalar value to an
> unsigned byte sequence of one to four bytes in length, as specified in
> Table3-6 and Table3-7.
> …
> 
> • Because surrogate code points are not Unicode scalar values,
>   any UTF-8 byte sequence that would otherwise map to code points
>   D800..DFFF is ill-formed.
> ]]
> — <http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf> D80-D92
> 
> I propose to add a note in the non-normative description of quoted
> literals <http://www.w3.org/TR/turtle/#turtle-literals>:
> [[
> Note that RDF literals are Unicode strings, they must be composed of
> valid Unicode characters. The code points in the Unicode surrogate
> code range, U+D800-U+DFFF, are not Unicode characters.
> ]]
> 
> Per Andy Seaborne's request to test good practice
> <http://www.w3.org/mid/514AE55F.5080103@epimorphics.com>, but in order
> to not burden implementations, I have not included test 38 as a
> negative test.
> 
> If you are satisfied with the resolution of test 38, please reply with
> [RESOLVED] in the subject.
> 

Received on Monday, 8 April 2013 13:58:32 UTC