- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sun, 24 Mar 2013 10:51:54 -0400
- To: Dave Beckett <dave@dajobe.org>
- Cc: public-rdf-comments@w3.org
* Dave Beckett <dave@dajobe.org> [2013-03-23 15:38-0700] > … [eliding license issues addressed in a separate sub-thread] > I've got some tests I made for raptor after the original Turtle submission > that the WG might want to use. I give permission for them to be used > under the W3C software license > http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 > > This is what they test: > … [eliding other tests addressed in a separate sub-thread] > test-38.ttl - unicode surrogates ok or not <http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-literal>'s reference to a "Unicode string" means that "\ud801\udc69" is not a valid RDF literal: [[ D80 Unicode string: A code unit sequence containing code units of a particular Unicode encoding form … D92 UTF-8 encoding form: The Unicode encoding form that assigns each Unicode scalar value to an unsigned byte sequence of one to four bytes in length, as specified in Table3-6 and Table3-7. … • Because surrogate code points are not Unicode scalar values, any UTF-8 byte sequence that would otherwise map to code points D800..DFFF is ill-formed. ]] — <http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf> D80-D92 I propose to add a note in the non-normative description of quoted literals <http://www.w3.org/TR/turtle/#turtle-literals>: [[ Note that RDF literals are Unicode strings, they must be composed of valid Unicode characters. The code points in the Unicode surrogate code range, U+D800-U+DFFF, are not Unicode characters. ]] Per Andy Seaborne's request to test good practice <http://www.w3.org/mid/514AE55F.5080103@epimorphics.com>, but in order to not burden implementations, I have not included test 38 as a negative test. If you are satisfied with the resolution of test 38, please reply with [RESOLVED] in the subject. -- -ericP
Received on Sunday, 24 March 2013 14:52:23 UTC