W3C home > Mailing lists > Public > public-rdf-comments@w3.org > March 2013

surrogates is literals

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sun, 24 Mar 2013 10:51:54 -0400
To: Dave Beckett <dave@dajobe.org>
Cc: public-rdf-comments@w3.org
Message-ID: <20130324145153.GN14139@w3.org>
* Dave Beckett <dave@dajobe.org> [2013-03-23 15:38-0700]
> … [eliding license issues addressed in a separate sub-thread]
> I've got some tests I made for raptor after the original Turtle submission
> that the WG might want to use.  I give permission for them to be used
> under the W3C software license
> http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
> This is what they test:
> … [eliding other tests addressed in a separate sub-thread]
>    test-38.ttl - unicode surrogates ok or not

reference to a "Unicode string" means that "\ud801\udc69" is not a
valid RDF literal:

D80 Unicode string:
A code unit sequence containing code units of a particular Unicode
encoding form
D92 UTF-8 encoding form:
The Unicode encoding form that assigns each Unicode scalar value to an
unsigned byte sequence of one to four bytes in length, as specified in
Table3-6 and Table3-7.

• Because surrogate code points are not Unicode scalar values,
  any UTF-8 byte sequence that would otherwise map to code points
  D800..DFFF is ill-formed.
— <http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf> D80-D92

I propose to add a note in the non-normative description of quoted
literals <http://www.w3.org/TR/turtle/#turtle-literals>:
Note that RDF literals are Unicode strings, they must be composed of
valid Unicode characters. The code points in the Unicode surrogate
code range, U+D800-U+DFFF, are not Unicode characters.

Per Andy Seaborne's request to test good practice
<http://www.w3.org/mid/514AE55F.5080103@epimorphics.com>, but in order
to not burden implementations, I have not included test 38 as a
negative test.

If you are satisfied with the resolution of test 38, please reply with
[RESOLVED] in the subject.
Received on Sunday, 24 March 2013 14:52:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:55 UTC