- From: Peter Patel-Schneider <pfpschneider@gmail.com>
- Date: Wed, 17 Apr 2013 10:41:07 -0700
- To: RDF WG <public-rdf-wg@w3.org>
There has been some discussion of the status of xsd:string literals and related literals in RDF documents and graphs, particularly with respect to ASCII control characters. The situation in RDF 2004 was that plain literals could include all Unicode code points. It was assumed by some people that this meant that plain literals wihtout language tags were the same as xsd:string. However, not all Unicode control points are allowed in XSD strings. In particular, #x0 is not allowed. The current Concepts says, in Section 3.3, that simple literals are sugar for typed literals with type xsd:string. In the changes section it says: The xsd:string datatype does not permit the #x0 character, and implementations may not permit control codes in the #x1-#x1F range. Earlier versions of RDF allowed these characters in simple literals, although they could never be serialized in a W3C-recommended concrete syntax. This last not correct. As well, xsd:string has undergone a change recently, allowing more control characters. As I see it, the situation is thus as follows, using Turtle syntax. All examples are syntactically correct and produce valid RDF literals. Syntax: "\u0000" 2004: plain literal Value: the Unicode string containing a single NULL Current: ill-typed xsd:string literal Syntax: "\u0001" 2004: plain literal Value: the Unicode string containing a single SOH Current: well-typed xsd:string literal Value: the Unicode string containing a single SOH Syntax: "\u0001"^^xsd:string 2004: ill-typed xsd:string literal Current: well-typed xsd:string literal Value: the Unicode string containing a single SOH I think that the following changes are required in the core documents. Concepts: Changes section: The xsd:string datatype does not permit the #x0 character, and implementations may not permit control codes in the #x1-#x1F range. Earlier versions of RDF allowed these characters as values in simple literals, although they could never be serialized in a W3C-recommended concrete syntax. Currently a literal with type xsd:string containing the #x0 character is an ill-typed literal, but is syntactically permissable. Semantics: Section 4 However, IL is total on language-tagged strings (but not on literals of type xsd:string).
Received on Wednesday, 17 April 2013 17:41:34 UTC