- From: RDF Working Group Issue Tracker <sysbot+tracker@w3.org>
- Date: Fri, 19 Aug 2011 18:44:12 +0000
- To: public-rdf-wg@w3.org
RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 http://www.w3.org/2011/rdf-wg/track/issues/75 Raised by: Richard Cyganiak On product: The lexical space of xsd:string doesn't cover all Unicode strings. I assume we will end up referring to XSD 1.1 for the definition of xsd:string [1]. That document leaves it up to implementations whether they support the XML 1.0 or XML 1.1; accordingly, the definition of allowed characters in an xsd:string is [2] or [3]. The more permissive one from XML 1.1: Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] This excludes #x0, Unicode codepoint U+0000. XML 1.0 also excludes a number of other control codes in the #x0-#x1F range. The definition of “lexical form” in RDF 2004 [4] says “Unicode string”, which according to [5] includes *all* codepoints including the control codes. So, any string that includes #x0 was a valid untagged plain literal in RDF 2004. In RDF 1.1, it will be typed as an xsd:string, and thus will be an ill-typed literal. (On the other hand, such strings could never be serialized in RDF/XML or XHTML+RDFa; they were serializable only in N-Triples and Turtle.) Is this a problem? Can we go ahead with the new literal design despite this restriction? Should we acknowledge it in the RDF Concepts spec? [1] http://www.w3.org/TR/2005/WD-xmlschema11-2-20050224/datatypes.html#string [2] http://www.w3.org/TR/REC-xml/#dt-character [3] http://www.w3.org/TR/xml11/#NT-Char [4] http://www.w3.org/TR/rdf-concepts/#dfn-lexical-form [5] http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
Received on Friday, 19 August 2011 18:44:17 UTC