- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 10 Apr 2013 15:38:55 -0400
- To: 'W3C RDF WG' <public-rdf-wg@w3.org>
Tests like LITERAL1_all_controls include control codes not allowed in xsd:string. XSD says that xsd:strings are XML character data: [[ The ·value space· of string is the set of finite-length sequences of characters (as defined in [XML 1.0 (Second Edition)]) that ·match· the Char production from [XML 1.0 (Second Edition)]. ]] — http://www.w3.org/TR/xmlschema-2/#string XML character data excludes non-whitespaec control characters: [[ A parsed entity contains text…Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. ]] — http://www.w3.org/TR/REC-xml/#dt-character Points 4 below explain why this calls into question whether any string can contain (so called "C0") control codes and be typed as an xsd:string. I have to say, I've always appreciated that RDF doesn't make me uu-encode or invent escaping mechanisms all the time like XML does; this control code issue is tied to a behavior which makes RDF (e.g. Turtle) considerably more flexible and easy to deal with. * Eric Prud'hommeaux <eric@w3.org> [2013-04-07 17:55-0400] > I've had these niggling doubts for a while, and finally succumbed to > that morbid desire to explore some problems that I'd rather not know > about. We've all known for a while that we can create graphs with APIs > (now even serializable in Turtle) which can't be written in RDF/XML. > Here's a list of issues I think we need to clarify: > > > > 1 Namespaces are OK syntactically[nssyn], though our notion of namespace > IRIs is of course outside the Namespaces definition as URIs [nsURI]. > [nssyn] http://www.w3.org/TR/REC-xml-names/#NT-Attribute > [nsURI] http://www.w3.org/TR/REC-xml-names/#dt-namespace > > ------------------------------------------------------------ > > > 2 QNames forbid a raft of [first] and [nth] characters which are > permissible in [IRIs]. > > first: [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | > [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | > [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | > [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | > [#x10000-#xEFFFF] > > nth: first | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | > [#x203F-#x2040] > http://www.w3.org/TR/REC-xml-names/#NT-NCName > > IRIs: ipchar = [A-Z] | "_" | [a-z] | [0-9] | "-" | "." "~" | > "%" HEX HEX | "!" | "$" | "&" | "'" | "(" | ")" | > "*" | "+" | "," | ";" | "=" | ":" | "@" | > [#xA0-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFEF] | > [#x10000-#x1FFFD] | [#x20000-#x2FFFD] | > [#x30000-#x3FFFD] | [#x40000-#x4FFFD] | > [#x50000-#x5FFFD] | [#x60000-#x6FFFD] | > [#x70000-#x7FFFD] | [#x80000-#x8FFFD] | > [#x90000-#x9FFFD] | [#xA0000-#xAFFFD] | > [#xB0000-#xBFFFD] | [#xC0000-#xCFFFD] | > [#xD0000-#xDFFFD] | [#xE1000-#xEFFFD] > http://tools.ietf.org/html/rfc3987#section-2.2 > > ------------------------------------------------------------ > > > 3 XML content excludes [#x00-#x08] [#x0B-#x0C] [#x0E-#x1F], all of > which are permitted in "Unicode strings" and thus RDF literals > [Rlit]. This applies regardless of CDATA enclosure or entity > substitution. > [Rlit] https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-lexical-form > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > [#x10000-#x10FFFF] > http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Char > > ------------------------------------------------------------ > > > 4 XML Schema also prohibits the above control characters from > appearing in something typed as xsd:string [string]. > [string] http://www.w3.org/TR/xmlschema-2/#dt-string > > ------------------------------------------------------------ > > > For 4, I propose notes in RDF Concepts and the serialization syntaxes > (e.g. Turtle). For the others, I wonder if we're forced into some > miserable escaping mechanism applied on top of XML. > > -- > -ericP -- -ericP
Received on Wednesday, 10 April 2013 19:39:25 UTC