- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Mon, 28 Jul 2003 13:55:28 +0100
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: pat hayes <phayes@ihmc.us>, rdf core <w3c-rdfcore-wg@w3.org>
On 28 Jul 2003 12:15:25 +0100 Brian McBride <bwm@hplb.hpl.hp.com> wrote: > On Sun, 2003-07-27 at 22:39, pat hayes wrote: > > Dave, > > Quick reply - Dave to confirm/correct > > > can you answer me a quick question about RDF/XML? Sorry I am > > still so behind the curve on this, but I need to get this exactly > > right given our decision about plain literals and xsd:string. > > > > Consider a plain literal in an RDF graph which uses some characters > > which require escaping in XML, eg say "<br/>". > > > > 1. Is it the case that in RDF/XML, this would be rendered using XML > > character escaping? Ie it would look like this > > "&gr;br/<" > > ? > > That would be "<br />", but you have the right idea. That's one of the encodings, there are several. How plain literals is written into RDF/XML does not involve XML canonicalization. In the graph, you get a Unicode string, what Charmod calls a Character string: http://www.w3.org/TR/charmod/#def-character-string > > > > 2. If so, would it be correct to say that in spite of this, that the > > literal character string itself was the original 5-character Unicode > > sequence? (Or is the character string of the literal an 11-character > > sequence in RDF/XML but a 5-character sequence in the graph? I hope > > not....) > > The literal in the graph is "<br />" > > > > > 3. If so, are there any literal character sequences which *cannot* be > > sent through RDF/XML? Or does XML provide an escape for every Unicode > > code point? > > We discovered last week that there are some UNICODE characters (ascii > control codes e.g. bel) which are not legal in an XML document. We have > to decide whether they are legal in the graph, and thus not expressible > in RDF/XML, or just not legal in the graph. Yes, these are listed [[ Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ ]] -- http://www.w3.org/TR/REC-xml#NT-Char However, that is for XML 1.0(2nd edition). the draft XML 1.1 proposes replacing the above comment to: [[ /* any Unicode character, excluding most ISO controls, the surrogate blocks, FFFE, and FFFF */ ]] -- http://www.w3.org/TR/xml11/#NT-Char: (ISO controls I assume refering to the excluded parts #0-#8, #B, #C, #E-#1F) RDF/XML is an XML 1.0 (2nd edition) format so the former definition applies. > I guess you would like us to make this decision quickly. > > My instincts are to not allow XML special cases to pollute (sorry value > laden term) the graph syntax, so I'm for saying that any UNICODE > character sequence is legal and noting there might be problems > serializing in RDF/XML. The former would be for concepts. RDF/XML or any XML format would have problems serializing such things. > That said, you (Pat) commented this would make expressing the semantics > more difficult, in that not all plain literals without lang tags would > denote xsd:string's, requiring you to have a more complex rule in the > semantics doc. > > I wonder whether we really need that rule. Would it suffice to *note* > that most plain literals without lang tags denote xsd:string's, but that > due to the fact that some UNICODE sequences are not legal xsd:string's, > not all plain literals without lang tags are xsd:string's. This is > something that should be straightforward to implement in an xsd > reasoner. We could do a couple of simple test cases. I'm wondering here what's broke - xsd:string allowing illegal Unicode or RDF's plain literals? > So I'm suggesting no rule and a warning note. As always, the WG > decides. > > Brian > > ps: test case: > > _:a <rdf:label> "\0007" . _:a rdf:label "\u0007" . > > entails? > > _:a <rdf:label> _:v . > _:v <rdf:type> <xsd:string> . _:a rdf:label _:v . _:v rdf:type xsd:string . Dave
Received on Monday, 28 July 2003 08:57:02 UTC