Re: quick syntax question.

On 28 Jul 2003 12:15:25 +0100
Brian McBride <bwm@hplb.hpl.hp.com> wrote:

> On Sun, 2003-07-27 at 22:39, pat hayes wrote:
> > Dave,
> 
> Quick reply - Dave to confirm/correct
> 
> >  can you answer me a quick question about RDF/XML? Sorry I am 
> > still so behind the curve on this, but I need to get this exactly 
> > right given our decision about plain literals and xsd:string.
> > 
> > Consider a plain literal in an RDF graph which uses some characters 
> > which require escaping in XML, eg say "<br/>".
> > 
> > 1. Is it the case that in RDF/XML, this would be rendered using XML 
> > character escaping? Ie it would look like this
> > "&gr;br/&lt;"
> > ?
> 
> That would be "&lt;br /&gt;", but you have the right idea.

That's one of the encodings, there are several.  How plain
literals is written into RDF/XML does not involve XML canonicalization.
In the graph, you get a Unicode string, what Charmod calls a
Character string: http://www.w3.org/TR/charmod/#def-character-string

> > 
> > 2. If so, would it be correct to say that in spite of this, that the 
> > literal character string itself was the original 5-character Unicode 
> > sequence? (Or is the character string of the literal an 11-character 
> > sequence in RDF/XML but a 5-character sequence in the graph? I hope 
> > not....)
> 
> The literal in the graph is "<br />"
> 
> > 
> > 3. If so, are there any literal character sequences which *cannot* be 
> > sent through RDF/XML? Or does XML provide an escape for every Unicode 
> > code point?
> 
> We discovered last week that there are some UNICODE characters (ascii
> control codes e.g. bel) which are not legal in an XML document.  We have
> to decide whether they are legal in the graph, and thus not expressible
> in RDF/XML, or just not legal in the graph.

Yes, these are listed 
[[
Char 	   ::=    	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 
  	/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
]] -- http://www.w3.org/TR/REC-xml#NT-Char

However, that is for XML 1.0(2nd edition).
the draft XML 1.1 proposes replacing the above comment to:
  [[
  /* any Unicode character, excluding most ISO controls, the surrogate blocks, FFFE, and FFFF */
  ]] -- http://www.w3.org/TR/xml11/#NT-Char:
(ISO controls I assume refering to the excluded parts #0-#8, #B, #C, #E-#1F)

RDF/XML is an XML 1.0 (2nd edition) format so the former definition applies.

> I guess you would like us to make this decision quickly.
> 
> My instincts are to not allow XML special cases to pollute (sorry value
> laden term) the graph syntax, so I'm for saying that any UNICODE
> character sequence is legal and noting there might be problems
> serializing in RDF/XML.

The former would be for concepts.  RDF/XML or any XML format would have
problems serializing such things.

> That said, you (Pat) commented this would make expressing the semantics
> more difficult, in that not all plain literals without lang tags would
> denote xsd:string's, requiring you to have a more complex rule in the
> semantics doc.
> 
> I wonder whether we really need that rule.  Would it suffice to *note*
> that most plain literals without lang tags denote xsd:string's, but that
> due to the fact that some UNICODE sequences are not legal xsd:string's,
> not all plain literals without lang tags are xsd:string's.  This is
> something that should be straightforward to implement in an xsd
> reasoner.  We could do a couple of simple test cases.

I'm wondering here what's broke - xsd:string allowing illegal Unicode
or RDF's plain literals?

> So I'm suggesting no rule and a warning note.  As always, the WG
> decides.
> 
> Brian
> 
> ps: test case:
> 
> _:a <rdf:label> "\0007" .

  _:a rdf:label "\u0007" .

> 
> entails?
> 
> _:a <rdf:label> _:v .
> _:v <rdf:type> <xsd:string> .

 _:a rdf:label _:v .
 _:v rdf:type xsd:string .

Dave

Received on Monday, 28 July 2003 08:57:02 UTC