W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > July 2003

RE: first pass parseType="Literal" text for primer

From: <Patrick.Stickler@nokia.com>
Date: Mon, 28 Jul 2003 15:52:04 +0300
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B026301A1@trebe006.europe.nokia.com>
To: <gk@ninebynine.org>, <w3c-rdfcore-wg@w3.org>
Cc: <dave.beckett@bristol.ac.uk>


OK.

If I'm understanding this correctly, the &lt; and &gt;
are getting resolved to '<' and '>' by the RDF/XML
parser, insofar as the XML processing of the RDF/XML
instance is concerned, but the canonicalization is
re-escaping them back to &lt; and &gt;?

If the entities are not ever being resolved at any
stage of the parsing process, then that worries me
alot. It suggests that an RDF/XML parser is not
playing by the rules of XML fully.

Patrick

> -----Original Message-----
> From: ext Graham Klyne [mailto:gk@ninebynine.org]
> Sent: 28 July, 2003 15:13
> To: Stickler Patrick (NMP/Tampere); w3c-rdfcore-wg@w3.org
> Cc: Dave Beckett
> Subject: RE: first pass parseType="Literal" text for primer
> 
> 
> At 14:31 28/07/03 +0300, Patrick.Stickler@nokia.com wrote:
> 
> 
> 
> > > 2. <title rdf:parseType='Literal'>Why the &lt;FONT&gt; Tag is
> > > Bad</title>
> > >
> > > I take the value of this 'title' property to be:
> > >
> > >    "Why the &lt;FONT&gt; Tag is Bad"^^rdf:XMLLiteral
> >
> >Eh? Really?
> >
> >Don't you mean
> >
> >    "Why the <FONT> Tag is Bad"^^rdf:XMLLiteral
> >
> >Surely the entities are resolved the same as for any
> >literal.
> 
> Not by my reading of Concepts:
> 
> [[
> The lexical space
>      is the set of all strings which:
> 
>          * are well-balanced, self-contained XML data [XML];
>          * correspond to exclusive Canonical XML (with 
> comments, with empty 
> InclusiveNamespaces PrefixList ) [XML-XC14N];
>          * when embedded between an arbitrary XML start tag 
> and an end tag 
> form a document conforming to XML Namespaces [XML-NS]
> ]]
> -- 
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/
> #section-XMLLiteral
> 
> which would require the '<' and '>' here to be &-escaped.  
> When the XML 
> literal is eventually interpreted, you'd get the bare '<' and '>' 
> characters back.
> 
> [...removes sleeping cat from copy of syntax spec...]
> 
> Looking at the syntax spec, struggling a bit...
> 
> [Dave:  Should section "6.1.2 Element Event" be "Start 
> Element Event", and 
> should there be a description of what the "string-value" accessor 
> returns?  Maybe not, but I note section 6.1 says that all 
> events have a 
> string-value accessor.]
> 
> Ah, got it:
> 
> In the syntax spec, we have sections 7.2.17 and 7.2.33 which 
> together claim 
> the literal string value is the exclusive XML canonicalization of the 
> content, which I think means that the escaping of '&', '<' 
> and '>' has to 
> be re-inserted:
> 
> [[
> The string used as the lexical form of the XML Literal is the 
> Exclusive XML 
> Canonicalization [XML-XC14N]) with comments and with empty 
> InclusiveNamespaces PrefixList of the literal text l, i.e. the entire 
> element content of this property element.
> ]]
> -- 
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-syntax-grammar-200
30117/#parseTypeLiteralPropertyElt

[Dave:  is it worth adding a note to clarify this point?]



>If you wanted/needed
>
>     "Why the &lt;FONT&gt; Tag is Bad"^^rdf:XMLLiteral
>
>then you'd have to say
>
>   <title rdf:parseType='Literal'>Why the &amp;lt;FONT&amp;gt; Tag is 
> Bad</title>
>
>No?
>
>If this is not the case, then I've really missing something
>major here and am very alarmed!

I think that may be workable, but it's not how I read the documents we're 
working on.

(Note that this formulation of the abstract syntax is for definitional 
purposes, and does not of itself require that an application do this.  You 
may have some other way of storing an XML literal which is fine as long as 
you get the same final answers.)

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Monday, 28 July 2003 08:52:46 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:58:50 EDT