Re: Please review RDF Last Call from Jeremy Carroll on 2003-03-10 (www-rdf-comments@w3.org from January to March 2003)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Mon, 10 Mar 2003 21:35:19 +0100
To: www-rdf-comments@w3.org, eric@w3.org
Message-Id: <200303102134.56574.jjc@hpl.hp.com>
Hi Eric,

I dropped the ball with your message 
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0240.html

My co-editors have pointed out my mistake ...

I reply inline - but highlight that there is a potential editorial issue of 
clarifying that a DOCTYPE cannot be included with XMLLiterals.
Please confirm that you do want that treated as a last call issue.

I will copy you on further messages to Joe Reagle concerning reagle-01 and 
reagle-02; I take you as having expressed interest in these issues.

Reagle:
>> > I'm confused by this because most of the specifications are citing
>> > Canonical XML (c14n), not Exclusive Canonicalization (exc-c14n).

Carroll:
>> The process is intended to be two-phase:
>> 
>> The first phase takes an RDF/XML document and constructs an RDF
>> graph.  In this phase it is not required to actually canonicalize,
>> but it is required to retain all the information needed for
>> exc-c14n.
>
Eric:
>Since identical strings are considered the same object in the RDF
>model, it may be worth applying exc-c14n as parseType="Literal"s are
>imported into the graph. This would apply if one were using an API to
>create XML-encoded nodes.
>  graph->createLiteral("<html>...</html>", XMLLiteral)
>If it is being parsed (as opposed to provided by an API or translated
>from another triples language), the parseType="Literal" data should
>already canonicalized. (This eases the burden on such parsers as they
>need not perform any canonicalization, though they may choose to for
>backword compatibility, as I did for annotea.)


I am not sure of the status intended with the above comment.
It is not dissimilar to some text I am asking the WG to consider,
viz:
[[
Note: For systems which reason about RDF graphs
it is suggested that the canonicalization be
performed on XML input. The internal representation
and non-XML external representations should be
in canonical form.
]]

>
>> The second phase, which many RDF applications don't actually ever do
>> is from the graph to its formal meaning; for these it concerns the
>> meaning of the string delivered by the parser. This second stage is
>> determined by the mapping defined in RDF Concepts. This second stage
>> uses c14n on the grounds that whatever the parser delivered (which
>> is intended as implementation dependent) is then preserved.
>
>I think this assumption limits the responsibility of the RDF engine to
>those semantics which are expressed in c14n subset of XML, as opposed
>to the string that looks like XML. If one uses an API to create a node
>  <!DOCTYPE PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>  <html>...</html>
>and wishes to preserve the doctype, the node must be entity-encoded
>and stored as CharData. 

The intent is to limit the responsibility as you indicate.


>               Perhaps some of this text would serve as a
>warning in the specification somwhere in the XML Content section [4].

Do you want this comment treated as an issue? Otherwise it will
get lost (promise!)

>>
>> The fundamental problem we are addressing is *how* to repesent XML
>> content within an RDF graph. This XML content originates from an
>> RDF/XML document, but, that original context gets lost. Thus we face
>> a number of problems familiar in exc-c14n, what to do about
>> entities?, what to do about visibly used namespaces? what to do with
>> namespaces that are present but not visibly used? These issues are
>> the pressing ones that are addressed by the Last Call docs. A
>> further issue of making sure that two different implementations get
>> exactly the same answer was not one that we felt it necessary to
>> address.  I will ask the WG to reconsider whether this was correct
>> as part of the LC process.
>
>I suspect that the easiest path is to use exc-c14n in the concepts
>document per issue reagle-02 [1]. This eliminates reagle-01 [2].

This proposal is now before the WG.

>
>The third issue [3] raised simply requires a clarification.

This has been done.

>
>> > > This behaviour is conformant but not required.
>> To the RDF Last Call documents.
>
>> Thanks for your comments, Brian should assign an issue number
>> concerning the implementation variability, Pat should follow up on
>> the misleading wording about the xsd namespace in semantics.
>
>Implementation experience:
>
>Annotea has to parse and reproduce plain and XML literals. These are
>stored in the triple store along with their encoding (PLAIN or
>XML). When serializing the product of a graph query (like properties
>of things annotating "http://www.w3.org/": ((annotates ?a
>http://www.w3.org/)(?p ?a ?o))), it entity-encodes PLAIN literals and
>wraps XML encoded ones in a parseType="Literal".
>  <r:Description r:about="foo"><p1>some data</p1></r:Description>
>and
>  <r:Description r:about="foo"><p1 parseType="Literal">some 
data</p1></r:Description>
>do not refer to the same object as the encoding is a key in the
>Literals table.


I am not reading any issue that needs addressing in the
above implementation experience.

Sorry again for the delay in reply 

Jeremy
Received on Monday, 10 March 2003 15:34:45 UTC