- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Mon, 10 Mar 2003 21:35:19 +0100
- To: www-rdf-comments@w3.org, eric@w3.org
Hi Eric, I dropped the ball with your message http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0240.html My co-editors have pointed out my mistake ... I reply inline - but highlight that there is a potential editorial issue of clarifying that a DOCTYPE cannot be included with XMLLiterals. Please confirm that you do want that treated as a last call issue. I will copy you on further messages to Joe Reagle concerning reagle-01 and reagle-02; I take you as having expressed interest in these issues. Reagle: >> > I'm confused by this because most of the specifications are citing >> > Canonical XML (c14n), not Exclusive Canonicalization (exc-c14n). Carroll: >> The process is intended to be two-phase: >> >> The first phase takes an RDF/XML document and constructs an RDF >> graph. In this phase it is not required to actually canonicalize, >> but it is required to retain all the information needed for >> exc-c14n. > Eric: >Since identical strings are considered the same object in the RDF >model, it may be worth applying exc-c14n as parseType="Literal"s are >imported into the graph. This would apply if one were using an API to >create XML-encoded nodes. > graph->createLiteral("<html>...</html>", XMLLiteral) >If it is being parsed (as opposed to provided by an API or translated >from another triples language), the parseType="Literal" data should >already canonicalized. (This eases the burden on such parsers as they >need not perform any canonicalization, though they may choose to for >backword compatibility, as I did for annotea.) I am not sure of the status intended with the above comment. It is not dissimilar to some text I am asking the WG to consider, viz: [[ Note: For systems which reason about RDF graphs it is suggested that the canonicalization be performed on XML input. The internal representation and non-XML external representations should be in canonical form. ]] > >> The second phase, which many RDF applications don't actually ever do >> is from the graph to its formal meaning; for these it concerns the >> meaning of the string delivered by the parser. This second stage is >> determined by the mapping defined in RDF Concepts. This second stage >> uses c14n on the grounds that whatever the parser delivered (which >> is intended as implementation dependent) is then preserved. > >I think this assumption limits the responsibility of the RDF engine to >those semantics which are expressed in c14n subset of XML, as opposed >to the string that looks like XML. If one uses an API to create a node > <!DOCTYPE PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > <html>...</html> >and wishes to preserve the doctype, the node must be entity-encoded >and stored as CharData. The intent is to limit the responsibility as you indicate. > Perhaps some of this text would serve as a >warning in the specification somwhere in the XML Content section [4]. Do you want this comment treated as an issue? Otherwise it will get lost (promise!) >> >> The fundamental problem we are addressing is *how* to repesent XML >> content within an RDF graph. This XML content originates from an >> RDF/XML document, but, that original context gets lost. Thus we face >> a number of problems familiar in exc-c14n, what to do about >> entities?, what to do about visibly used namespaces? what to do with >> namespaces that are present but not visibly used? These issues are >> the pressing ones that are addressed by the Last Call docs. A >> further issue of making sure that two different implementations get >> exactly the same answer was not one that we felt it necessary to >> address. I will ask the WG to reconsider whether this was correct >> as part of the LC process. > >I suspect that the easiest path is to use exc-c14n in the concepts >document per issue reagle-02 [1]. This eliminates reagle-01 [2]. This proposal is now before the WG. > >The third issue [3] raised simply requires a clarification. This has been done. > >> > > This behaviour is conformant but not required. >> To the RDF Last Call documents. > >> Thanks for your comments, Brian should assign an issue number >> concerning the implementation variability, Pat should follow up on >> the misleading wording about the xsd namespace in semantics. > >Implementation experience: > >Annotea has to parse and reproduce plain and XML literals. These are >stored in the triple store along with their encoding (PLAIN or >XML). When serializing the product of a graph query (like properties >of things annotating "http://www.w3.org/": ((annotates ?a >http://www.w3.org/)(?p ?a ?o))), it entity-encodes PLAIN literals and >wraps XML encoded ones in a parseType="Literal". > <r:Description r:about="foo"><p1>some data</p1></r:Description> >and > <r:Description r:about="foo"><p1 parseType="Literal">some data</p1></r:Description> >do not refer to the same object as the encoding is a key in the >Literals table. I am not reading any issue that needs addressing in the above implementation experience. Sorry again for the delay in reply Jeremy
Received on Monday, 10 March 2003 15:34:45 UTC