- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Wed, 18 Feb 2004 11:12:39 -0500
- To: www-tag@w3.org
- Message-id: <877jykmknc.fsf@nwalsh.com>
I originally sent this message on 12 Feb, moments before I sent my reply to this message[1]. However, this message is not in the archives, and I never received a copy from the list, so I think perhaps it did not get sent. Here it is again. [1] http://lists.w3.org/Archives/Public/www-tag/2004Feb/0036.html -- begin original message -- On 2 Feb, I accepted an action item to summarize issue xmlChunk-44 and solicit input. Herewith is my draft summary. XML documents are self contained. By that, I mean that all of the questions that can be asked about a single document, or about a particular point in a single document, can be answered definitively if the entire document is available. Some examples of questions that one might ask about a document are: - What is the base URI of the root of the document? - What version of XML does the document use? - How many namespaces does the document use? - How many character information items does the document contain? Some examples of questions that one might ask about any particular point in a document are: - What is the current base URI? - What namespaces are in-scope? - What is the current value of xml:lang? - More generally, what is the most recently seen value for any particular attribute? - How many ancestors are there? - How many preceding siblings are there? - How many following siblings are there? Given another XML document, we can ask the additional question "are these two documents the same"? The answer to that question clearly depends on how you define "equal" and experience suggests that there is no single answer that will garner universal acceptance. At the heart of xmlChunk-44 is the observation that we sometimes want to extract portions of an XML document and use those fragments or "chunks" in other contexts. For example, we might want to: - Use a chunk as the value of a property in an RDF graph - Perform some operation on a portion of a document extracted with an XPath expression - Transform a small portion of a large document - Transmit a signed chunk inside the body of a larger document - Compare two chunks to see if they're the same The question then becomes, how can we communicate context information about the chunk so that the recipient of the chunk can get the expected answers? For example, consider this document: <?xml version="1.0" encoding="utf-8"?> <article xmlns="http://docbook.org/docbook-ng" version="bourbon" xml:lang="en" xml:base="http://example.org/not/really/here"> <info> <title>Unit Test: article.001.xml</title> <authorgroup> <author> <personname> <firstname>Norman</firstname> <surname>Walsh</surname> </personname> </author> </authorgroup> </info> <para>There's no content here.</para> </article> Now let's consider the "author" chunk. As I described above, we can answer questions about the author: - It has the base URI "http://example.org/not/really/here" - It has the xml:lang "en" - It has the DocBook version "bourbon" Suppose I take that chunk and place it in some new context: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/stuff/1.0/"> <rdf:Description rdf:about="http://example.org/not/really/here#author"> <ex:prop rdf:parseType="Literal" xmlns="http://docbook.org/docbook-ng"> <author> <personname> <firstname>Norman</firstname> <surname>Walsh</surname> </personname> </author> </ex:prop> </rdf:Description> </rdf:RDF> I've lost important information about that chunk. I can't tell what language it's in or what base URI it should have, for example, or what version of DocBook it uses. (It might not be appropriate in all applications to preserve all of the context, but it should be possible to preserve the context when it's important to the application.) There is also the deeper question of establishing a canonical form for the logical XML chunk. We might, for example, wish it to be the case that the following RDF statement <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/stuff/1.0/"> <rdf:Description rdf:about="http://example.org/not/really/here#author"> <ex:prop rdf:parseType="Literal" xmlns:db="http://docbook.org/docbook-ng"> <db:author> <db:personname> <db:firstname>Norman</db:firstname> <db:surname>Walsh</db:surname> </db:personname> </db:author> </ex:prop> </rdf:Description> </rdf:RDF> be considered "the same" as the former statement. I think the issue xmlChunk-44 asks, essentially: 1. Should there be a standard way to communicate context information for a portion of an XML document? 2. If so, what should it be? 3. And to what extent should it provide a "canonical" form? Be seeing you, norm -- Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc. NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Received on Wednesday, 18 February 2004 11:13:57 UTC