- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Fri, 27 Mar 2009 19:32:39 -0700
- To: Thomas Roessler <tlr@w3.org>
- CC: XMLSec WG Public List <public-xmlsec@w3.org>
- Message-ID: <49CD8C47.9050009@oracle.com>
I guess I hijacked your original email thread to discuss the overall transform issue. The event stream part of the proposal is for the streaming requirement which is completely separate from the determine-what-is-signed requirement. In Java StaX is popular streaming parser - it is embedded in JDK 1.6, (http://java.sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html) and in C# The XmlTextReader class is (http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.read.aspx) From these we define a "event stream" model as follows. * unlike a nodeset, the entire event stream is not available all at once. Instead there is an "engine", and this returns the "events" one by one. * Here is an example of how an XML is split up into events. * <foo a="23">|<bar>|Some|Text|</bar>|</foo> * Possible Events - StartDocument, EndDocument, StartElement, EndElement. Text, ProcessingInstruction, Comment * All the attributes and namespace declarations are read as part of the StarttElement event. * Large text nodes may be split up into multiple Text events. * The engine only knows about the event that is it currently pointing to - it doesn't have any idea of the events before or after. However it maintains a namespace context. I.e. at element nodes it can be queried to find out about all the namespace declarations in context. * It goes in a forward only direction. calling "engine.next()" will make the engine go to the next event. * At every position, the engine can be queried to get the current event and its details. The Canonicalization algorithm needs to defined in terms of this event stream The canonicalization engine should get events one by one, and emit octet stream chunks for each event. This way it can work with very large documents, without having to keep it all in memory. This event stream can be used to represent a complete document or document subset. But there are some extra considerations for document subsets * A document subset can have multiple subtrees, which translated to multiple root elements which is not well formed XML, but is is possible in this model. * Attributes are only valid in the context of their element, so this model does not allow attributes in the document subset, whose parent element is missing from the subset * A nodeset that represents a document subset always has a reference to the whole document. This is not the case with an event stream representing a document subset - in this case only the events of the document subset are present. So we need a solution to find namespaces and xml: attributes of missing ancestors. - The namespaces can be obtained from the namespace context. All the other transforms also need to be defined on top of this model. E.g. XPath selection needs to work on this event stream too. Pratik Thomas Roessler wrote: > Hi Pratik, > > I agree with most of your high-level points, therefore I don't repeat > them here. ;-) > > On 25 Mar 2009, at 18:23, Pratik Datta wrote: > >> Thomas, regarding your nodeset question, I have been also trying to >> think of an different model to represent a document subset - the >> event stream is a popular model in streaming parsers, but maybe we >> need to define our own model. > > I'd like to understand whether we can use an event stream (as > specified where?) or whether we'd need to define a separate model. My > sense is that having that framework will go a long way toward > understanding what your proposal means in terms of analysis and > implementation complexity. > > Therefore, if you could shed some more light on that point, that would > be most welcome. > > Thanks, > -- > Thomas Roessler, W3C <tlr@w3.org <mailto:tlr@w3.org>> >
Received on Saturday, 28 March 2009 02:33:26 UTC