- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Fri, 27 Mar 2009 19:32:39 -0700
- To: Thomas Roessler <tlr@w3.org>
- CC: XMLSec WG Public List <public-xmlsec@w3.org>
- Message-ID: <49CD8C47.9050009@oracle.com>
I guess I hijacked your original email thread to discuss the overall
transform issue.
The event stream part of the proposal is for the streaming requirement
which is completely separate from the determine-what-is-signed requirement.
In Java StaX is popular streaming parser - it is embedded in JDK 1.6,
(http://java.sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html)
and in C# The XmlTextReader class is
(http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.read.aspx)
From these we define a "event stream" model as follows.
* unlike a nodeset, the entire event stream is not available all at
once. Instead there is an "engine", and this returns the "events"
one by one.
* Here is an example of how an XML is split up into events.
* <foo a="23">|<bar>|Some|Text|</bar>|</foo>
* Possible Events - StartDocument, EndDocument, StartElement,
EndElement. Text, ProcessingInstruction, Comment
* All the attributes and namespace declarations are read as part of
the StarttElement event.
* Large text nodes may be split up into multiple Text events.
* The engine only knows about the event that is it currently
pointing to - it doesn't have any idea of the events before or
after.
However it maintains a namespace context. I.e. at element nodes it
can be queried to find out about all the namespace declarations in
context.
* It goes in a forward only direction. calling "engine.next()" will
make the engine go to the next event.
* At every position, the engine can be queried to get the current
event and its details.
The Canonicalization algorithm needs to defined in terms of this event
stream The canonicalization engine should get events one by one, and
emit octet stream chunks for each event. This way it can work with very
large documents, without having to keep it all in memory.
This event stream can be used to represent a complete document or
document subset. But there are some extra considerations for document
subsets
* A document subset can have multiple subtrees, which translated to
multiple root elements which is not well formed XML, but is is
possible in this model.
* Attributes are only valid in the context of their element, so this
model does not allow attributes in the document subset, whose
parent element is missing from the subset
* A nodeset that represents a document subset always has a reference
to the whole document. This is not the case with an event stream
representing a document subset - in this case only the events of
the document subset are present. So we need a solution to find
namespaces and xml: attributes of missing ancestors. - The
namespaces can be obtained from the namespace context.
All the other transforms also need to be defined on top of this model.
E.g. XPath selection needs to work on this event stream too.
Pratik
Thomas Roessler wrote:
> Hi Pratik,
>
> I agree with most of your high-level points, therefore I don't repeat
> them here. ;-)
>
> On 25 Mar 2009, at 18:23, Pratik Datta wrote:
>
>> Thomas, regarding your nodeset question, I have been also trying to
>> think of an different model to represent a document subset - the
>> event stream is a popular model in streaming parsers, but maybe we
>> need to define our own model.
>
> I'd like to understand whether we can use an event stream (as
> specified where?) or whether we'd need to define a separate model. My
> sense is that having that framework will go a long way toward
> understanding what your proposal means in terms of analysis and
> implementation complexity.
>
> Therefore, if you could shed some more light on that point, that would
> be most welcome.
>
> Thanks,
> --
> Thomas Roessler, W3C <tlr@w3.org <mailto:tlr@w3.org>>
>
Received on Saturday, 28 March 2009 02:33:26 UTC