Re: Transform Note Design Decisions from Frederick Hirsch on 2009-03-30 (public-xmlsec@w3.org from March 2009)

From: Frederick Hirsch <frederick.hirsch@nokia.com>
Date: Mon, 30 Mar 2009 15:00:17 -0400
To: ext Pratik Datta <pratik.datta@oracle.com>
Cc: Frederick Hirsch <frederick.hirsch@nokia.com>, Thomas Roessler <tlr@w3.org>, XMLSec WG Public List <public-xmlsec@w3.org>
Message-Id: <CA768F0D-D5E2-4F41-9D15-5A3F1F6BCD42@nokia.com>
> * A nodeset that represents a document subset always has a reference  
> to the whole document. This is not the case with an event stream  
> representing a document subset - in this case only the events of the  
> document subset are present. So we need a solution to find  
> namespaces and xml: attributes of missing ancestors. - The  
> namespaces can be obtained from the namespace context.

In other words, we are still left with the namespace inheritance  
complexity and issues unless we eliminate the ability to support  
QNames in content?
regards, Frederick

Frederick Hirsch
Nokia



On Mar 27, 2009, at 10:32 PM, ext Pratik Datta wrote:

> I guess I hijacked your original email thread to discuss the overall  
> transform issue.
>
> The event stream part of the proposal is for the streaming  
> requirement which is completely separate from the determine-what-is- 
> signed requirement.
>
> In Java StaX is popular streaming parser - it is embedded in JDK  
> 1.6, (http://java.sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html 
> )
>  and in C# The XmlTextReader class is (http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.read.aspx 
> )
>
> >From these we define a "event stream" model as follows.
> 	• unlike a nodeset, the entire event stream is not available all at  
> once. Instead there is an "engine", and this returns the "events"  
> one by one.
> 	• Here is an example of how an XML is split up into events.
> 	• <eventStream.gif>
> 	• Possible Events - StartDocument, EndDocument, StartElement,  
> EndElement. Text, ProcessingInstruction, Comment
> 	• All the attributes and namespace declarations are read as part of  
> the StarttElement event.
> 	• Large text nodes may be split up into multiple Text events.
> 	• The engine only knows about the event that is it currently  
> pointing to  - it doesn't have any idea of the events before or after.
> However it maintains a namespace context. I.e. at element nodes it  
> can be queried to find out about all the namespace declarations in  
> context.
> 	• It goes in a forward only direction.  calling "engine.next()"  
> will make the engine go to the next event.
> 	• At every position, the engine can be queried to get the current  
> event and its details.
>
>
> The Canonicalization algorithm needs to defined in terms of this  
> event stream  The canonicalization engine should get events one by  
> one, and emit octet stream chunks for each event. This way it can  
> work with very large documents, without having to keep it all in  
> memory.
>
> This event stream can be used to represent a complete document or  
> document subset. But there are some extra considerations for  
> document subsets
> 	• A document subset can have multiple subtrees, which translated to  
> multiple root elements which is not well formed XML, but is is  
> possible in this model.
> 	• Attributes are only valid in the context of their element, so  
> this model does not allow attributes in the document subset, whose  
> parent element is missing from the subset
> 	• A nodeset that represents a document subset always has a  
> reference to the whole document. This is not the case with an event  
> stream representing a document subset - in this case only the events  
> of the document subset are present. So we need a solution to find  
> namespaces and xml: attributes of missing ancestors.  - The  
> namespaces can be obtained from the namespace context.
>
> All the other transforms also need to be defined on top of this  
> model. E.g. XPath selection needs to work on this event stream too.
>
> Pratik
>
> Thomas Roessler wrote:
>>
>> Hi Pratik,
>>
>> I agree with most of your high-level points, therefore I don't  
>> repeat them here. ;-)
>>
>> On 25 Mar 2009, at 18:23, Pratik Datta wrote:
>>
>>> Thomas, regarding your nodeset question, I have been also trying  
>>> to think of an different model  to represent a document subset -  
>>> the event stream is a popular model in streaming parsers, but  
>>> maybe we need to define our own model.
>>
>>
>> I'd like to understand whether we can use an event stream (as  
>> specified where?) or whether we'd need to define a separate model.   
>> My sense is that having that framework will go a long way toward  
>> understanding what your proposal means in terms of analysis and  
>> implementation complexity.
>>
>> Therefore, if you could shed some more light on that point, that  
>> would be most welcome.
>>
>> Thanks,
>> --
>> Thomas Roessler, W3C  <tlr@w3.org>
>>
>
Received on Monday, 30 March 2009 19:01:38 UTC