Re: Transform Note Design Decisions from Pratik Datta on 2009-03-30 (public-xmlsec@w3.org from March 2009)

From: Pratik Datta <pratik.datta@oracle.com>
Date: Mon, 30 Mar 2009 12:53:54 -0700
To: Frederick Hirsch <frederick.hirsch@nokia.com>
CC: Thomas Roessler <tlr@w3.org>, XMLSec WG Public List <public-xmlsec@w3.org>
Message-ID: <49D12352.4080508@oracle.com>
The namespace inheritance complexity is not just for QNames in content.

We also need to support the use case where a document subset is signed, 
where the some of the namespace declarations being used are defined by a 
ancestor (not in the subset), and there are also unnecessary namespace 
declarations. Exclusive Canonicalization irons out all these differences 
- and we need to support that.

E.g. consider a signed SAML assertion.   The declaration for the saml 
namespace may be in the <saml:Assertion> itself, or in the 
<wsse:Security> ancestor element. Also the wsse:Security element may 
include other namespace declaration that are not used inside the SAML 
assertion.   The saml assertion should be movable from one message to 
another without breaking the signature.

So we need to support all the namespace complexity with Exclusive C14N, 
Exclusive C14N with InclusivePrefixList and Inclusive.

Pratik

Frederick Hirsch wrote:
>> * A nodeset that represents a document subset always has a reference 
>> to the whole document. This is not the case with an event stream 
>> representing a document subset - in this case only the events of the 
>> document subset are present. So we need a solution to find namespaces 
>> and xml: attributes of missing ancestors. - The namespaces can be 
>> obtained from the namespace context.
>
> In other words, we are still left with the namespace inheritance 
> complexity and issues unless we eliminate the ability to support 
> QNames in content?
> regards, Frederick
>
> Frederick Hirsch
> Nokia
>
>
>
> On Mar 27, 2009, at 10:32 PM, ext Pratik Datta wrote:
>
>> I guess I hijacked your original email thread to discuss the overall 
>> transform issue.
>>
>> The event stream part of the proposal is for the streaming 
>> requirement which is completely separate from the 
>> determine-what-is-signed requirement.
>>
>> In Java StaX is popular streaming parser - it is embedded in JDK 1.6, 
>> (http://java.sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html) 
>>
>>  and in C# The XmlTextReader class is 
>> (http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.read.aspx) 
>>
>>
>> >From these we define a "event stream" model as follows.
>>     • unlike a nodeset, the entire event stream is not available all 
>> at once. Instead there is an "engine", and this returns the "events" 
>> one by one.
>>     • Here is an example of how an XML is split up into events.
>>     • <eventStream.gif>
>>     • Possible Events - StartDocument, EndDocument, StartElement, 
>> EndElement. Text, ProcessingInstruction, Comment
>>     • All the attributes and namespace declarations are read as part 
>> of the StarttElement event.
>>     • Large text nodes may be split up into multiple Text events.
>>     • The engine only knows about the event that is it currently 
>> pointing to  - it doesn't have any idea of the events before or after.
>> However it maintains a namespace context. I.e. at element nodes it 
>> can be queried to find out about all the namespace declarations in 
>> context.
>>     • It goes in a forward only direction.  calling "engine.next()" 
>> will make the engine go to the next event.
>>     • At every position, the engine can be queried to get the current 
>> event and its details.
>>
>>
>> The Canonicalization algorithm needs to defined in terms of this 
>> event stream  The canonicalization engine should get events one by 
>> one, and emit octet stream chunks for each event. This way it can 
>> work with very large documents, without having to keep it all in memory.
>>
>> This event stream can be used to represent a complete document or 
>> document subset. But there are some extra considerations for document 
>> subsets
>>     • A document subset can have multiple subtrees, which translated 
>> to multiple root elements which is not well formed XML, but is is 
>> possible in this model.
>>     • Attributes are only valid in the context of their element, so 
>> this model does not allow attributes in the document subset, whose 
>> parent element is missing from the subset
>>     • A nodeset that represents a document subset always has a 
>> reference to the whole document. This is not the case with an event 
>> stream representing a document subset - in this case only the events 
>> of the document subset are present. So we need a solution to find 
>> namespaces and xml: attributes of missing ancestors.  - The 
>> namespaces can be obtained from the namespace context.
>>
>> All the other transforms also need to be defined on top of this 
>> model. E.g. XPath selection needs to work on this event stream too.
>>
>> Pratik
>>
>> Thomas Roessler wrote:
>>>
>>> Hi Pratik,
>>>
>>> I agree with most of your high-level points, therefore I don't 
>>> repeat them here. ;-)
>>>
>>> On 25 Mar 2009, at 18:23, Pratik Datta wrote:
>>>
>>>> Thomas, regarding your nodeset question, I have been also trying to 
>>>> think of an different model  to represent a document subset - the 
>>>> event stream is a popular model in streaming parsers, but maybe we 
>>>> need to define our own model.
>>>
>>>
>>> I'd like to understand whether we can use an event stream (as 
>>> specified where?) or whether we'd need to define a separate model.  
>>> My sense is that having that framework will go a long way toward 
>>> understanding what your proposal means in terms of analysis and 
>>> implementation complexity.
>>>
>>> Therefore, if you could shed some more light on that point, that 
>>> would be most welcome.
>>>
>>> Thanks,
>>> -- 
>>> Thomas Roessler, W3C  <tlr@w3.org>
>>>
>>
>
>
Received on Monday, 30 March 2009 19:54:55 UTC