- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Fri, 7 May 2010 10:59:00 -0700 (PDT)
- To: Meiko Jensen <Meiko.Jensen@ruhr-uni-bochum.de>, XMLSec WG Public List <public-xmlsec@w3.org>
I have a high level comment for the streaming. To implement XML Signature 2.0 in streaming mode, first you need a streaming XPath parser - this is pretty complex by itself and will need dedicated team of developers. Now if that is the case, then this team can implement the trim text nodes, using the "whitespace caching" technique that you outlined. My opinion is that there is nothing in the XML Signature 2.0 spec that cannot be streamed, assuming you have a solid development team. So do we really need a profile for streaming? If you want simplicity you should use a DOM. Or are you suggesting that we should have a "simple-streaming" profile? Pratik -----Original Message----- From: Meiko Jensen [mailto:Meiko.Jensen@ruhr-uni-bochum.de] Sent: Wednesday, May 05, 2010 4:14 AM To: Pratik Datta; XMLSec WG Public List Subject: TrimTextNodes in Streaming parameter set Hi Pratik, regarding the trimTextNodes parameter in my streaming proposal, here an example: <A> <B> stupid example... </B> </A> In SAX, this might end up with the contents of B being split to---say---3 separate characters() events. The first contains "stupid", hence removing the leading whitespaces is no issue. Trailing whitespaces already pose a problem: one can not be sure there's no non-whitespace text following. Hence, this requires caching the trailing whitespaces up to the point one can decide whether they are trailing or embedded whitespaces. Second characters() event only contains whitespaces. Still, we don't know whether we may safely discard them. However, I know at least one programmer who will implement the trimTextNodes method so that characters() events containing of whitespaces only will be discarded. We may add a hint to this issue in the spec, but it still remains somewhat tricky. Third characters() event: "example...". Now it turns out that the cached whitespaces were in fact embedded, not trailing. So we have to flush the cache to the c14n. Again, the trailing whitespaces trigger caching. Then, there comes an "endElement()" event of the B element. Here, it turns out that the cache can be discarded, as the contained whitespaces indeed were trailing ones. However, this results in that every event method must be implemented to take care not only of the event itself, but also on the whitespace cache. I know this is a rather constructed example, but the issue exists and may cause the "WTF happened here?" kind of bugs in real-world scenarios. Additionally, the issue is complicated a little by the ignoreWhitespaces() event that is used only by validating parsers, and would get called e.g. for the whitespaces between <A> and <B> in the example above. In fact, that's why I suggested to consider a third option (besides trim and noTrim) that would only erase ignorableWhitespaces(). However, that one does not work if used in non-validating parser environments (but could be emulated). That's why I proposed to set trimTextNodes=false. What do you think? best regards Meiko -- Dipl.-Inf. Meiko Jensen Chair for Network and Data Security Horst Görtz Institute for IT-Security Ruhr University Bochum, Germany _____________________________ Universitätsstr. 150, Geb. IC 4/150 D-44780 Bochum, Germany Phone: +49 (0) 234 / 32-26796 Telefax: +49 (0) 234 / 32-14347 http:// www.nds.rub.de
Received on Friday, 7 May 2010 18:00:52 UTC