- From: Meiko Jensen <Meiko.Jensen@ruhr-uni-bochum.de>
- Date: 5 May 2010 13:14:24 +0200
- To: "Pratik Datta" <PRATIK.DATTA@oracle.com>, "XMLSec WG Public List" <public-xmlsec@w3.org>
Hi Pratik, regarding the trimTextNodes parameter in my streaming proposal, here an example: <A> <B> stupid example... </B> </A> In SAX, this might end up with the contents of B being split to---say---3 separate characters() events. The first contains "stupid", hence removing the leading whitespaces is no issue. Trailing whitespaces already pose a problem: one can not be sure there's no non-whitespace text following. Hence, this requires caching the trailing whitespaces up to the point one can decide whether they are trailing or embedded whitespaces. Second characters() event only contains whitespaces. Still, we don't know whether we may safely discard them. However, I know at least one programmer who will implement the trimTextNodes method so that characters() events containing of whitespaces only will be discarded. We may add a hint to this issue in the spec, but it still remains somewhat tricky. Third characters() event: "example...". Now it turns out that the cached whitespaces were in fact embedded, not trailing. So we have to flush the cache to the c14n. Again, the trailing whitespaces trigger caching. Then, there comes an "endElement()" event of the B element. Here, it turns out that the cache can be discarded, as the contained whitespaces indeed were trailing ones. However, this results in that every event method must be implemented to take care not only of the event itself, but also on the whitespace cache. I know this is a rather constructed example, but the issue exists and may cause the "WTF happened here?" kind of bugs in real-world scenarios. Additionally, the issue is complicated a little by the ignoreWhitespaces() event that is used only by validating parsers, and would get called e.g. for the whitespaces between <A> and <B> in the example above. In fact, that's why I suggested to consider a third option (besides trim and noTrim) that would only erase ignorableWhitespaces(). However, that one does not work if used in non-validating parser environments (but could be emulated). That's why I proposed to set trimTextNodes=false. What do you think? best regards Meiko -- Dipl.-Inf. Meiko Jensen Chair for Network and Data Security Horst Görtz Institute for IT-Security Ruhr University Bochum, Germany _____________________________ Universitätsstr. 150, Geb. IC 4/150 D-44780 Bochum, Germany Phone: +49 (0) 234 / 32-26796 Telefax: +49 (0) 234 / 32-14347 http:// www.nds.rub.de
Received on Wednesday, 5 May 2010 11:14:54 UTC