One-pass vs. Two-Pass in Streaming XML from Meiko Jensen on 2010-08-18 (public-xmlsec@w3.org from August 2010)

From: Meiko Jensen <Meiko.Jensen@rub.de>
Date: 18 Aug 2010 13:42:50 +0200
To: "'XMLSec WG Public List'" <public-xmlsec@w3.org>
Message-ID: <4C6BC73A.1000007@rub.de>
Completing my ACTION-628 , here comes the promised example on when
one-pass streaming is superior to two-pass streaming. Note that the
example is not based on a real scenario, but me and my colleagues found
it being rather realistic.

Imagine a hospital that uses electronic patient records, which are
stored in XML, and must be digitally signed on every change by the
responsible doctor (due to non-repudiation issues). To keep the overhead
small, the system developer decided to sign the whole document with the
key of the editing doctor, using an enveloping XML Signature (or a
detached signature that is located before the medical data in document
order). Patient records contain everything of relevance to the health of
the patient, including diagnosis, x-ray images, medication details,
insurance data and so on.

Now imagine a nurse that has to decide which medication a patient should
get. Therefor she has to 1) get the patient's record from some dedicated
document server, 2) extract the (rather small) medication data, and 3)
verify that it was signed properly, to ensure legitimation by the
doctor. Since she uses a small mobile device with little RAM and slow
CPU, resource efficiency matters. Especially, the device is not capable
of keeping the full patient's record in memory (due to a set of huge
x-ray images).

In such a scenario, one-pass streaming is superior to two-pass
streaming. The application within the nurse's device can read the full
patient record chunk-wise from the network, calculate the digests,
verify the signature, and extract the (few) data given on medication for
display purposes. Note that the application is written so that it caches
the extracted lines on medication (less critical operation, see below),
but will display that information (critical operation) only after
signature verification succeeds. It is not necessary at any time to
store the full medical data at the nurse's device at any time, since it
reads data chunks via network, processes them, and drops them
immediately. Nevertheless, the intended operation can be performed
successfully, using O(medication data) instead of O(patient record) in
terms of memory.

To discuss some obvious weaknesses of this example, here are some
additional arguments:

Two-pass would, of course, also be an option to implement this scenario,
but would require to transfer the patient record data to the nurse's
device twice, doubling the network traffic.

One could argue that having the signature cover the whole patient record
is bad application design, and that the document server should provide
an operation that just returns the medication data, signed stand-alone,
to avoid the problems stated above. However, such application design is
likely to be more complex, more expensive to build, and less secure
(since a signature over just the medication data may be misused in the
context of other patients). Hence, we'd consider the above example to be
not an ideal, but a realistic scenario.

An obvious threat in this scenario is that an attacker might put some
huge or malformed data into the medication part of the patient record
document, trying to perform Denial of Service to the device. Since hash
value calculation for the whole document has not been finished, there is
no way for the device to detect this attack before it impacts (memory
exhaustion). However, these classes of attacks are known as "Oversized
Payload", "Coercive Parsing", or the like, depending on their technique,
and would impact in---at least---the very same way in every DOM-based 
or two-pass streaming parsing environment. Hence, there is no way for an
attacker to exploit the streaming parsing other than using techniques
that are at least equivalently effective in other parsing scenarios.
Even more, since streaming parsing allows on-the-fly XML Schema
validation, this attack vector can be fend even better using the
streaming approach (cmp. Nils Gruschka's Ph.D. thesis, 2008). However,
it is critical for the application designer to understand this paradigm
and to delay all critical operations (here the display) up to the point
where the signature is verified successfully.


Meiko
Received on Wednesday, 18 August 2010 11:43:19 UTC