Re: Re: schema validity in encryption from Joseph Ashwood on 2000-12-21 (xml-encryption@w3.org from December 2000)

From: Joseph Ashwood <jashwood@arcot.com>
Date: Thu, 21 Dec 2000 15:04:00 -0800
To: <xml-encryption@w3.org>
Message-ID: <028c01c06ba6$1312b470$2a0210ac@livermore>
[comments are post quote]
----- Original Message -----
From: "Martin J. Duerst" <duerst@w3.org>
To: <xml-encryption@w3.org>
Cc: <w3c-i18n-ig@w3.org>
Sent: Wednesday, December 20, 2000 1:39 AM
Subject: Fwd: Re: schema validity in encryption


> >>This would change
> >>     <element>Clear text here.</element>
> >>to
> >>     <element>ScRaMbLeD TeXt HeRe</element>
> >>yes? While this may work technically (it will validate), I have
> >>serious problems with such an approach. The markup is now actually
> >>completely wrong. What was an <element> is still called an <element>,
> >>but it's not an <element> anymore, it's an <encodedElement>. The
> >>original Markup has been misused. This can be seen as a problem
> >>of markup philosophy (or whatever you call it) but can also lead
> >>to very serious practical problems. If the document is received
> >>as is, and by accident or whatever the separate information in
> >>an external document is lost (very easy to happen), the encoded
> >>information will be taken as the real information, with very
> >>bad consequences.

I'd like to continue this a bit. I personally see no reason to rename an
element simply because it is encrypted. In every context I can personally
think of, specifically all those that are well formed for security (signed
before encryption, or authenticated before encryption w/ authenticated
keys), I can see no reason to change an element name based on whether or not
it is encrypted. It is obvious from context to the point where if it is
dictated that before the tag <whatever> is accessed it must be decrypted,
there is no ambiguity. Additionally I am particularly fond of using tags
that add as little space as possible to the document, so I would prefer
<CCNum> to <EncryptedCCNum>. Since I generally work with files that have an
absurd number of same-tagged entries (similar to a credit card list billing
list of several million people) the savings can be significant in size.

This is also my reason for wanting a semi-enforced tag->encryption mapping,
that seperates the 2. I've noticed that the proposals seem to keep coming
back to putting the decryption information as close to the encrypted data as
possible. For small files, or files with diverse tags that are used a small
number of times, this is the most reasonable mode of operation. However
looking to the amount of processing power that must go into public key
operations to exchange those keys securely, it gets extremely expensive.

For the sake of information about this, please take a look at
http://www.eskimo.com/~weidai/benchmarks.html for timing examples.

Just as an example let's assume a Credit Card company with 500 Million
customers who need to be billed each month, with spending tracked etc
(assume Visa for the sake of argument). It seems reasonable for them to want
to seperate the cardholder name, card number, and billing address, with each
of the 3 encrypted to a seperate key. It is also reasonable that they will
have a large number of employees who will have access to varying subsets of
those fields, say 6000 employees.

Assuming the decryption data is housed directly with the encrypted data.
Also assuming that 3000 Employees have access to each of the fields (there
is significant overlap). And assuming the use of 2048-bit RSA keys. Adding 1
credit card entry will require 9000 public key operations, which will
completely dominate any other factors involved. This means .89*9000
milliseconds, or about 8 seconds. For billing it gets much worse, to bill
each month requires 500 Million decryptions * 3 fields, 64.13 *1,500,000,000
milliseconds, about 1000 days, even if only the the credit card number is
encrypted that 300+ days. The storage factor is just as dominant, assuming
there is no other information, each record will occupy 2048*3*3000 bits of
disk, or 500,000,000*2048*3*3000 bits total, which is ~1 million Gigabytes.
I personally consider these demands to be far beyond what should be required
of any company, and I don't want to force them to use a custom in house
design which they will have to create, test, and support in house or hire
out.

If the mapping is done such that each of those 6000 employees is given
selective knowledge of Rijndael encryption keys/ Otherwise same asusmptions.
Adding 1 credit card takes, 3 public key operations (to recover the
encryption keys), 3 keyings of Rijndael, and 3-9 block encryptions with
Rijndael, that's 192 milliseconds + very small amounts, or about a tenth of
a second. For billing there is 3 public key operations, 3 Rijndael keyings
and 1500-4500 million bits of Rijndael decryptions, that's a few seconds.
The space is much more reasonable, assuming there is no other information it
is approximately 3000*3*2048 bits for the encrypted decryption keys (at
most, and there is likely to be ways to optimize that), plus 3 blocks of
Rijndael per field, or 384 bits perfield, times 3 fields per card, times 500
million cards, is a mere 68 gigabytes. This way the computation is likely to
be fairly dominated by the disk/database access that is involved in such
operations.

It is for this reason that I am personally strongly for seperating the
keying data from the encrypted data. I am sure I am not the only person who
will be dealing with this kind of situation, and these are not small
differences in compute or storage area.
                        Joe
Received on Thursday, 21 December 2000 18:30:13 UTC