Re: Encryption and Signatures (was Re: [DPUB] packaging requirements document)

Leonard, OK so maybe this is not just a terminology thing if you are
insisting that somehow physical packaging is a pre-requisite for reliable
digital signatures. I just don't see any real justification for such a
limitation.

To me "a well defined set of bytes " has nothing to do with whether these
bytes happen to be "contained in a package" so I take your statement as
juxtaposing separate issues. Of course the latter certainly is a good way
to circumscribe the former, but it is not the only way (and an important
part of the charter of this group is to help ensure that packaged content
is not the only way). And, there are exceptions and corner cases to the
"packaged" scenario: as you pointed out earlier, PDF files don't
necessarily physically contain the necessary fonts, so any "whole package"
signing doesn't fully capture the whole Infoset. That there have been a
number of malware exploits based on bugs in font handling code makes this
not just a theoretical concern.

So as I see it whether a signature is applied at the whole publication (all
its constituent resources) or incrementally to different resources is
really an implementation detail orthogonal to whether a publication's
resources are packaged or distributed. In today's packaged EPUB, content is
encrypted resource-by-resource not at the whole ZIP package level so your
comment the the latter is "...what the publisher is going to do" doesn't
track the reality of today's EPUB workflows.

--Bill


On Thu, Aug 20, 2015 at 11:27 AM, Leonard Rosenthol <lrosenth@adobe.com>
wrote:

> Sorry, Bill, but this has nothing to do with terminology.
>
> I believe, instead, that it has to do with a core component of signature
> technology which is “What is the user signing?” which has an friend
> commonly referred to as “The WYSIWYS problem”[1].  Both of which happen to
> fall out of the way that Digital Signature technology happens to work.
>
> In Ivan’s example, he is signing the content of the email.  Not the
> headers, not the packaging, not the transport, etc.   However, when one
> applies a DigSig to a document today (be it PDF [2], Word [3], etc.), that
> signature covers the entire package.   There are also systems where one can
> sign portions of a package (eg. The ASiC standard from ETSI [4]) or even
> just specific pieces of content/data (eg. Data signatures in PDF [5]).   In
> all these cases, it doesn’t matter what the content does – whether it is
> static/dynamic, interactive or not – it is a well defined set of bytes
> contained in a package.
>
> These all make sense in the context of a “stand-alone” or offline scenario
> where I have the package and I want to verify that it is intact, unmodified
> and is the one that was sent by the person I expected it to be from.   But
> what does it mean in the context of a piece of content served via HTTP (be
> it  REST or simple web serving)?  It could well be static content (in the
> classic static web server case) that may or may not be interactive.  But
> how would the client know what is signed and how to verify it – unless you
> are now serving up something more than just the original content?  (FWIW:
> XML DigSig [6], JSON Web Signatures [7] and JOSE [8] all try to address).
> And that only works in the individual content signing (where someone signed
> each static resource/asset individually) and not in the case where the
> entire package was signed (more likely, since that’s what the publisher is
> going to do).
>
> And I am all in favor of discussions around document lifecycles :).
>
> Leonard
>
>
> [1] https://en.wikipedia.org/wiki/WYSIWYS
> *[2] *
> https://www.adobe.com/devnet-docs/acrobatetk/tools/DigSig/Acrobat_DigitalSignatures_in_PDF.pdf
> [3]
> https://support.office.com/en-sg/article/Add-or-remove-a-digital-signature-in-Office-documents-49af4304-bfe7-41bf-99c3-a5023bdab44a
> [4]
> http://www.etsi.org/deliver/etsi_ts/102900_102999/102918/01.01.01_60/ts_102918v010101p.pdf
> [5]
> http://help.adobe.com/en_US/acrobat/X/standard/using/WS396794562021d52e36ef196612b34b1b97f-8000.html
> [6] http://www.w3.org/TR/xmldsig-core/
> [7] https://tools.ietf.org/html/rfc7515
> [8]http://www.iana.org/assignments/jose/jose.xhtml
>
>
>
> From: Bill McCoy
> Date: Thursday, August 20, 2015 at 12:28 PM
> To: Ivan Herman
> Cc: Leonard Rosenthol, Tzviya Siegman, W3C Digital Publishing IG
>
> Subject: Re: Encryption and Signatures (was Re: [DPUB] packaging
> requirements document)
>
> IMO this particular debate is just a confusion in terminology..
>
> Of course digital signing (both of the verification nature as per Ivan's
> example and in the legal sense as I think Leonard was talking about) makes
> sense for content whether it happens to be online or offline. But it
> doesn't make much sense in the full generality of the REST architecture of
> the Web, where the online resources at URL endpoints are active programs
> that may be dynamically returning different representations.
>
> So I took Leonard to have been using the term "online" to mean "full
> dynamic Web" aka the opposite of my suggested "idempotency" constraint for
> portable documents (that resource=representation). But I don't think that
> has anything to do per se with "online" vs. "offline" states of content so
> I think we should avoid mis-using those terms when what we mean is
> "dynamic" vs. "static" (I have avoided the term "static" because it tends
> to be taken as implying non-interactive but in fact static content can
> include JavaScript that delivers interactivity on the client side (aka the
> "code-on-demand" option of REST architecture)... but perhaps "static" is
> nevertheless a better term than "idempotent", as in the Web world it is
> pretty well understood what static hosting means.
>
> BTW I think it would be a mistake if this group didn't (eventually)
> wrestle with issues that arise throughout the life cycle of content...
> "publication" (V.)  is no longer a singular action thus "publication" (N.)
> cannot be considered to exist only as an endpoint. So again in the above
> sense I think "static" doesn't imply a definitive, non-changing work but is
> a statement about a "promise" made about invariability of a particular
> manifestation of a work. Backing up that promise with things like digitally
> signed checksums seems sensible to me especially as we move beyond the
> single-file packaged instantiation which gives a very basic testament of
> integrity.
>
> --Bill
>
>
> On Thu, Aug 20, 2015 at 9:01 AM, Ivan Herman <ivan@w3.org> wrote:
>
>>
>> > On 20 Aug 2015, at 17:33 , Leonard Rosenthol <lrosenth@adobe.com>
>> wrote:
>> >
>> > Access Control and/or DRM is not the same as encryption.  Many
>> implementations of ACL/DRM use encryption but it’s not a requirement - and
>> you can have encryption w/o ACL/DRM (such as with https).  Should we redo
>> that section as Access Control and Rights Management instead of
>> encryption?  That would indeed make things clearer and allow the inclusion
>> of the various uses cases that both of you have put forth.
>> >
>>
>> That is fine, it is technology neutral indeed. I am just a bit scared to
>> put words like Digital Rights Management into such a place because it
>> always sparks discussion… let us keep to access control:-)
>>
>> > Signatures, on the other hand, I still don’t see as something that
>> works online because I don’t know what is being signed.  And is this really
>> about an authentication (or certification) signature, a user signature
>> (such as a contract), all, none?
>> >
>>
>> Hm. I do it all the time (although this is not publication): I
>> systematically sign my mails when I send them from my laptop, using GPG. I
>> do receive such mails from my colleagues. This is used for authentication
>> (because my email address has been forged in the past). I could do the same
>> if I send over a contract indeed. I am not sure I see the problem...
>>
>> Ivan
>>
>>
>> > Leonard
>> >
>> >
>> >
>> >
>> > On 8/20/15, 10:56 AM, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
>> wrote:
>> >
>> >> Ivan beat me to this.
>> >>
>> >> We have not begun to talk about the full ecosystem of publishing, just
>> the finished product. If I want to use Web Publications + Web Annotations
>> for a blind peer review process, there are very strict guidelines about who
>> can/cannot see different pieces (the article, the metadata, the associated
>> data, the annotations).  This too is needed in all states.
>> >>
>> >>
>> >>
>> >> Tzviya Siegman
>> >> Digital Book Standards & Capabilities Lead
>> >> Wiley
>> >> 201-748-6884
>> >> tsiegman@wiley.com
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Ivan Herman [mailto:ivan@w3.org]
>> >> Sent: Thursday, August 20, 2015 10:40 AM
>> >> To: Leonard Rosenthol
>> >> Cc: Siegman, Tzviya - Hoboken; W3C Digital Publishing IG
>> >> Subject: Re: Encryption and Signatures (was Re: [DPUB] packaging
>> requirements document)
>> >>
>> >>
>> >>> On 20 Aug 2015, at 16:31 , Leonard Rosenthol <lrosenth@adobe.com>
>> wrote:
>> >>>
>> >>> I am familiar with the WebCrypto work.  However, that’s simply an
>> implementation possibility and doesn’t directly reflect the actual desire
>> (or lack of same) as to what it means to have an “encrypted online
>> publication” or a “signed online publication”.
>> >>>
>> >>> And that is what I am asking about - use case(s).
>> >>>
>> >>> Leonard
>> >>>
>> >>>
>> >>
>> >> Well… alas!, almost all journal publishers do some sort of an access
>> control over scholarly papers. Today, they hide it behind password
>> firewalls, sometimes they provide you password protected PDF files, etc. I
>> would think that for that community the portable document approach that we
>> have would be viable only if there is an encryption of the content.
>> >>
>> >> A similar, but maybe a better example are legal documents (that need
>> some authentication, ie, signatures) or, say, market reports from
>> consulting companies that sell those reports for good money. I would expect
>> that for all of these any Web Document would have to provide some
>> encryption/signature facility online, just like the portable version.
>> >>
>> >> Ivan
>> >>
>> >>
>> >>>
>> >>>
>> >>> On 8/20/15, 10:06 AM, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
>> wrote:
>> >>>
>> >>>> Hi Leonard,
>> >>>>
>> >>>> I am not well-enough versed in these areas to know if this is one of
>> the solutions digital publishing will pursue, but there is a Web
>> Cryptography group focusing on these areas. They have published a draft [1].
>> >>>>
>> >>>> [1] http://www.w3.org/TR/WebCryptoAPI/
>> >>>>
>> >>>>
>> >>>> Tzviya Siegman
>> >>>> Digital Book Standards & Capabilities Lead Wiley
>> >>>> 201-748-6884
>> >>>> tsiegman@wiley.com
>> >>>>
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: Leonard Rosenthol [mailto:lrosenth@adobe.com]
>> >>>> Sent: Thursday, August 20, 2015 9:33 AM
>> >>>> To: Ivan Herman
>> >>>> Cc: Siegman, Tzviya - Hoboken; W3C Digital Publishing IG
>> >>>> Subject: Encryption and Signatures (was Re: [DPUB] packaging
>> >>>> requirements document)
>> >>>>
>> >>>> I really like the improvements - looks great!
>> >>>>
>> >>>> In re-reading the document, it struck me that the last two
>> requirements - Encryption and Digital Signatures - are being applied to all
>> states, but I am not sure if that’s actually what we are expecting.
>> >>>>
>> >>>> Certainly, these make sense in the offline state - and they may make
>> sense in the cached state (depending on how caching is implemented by a
>> given RS/UA).  However, I am having trouble understanding how they would be
>> used/applied in the online case.  Can someone give me some use cases for
>> either/both in those cases?
>> >>>>
>> >>>> Thanks,
>> >>>> Leonard
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 8/20/15, 5:10 AM, "Ivan Herman" <ivan@w3.org> wrote:
>> >>>>
>> >>>>>
>> >>>>>> On 19 Aug 2015, at 17:08 , Leonard Rosenthol <lrosenth@adobe.com>
>> wrote:
>> >>>>>>
>> >>>>>> On 8/19/15, 10:42 AM, "Ivan Herman" <ivan@w3.org> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> I was fairly busy with other things today, so I could not spend
>> >>>>>>> too much time on this. I have some responses (and possible actions
>> >>>>>>> on the documents) below, but I cannot promise to take care of all
>> >>>>>>> of them now. To be continued tomorrow, if needed…
>> >>>>>>
>> >>>>>> No problem - just wanted to make sure we delivered our document in
>> >>>>>> a timely manner…
>> >>>>>>
>> >>>>>>
>> >>>>>>>> On 18 Aug 2015, at 18:10 , Leonard Rosenthol <lrosenth@adobe.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> – Regardless of the fact that someone at the IETF thinks
>> “archive” is the right term, in the document/publication space it is NOT.
>> I would strongly recommend that we NOT refer to that document or that
>> terminology.
>> >>>>>>>
>> >>>>>>> During the discussion on the mailing list we were asked to put a
>> concise definition for a package into the document. (I believe what IETF
>> considered as archive in their exploration for providing a top level media
>> type for packages is actually of a similar goal.) Do you have a beter
>> replacement?
>> >>>>>>
>> >>>>>> I think “package” is the correct term, not archive.  I have
>> reached out to the IETF to get them to change as well.
>> >>>>>
>> >>>>> I have added a note that this is not the terminology we use, but
>> the definition itself may still be helpful.
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>>> - I have problems with this phrase “ This is, however, different
>> from the cached state of a networked publication, which does not have a
>> separate existence (though can also be used offline).”.  There are many
>> ways to cache, some of which are related to browser-based technology and
>> some of which are not.  But all of which constitute the concept of a
>> “cached and offline” document.   How about just removing this.  I don’t
>> think it adds anything, certainly not at this point in the document.
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> The text (tries to) refer to browser based caches here.
>> >>>>>>
>> >>>>>> And my point is that it should not do so, because there is no
>> requirement that ONLY browser-based caches be used as part of the process
>> of caching and/or taking a publication offline.  There is also no
>> requirement that the cached state and the portable state be different.  I
>> believe that it is important that this document be agnostic to the specific
>> technology choices and focus on the goals and requirements.
>> >>>>>>
>> >>>>>>
>> >>>>>>> Do you have a better way of formulating this?
>> >>>>>>
>> >>>>>> I would just remove the sentence entirely as it adds nothing.
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> Having re-read the whole paragraph I think I agree that the
>> sentence may be superfluous there, so let us remove it. But (in your
>> original remark) you also questioned the three bullet items; I claim that
>> the reference to the package is important there, and I think the
>> definitions should stay as they are.
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>>> Right, this is a bit more complicated. What I think was meant is
>> that the rendering and possibly interactive part of the reading system
>> independent of the state, ie, the change on that is indeed transparent.
>> >>>>>>
>> >>>>>> Yes, I agree that the content should look/act the same independent
>> of state.  Just say something that like :).
>> >>>>>>
>> >>>>>>
>> >>>>>>>> - The phrase “ It should maintain its integrity over time” isn’t
>> actually something that we, as the file format specification, have any
>> control over. It is more about the media, systems, etc. in which the
>> content is stored.  As such, it should be removed.
>> >>>>>>>
>> >>>>>>> Hm. If I reboot my machine, the cache will disappear, but a
>> portable document on my disc will remain. I am not sure what the problem is
>> with this.
>> >>>>>>
>> >>>>>> What you talk about is persistence, not integrity.  Integrity has
>> to do with reliability and robustness, which are more tied to things such
>> as media stability, data validation/checksumming, etc.
>> >>>>>>
>> >>>>>
>> >>>>> I have changed this to persistence.
>> >>>>>
>> >>>>>> And actually, there is nothing in the requirements that state that
>> the cache goes away on a reboot.  That would be a specific implementation
>> decision.
>> >>>>>>
>> >>>>>
>> >>>>> That is true, I just used that as an example in my response...
>> >>>>>
>> >>>>>>
>> >>>>>>>> - Are there no other requirements for the portable state?  I
>> believe we had some in our existing use case/requirements specs.   If not,
>> I can think of a few that I would add here.
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> I would very welcome that.
>> >>>>>>
>> >>>>>> Here are a few…
>> >>>>>>
>> >>>>>
>> >>>>> These are of course all valid use cases. As for the document:
>> >>>>>
>> >>>>>> - Ability to distribute the publication via non-real-time methods
>> >>>>>> ranging from email to sneaker-net
>> >>>>>
>> >>>>> There is reference to this in the opening paragraph ("Nevertheless,
>> >>>>> packages that exist (possibly) apart from the network still have a
>> >>>>> role to play as units that can be stored or transferred. This
>> >>>>> concept is essential with the current business models that dominate
>> >>>>> the publishing industry for, e.g., digital books.") I wonder whether
>> >>>>> this warrants a separate section
>> >>>>>
>> >>>>>> - Ability to read the publication behind a firewall or secured
>> >>>>>> network
>> >>>>>> - Ability to perform preflight & validation on a stable set of
>> >>>>>> content
>> >>>>>
>> >>>>> Again, these are valid use cases, but are they peculiar to the
>> portable state? After all, online content should be reachable behind the
>> firewall if one has the right access credentials; we would be misunderstood
>> as if we said that a publication can only be consumed as a portable state
>> if within the enterprise...
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>>> This is something I will have to think more about. The issue is
>> that the streamability may make something different depending on the state.
>> >>>>>>
>> >>>>>> Again, I think we have a terminology problem here. What you
>> describe is the ability to get live (or updated) data/content which is a
>> completely different requirement than streamability.  The ability to stream
>> is well defined in the first sentence - “ It must be possible for a client
>> to fetch components of a package in any order, or to fetch multiple
>> components at the same time, without having to read the entire document”.
>> That has NOTHING to do with being connected or offline - it has to do how
>> the RS is able to access to the content.
>> >>>>>>
>> >>>>>> If you want to also add a separate requirement around the ability
>> for the package to be able to specify that specific pieces of content
>> (assets) within the package are not necessarily embedded but instead are
>> retrieved live (with optional caching) - I think that would be a welcome
>> requirement.  Or you could just merge this with "Updates new components
>> only”?
>> >>>>>>
>> >>>>>
>> >>>>> Reading it through again, I believe there is indeed a conflation of
>> concepts. I have actually divided up into three different sections:
>> >>>>>
>> >>>>> - Streaming (following the definition of streaming on wikipedia[1])
>> >>>>> - Random access to content
>> >>>>> - External (non embedded) references, which is your last example
>> >>>>>
>> >>>>> I moved some of the use cases around accordingly.
>> >>>>>
>> >>>>>
>> >>>>> [1] https://en.wikipedia.org/wiki/Streaming_media
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>>>> - In the Package in a Package section, you have “This is
>> trivially available in online and cached states, but puts an extra
>> requirement on portable states.”  This appears to be a copy/paste from
>> elsewhere, as it doesn’t belong here because it’s simply not true in this
>> case.  Please remove.
>> >>>>>>>
>> >>>>>>> It is a copy paste indeed, but is it incorrect? (It may be
>> superfluous, though).
>> >>>>>>
>> >>>>>> Given that we don’t actually know what an “online state”, a
>> “cached state” or even a “portable state” look like from a technical
>> perspective - it is impossible to make any comment on the ability to
>> implement such.
>> >>>>>>
>> >>>>>
>> >>>>> You are right. I removed these types of comments. At some point, a
>> >>>>> more careful technical analysis will be needed for each of those
>> >>>>> requirements, but this is indeed not the place…
>> >>>>>
>> >>>>> Thanks!
>> >>>>>
>> >>>>> Ivan
>> >>>>>
>> >>>>>>
>> >>>>>>>> - The Access to package section also has a similar note about
>> “trivially available” which is also not true, and I would recommend removal
>> as well.
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> I have re-written that sentence in a way that, I believe, is
>> >>>>>>> correct…
>> >>>>>>
>> >>>>>> I don’t see the change yet, but as with the previous statement - I
>> don’t see how we can make any comments on implementation complexity until
>> we know what is being implemented.
>> >>>>>>
>> >>>>>>
>> >>>>>>> Thanks
>> >>>>>>
>> >>>>>> Thank you for taking the time to review my comments.
>> >>>>>>
>> >>>>>>
>> >>>>>> Leonard
>> >>>>>
>> >>>>>
>> >>>>> ----
>> >>>>> Ivan Herman, W3C
>> >>>>> Digital Publishing Activity Lead
>> >>>>> Home: http://www.w3.org/People/Ivan/
>> >>>>> mobile: +31-641044153
>> >>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> _____________________________________________________________________
>> >>>> _ This email has been scanned by the Symantec Email Security.cloud
>> >>>> service.
>> >>>> For more information please visit http://www.symanteccloud.com
>> >>>> _____________________________________________________________________
>> >>>> _
>> >>
>> >>
>> >> ----
>> >> Ivan Herman, W3C
>> >> Digital Publishing Activity Lead
>> >> Home: http://www.w3.org/People/Ivan/
>> >> mobile: +31-641044153
>> >> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> >>
>> >>
>> >>
>> >>
>>
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>
>>
>>
>>
>>
>

Received on Thursday, 20 August 2015 18:51:22 UTC