Re: Encryption and Signatures (was Re: [DPUB] packaging requirements document) from Leonard Rosenthol on 2015-08-20 (public-digipub-ig@w3.org from August 2015)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Thu, 20 Aug 2015 18:27:59 +0000
To: Bill McCoy <bmccoy@idpf.org>, Ivan Herman <ivan@w3.org>
CC: Tzviya Siegman <tsiegman@wiley.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <7A94A318-160C-4A43-A375-03530FE218B0@adobe.com>
Sorry, Bill, but this has nothing to do with terminology.

I believe, instead, that it has to do with a core component of signature technology which is “What is the user signing?” which has an friend commonly referred to as “The WYSIWYS problem”[1].  Both of which happen to fall out of the way that Digital Signature technology happens to work.

In Ivan’s example, he is signing the content of the email.  Not the headers, not the packaging, not the transport, etc.   However, when one applies a DigSig to a document today (be it PDF [2], Word [3], etc.), that signature covers the entire package.   There are also systems where one can sign portions of a package (eg. The ASiC standard from ETSI [4]) or even just specific pieces of content/data (eg. Data signatures in PDF [5]).   In all these cases, it doesn’t matter what the content does – whether it is static/dynamic, interactive or not – it is a well defined set of bytes contained in a package.

These all make sense in the context of a “stand-alone” or offline scenario where I have the package and I want to verify that it is intact, unmodified and is the one that was sent by the person I expected it to be from.   But what does it mean in the context of a piece of content served via HTTP (be it  REST or simple web serving)?  It could well be static content (in the classic static web server case) that may or may not be interactive.  But how would the client know what is signed and how to verify it – unless you are now serving up something more than just the original content?  (FWIW: XML DigSig [6], JSON Web Signatures [7] and JOSE [8] all try to address).   And that only works in the individual content signing (where someone signed each static resource/asset individually) and not in the case where the entire package was signed (more likely, since that’s what the publisher is going to do).

And I am all in favor of discussions around document lifecycles :).

Leonard


[1] https://en.wikipedia.org/wiki/WYSIWYS

[2] https://www.adobe.com/devnet-docs/acrobatetk/tools/DigSig/Acrobat_DigitalSignatures_in_PDF.pdf

[3] https://support.office.com/en-sg/article/Add-or-remove-a-digital-signature-in-Office-documents-49af4304-bfe7-41bf-99c3-a5023bdab44a

[4] http://www.etsi.org/deliver/etsi_ts/102900_102999/102918/01.01.01_60/ts_102918v010101p.pdf

[5] http://help.adobe.com/en_US/acrobat/X/standard/using/WS396794562021d52e36ef196612b34b1b97f-8000.html

[6] http://www.w3.org/TR/xmldsig-core/

[7] https://tools.ietf.org/html/rfc7515

[8]http://www.iana.org/assignments/jose/jose.xhtml




From: Bill McCoy
Date: Thursday, August 20, 2015 at 12:28 PM
To: Ivan Herman
Cc: Leonard Rosenthol, Tzviya Siegman, W3C Digital Publishing IG
Subject: Re: Encryption and Signatures (was Re: [DPUB] packaging requirements document)

IMO this particular debate is just a confusion in terminology..

Of course digital signing (both of the verification nature as per Ivan's example and in the legal sense as I think Leonard was talking about) makes sense for content whether it happens to be online or offline. But it doesn't make much sense in the full generality of the REST architecture of the Web, where the online resources at URL endpoints are active programs that may be dynamically returning different representations.

So I took Leonard to have been using the term "online" to mean "full dynamic Web" aka the opposite of my suggested "idempotency" constraint for portable documents (that resource=representation). But I don't think that has anything to do per se with "online" vs. "offline" states of content so I think we should avoid mis-using those terms when what we mean is "dynamic" vs. "static" (I have avoided the term "static" because it tends to be taken as implying non-interactive but in fact static content can include JavaScript that delivers interactivity on the client side (aka the "code-on-demand" option of REST architecture)... but perhaps "static" is nevertheless a better term than "idempotent", as in the Web world it is pretty well understood what static hosting means.

BTW I think it would be a mistake if this group didn't (eventually) wrestle with issues that arise throughout the life cycle of content... "publication" (V.)  is no longer a singular action thus "publication" (N.) cannot be considered to exist only as an endpoint. So again in the above sense I think "static" doesn't imply a definitive, non-changing work but is a statement about a "promise" made about invariability of a particular manifestation of a work. Backing up that promise with things like digitally signed checksums seems sensible to me especially as we move beyond the single-file packaged instantiation which gives a very basic testament of integrity.

--Bill


On Thu, Aug 20, 2015 at 9:01 AM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:

> On 20 Aug 2015, at 17:33 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>
> Access Control and/or DRM is not the same as encryption.  Many implementations of ACL/DRM use encryption but it’s not a requirement - and you can have encryption w/o ACL/DRM (such as with https).  Should we redo that section as Access Control and Rights Management instead of encryption?  That would indeed make things clearer and allow the inclusion of the various uses cases that both of you have put forth.
>

That is fine, it is technology neutral indeed. I am just a bit scared to put words like Digital Rights Management into such a place because it always sparks discussion… let us keep to access control:-)

> Signatures, on the other hand, I still don’t see as something that works online because I don’t know what is being signed.  And is this really about an authentication (or certification) signature, a user signature (such as a contract), all, none?
>

Hm. I do it all the time (although this is not publication): I systematically sign my mails when I send them from my laptop, using GPG. I do receive such mails from my colleagues. This is used for authentication (because my email address has been forged in the past). I could do the same if I send over a contract indeed. I am not sure I see the problem...

Ivan


> Leonard
>
>
>
>
> On 8/20/15, 10:56 AM, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:
>
>> Ivan beat me to this.
>>
>> We have not begun to talk about the full ecosystem of publishing, just the finished product. If I want to use Web Publications + Web Annotations for a blind peer review process, there are very strict guidelines about who can/cannot see different pieces (the article, the metadata, the associated data, the annotations).  This too is needed in all states.
>>
>>
>>
>> Tzviya Siegman
>> Digital Book Standards & Capabilities Lead
>> Wiley
>> 201-748-6884<tel:201-748-6884>
>> tsiegman@wiley.com<mailto:tsiegman@wiley.com>
>>
>>
>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org<mailto:ivan@w3.org>]
>> Sent: Thursday, August 20, 2015 10:40 AM
>> To: Leonard Rosenthol
>> Cc: Siegman, Tzviya - Hoboken; W3C Digital Publishing IG
>> Subject: Re: Encryption and Signatures (was Re: [DPUB] packaging requirements document)
>>
>>
>>> On 20 Aug 2015, at 16:31 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>
>>> I am familiar with the WebCrypto work.  However, that’s simply an implementation possibility and doesn’t directly reflect the actual desire (or lack of same) as to what it means to have an “encrypted online publication” or a “signed online publication”.
>>>
>>> And that is what I am asking about - use case(s).
>>>
>>> Leonard
>>>
>>>
>>
>> Well… alas!, almost all journal publishers do some sort of an access control over scholarly papers. Today, they hide it behind password firewalls, sometimes they provide you password protected PDF files, etc. I would think that for that community the portable document approach that we have would be viable only if there is an encryption of the content.
>>
>> A similar, but maybe a better example are legal documents (that need some authentication, ie, signatures) or, say, market reports from consulting companies that sell those reports for good money. I would expect that for all of these any Web Document would have to provide some encryption/signature facility online, just like the portable version.
>>
>> Ivan
>>
>>
>>>
>>>
>>> On 8/20/15, 10:06 AM, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:
>>>
>>>> Hi Leonard,
>>>>
>>>> I am not well-enough versed in these areas to know if this is one of the solutions digital publishing will pursue, but there is a Web  Cryptography group focusing on these areas. They have published a draft [1].
>>>>
>>>> [1] http://www.w3.org/TR/WebCryptoAPI/

>>>>
>>>>
>>>> Tzviya Siegman
>>>> Digital Book Standards & Capabilities Lead Wiley
>>>> 201-748-6884<tel:201-748-6884>
>>>> tsiegman@wiley.com<mailto:tsiegman@wiley.com>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Leonard Rosenthol [mailto:lrosenth@adobe.com<mailto:lrosenth@adobe.com>]
>>>> Sent: Thursday, August 20, 2015 9:33 AM
>>>> To: Ivan Herman
>>>> Cc: Siegman, Tzviya - Hoboken; W3C Digital Publishing IG
>>>> Subject: Encryption and Signatures (was Re: [DPUB] packaging
>>>> requirements document)
>>>>
>>>> I really like the improvements - looks great!
>>>>
>>>> In re-reading the document, it struck me that the last two requirements - Encryption and Digital Signatures - are being applied to all states, but I am not sure if that’s actually what we are expecting.
>>>>
>>>> Certainly, these make sense in the offline state - and they may make sense in the cached state (depending on how caching is implemented by a given RS/UA).  However, I am having trouble understanding how they would be used/applied in the online case.  Can someone give me some use cases for either/both in those cases?
>>>>
>>>> Thanks,
>>>> Leonard
>>>>
>>>>
>>>>
>>>>
>>>> On 8/20/15, 5:10 AM, "Ivan Herman" <ivan@w3.org<mailto:ivan@w3.org>> wrote:
>>>>
>>>>>
>>>>>> On 19 Aug 2015, at 17:08 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>>>>
>>>>>> On 8/19/15, 10:42 AM, "Ivan Herman" <ivan@w3.org<mailto:ivan@w3.org>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I was fairly busy with other things today, so I could not spend
>>>>>>> too much time on this. I have some responses (and possible actions
>>>>>>> on the documents) below, but I cannot promise to take care of all
>>>>>>> of them now. To be continued tomorrow, if needed…
>>>>>>
>>>>>> No problem - just wanted to make sure we delivered our document in
>>>>>> a timely manner…
>>>>>>
>>>>>>
>>>>>>>> On 18 Aug 2015, at 18:10 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>>>>>>
>>>>>>>> – Regardless of the fact that someone at the IETF thinks “archive” is the right term, in the document/publication space it is NOT.  I would strongly recommend that we NOT refer to that document or that terminology.
>>>>>>>
>>>>>>> During the discussion on the mailing list we were asked to put a concise definition for a package into the document. (I believe what IETF considered as archive in their exploration for providing a top level media type for packages is actually of a similar goal.) Do you have a beter replacement?
>>>>>>
>>>>>> I think “package” is the correct term, not archive.  I have reached out to the IETF to get them to change as well.
>>>>>
>>>>> I have added a note that this is not the terminology we use, but the definition itself may still be helpful.
>>>>>
>>>>>>
>>>>>>
>>>>>>>> - I have problems with this phrase “ This is, however, different from the cached state of a networked publication, which does not have a separate existence (though can also be used offline).”.  There are many ways to cache, some of which are related to browser-based technology and some of which are not.  But all of which constitute the concept of a “cached and offline” document.   How about just removing this.  I don’t think it adds anything, certainly not at this point in the document.
>>>>>>>>
>>>>>>>
>>>>>>> The text (tries to) refer to browser based caches here.
>>>>>>
>>>>>> And my point is that it should not do so, because there is no requirement that ONLY browser-based caches be used as part of the process of caching and/or taking a publication offline.  There is also no requirement that the cached state and the portable state be different.  I believe that it is important that this document be agnostic to the specific technology choices and focus on the goals and requirements.
>>>>>>
>>>>>>
>>>>>>> Do you have a better way of formulating this?
>>>>>>
>>>>>> I would just remove the sentence entirely as it adds nothing.
>>>>>>
>>>>>>
>>>>>
>>>>> Having re-read the whole paragraph I think I agree that the sentence may be superfluous there, so let us remove it. But (in your original remark) you also questioned the three bullet items; I claim that the reference to the package is important there, and I think the definitions should stay as they are.
>>>>>
>>>>>>
>>>>>>
>>>>>>>> Right, this is a bit more complicated. What I think was meant is that the rendering and possibly interactive part of the reading system independent of the state, ie, the change on that is indeed transparent.
>>>>>>
>>>>>> Yes, I agree that the content should look/act the same independent of state.  Just say something that like :).
>>>>>>
>>>>>>
>>>>>>>> - The phrase “ It should maintain its integrity over time” isn’t actually something that we, as the file format specification, have any control over. It is more about the media, systems, etc. in which the content is stored.  As such, it should be removed.
>>>>>>>
>>>>>>> Hm. If I reboot my machine, the cache will disappear, but a portable document on my disc will remain. I am not sure what the problem is with this.
>>>>>>
>>>>>> What you talk about is persistence, not integrity.  Integrity has to do with reliability and robustness, which are more tied to things such as media stability, data validation/checksumming, etc.
>>>>>>
>>>>>
>>>>> I have changed this to persistence.
>>>>>
>>>>>> And actually, there is nothing in the requirements that state that the cache goes away on a reboot.  That would be a specific implementation decision.
>>>>>>
>>>>>
>>>>> That is true, I just used that as an example in my response...
>>>>>
>>>>>>
>>>>>>>> - Are there no other requirements for the portable state?  I believe we had some in our existing use case/requirements specs.   If not, I can think of a few that I would add here.
>>>>>>>>
>>>>>>>
>>>>>>> I would very welcome that.
>>>>>>
>>>>>> Here are a few…
>>>>>>
>>>>>
>>>>> These are of course all valid use cases. As for the document:
>>>>>
>>>>>> - Ability to distribute the publication via non-real-time methods
>>>>>> ranging from email to sneaker-net
>>>>>
>>>>> There is reference to this in the opening paragraph ("Nevertheless,
>>>>> packages that exist (possibly) apart from the network still have a
>>>>> role to play as units that can be stored or transferred. This
>>>>> concept is essential with the current business models that dominate
>>>>> the publishing industry for, e.g., digital books.") I wonder whether
>>>>> this warrants a separate section
>>>>>
>>>>>> - Ability to read the publication behind a firewall or secured
>>>>>> network
>>>>>> - Ability to perform preflight & validation on a stable set of
>>>>>> content
>>>>>
>>>>> Again, these are valid use cases, but are they peculiar to the portable state? After all, online content should be reachable behind the firewall if one has the right access credentials; we would be misunderstood as if we said that a publication can only be consumed as a portable state if within the enterprise...
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>> This is something I will have to think more about. The issue is that the streamability may make something different depending on the state.
>>>>>>
>>>>>> Again, I think we have a terminology problem here. What you describe is the ability to get live (or updated) data/content which is a completely different requirement than streamability.  The ability to stream is well defined in the first sentence - “ It must be possible for a client to fetch components of a package in any order, or to fetch multiple components at the same time, without having to read the entire document”.  That has NOTHING to do with being connected or offline - it has to do how the RS is able to access to the content.
>>>>>>
>>>>>> If you want to also add a separate requirement around the ability for the package to be able to specify that specific pieces of content (assets) within the package are not necessarily embedded but instead are retrieved live (with optional caching) - I think that would be a welcome requirement.  Or you could just merge this with "Updates new components only”?
>>>>>>
>>>>>
>>>>> Reading it through again, I believe there is indeed a conflation of concepts. I have actually divided up into three different sections:
>>>>>
>>>>> - Streaming (following the definition of streaming on wikipedia[1])
>>>>> - Random access to content
>>>>> - External (non embedded) references, which is your last example
>>>>>
>>>>> I moved some of the use cases around accordingly.
>>>>>
>>>>>
>>>>> [1] https://en.wikipedia.org/wiki/Streaming_media

>>>>>
>>>>>
>>>>>>
>>>>>>>> - In the Package in a Package section, you have “This is trivially available in online and cached states, but puts an extra requirement on portable states.”  This appears to be a copy/paste from elsewhere, as it doesn’t belong here because it’s simply not true in this case.  Please remove.
>>>>>>>
>>>>>>> It is a copy paste indeed, but is it incorrect? (It may be superfluous, though).
>>>>>>
>>>>>> Given that we don’t actually know what an “online state”, a “cached state” or even a “portable state” look like from a technical perspective - it is impossible to make any comment on the ability to implement such.
>>>>>>
>>>>>
>>>>> You are right. I removed these types of comments. At some point, a
>>>>> more careful technical analysis will be needed for each of those
>>>>> requirements, but this is indeed not the place…
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Ivan
>>>>>
>>>>>>
>>>>>>>> - The Access to package section also has a similar note about “trivially available” which is also not true, and I would recommend removal as well.
>>>>>>>>
>>>>>>>
>>>>>>> I have re-written that sentence in a way that, I believe, is
>>>>>>> correct…
>>>>>>
>>>>>> I don’t see the change yet, but as with the previous statement - I don’t see how we can make any comments on implementation complexity until we know what is being implemented.
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>
>>>>>> Thank you for taking the time to review my comments.
>>>>>>
>>>>>>
>>>>>> Leonard
>>>>>
>>>>>
>>>>> ----
>>>>> Ivan Herman, W3C
>>>>> Digital Publishing Activity Lead
>>>>> Home: http://www.w3.org/People/Ivan/

>>>>> mobile: +31-641044153<tel:%2B31-641044153>
>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> _____________________________________________________________________
>>>> _ This email has been scanned by the Symantec Email Security.cloud
>>>> service.
>>>> For more information please visit http://www.symanteccloud.com

>>>> _____________________________________________________________________
>>>> _
>>
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/

>> mobile: +31-641044153<tel:%2B31-641044153>
>> ORCID ID: http://orcid.org/0000-0003-0782-2704

>>
>>
>>
>>


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Thursday, 20 August 2015 18:28:34 UTC