RE: Encryption and Signatures (was Re: [DPUB] packaging requirements document)

Thanks, Shane. This is like PROV on steroids.

Tzviya Siegman
Digital Book Standards & Capabilities Lead
Wiley
201-748-6884
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: ahby@aptest.com [mailto:ahby@aptest.com] On Behalf Of Shane McCarron
Sent: Thursday, August 20, 2015 2:08 PM
To: Siegman, Tzviya - Hoboken
Cc: Ivan Herman; Leonard Rosenthol; W3C Digital Publishing IG
Subject: Re: Encryption and Signatures (was Re: [DPUB] packaging requirements document)

There is also a related issue of credentials.  The Credentials Community Group has done a lot of work on the underlying concept of providing provably valid 'credentials' that could be associated with things like people and publications.  "This document is its original form and its provenance is X" sort of thing.  Not sure that there is a specific need to incorporate this into the document, but maybe.  It is also possible to handle this with JSON-LD and digital signatures.  Manu Sporny provided the following example:

{
  "@context": "https://w3.org/ns/digipub-v1",
  "id": "https://example.org/books/the-work",
  "contentHash": "23hf92slkjh32987ynf32987dsh32",
  "label": "The Work",
  "published": "2015-04-25",
  "author": [{...}, {...}],
  "signature": {
    "type": "GraphSignature2015",
    "creator": "https://example.org/keys/14",
    "created": "2015-04-26T20:21:34Z",
    "signatureValue": "OGQzNGVkMzVm4NmExMgoYzI43Q3ODIyOWM32NjI="
  }
}



On Thu, Aug 20, 2015 at 9:56 AM, Siegman, Tzviya - Hoboken <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:
Ivan beat me to this.

We have not begun to talk about the full ecosystem of publishing, just the finished product. If I want to use Web Publications + Web Annotations for a blind peer review process, there are very strict guidelines about who can/cannot see different pieces (the article, the metadata, the associated data, the annotations).  This too is needed in all states.



Tzviya Siegman
Digital Book Standards & Capabilities Lead
Wiley
201-748-6884<tel:201-748-6884>
tsiegman@wiley.com<mailto:tsiegman@wiley.com>


-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org<mailto:ivan@w3.org>]
Sent: Thursday, August 20, 2015 10:40 AM
To: Leonard Rosenthol
Cc: Siegman, Tzviya - Hoboken; W3C Digital Publishing IG
Subject: Re: Encryption and Signatures (was Re: [DPUB] packaging requirements document)


> On 20 Aug 2015, at 16:31 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>
> I am familiar with the WebCrypto work.  However, that’s simply an implementation possibility and doesn’t directly reflect the actual desire (or lack of same) as to what it means to have an “encrypted online publication” or a “signed online publication”.
>
> And that is what I am asking about - use case(s).
>
> Leonard
>
>

Well… alas!, almost all journal publishers do some sort of an access control over scholarly papers. Today, they hide it behind password firewalls, sometimes they provide you password protected PDF files, etc. I would think that for that community the portable document approach that we have would be viable only if there is an encryption of the content.

A similar, but maybe a better example are legal documents (that need some authentication, ie, signatures) or, say, market reports from consulting companies that sell those reports for good money. I would expect that for all of these any Web Document would have to provide some encryption/signature facility online, just like the portable version.

Ivan


>
>
> On 8/20/15, 10:06 AM, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:
>
>> Hi Leonard,
>>
>> I am not well-enough versed in these areas to know if this is one of the solutions digital publishing will pursue, but there is a Web  Cryptography group focusing on these areas. They have published a draft [1].
>>
>> [1] http://www.w3.org/TR/WebCryptoAPI/

>>
>>
>> Tzviya Siegman
>> Digital Book Standards & Capabilities Lead Wiley
>> 201-748-6884<tel:201-748-6884>
>> tsiegman@wiley.com<mailto:tsiegman@wiley.com>
>>
>>
>> -----Original Message-----
>> From: Leonard Rosenthol [mailto:lrosenth@adobe.com<mailto:lrosenth@adobe.com>]
>> Sent: Thursday, August 20, 2015 9:33 AM
>> To: Ivan Herman
>> Cc: Siegman, Tzviya - Hoboken; W3C Digital Publishing IG
>> Subject: Encryption and Signatures (was Re: [DPUB] packaging
>> requirements document)
>>
>> I really like the improvements - looks great!
>>
>> In re-reading the document, it struck me that the last two requirements - Encryption and Digital Signatures - are being applied to all states, but I am not sure if that’s actually what we are expecting.
>>
>> Certainly, these make sense in the offline state - and they may make sense in the cached state (depending on how caching is implemented by a given RS/UA).  However, I am having trouble understanding how they would be used/applied in the online case.  Can someone give me some use cases for either/both in those cases?
>>
>> Thanks,
>> Leonard
>>
>>
>>
>>
>> On 8/20/15, 5:10 AM, "Ivan Herman" <ivan@w3.org<mailto:ivan@w3.org>> wrote:
>>
>>>
>>>> On 19 Aug 2015, at 17:08 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>>
>>>> On 8/19/15, 10:42 AM, "Ivan Herman" <ivan@w3.org<mailto:ivan@w3.org>> wrote:
>>>>
>>>>
>>>>
>>>>> I was fairly busy with other things today, so I could not spend
>>>>> too much time on this. I have some responses (and possible actions
>>>>> on the documents) below, but I cannot promise to take care of all
>>>>> of them now. To be continued tomorrow, if needed…
>>>>
>>>> No problem - just wanted to make sure we delivered our document in
>>>> a timely manner…
>>>>
>>>>
>>>>>> On 18 Aug 2015, at 18:10 , Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>>>>
>>>>>> – Regardless of the fact that someone at the IETF thinks “archive” is the right term, in the document/publication space it is NOT.  I would strongly recommend that we NOT refer to that document or that terminology.
>>>>>
>>>>> During the discussion on the mailing list we were asked to put a concise definition for a package into the document. (I believe what IETF considered as archive in their exploration for providing a top level media type for packages is actually of a similar goal.) Do you have a beter replacement?
>>>>
>>>> I think “package” is the correct term, not archive.  I have reached out to the IETF to get them to change as well.
>>>
>>> I have added a note that this is not the terminology we use, but the definition itself may still be helpful.
>>>
>>>>
>>>>
>>>>>> - I have problems with this phrase “ This is, however, different from the cached state of a networked publication, which does not have a separate existence (though can also be used offline).”.  There are many ways to cache, some of which are related to browser-based technology and some of which are not.  But all of which constitute the concept of a “cached and offline” document.   How about just removing this.  I don’t think it adds anything, certainly not at this point in the document.
>>>>>>
>>>>>
>>>>> The text (tries to) refer to browser based caches here.
>>>>
>>>> And my point is that it should not do so, because there is no requirement that ONLY browser-based caches be used as part of the process of caching and/or taking a publication offline.  There is also no requirement that the cached state and the portable state be different.  I believe that it is important that this document be agnostic to the specific technology choices and focus on the goals and requirements.
>>>>
>>>>
>>>>> Do you have a better way of formulating this?
>>>>
>>>> I would just remove the sentence entirely as it adds nothing.
>>>>
>>>>
>>>
>>> Having re-read the whole paragraph I think I agree that the sentence may be superfluous there, so let us remove it. But (in your original remark) you also questioned the three bullet items; I claim that the reference to the package is important there, and I think the definitions should stay as they are.
>>>
>>>>
>>>>
>>>>>> Right, this is a bit more complicated. What I think was meant is that the rendering and possibly interactive part of the reading system independent of the state, ie, the change on that is indeed transparent.
>>>>
>>>> Yes, I agree that the content should look/act the same independent of state.  Just say something that like :).
>>>>
>>>>
>>>>>> - The phrase “ It should maintain its integrity over time” isn’t actually something that we, as the file format specification, have any control over. It is more about the media, systems, etc. in which the content is stored.  As such, it should be removed.
>>>>>
>>>>> Hm. If I reboot my machine, the cache will disappear, but a portable document on my disc will remain. I am not sure what the problem is with this.
>>>>
>>>> What you talk about is persistence, not integrity.  Integrity has to do with reliability and robustness, which are more tied to things such as media stability, data validation/checksumming, etc.
>>>>
>>>
>>> I have changed this to persistence.
>>>
>>>> And actually, there is nothing in the requirements that state that the cache goes away on a reboot.  That would be a specific implementation decision.
>>>>
>>>
>>> That is true, I just used that as an example in my response...
>>>
>>>>
>>>>>> - Are there no other requirements for the portable state?  I believe we had some in our existing use case/requirements specs.   If not, I can think of a few that I would add here.
>>>>>>
>>>>>
>>>>> I would very welcome that.
>>>>
>>>> Here are a few…
>>>>
>>>
>>> These are of course all valid use cases. As for the document:
>>>
>>>> - Ability to distribute the publication via non-real-time methods
>>>> ranging from email to sneaker-net
>>>
>>> There is reference to this in the opening paragraph ("Nevertheless,
>>> packages that exist (possibly) apart from the network still have a
>>> role to play as units that can be stored or transferred. This
>>> concept is essential with the current business models that dominate
>>> the publishing industry for, e.g., digital books.") I wonder whether
>>> this warrants a separate section
>>>
>>>> - Ability to read the publication behind a firewall or secured
>>>> network
>>>> - Ability to perform preflight & validation on a stable set of
>>>> content
>>>
>>> Again, these are valid use cases, but are they peculiar to the portable state? After all, online content should be reachable behind the firewall if one has the right access credentials; we would be misunderstood as if we said that a publication can only be consumed as a portable state if within the enterprise...
>>>
>>>
>>>>
>>>>
>>>>>> This is something I will have to think more about. The issue is that the streamability may make something different depending on the state.
>>>>
>>>> Again, I think we have a terminology problem here. What you describe is the ability to get live (or updated) data/content which is a completely different requirement than streamability.  The ability to stream is well defined in the first sentence - “ It must be possible for a client to fetch components of a package in any order, or to fetch multiple components at the same time, without having to read the entire document”.  That has NOTHING to do with being connected or offline - it has to do how the RS is able to access to the content.
>>>>
>>>> If you want to also add a separate requirement around the ability for the package to be able to specify that specific pieces of content (assets) within the package are not necessarily embedded but instead are retrieved live (with optional caching) - I think that would be a welcome requirement.  Or you could just merge this with "Updates new components only”?
>>>>
>>>
>>> Reading it through again, I believe there is indeed a conflation of concepts. I have actually divided up into three different sections:
>>>
>>> - Streaming (following the definition of streaming on wikipedia[1])
>>> - Random access to content
>>> - External (non embedded) references, which is your last example
>>>
>>> I moved some of the use cases around accordingly.
>>>
>>>
>>> [1] https://en.wikipedia.org/wiki/Streaming_media

>>>
>>>
>>>>
>>>>>> - In the Package in a Package section, you have “This is trivially available in online and cached states, but puts an extra requirement on portable states.”  This appears to be a copy/paste from elsewhere, as it doesn’t belong here because it’s simply not true in this case.  Please remove.
>>>>>
>>>>> It is a copy paste indeed, but is it incorrect? (It may be superfluous, though).
>>>>
>>>> Given that we don’t actually know what an “online state”, a “cached state” or even a “portable state” look like from a technical perspective - it is impossible to make any comment on the ability to implement such.
>>>>
>>>
>>> You are right. I removed these types of comments. At some point, a
>>> more careful technical analysis will be needed for each of those
>>> requirements, but this is indeed not the place…
>>>
>>> Thanks!
>>>
>>> Ivan
>>>
>>>>
>>>>>> - The Access to package section also has a similar note about “trivially available” which is also not true, and I would recommend removal as well.
>>>>>>
>>>>>
>>>>> I have re-written that sentence in a way that, I believe, is
>>>>> correct…
>>>>
>>>> I don’t see the change yet, but as with the previous statement - I don’t see how we can make any comments on implementation complexity until we know what is being implemented.
>>>>
>>>>
>>>>> Thanks
>>>>
>>>> Thank you for taking the time to review my comments.
>>>>
>>>>
>>>> Leonard
>>>
>>>
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/

>>> mobile: +31-641044153<tel:%2B31-641044153>
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704

>>>
>>>
>>>
>>>
>>
>> _____________________________________________________________________
>> _ This email has been scanned by the Symantec Email Security.cloud
>> service.
>> For more information please visit http://www.symanteccloud.com

>> _____________________________________________________________________
>> _

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704







--
Shane McCarron
Managing Director, Applied Testing and Technology, Inc.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com

______________________________________________________________________

Received on Thursday, 20 August 2015 18:10:21 UTC