Re: [Glossary] Portable Digital Document's states

[Sorry for coming in late]

I agree with Brady – there is a huge difference between package/unpackaged and online/offline.  The original terms made more sense to me as they focused on whether the content was usable without standard web protocols (aka offline) or not.

Leonard

From: Ivan Herman
Date: Wednesday, September 16, 2015 at 10:23 AM
To: "Brady com>"
Cc: W3C Digital Publishing IG, Ralph Swick
Subject: Re: [Glossary] Portable Digital Document's states
Resent-From: <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Resent-Date: Wed, 16 Sep 2015 14:23:18 +0000


On 16 Sep 2015, at 16:10 , Brady Duga <duga@google.com<mailto:duga@google.com>> wrote:

I am really disagreeing both with the new names and the old definitions (as you posted them). Originally I thought these terms referred to the states of being on my local machine vs out on the network somewhere, hence the names and inclusion of cached state. These have now morphed into packaged state, which seems entirely orthogonal to the old terms. As a user, I care more about knowing that everything I need to read a PDD is available when I get on an airplane than I am about whether all those resources are stored in 1 or more than 1 file, or what protocols are used to access the constituent resources. Maybe I am just confused by casting this as redefining the old terms. Perhaps, in light of our current PDD definition, we are simply discarding this online/offline definition and are also adding definitions for packaging states.

I think I understand what you say:-)

And indeed, without intentionally doing it, I have moved the emphasis of those terms. I am indeed trying to catch the difference between packaging or not, and the online/offline seems to be fairly orthogonal to it. So indeed, we have two set of definitions. However, for our work later, I am not sure whether the offline/online difference is something that would affect any of our technical investigation, so I am not sure it is worth dealing with them. I may be wrong but, indeed, we have two dimensions here.

Let us see what do others think…

Ivan



On Tue, Sep 15, 2015 at 10:11 PM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:

On 15 Sep 2015, at 18:56 , Brady Duga <duga@google.com<mailto:duga@google.com>> wrote:

TextEdit on Yosemite still produces "documents" that are a directory (how it handles images). My point is, if we want to specify a Portable Digital FILE we should call it that (though, I think someone has taken that acronym). And yes, there are reasons why a PDD might want to be in a single file, so much so that it is a requirement to allow it. That is, a PDD that can't be packaged and used in packaged form is not sufficient for our needs. However, there are also uses for unpackaged versions of PDDs, and I am not sure we want to exclude them. It seems odd that if I have a zip file it is a PDD, but the second I unzip it it is no longer a PDD. Is it not portable? Is it not a document? Maybe, maybe not. A Web Document may be multiple files. Or one file may be multiple documents. Or no files. An XML document may be split across multiple files. So there is ample precedent for Document ≠ File.

Brady,

I have the impression that we are violently agreeing, just have not synchronized our terminology…

Per the glossary terms that I use[1], coming out of our long discussion thread, the term "Portable Digital Document" is (or at least should:-) silent on the the physical realization of the set of resources, ie, whether it is split across multiple files or not. If at all, it suggests (maybe mistakenly so) that it *is* spread over files if I look at the specification of a Web Resource (the reference to HTTP and FTP). In other words, a PDD can be spread over files or can be combined into one file and, as you say, both manifestations have precedent and importance. This is what I tried to express with the 'unpacked', resp. 'packed' states.

Do I still misunderstand something?

Cheers

Ivan

[1] https://www.w3.org/dpub/IG/wiki/Glossary





On Tue, Sep 15, 2015 at 7:12 AM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:

On 15 Sep 2015, at 14:53 , Brady Duga <duga@google.com<mailto:duga@google.com>> wrote:



On Mon, Sep 14, 2015 at 9:56 PM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:

On 14 Sep 2015, at 15:43 , Brady Duga <duga@google.com<mailto:duga@google.com>> wrote:

Local and remote instead of packed and unpacked?

Doesn't the same issue arise as with online vs. offline? I can have, on my laptop, the same document side-by-side in a package (say, an EPUB file) as well as part of my directory structure. They would be both 'local', wouldn't they?

And they might both be EPUBs. The requirement to zip the archive didn't always exist, and may not in the future. Traditionally, the concept of a "document" in markup languages did not imply one file. If we are changing that we should *definitely* stop using document in our terms! In fact, knowing if something is in one file or not is difficult. On OS X, there are documents that are shown with a single icon and act to the user as a single file, but are actually stored as a collection of files in a directory. A typical user will think it is one document. On Windows you used to be able to open zip files like folders. To the typical user that one file was multiple documents.

And, to take that example, Apple actually burned their finger a bit with that one: for Pages and friends they abandoned the one file approach for a Pages document in Maverick, but it created too much problems exactly for the reason you quote (it was not clear to many whether it was one file or a file system, and many tools fell on their face) so, in Yosemite, they packaged the files into one file again. (Zip-based, I believe).

I am not really sure where you want to go, I must admit. Do you say that there is no reason to make a difference between a Document being packed into one file (ie, a package) and a Document being spread over files on the file system? We know there is a difference and user agents have to behave differently when they hit one or the other… We did hear at the F2F in NY that there *is* an importance in the publishing world to be able to handle a Document as one single file/entity/unit. The 'packed'/'unpacked' was meant to grasp this difference. (As I said, I am not sure that it is worth keeping the 'Cached' as a separate term.)

Can you explain?

Thanks

Ivan




The use of "one unit" is odd, since presumably you could have a PDD that is in several files, spread across multiple folders. What is the unit it was packed into?

Hm. I must admit that, for me, a packed PDD is in one file. Just like EPUB, in this sense. Do you have an example of a packed PDD spreading across multiple files?

See above, but EPUB might be one such example. It depends if we reinstate the virtual file system stuff in OCF and allow for a file system representation in addition to a ZIP archive.



As for the term User Agent, EPUB intentionally uses Reading Systems to avoid confusion with a browser UA. A Reading System uses composition (has-a) instead of inheritance (is-a) as the relationship to a UA. It may not be quite technically correct, but it makes clear that a RS may be more than a local browser (it includes any polyfills, server components, etc).

It was my mistake to bring in another term in the discussion, my apologies. Brady, is it o.k. if we postpone this discussion (the term is on the list of pending items on the Glossary page) or we push it into a separate thread? I am a little bit afraid of mixing up discussion that would make things very messy.

I am fine not adding a new term, and for our work User Agent makes sense. I was just trying to explain why Reading System exists. I don't think this group should use that term.


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704








----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704








----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

Received on Friday, 18 September 2015 12:36:21 UTC