Re: [Glossary] Definition of a portable document (and other things...)

I would perfectly fine with the requirement (somewhere) about the “checksum on the data” - or basically finding a way to say that the document is referencing (by URI) a VERY SPECIFIC external resource.  So the video, the font, etc. would be great examples of that

Leonard

From: Bill McCoy
Date: Tuesday, September 8, 2015 at 7:30 PM
To: Deborah Kaplan
Cc: Liam Quin, W3C Digital Publishing IG
Subject: Re: [Glossary] Definition of a portable document (and other things...)
Resent-From: <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Resent-Date: Tuesday, September 8, 2015 at 7:31 PM

Deborah

I'm sorry I didn't note your point earlier in the thread (that's a mea culpa not a you-a culpa). I don't necessarily think all of us would agree on that definition of "curated" but since you also are OK with a different term I think it's a moot point.

Re: my git example, you are absolutely right, under some circumstances a git repo could be considered a document as well, such as where the repo consists of a dataset of some kind, but in some ways that would be like a document consisting of all variations and revisions of every edition of Huckleberry Finn, including all errata etc., all combined into a unified whole... it can be imagined, but AFAIK has never existed. So it's a very special case, whereas the far more common case I was getting at is a single instantiation of a particular edition. To me that is all a git snapshot is - a particular instantiation, concretely and uniquely defined by a single set of SHA checksums. And if that snapshot is of software, and is considered a "release" it will be tested and verified to work in the whole. Or if the snapshot is a publication, it will similarly be verified as a unit. Whereas the git repo consists of all possible versions of all resources, many of which won't be intended or even able to work together. I.e. to me it's only a document if all its parts, in specific instantiations, work together, thus the document itself is a specific instantiation.

If we said that a portable document had to be verifiable as a set of SHA checksums I would be happy (packaging into a single PDF or ZIP archive being a cheat to avoid needing the checksums). In Leonard's case of external video file, it still has a unique checksum (unless it's generated on the fly by a webcam in which case the referencing document has a ding against it's "portability" attribute).  In Leonard's case of font referenced only by name, if there is no checksum for the specific instance of the font, then as well it's a ding on portability (one which, especially for non-Latin scripts, may have dire consequences for intelligibility of the contents).

--Bill


On Tue, Sep 8, 2015 at 3:12 PM, Deborah Kaplan <dkaplan@safaribooksonline.com<mailto:dkaplan@safaribooksonline.com>> wrote:
Olaf Drümmer wrote:

> Nonetheless I would keep curation out of the text for the definitions, and condense it into 'intended'. Joseph Beuys (German artist) once put a pile of grease somewhere and intended it to be a
> work of art (not sure how much curation went on while he was doing it, at least it didn't turn into cheese). Some cleaning person did not get the message and… Anyway: that pile of grease would
> have to be considered a document, its portability only limited by climate/temperature ;-). If Beuys had incidentally dropped a same shaped and same sized pile of grease, it would not have been
> a document.

I am comfortable changing the term; "curated" has a jargon meaning in museums, libraries, and archives, and outside of that environment may have different connotations.

>Bill McCoy <bmccoy@idpf.org<mailto:bmccoy@idpf.org>> said:

>    A computer program to me can validly produce anything we consider a "Portable Web Document". For example a realization of my monthly bank statement will be a document, but it is not curated by a human.

Far up this now lengthy thread (mea culpa!) I discussed how curation by computer is very much a form of curation. Humans with intent created the tool which generated the monthly bank statement. The bank statement itself it simply a serialized view of some cells in your bank's data tables, but the choice to create that *specific* view of those cells -- and your choice to have your bank generate the PDF or paper, instead of quietly trusting Quicken to make some background transactions while it updates its own local database -- is what creates a document.

(As Olaf has also said, much more succinctly.)

Bill McCoy <bmccoy@idpf.org<mailto:bmccoy@idpf.org>> said:

>    If an online calendar is simply a UX over a database then I don't consider it a "document" (whether or not the calendar entries have been curated). But if the calendar system can produce a PDF representation of the calendar, that would be a portable document (but not a "portable web document").
>
>    Similarly if you search on Google for "influenza" the results on the left (the search results) are in no way a "web document" (IMO), the sidebar on the right (with navigation via tabs) could be considered a "web document" but is not a "portable web document" - and whether it's truly a web document could be debated. The PDF that is generated is certainly a portable document (but not a portable "web" document, as I understand that term). But whether the content of the sidebar was in the first place human-curated or machine generated via semantic processing to me is not decisive as to whether it should be considered a "web document", and certainly not as to whether the PDF should be considered a "portable document". In fact I don't know the answer. So thus "document-ness", at least to me, has nothing directly to do with human curation.

[and then in a second email]

> Could an entire git repository a document (in the sense we mean for this activity)? I don't think so. Could a particular snapshot (e.g. current mainline or a named release) of a git repository

From an information science POV, an entire git repository -- or a calendar, or a collection of search results, or a search algorithm -- can absolutely be documents.  The dependency is not whether they can be turned into a PDF or and HTML representation: digital paper, as it were -- just as a text with embedded video can be a document, or tablet-based interactive picturebooks. The dependency is whether the object as it stands is being treated as a document.  Places where this has real digital publication ramifications in the academy include:

- In digital theses and dissertations, when a student is required to deposit the documents of his doctoral work in an electronic thesis and dissertation database as a graduation requirement -- and the documents are composed of software products, chemical formulae, or datasets.

- In an archives, when a scholar deposits her life's research, including her academic papers, her patented algorithm, several boxes of papers and ephemera, petabytes of data, the export of her Microsoft Outlook mailbox, and her award-winning website with interactive visualizations of her findings. The author writing about that scholar's life work interacts with each of these items in the archives, described and catalogued as a document, and analyzes each one critically as a complete document.

- In a records management department, in an era where paper or even PDF rules and regulations have given way to micro-updates of websites, so the recordkeepers must record snapshots of entire web heirarchies as the documents recording the institution's history, later to be published in an online index for the board of directors.

What makes each of these a "document" is that humans need to understand each as a concrete whole. It's not the technology of curation that matters -- indeed, in the third example, an automated spider run by the internet archive does the trick.  It's the choice to view the parts as a "document" -- to view a dynamic website as a procedures manual, to view a running computer program as a dissertation.

Suzanne Briet was the French scholar who came up with the lovely, evocative antelope example:

"An antelope running wild on the plains of Africa should not be considered a document... But if it were to be captured, taken to a zoo and made an object of study, it has been made into a document. It has become physical evidence being used by those who study it. Indeed, scholarly articles written about the antelope are secondary documents, since the antelope itself is the primary document."

Deborah




--

Bill McCoy
Executive Director
International Digital Publishing Forum (IDPF)
email: bmccoy@idpf.org<mailto:bmccoy@idpf.org>
mobile: +1 206 353 0233

Received on Wednesday, 9 September 2015 00:23:18 UTC