Re: [Glossary] Definition of a portable document (and other things...)

From: Deborah Kaplan <dkaplan@safaribooksonline.com>
Date: Tue, 8 Sep 2015 18:12:35 -0400
Message-ID: <CANSiVPaZ0z9LtkhiLdnvooqDNa5+1kDLm9Fg+hQ41tB6dQFb9w@mail.gmail.com>
To: Bill McCoy <bmccoy@idpf.org>
Cc: Liam Quin <liam@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Olaf Drümmer wrote:

> Nonetheless I would keep curation out of the text for the definitions,
and condense it into 'intended'. Joseph Beuys (German artist) once put a
pile of grease somewhere and intended it to be a
> work of art (not sure how much curation went on while he was doing it, at
least it didn't turn into cheese). Some cleaning person did not get the
message and… Anyway: that pile of grease would
> have to be considered a document, its portability only limited by
climate/temperature ;-). If Beuys had incidentally dropped a same shaped
and same sized pile of grease, it would not have been
> a document.

I am comfortable changing the term; "curated" has a jargon meaning in
museums, libraries, and archives, and outside of that environment may have
different connotations.

>Bill McCoy <bmccoy@idpf.org> said:

>    A computer program to me can validly produce anything we consider a
"Portable Web Document". For example a realization of my monthly bank
statement will be a document, but it is not curated by a human.

Far up this now lengthy thread (mea culpa!) I discussed how curation by
computer is very much a form of curation. Humans with intent created the
tool which generated the monthly bank statement. The bank statement itself
it simply a serialized view of some cells in your bank's data tables, but
the choice to create that *specific* view of those cells -- and your choice
to have your bank generate the PDF or paper, instead of quietly trusting
Quicken to make some background transactions while it updates its own local
database -- is what creates a document.

(As Olaf has also said, much more succinctly.)

Bill McCoy <bmccoy@idpf.org> said:

>    If an online calendar is simply a UX over a database then I don't
consider it a "document" (whether or not the calendar entries have been
curated). But if the calendar system can produce a PDF representation of
the calendar, that would be a portable document (but not a "portable web
>    Similarly if you search on Google for "influenza" the results on the
left (the search results) are in no way a "web document" (IMO), the sidebar
on the right (with navigation via tabs) could be considered a "web
document" but is not a "portable web document" - and whether it's truly a
web document could be debated. The PDF that is generated is certainly a
portable document (but not a portable "web" document, as I understand that
term). But whether the content of the sidebar was in the first place
human-curated or machine generated via semantic processing to me is not
decisive as to whether it should be considered a "web document", and
certainly not as to whether the PDF should be considered a "portable
document". In fact I don't know the answer. So thus "document-ness", at
least to me, has nothing directly to do with human curation.

[and then in a second email]

> Could an entire git repository a document (in the sense we mean for this
activity)? I don't think so. Could a particular snapshot (e.g. current
mainline or a named release) of a git repository

>From an information science POV, an entire git repository -- or a calendar,
or a collection of search results, or a search algorithm -- can absolutely
be documents.  The dependency is not whether they can be turned into a PDF
or and HTML representation: digital paper, as it were -- just as a text
with embedded video can be a document, or tablet-based interactive
picturebooks. The dependency is whether the object as it stands is being
treated as a document.  Places where this has real digital publication
ramifications in the academy include:

- In digital theses and dissertations, when a student is required to
deposit the documents of his doctoral work in an electronic thesis and
dissertation database as a graduation requirement -- and the documents are
composed of software products, chemical formulae, or datasets.

- In an archives, when a scholar deposits her life's research, including
her academic papers, her patented algorithm, several boxes of papers and
ephemera, petabytes of data, the export of her Microsoft Outlook mailbox,
and her award-winning website with interactive visualizations of her
findings. The author writing about that scholar's life work interacts with
each of these items in the archives, described and catalogued as a
document, and analyzes each one critically as a complete document.

- In a records management department, in an era where paper or even PDF
rules and regulations have given way to micro-updates of websites, so the
recordkeepers must record snapshots of entire web heirarchies as the
documents recording the institution's history, later to be published in an
online index for the board of directors.

What makes each of these a "document" is that humans need to understand
each as a concrete whole. It's not the technology of curation that matters
-- indeed, in the third example, an automated spider run by the internet
archive does the trick.  It's the choice to view the parts as a "document"
-- to view a dynamic website as a procedures manual, to view a running
computer program as a dissertation.

Suzanne Briet was the French scholar who came up with the lovely, evocative
antelope example:

"An antelope running wild on the plains of Africa should not be considered
a document... But if it were to be captured, taken to a zoo and made an
object of study, it has been made into a document. It has become physical
evidence being used by those who study it. Indeed, scholarly articles
written about the antelope are secondary documents, since the antelope
itself is the primary document."

