- From: Bill McCoy <bmccoy@idpf.org>
- Date: Tue, 8 Sep 2015 16:30:52 -0700
- To: Deborah Kaplan <dkaplan@safaribooksonline.com>
- Cc: Liam Quin <liam@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
- Message-ID: <CADMjS0Ze+CVeh49Pc=CkWOFfrRKQ+b2F+92=kNfdXBtCzsmDFQ@mail.gmail.com>
Deborah I'm sorry I didn't note your point earlier in the thread (that's a mea culpa not a you-a culpa). I don't necessarily think all of us would agree on that definition of "curated" but since you also are OK with a different term I think it's a moot point. Re: my git example, you are absolutely right, under some circumstances a git repo could be considered a document as well, such as where the repo consists of a dataset of some kind, but in some ways that would be like a document consisting of all variations and revisions of every edition of Huckleberry Finn, including all errata etc., all combined into a unified whole... it can be imagined, but AFAIK has never existed. So it's a very special case, whereas the far more common case I was getting at is a single instantiation of a particular edition. To me that is all a git snapshot is - a particular instantiation, concretely and uniquely defined by a single set of SHA checksums. And if that snapshot is of software, and is considered a "release" it will be tested and verified to work in the whole. Or if the snapshot is a publication, it will similarly be verified as a unit. Whereas the git repo consists of all possible versions of all resources, many of which won't be intended or even able to work together. I.e. to me it's only a document if all its parts, in specific instantiations, work together, thus the document itself is a specific instantiation. If we said that a portable document had to be verifiable as a set of SHA checksums I would be happy (packaging into a single PDF or ZIP archive being a cheat to avoid needing the checksums). In Leonard's case of external video file, it still has a unique checksum (unless it's generated on the fly by a webcam in which case the referencing document has a ding against it's "portability" attribute). In Leonard's case of font referenced only by name, if there is no checksum for the specific instance of the font, then as well it's a ding on portability (one which, especially for non-Latin scripts, may have dire consequences for intelligibility of the contents). --Bill On Tue, Sep 8, 2015 at 3:12 PM, Deborah Kaplan < dkaplan@safaribooksonline.com> wrote: > Olaf Drümmer wrote: > > > Nonetheless I would keep curation out of the text for the definitions, > and condense it into 'intended'. Joseph Beuys (German artist) once put a > pile of grease somewhere and intended it to be a > > work of art (not sure how much curation went on while he was doing it, > at least it didn't turn into cheese). Some cleaning person did not get the > message and… Anyway: that pile of grease would > > have to be considered a document, its portability only limited by > climate/temperature ;-). If Beuys had incidentally dropped a same shaped > and same sized pile of grease, it would not have been > > a document. > > I am comfortable changing the term; "curated" has a jargon meaning in > museums, libraries, and archives, and outside of that environment may have > different connotations. > > >Bill McCoy <bmccoy@idpf.org> said: > > > A computer program to me can validly produce anything we consider a > "Portable Web Document". For example a realization of my monthly bank > statement will be a document, but it is not curated by a human. > > Far up this now lengthy thread (mea culpa!) I discussed how curation by > computer is very much a form of curation. Humans with intent created the > tool which generated the monthly bank statement. The bank statement itself > it simply a serialized view of some cells in your bank's data tables, but > the choice to create that *specific* view of those cells -- and your choice > to have your bank generate the PDF or paper, instead of quietly trusting > Quicken to make some background transactions while it updates its own local > database -- is what creates a document. > > (As Olaf has also said, much more succinctly.) > > Bill McCoy <bmccoy@idpf.org> said: > > > If an online calendar is simply a UX over a database then I don't > consider it a "document" (whether or not the calendar entries have been > curated). But if the calendar system can produce a PDF representation of > the calendar, that would be a portable document (but not a "portable web > document"). > > > > Similarly if you search on Google for "influenza" the results on the > left (the search results) are in no way a "web document" (IMO), the sidebar > on the right (with navigation via tabs) could be considered a "web > document" but is not a "portable web document" - and whether it's truly a > web document could be debated. The PDF that is generated is certainly a > portable document (but not a portable "web" document, as I understand that > term). But whether the content of the sidebar was in the first place > human-curated or machine generated via semantic processing to me is not > decisive as to whether it should be considered a "web document", and > certainly not as to whether the PDF should be considered a "portable > document". In fact I don't know the answer. So thus "document-ness", at > least to me, has nothing directly to do with human curation. > > [and then in a second email] > > > Could an entire git repository a document (in the sense we mean for this > activity)? I don't think so. Could a particular snapshot (e.g. current > mainline or a named release) of a git repository > > From an information science POV, an entire git repository -- or a > calendar, or a collection of search results, or a search algorithm -- can > absolutely be documents. The dependency is not whether they can be turned > into a PDF or and HTML representation: digital paper, as it were -- just as > a text with embedded video can be a document, or tablet-based interactive > picturebooks. The dependency is whether the object as it stands is being > treated as a document. Places where this has real digital publication > ramifications in the academy include: > > - In digital theses and dissertations, when a student is required to > deposit the documents of his doctoral work in an electronic thesis and > dissertation database as a graduation requirement -- and the documents are > composed of software products, chemical formulae, or datasets. > > - In an archives, when a scholar deposits her life's research, including > her academic papers, her patented algorithm, several boxes of papers and > ephemera, petabytes of data, the export of her Microsoft Outlook mailbox, > and her award-winning website with interactive visualizations of her > findings. The author writing about that scholar's life work interacts with > each of these items in the archives, described and catalogued as a > document, and analyzes each one critically as a complete document. > > - In a records management department, in an era where paper or even PDF > rules and regulations have given way to micro-updates of websites, so the > recordkeepers must record snapshots of entire web heirarchies as the > documents recording the institution's history, later to be published in an > online index for the board of directors. > > What makes each of these a "document" is that humans need to understand > each as a concrete whole. It's not the technology of curation that matters > -- indeed, in the third example, an automated spider run by the internet > archive does the trick. It's the choice to view the parts as a "document" > -- to view a dynamic website as a procedures manual, to view a running > computer program as a dissertation. > > Suzanne Briet was the French scholar who came up with the lovely, > evocative antelope example: > > "An antelope running wild on the plains of Africa should not be considered > a document... But if it were to be captured, taken to a zoo and made an > object of study, it has been made into a document. It has become physical > evidence being used by those who study it. Indeed, scholarly articles > written about the antelope are secondary documents, since the antelope > itself is the primary document." > > Deborah > > -- Bill McCoy Executive Director International Digital Publishing Forum (IDPF) email: bmccoy@idpf.org mobile: +1 206 353 0233
Received on Tuesday, 8 September 2015 23:31:21 UTC