[dpub] use case(s) for scholarly publishing from Siegman, Tzviya - Hoboken on 2016-03-18 (public-digipub-ig@w3.org from March 2016)

From: Siegman, Tzviya - Hoboken <tsiegman@wiley.com>
Date: Fri, 18 Mar 2016 17:36:51 +0000
To: "DPUB mailing list (public-digipub-ig@w3.org)" <public-digipub-ig@w3.org>
Message-ID: <2a955047d9f34c5ca6c9ddda3e4f6293@CAR-WNMBP-006.wiley.com>

Hello DPUB,

Ivan has put together a set of use cases from the perspective of scholarly publishing. The two of us had some back and forth about the issues and thought we'd bring the discussion to the list.

Some overarching questions for Romain and Heather:

1. Would you prefer that we leave this as one long and complex use case or break it into the dozen or so simple use cases that are reflected in the requirements section at the end?

2. Would you prefer that we keep the focus on scholarly publishing, even when the examples extend far beyond scholarly so that we demonstrate the real-world need?

[[[

# Scientific publication use case/requirements

By scientific publication we mean scholarly communications, or collections thereof (i.e., proceedings, journal volumes, etc) on the Web. Although related, scientific/STEM books is a different category and is not particularly focused on in this sequel.

## Important aspects

### Identity

Having a stable, _unique_ identity is a non-negotiable necessity for a scientific publication. This identity _MUST_ be independent of the state (offline, online, etc), but also of the format (printed, online in HTML, PDF, Word, whatever). Although the existing identifier schemes (eg, DOI) typically have a mapping onto a locator (typically a Web page), their primary role is to server as unique identification, and their role as locators on the Web is mostly incidental and indirect.

### Metadata

Just as identity, a number of metadata items are essential for scientific publications. It is essential for that metadata to abide to the requirements of specific vocabularies, identity of authors (e.g., possibility to use ORCID), publications, etc. The metadata _structure_ must be as open as possible, giving possibilities to easily connect to various external databases and vocabularies.

Metadata should also be _searchable_. Finding the metadata, and include the information therein in the specialized database of various search engines, is a way to ensure that a scientific publication is known or not. In practice, this means that the metadata should not be hidden in an archive (that search engines rarely consume) but should be available on the open Web.

### Dynamic content
TS: note effect on archive format and definition of scientific record

IH: And it indeed touches on archival issues of, say, programs, but I am not sure that is relevant for our use cases. WDYT?

The time when a scientific publication was, essentially, equal to a printed, paper article (or its reproduction in PDF) is becoming obsolete. Scientific publications may include a collection of different resources of different format, of which a textual content may only be one. Scientific publication today may include, or indeed _be_, a data set, a scientific software, video, audio, etc. The future is an interrelated collection of resources that, _together_, form a logical unit, i.e., _is_ the scientific publication itself.

Note that a scientific publication may also include very dynamic content, e.g., a program/script that produces interactive visualization of data, that can be run by the reader to show some algorithms working, etc.

### Consuming scientific publication

In practice, readers, users, etc., may want to, say, read a scientific publication under very different circumstances. It is a very widespread usage pattern for a scientist to read such publication while commuting, i.e., being offline, using different devices (different aspect ratio, screen sizes, etc). Having some sort of a bookmarking to ensure smooth moving among devices is important.

This also means that the scientific publication should, as much as possible, adaptable to the reading/consuming environment in terms of text size, some fundamental rendering aspects (one column or two, color usage, etc.). The fact whether a specific software or hardware can perform certain dynamic features (eg, execute a complex program) should also be taken into account; it should be possible to add to the communication's metadata/manifest which content is essential for the faithful rendering of the content and which one can have some fallbacks.

TS: This is really about offlinification and personalization

Yes. But it also shows that 'offlinification' may be more than just simply have all the resources (videos, audio, etc) around, it is a more complex issue. The same with personalization

### Annotation

Annotation is another essential feature for scientific publications. Annotations come in many forms, from simple highlighting of a sentence in the text, to complex (and possibly highly formatted) attached content, containing mathematics, drawings, etc.

Annotations play a role at various points in the publication usage; the most typical are, on the one hand, the peer-review system playing an essential part in the publication process and, on the other hand, annotations on the final, published communication. Mainly the latter (but, increasingly, the former, too) kind of annotations are often made public, too. This means that annotations, created by a reader offline, should not only migrate as part of the publication itself when getting online but, increasingly, should also be stored automatically in public annotation servers automatically.

### Hierarchy of communications

Proceedings, article collections, journals have lots of similarities to "simple" communications (need of identity, metadata, etc), but have the additional feature of being a collection of communications that have all their own identity, too.

## Derived requirements (draft)

* Separation of identity from locators
* Possibility of using, possibly, external metadata
* Metadata/manifest structure should be easily extensible and adaptable
* Smooth transfer between different states (offline, online, etc)
* Identification of a 'bookmark' in a state independent way
* Identification of what is an 'essential' content and what is not
* Inclusion of many different media
* Publications that may contain, practically, no textual information (e.g., scientific software as scientific publication)
* Possibility for flexible annotations, not necessarily _included_ in the communication itself
* Easy self adaptation on reading environment, user styling
* Hierarchical view of portable publications?

]]]

Tzviya Siegman
Digital Book Standards & Capabilities Lead
Wiley
201-748-6884
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

Received on Friday, 18 March 2016 17:37:20 UTC