Re: "Show me the metadata!" :), was Re: Rough sketch for WP from Robin Berjon on 2016-09-26 (public-digipub-ig@w3.org from September 2016)

From: Robin Berjon <robin@berjon.com>
Date: Mon, 26 Sep 2016 12:13:42 -0400
To: "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>, Marcos Caceres <marcos@marcosc.com>, Baldur Bjarnason <baldur@rebus.foundation>, Ivan Herman <ivan@w3.org>
Cc: "Cramer, Dave" <dave.cramer@hbgusa.com>, Michael Smith <mike@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Peter Krautzberger <peter.krautzberger@mathjax.org>
Message-ID: <0ca44afe-7c51-2e83-9c70-a6ec360096a4@berjon.com>

On 26/09/2016 11:44 , Siegman, Tzviya - Hoboken wrote:
> 3. In the scholarly publishing world, the line between content and
> metadata is further blurred. It might be obvious to those of us in
> the world of HTML that the title of an article should be tagged as
> <h1>, but what about the subtitle? How do I tag author names? All of
> this information must be displayed, not just tagged. How do I tag
> this information in a way that makes it searchable in the NIH
> database? This might not sound unique to Digital Publishing and look
> a lot like issues that have plagued bloggers and those who have
> pondered the outline algorithm for years. We welcome those solutions
> and hope to build on them. But, I'd like to outline the kind of
> complexities that we face and would be happy to show sample files in
> a smaller setting. For now, most publishers work with a model that is
> compliant with the JATS tag suite [3]. You'll notice that this is
> XML, which is fine, but for it to work on a website, there has to be
> a transform to something else (HTML, PDF, etc). That something else
> has no standardization. You'll also notice that the <article-meta>
> and the article are separate. This means that some basic information,
> like the title get repeated. That is kind of annoying. Metadata in
> this world also includes rather detailed information such as author
> affiliations. Does this means the affiliation of the author at the
> time of publication? What happens if the author transfers from one
> university to another during the peer review process? Should the
> affiliation change in the article at the time of publication? This
> requires more than just an element or property in HTML. I don't think
> we should attempt to make decisions about this level of granularity,
> but we should make it possible for publishers and authors to do so. I
> would be happy to talk to you about how we deal with this at my
> company (Wiley). Another issue that I suspect is near and dear to the
> hearts of many is how to convey whether an article is open access and
> what type of access is allowed.  Wouldn't you prefer to know about
> that if asked to review an article for one of the evil publishers?

One thing that I have found helpful (for people like Marcos and myself)
when trying to make sense of requirements in the scholarly world is to
think about the manner in which it is handled for W3C standards.

The way the W3C does it nowadays would likely send many scholarly
publishers screaming, but that is where its ancestry lies. We have
titles and subtitles (the latter often marked up the wrong way), authors
and affiliations with some loose conventions to handle changes of
affiliations over geologic^Wstandards time, levels of review, alternate
formats and translations, and an abstract (the content of which is
rarely an abstract as the Director will typically tell you during
transition).

If you start from that and imagine that it is a radical modernisation of
scholarly publishing with a lot more flexibility you're pretty close to
the mark.

To address metadata encoding more specifically, it shouldn't come as a
surprise to some here that I would advocate for schema.org as a
sensible, widely deployed and developer-adopted option. Maybe some of
the work that we've done with Scholarly HTML ought to be applied more
generally (with some scholarly specifics such as using
`hasDigitalDocumentPermission` to mark open access)? Some of the
modelling is a bit indirect (for instance affiliations are indirected
precisely because they are ephemeral) but a lot of it is generic enough.
It can be used in a manifest as JSON-LD, which could be sweet.

-- 
• Robin Berjon - http://berjon.com/ - @robinberjon
• http://science.ai/ — intelligent science publishing
•

Received on Monday, 26 September 2016 16:14:16 UTC