W3C home > Mailing lists > Public > public-digipub-ig@w3.org > October 2014

RE: Metadata Task Force Report Comments and Feedback

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Mon, 27 Oct 2014 18:08:04 +0000
To: "Stein, Ayla" <astein@illinois.edu>, "ivan@w3.org" <ivan@w3.org>, "W3C Digital Publishing IG (public-digipub-ig@w3.org)" <public-digipub-ig@w3.org>
Message-ID: <3825975d95964499ad4b2e30f1eddf25@CO2PR06MB572.namprd06.prod.outlook.com>
Hi, Ayla-

A belated response as I'm getting some ducks in order before TPAC. Copying the DPIG in general as well. Thanks so much for your thoughtful comments!

--Bill Kasdorf

From: Stein, Ayla [mailto:astein@illinois.edu]
Sent: Saturday, October 25, 2014 11:28 PM
To: ivan@w3.org; Bill Kasdorf
Subject: Metadata Task Force Report Comments and Feedback

Hi Bill and Ivan,

I've made some comments about the draft at https://w3c.github.io/dpub-metadata/ in the email text below. I apologize if this isn't the most recent draft and some of my comments/feedback are out of date! I basically started at the top of the document, took direct quotes on what I wanted to comment on. My feedback (as you can see) is in the blue <AS> brackets (a la Bill).

"Granularity: The need to associate metadata with arbitrarily granular units of content, rather than simply at the publication<BK>document</BK> level."
<AS> Having article level metadata instead of metadata just at the publication level is becoming increasingly important for discovery in libraries, especially with the advent of web-scale discovery systems. Our users want a one-click solution and don't want to have to navigate an entire publication to get to their article. Some publishers give us article level metadata and some do not, it varies from publisher to publisher.</AS>
<BK> Your comment made me realize that I should have said "document," not "publication." I was referring to an arbitrarily granular level _within a document_. BTW, I'm surprised to hear you say that you don't get article-level metadata. I bet the article-level metadata virtually always exists; it is fundamental to NLM/JATS/BITS on which most journal publishing is based (and with which CrossRef metadata is closely aligned, which is virtually universal). So somewhere along the line article metadata isn't getting from the publisher to the library, even when they have it for CrossRef and their hosting platform. Hmm! </BK>

"Complexity: The profusion of identifiers and metadata vocabularies is confusing and difficult to master."
<AS>To add to the complexity, we may want to recommend the use of ISNI identifiers as URIs if we're going to recommend/explore the use of ISBN's as URIs. From the same "family" as ISBNs/ISSNs according to the ISNI website: www.isni.org<http://www.isni.org> </AS>
<BK> Definitely. ORCID is also essential. And of course the DOI, which is already recommended always to be expressed as a URI. One thing I think would be a useful outcome would be a clear, simple explanation of the URI and how it is used in the context of identifiers like ISBN, ISSN, ISNI, ORCID, DOI, etc. That's in the "educate publishers" recommendation below. But on that score, we need to be careful not to reinvent things: BISG has just updated its guide to identifiers (https://www.bisg.org/guide-identifiers-0) for example. </BK>

"Specific recommendation: The W3C should collaborate with BISG, the Book Industry Study Group; EDItEUR, the international organization responsible for ONIX, the standard messaging format for dissemination of book supply chain metadata; and schema.org, which provides the most commonly used means of embedding metadata in web content; and other appropriate parties to develop an optimal way for book publishers to embed appropriate and useful metadata in Web content based on the well-known and widely implemented ONIX model".
<AS>A representative from the library world would be good to include here, or someone involved with mapping LCSH<->BISAC to help provide continuity between industries. Both LCSH and BISAC Subject Headings have been made available as linked data vocabularies on id.loc.gov. These are additional resources of which we should definitely take full advantage. Someone from NISO might also be a good choice.</AS>
<BK> This recommendation was specifically directed at how to use the ONIX model (very likely a subset of it) in schema.org. That's supply chain metadata. One aspect of that is subject classifications, and subject classifications are probably most publishers' #1 priority. Your comment highlights an important issue: the need to identify the scheme associated with any given classifications (BISAC, BIC, Thema, LCSH, etc.). One question for you: how much are librarians receiving and using ONIX? We should discuss the scope of this in the meeting on Thursday at TPAC, and knowing more about the use of ONIX by librarians would definitely be relevant.</BK>

"Educate publishers and their partners on how best to use existing features of the OWP. While technical specifications, sometimes supplemented by primers, are already provided on such features by the W3C, these are often targeted at technical users. There is a need for much simpler, more user-friendly documentation aimed at non-technical people within the publishing ecosystem. There is also a need for much more aggressive dissemination of this information throughout the publishing ecosystem to demystify these features of the OWP and encourage their broad and proper use both by creators and recipients of metadata. Specific recommendation: Focus initially on encouraging proper understanding and use of URIs and of RDF/RDFa."
<AS>Yes, and push Schema.org along with it!</AS><BK> Good point! </BK>

"ISBNs-the primary product identifier for books-can also be expressed as a URI. Better understanding of the URI and how to use it would be an important step in the improvement of metadata implementation within Web content and systems."
<AS> I mentioned it earlier, but I'll mention it again: we should look into how ISNIs can be used in this context as well, since they can also be expressed as URIs. http://www.isni.org/how-isni-works</AS>. <BK> Definitely. This recommendation was not about any particular identifier as a URI but how to express identifiers plural (ISBN, ISNI, DOI, etc.) as URIs. But your point is a good one: we should provide examples of the benefits of doing that, and ISNI, DOI, and ORCID are particularly useful in that regard; the latter two have particularly robust systems to DO something with the link! </BK>

"[More recommendations to follow from the TF. E.g., I think we should definitely make some sort of recommendation with regards to rights metadata. To be discussed.]"
<AS> I'll take this opportunity to suggest encouraging the use of Schema.org, since it does have terms for intellectual property information. In talking to Tim Cole (giving credit where credit is due!) about this, it came out that there is some reticence about using Schema.org as a replacement(ish) for ONIX, since ONIX is far more detailed and includes elements/info that Schema.org doesn't. However, if publishers are mainly concerned about marketing/making their products discoverable, Schema.org is a more logical way to go rather than a new version of ONIX, since no one outside the publishing community uses it, a la the way no one outside the library community uses MARC. At the very least, Schema.org could be used for descriptive/rights metadata, and anything else that wouldn't pertain to consumers could be in ONIX or otherwise "hidden". I've been involved in a project with Tim C. and several others at University of Illinois Urbana-Champaign that involves transforming our catalog records from MARC to MODS and also marking them up with Schema.org semantics. While not implemented yet, we've done enough mapping to know that Schema.org is completely capable of providing the essential information (including item level holdings and their locations) </AS>.<BK> Yes, exactly, this is what that first recommendation is all about. </BK>

Publishing Terms and Concepts - <AS> In the combined document/final recommendations/etc., I think this section should be much closer to the top.

"Mr. Kasdorf took a "horizontal" approach, interviewing experts from diverse types of publishing (book, journal, magazine, and news) and representing diverse roles within the digital publishing ecosystem (publishers, metadata service providers, consultants, and representatives from other organizations that are addressing the issue of metadata in publishing)."

<AS> An interesting follow up report (which might be out of scope, but I thought I'd mention it anyway) would be to interview university presses and libraries who have started publication service, which would probably dovetail a lot with scholarly publishers (if they're not already included) but there have also recently been public libraries who have started publishing services for their patrons (more along the lines of supporting/facilitating self-publishing) but also acting as publishers themselves: http://lj.libraryjournal.com/2014/03/publishing/the-public-library-as-publisher/#_  </AS>.

"It has also led to the development of other standards-such as ORCID, the Open Researcher and Contributor ID, and FundRef,"

<AS> This may be a nitpicky comment, but to clarify that ORCID is an acronym of the Open Researcher and Contributor ID, I would flip the two around and use something like: "...other standards such as the Open Researcher and Contributor ID (ORCID), and FundRef..."</AS>.<BK> Good copyediting! ;-) You're exactly right, otherwise folks might think we're talking about three standards, not two. </BK>

I hope these comments help/make sense! Please let me know if you need me to clarify or re-address anything.

<BK> Thanks much for your comments. Will you be attending or calling in to our meeting at TPAC on Thursday? We will mainly be focusing on refining these recommendations. </BK>


Ayla Stein

Ayla Stein
Metadata Librarian
Assistant Professor, University Library
220 Main Library
University of Illinois at Urbana-Champaign
1408 W. Gregory Drive (MC-522)
Urbana, Illinois 61801
(217) 300-2958
Received on Monday, 27 October 2014 18:08:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:35:52 UTC