RE: use case clarification - cross format annotations from Bill Kasdorf on 2014-12-11 (public-annotation@w3.org from December 2014)

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Thu, 11 Dec 2014 20:33:17 +0000
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: Robert Bolick <robert.bolick@gmail.com>, Carol Anne Meyer <cmeyer@crossref.org>, Web Annotation <public-annotation@w3.org>, "Dawson, Laura (Laura.Dawson@bowker.com)" <Laura.Dawson@bowker.com>
Message-ID: <CO2PR06MB57279EDA6EBC459C02494D7DF630@CO2PR06MB572.namprd06.prod.outlook.com>
Thanks, this is a really nice clear explanation and example.

My question, though, is _who_ does this? Where do these expressions "live" in the ecosystem and the workflow?


-----Original Message-----
From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of Stian Soiland-Reyes
Sent: Thursday, December 11, 2014 11:58 AM
To: Bill Kasdorf
Cc: Robert Bolick; Carol Anne Meyer; Web Annotation; Dawson, Laura (Laura.Dawson@bowker.com)
Subject: Re: use case clarification - cross format annotations

Sometimes you don't have a common DOI - but you still know there is a common "abstract thing" the two formats share.

You can use existing PROV and DC mechanisms to relate different formats of "the same thing".


http://www.w3.org/TR/prov-o/#alternateOf is exactly for this purpose (e.g. relate PDF to HTML sibling) - where http://www.w3.org/TR/prov-o/#specializationOf can be used to relate a DOI and HTML.

    <http://example.com/paper54.html> prov:specializationOf <http://dx.doi.org/10.1234/p54> .
    <http://example.com/paper54.pdf> prov:specializationOf <http://dx.doi.org/10.1234/p54> .
    <http://example.com/paper54.pdf> prov:alternateOf <http://example.com/paper54.html> .

http://purl.org/dc/terms/isFormatOf can be used similarly:

" A related resource that is substantially the same as the described resource, but in another format."

    <http://example.com/paper54.pdf> dcterms:isFormatOf <http://example.com/paper54.html> .


The not-quite-inverse http://purl.org/dc/terms/hasFormat on the other hand implies that the subject pre-existed, e.g. one converted to another.
Here the PDF format was made from the HTML format:

    <http://example.com/paper54.html> dcterms:hasFormat <http://example.com/paper54.pdf> .


But if you really want to also state such a conversion provenance I believe it's better to use prov:alternateOf in addition to a derivation relation like pav:importedFrom http://purl.org/pav/html#http://purl.org/pav/importedFrom :

    <http://example.com/paper54.pdf>  pav:importedFrom <http://example.com/paper54.html>

(This is not usually true as a publisher will commonly generate both HTML and PDF from a common, internal XML format - meaning you are left with just alternateOf and possibly specializationOf the DOI)



On 9 December 2014 at 18:29, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
> Excellent. The STM and standards communities are obvious beneficiaries 
> of the Annotations work and would likely be early adopters compared to 
> others within the broader publishing space.
>
>
>
> From: Robert Bolick [mailto:robert.bolick@gmail.com]
> Sent: Tuesday, December 09, 2014 1:25 PM
> To: Bill Kasdorf
> Cc: Carol Anne Meyer; Web Annotation; Dawson, Laura
> (Laura.Dawson@bowker.com)
> Subject: Re: use case clarification - cross format annotations
>
>
>
> And standards developing and publishing bodies are collaborating with 
> each other and CrossRef as I write, Bill. I'm hopeful that this will 
> facilitate earlier adoption of this group's deliveries!
>
> Sent from my iPhone
>
>
> On 9 Dec 2014, at 15:40, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
>
> Thanks, Carol!
>
>
>
> Subsequent discussion on the list—much of which has gone beyond my 
> technical knowledge—has focused on the ability of server-side 
> processing via APIs to resolve some of these ambiguities. Which I was 
> delighted to find: I was worried that the fact that a DOI could not be 
> relied upon to resolve directly to one and only one version and 
> instance of a document, and that what it resolves to can change over 
> time, would make it impossible to use the way the Open Annotations WG 
> needs annotations to work. Not only is that not necessarily the case, 
> Paolo and others are actually using DOIs now for this purpose, with mechanisms to deal with those potential ambiguities.
>
>
>
> Your clarifications are all helpful and probably quite relevant in 
> that context, so I'm copying the WG.
>
>
>
> One other point: although I realized that there are CrossRef DOIs for 
> datasets (thanks for the details, and reminding me how much that is in 
> fact done), my main point was that _different metadata_ is associated 
> with a CrossRef DOI and a DataCite DOI. Is that correct? Or when there 
> is a CrossRef DOI associated with a dataset, is the metadata the same 
> as if it had a DataCite DOI? (BTW I also knew CrossRef and DataCite 
> are
> collaborating: kudos for that, of course! Ditto for ORCID and ISNI, 
> though "talking" rather than "collaborating" may more accurately 
> reflect the status of that. I think ISNI is going to be essential for 
> organizational identification, as a complement to ORCID for 
> contributors.)
>
>
>
> --Bill
>
>
>
> From: Carol Anne Meyer [mailto:cmeyer@crossref.org]
> Sent: Monday, December 08, 2014 11:32 PM
> To: Bill Kasdorf
> Cc: Dawson, Laura (Laura.Dawson@bowker.com)
> Subject: Re: FW: use case clarification - cross format annotations
>
>
>
> Hi Bill,
>
>
>
> Thanks very much for sharing this.
>
>
>
> Yep, you got it right--just a few notes to elaborate below:
>
>
>
> On Thu, Dec 4, 2014 at 3:45 AM, Bill Kasdorf 
> <bkasdorf@apexcovantage.com>
> wrote:
>
> Hi, Laura and Carol—
>
>
>
> I don't think you two get the W3C OAWG e-mails, and I wanted you to 
> see what I just sent. You both may have comments or corrections to 
> what I wrote. Hope I didn't misrepresent anything from your point of view!
>
>
>
> --Bill
>
>
>
> From: Bill Kasdorf
> Sent: Thursday, December 04, 2014 3:31 AM
> To: Paolo Ciccarese; Frederick HIrsch
> Cc: W3C Public Annotation List
> Subject: RE: use case clarification - cross format annotations
>
>
>
> I just want to reinforce the importance of this issue. In fact from a 
> use case POV I think there are two issues:
>
>
>
> --The same document referenced by multiple URIs.
>
> This CAN but is not always handled by CrossRef with Multiple 
> Resolution, but only when the documents with the different URIs are 
> the same versione--typically the version of record. In this case, one 
> DOI has more than one URI associated with it. The service provides a 
> user-choice popup. A specific URI can be accessed with a CrossRef DOI 
> and parameter to by-pass the multiple resolution interface.
>
>
>
> --Synchronizing annotations to multiple formats of the same document 
> (that is the same _version_ of the same document . . . which implies 
> what I would consider a third use case, versions, which we are already addressing).
>
>
>
> And I also want to highlight this comment from Paolo:
>
>
>
>> When tools like Domeo and Annotopia see a document, the first thing 
>> they do is capture available IDs. Domeo looks up for DOIs, PMIDs, 
>> PMCIDs, PIIs and so on. When sending the annotation to Annotopia, the 
>> bibliographic data are sent as description of the target document. 
>> This is done by reusing existing vocabularies/ontologies.
>
>
>
> This is really essential for people to understand. I know many in this 
> community are skeptical of IDs like the DOI that require 
> implementation of support systems around them. But in the real world 
> ;-) this is how this works.
>
>
>
> The way to think of it is this: identifiers are proxies for metadata.
>
>
>
> The systems associated with these IDs provide documented 
> specifications for what _their_ metadata includes. And they usually 
> also provide APIs for the retrieval of their stored metadata based on 
> the identifier. So btw when Paolo refers to DOIs for a scientific or 
> scholarly paper, he really means a "CrossRef DOI." A data set 
> associated with that paper would have a different DOI (a "DataCite 
> DOI") which would have entirely different metadata associated with it.
>
>
>
> So this is very close to being true; just to be precise, data sets 
> associated with papers can have "CrossRef DOIs"; The difference for 
> CrossRef is really the community. If the publisher is hosting or 
> maintaining the data, it may be easier for them to add dataset DOIs at 
> CrossRef. And several significant databases have been assigning data 
> set DOIs through CrossRef for years. An example is the Protein Data 
> Bank. Another is the Organization for Economic and Cooperative 
> Development (OECD). In fact there are almost a million data sets from 
> 1100 databases with CrossRef DOIs. There are about 5 million DOIs assigned to data sets at DataCite.
>
>
>
> CrossRef and DataCite have made a commitment to collaborate--for 
> example, CrossRef's content negotiation APIs were extended to help 
> with interoperability between the two registration agencies, and we 
> have plans to work closely together going forward.
>
>
>
> And the entertainment industry, which now also uses DOIs, obviously 
> has entirely different metadata associated with those DOIs.
>
>
>
> A sidenote: because the CrossRef DOI is so ubiquitous in STM, people 
> tend to think it has _all possible metadata_. Nope! ;-) They think 
> they can get an e-mail of a contributor from CrossRef, but that's not 
> in the CrossRef metadata. But guess what? It's probably available via 
> the ORCID ID that should be available in the CrossRef metadata, which 
> would send a system to a different server to retrieve information 
> about that specific contributor (and a scientific paper can have scores of contributors).
>
>
> Yes this is right. Though right now there are not a ton of ORCIDs in 
> the CrossRef metadata, they are growing and expected to do so faster 
> as publishers figure out how to get the right data from the right 
> systems to CrossRef.
>
>
>
> Where I'm going with this is that it is WAY better to have these 
> centralized, authoritative, ideally continually maintained 
> repositories of _particular kinds of metadata with IDs associated with 
> the metadata records_ than to try to ship boatloads of metadata all 
> over the place with individual documents. Thus: Why We Need 
> Identifiers, and Why Identifiers Need Support Systems.
>
>
>
> Another example we've been looking at is institutional 
> identifiers--candidates include Ringgold and the ISBN's new 
> organizational ID.  We have a taxonomy of some funding institutions 
> (and they have a Funder
> ID) as part of our FundRef funding data service.
>
>
>
> For a given community of users (scientists, librarians, scholars, data 
> curators), getting a known ID like a CrossRef DOI or a DataCite DOI or 
> an ORCID is just amazingly efficient. The metadata thus available may 
> not be useful or relevant to users outside that sector, but for the 
> users for whom that identifier and its support system were created, it saves the day.
>
>
>
> I realize that you may be thinking "well this is all very interesting 
> but what does this mean for OA?" I guess my point is that these 
> purpose-built identifiers and the systems associated with them will 
> not go away. Lacking a canonical and ubiquitous "work identifier," 
> this is the ecosystem that we are working with now.
>
>
>
> The demo is very interesting. Tangentially, It may be of interest that 
> we have worked with Ubiquity on a few projects and they have become a 
> sponsoring entity that agrees to fulfill CrossRef membership 
> obligations (depositing DOIs and creating outbound reference links and 
> paying the bills) on behalf of small publishers who may not have the 
> resources to do so themselves.
>
>
>
> --Bill Kasdorf
>
>
>
> From: Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com]
> Sent: Wednesday, December 03, 2014 8:27 AM
> To: Frederick HIrsch
> Cc: W3C Public Annotation List
> Subject: Re: use case clarification - cross format annotations
>
>
>
> Hi Frederick,
>
> comments in line
>
>
>
> On Wed, Dec 3, 2014 at 6:52 AM, Frederick HIrsch <hirsch@fjhirsch.com>
> wrote:
>
> Paolo
>
> Thanks for providing a use case on the wiki - 
> https://www.w3.org/annotation/wiki/Cross-formats_Annotations

>
> I think what you are saying is that the same document can be provided 
> in different formats (e.g. HTML or PDF) at different portals (e.g. 
> PubMed Central vs authors personal web site etc) - I guess different 
> portals could also offer the same format with different URLs as well.
>
>
>
> Correct. This is a very common scenario for scientific papers, one of 
> the main resources I annotate.
>
>
>
>
> The use case also says that sometimes these various targets should be 
> treated as the same despite having different URLs and sometimes should 
> be treated as different, depending on user choice.
>
>
>
> Correct. For instance if I annotate with Domeo an HTML version, I want 
> to see the same annotations on my PDF version through the Utopia 
> client. This is in fact already implemented through the Annotopia server:
> https://www.youtube.com/watch?v=OrNX6Sfg_RQ

>
>
>
>
> Thus I have  questions
>
> - how can a system know that two documents are different 
> representations of the same document when they have different URLs?
>
>
>
> When tools like Domeo and Annotopia see a document, the first thing 
> they do is capture available IDs. Domeo looks up for DOIs, PMIDs, 
> PMCIDs, PIIs and so on. When sanding the annotation to Annotopia, the 
> bibliographic data are sent as description of the target document. 
> This is done by reusing existing vocabularies/ontologies.
>
>
>
>
> - why would a end-user want only to provide annotations for a specific 
> representation of the same target and not have it apply to all versions?
>
>
>
> It depends what is the task. If the task is to compare output formats 
> you might want to do that. Also different formats might be different 
> in layout and the annotation might be related to that.
> In general, it is important to know exactly which variant motivated 
> the annotation so that the process can be fully understood.
>
>
>
>
> - should we simplify the use case to how to share annotations for a 
> target that has multiple instances with different URLs.
>
>
>
> I guess so. Keeping in mind that one URL can refer to HTML and one to PDF?
>
>
>
>
> It seems the big issue here is that different URLs might refer to the 
> same target, and how to handle that.
>
>
>
> Yup. In my case I incorporate bibliographic data in the annotation. In 
> alternative something else need to do that job of finding that out.
>
>
>
>
> I know I’m jumping ahead, but thought I’d ask now.
>
>
>
> Good you asked :)
>
>
>
>
> regards, Frederick
>
> Frederick Hirsch
> @fjhirsch
>
>
>
>
>
> --
>
> Dr. Paolo Ciccarese
> Assistant Professor of Neurology, Harvard Medical School Assistant in 
> Neuroscience, Massachusetts General Hospital Senior Information 
> Scientist, MGH Biomedical Informatics Core
>
> ORCID: http://orcid.org/0000-0002-5156-2703

>
>
> CONFIDENTIALITY NOTICE: This message is intended only for the 
> addressee(s), may contain information that is considered to be 
> sensitive or confidential and may not be forwarded or disclosed to any 
> other party without the permission of the sender.
> If you have received this message in error, please notify the sender 
> immediately.
>
>
>
>
>
> --
>
> Carol Anne Meyer
>
> Business Development and Marketing
>
> CrossRef
>
> 50 Salem Street
>
> Lynnfield, MA 01940
>
> + 1 781 629 9782
>
> International +1 781 295 0072 x23
>
> @meyercarol
>
>
>
> www.crossref.org
>
> @CrossRefNews



--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Thursday, 11 December 2014 20:33:48 UTC