Re: use case clarification - cross format annotations from Robert Bolick on 2014-12-04 (public-annotation@w3.org from December 2014)

From: Robert Bolick <robert.bolick@gmail.com>
Date: Thu, 4 Dec 2014 17:05:14 +0000
To: Bill Kasdorf <bkasdorf@apexcovantage.com>
Cc: Paolo Ciccarese <paolo.ciccarese@gmail.com>, Frederick HIrsch <hirsch@fjhirsch.com>, W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <56DFA582-2AEB-4F5E-B5BE-D2C5A166E7C2@gmail.com>
+++

Sent from my iPhone

> On 4 Dec 2014, at 08:30, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
> 
> I just want to reinforce the importance of this issue. In fact from a use case POV I think there are two issues:
>  
> --The same document referenced by multiple URIs.
>  
> --Synchronizing annotations to multiple formats of the same document (that is the same _version_ of the same document . . . which implies what I would consider a third use case, versions, which we are already addressing).
>  
> And I also want to highlight this comment from Paolo:
>  
> > When tools like Domeo and Annotopia see a document, the first thing they do is capture available IDs. Domeo looks up for DOIs, PMIDs, PMCIDs, PIIs and so on. When sending the annotation to Annotopia, the bibliographic data are sent as description of the target document. This is done by reusing existing vocabularies/ontologies.
>  
> This is really essential for people to understand. I know many in this community are skeptical of IDs like the DOI that require implementation of support systems around them. But in the real world ;-) this is how this works.
>  
> The way to think of it is this: identifiers are proxies for metadata.
>  
> The systems associated with these IDs provide documented specifications for what _their_ metadata includes. And they usually also provide APIs for the retrieval of their stored metadata based on the identifier. So btw when Paolo refers to DOIs for a scientific or scholarly paper, he really means a "CrossRef DOI." A data set associated with that paper would have a different DOI (a "DataCite DOI") which would have entirely different metadata associated with it. And the entertainment industry, which now also uses DOIs, obviously has entirely different metadata associated with those DOIs.
>  
> A sidenote: because the CrossRef DOI is so ubiquitous in STM, people tend to think it has _all possible metadata_. Nope! ;-) They think they can get an e-mail of a contributor from CrossRef, but that's not in the CrossRef metadata. But guess what? It's probably available via the ORCID ID that should be available in the CrossRef metadata, which would send a system to a different server to retrieve information about that specific contributor (and a scientific paper can have scores of contributors). Where I'm going with this is that it is WAY better to have these centralized, authoritative, ideally continually maintained repositories of _particular kinds of metadata with IDs associated with the metadata records_ than to try to ship boatloads of metadata all over the place with individual documents. Thus: Why We Need Identifiers, and Why Identifiers Need Support Systems.
>  
> For a given community of users (scientists, librarians, scholars, data curators), getting a known ID like a CrossRef DOI or a DataCite DOI or an ORCID is just amazingly efficient. The metadata thus available may not be useful or relevant to users outside that sector, but for the users for whom that identifier and its support system were created, it saves the day.
>  
> I realize that you may be thinking "well this is all very interesting but what does this mean for OA?" I guess my point is that these purpose-built identifiers and the systems associated with them will not go away. Lacking a canonical and ubiquitous "work identifier," this is the ecosystem that we are working with now.
>  
> --Bill Kasdorf
>  
> From: Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com] 
> Sent: Wednesday, December 03, 2014 8:27 AM
> To: Frederick HIrsch
> Cc: W3C Public Annotation List
> Subject: Re: use case clarification - cross format annotations
>  
> Hi Frederick,
> comments in line
>  
> On Wed, Dec 3, 2014 at 6:52 AM, Frederick HIrsch <hirsch@fjhirsch.com> wrote:
> Paolo
> 
> Thanks for providing a use case on the wiki - https://www.w3.org/annotation/wiki/Cross-formats_Annotations
> 
> I think what you are saying is that the same document can be provided in different formats (e.g. HTML or PDF) at different portals (e.g. PubMed Central vs authors personal web site etc) - I guess different portals could also offer the same format with different URLs as well.
>  
> Correct. This is a very common scenario for scientific papers, one of the main resources I annotate.
>  
> 
> The use case also says that sometimes these various targets should be treated as the same despite having different URLs and sometimes should be treated as different, depending on user choice.
>  
> Correct. For instance if I annotate with Domeo an HTML version, I want to see the same annotations on my PDF version through the Utopia client. This is in fact already implemented through the Annotopia server: https://www.youtube.com/watch?v=OrNX6Sfg_RQ
>  
> 
> Thus I have  questions
> 
> - how can a system know that two documents are different representations of the same document when they have different URLs?
>  
> When tools like Domeo and Annotopia see a document, the first thing they do is capture available IDs. Domeo looks up for DOIs, PMIDs, PMCIDs, PIIs and so on. When sanding the annotation to Annotopia, the bibliographic data are sent as description of the target document. This is done by reusing existing vocabularies/ontologies.
>  
> 
> - why would a end-user want only to provide annotations for a specific representation of the same target and not have it apply to all versions?
>  
> It depends what is the task. If the task is to compare output formats you might want to do that. Also different formats might be different in layout and the annotation might be related to that. 
> In general, it is important to know exactly which variant motivated the annotation so that the process can be fully understood.
>  
> 
> - should we simplify the use case to how to share annotations for a target that has multiple instances with different URLs.
>  
> I guess so. Keeping in mind that one URL can refer to HTML and one to PDF?
>  
> 
> It seems the big issue here is that different URLs might refer to the same target, and how to handle that.
>  
> Yup. In my case I incorporate bibliographic data in the annotation. In alternative something else need to do that job of finding that out.
>  
> 
> I know I’m jumping ahead, but thought I’d ask now.
>  
> Good you asked :)
>  
> 
> regards, Frederick
> 
> Frederick Hirsch
> @fjhirsch
> 
> 
> 
> 
> 
> 
> 
> --
> Dr. Paolo Ciccarese                       
> Assistant Professor of Neurology, Harvard Medical School
> Assistant in Neuroscience, Massachusetts General Hospital
> Senior Information Scientist, MGH Biomedical Informatics Core
> ORCID: http://orcid.org/0000-0002-5156-2703
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s), may contain information that is considered
> to be sensitive or confidential and may not be forwarded or disclosed to any other party without the permission of the sender. 
> If you have received this message in error, please notify the sender immediately.
Received on Thursday, 4 December 2014 17:05:45 UTC