Re: use case clarification - cross format annotations from Stian Soiland-Reyes on 2014-12-11 (public-annotation@w3.org from December 2014)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 11 Dec 2014 16:58:26 +0000
To: Bill Kasdorf <bkasdorf@apexcovantage.com>
Cc: Robert Bolick <robert.bolick@gmail.com>, Carol Anne Meyer <cmeyer@crossref.org>, Web Annotation <public-annotation@w3.org>, "Dawson, Laura (Laura.Dawson@bowker.com)" <Laura.Dawson@bowker.com>
Message-ID: <CAPRnXt=KOExciSWw1cXa-YVy+R8CFnQNQdps3MSTp55=xAuWFA@mail.gmail.com>
Sometimes you don't have a common DOI - but you still know there is a
common "abstract thing" the two formats share.

You can use existing PROV and DC mechanisms to relate different
formats of "the same thing".


http://www.w3.org/TR/prov-o/#alternateOf is exactly for this purpose
(e.g. relate PDF to HTML sibling) - where
http://www.w3.org/TR/prov-o/#specializationOf can be used to relate a
DOI and HTML.

    <http://example.com/paper54.html> prov:specializationOf
<http://dx.doi.org/10.1234/p54> .
    <http://example.com/paper54.pdf> prov:specializationOf
<http://dx.doi.org/10.1234/p54> .
    <http://example.com/paper54.pdf> prov:alternateOf
<http://example.com/paper54.html> .

http://purl.org/dc/terms/isFormatOf can be used similarly:

" A related resource that is substantially the same as the described
resource, but in another format."

    <http://example.com/paper54.pdf> dcterms:isFormatOf
<http://example.com/paper54.html> .


The not-quite-inverse http://purl.org/dc/terms/hasFormat on the other hand
implies that the subject pre-existed, e.g. one converted to another.
Here the PDF format was made from the HTML format:

    <http://example.com/paper54.html> dcterms:hasFormat
<http://example.com/paper54.pdf> .


But if you really want to also state such a conversion provenance I
believe it's better to use prov:alternateOf in addition to a
derivation relation like pav:importedFrom
http://purl.org/pav/html#http://purl.org/pav/importedFrom :

    <http://example.com/paper54.pdf>  pav:importedFrom
<http://example.com/paper54.html>

(This is not usually true as a publisher will commonly generate both
HTML and PDF from a common, internal XML format - meaning you are left
with just alternateOf and possibly specializationOf the DOI)



On 9 December 2014 at 18:29, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
> Excellent. The STM and standards communities are obvious beneficiaries of
> the Annotations work and would likely be early adopters compared to others
> within the broader publishing space.
>
>
>
> From: Robert Bolick [mailto:robert.bolick@gmail.com]
> Sent: Tuesday, December 09, 2014 1:25 PM
> To: Bill Kasdorf
> Cc: Carol Anne Meyer; Web Annotation; Dawson, Laura
> (Laura.Dawson@bowker.com)
> Subject: Re: use case clarification - cross format annotations
>
>
>
> And standards developing and publishing bodies are collaborating with each
> other and CrossRef as I write, Bill. I'm hopeful that this will facilitate
> earlier adoption of this group's deliveries!
>
> Sent from my iPhone
>
>
> On 9 Dec 2014, at 15:40, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
>
> Thanks, Carol!
>
>
>
> Subsequent discussion on the list—much of which has gone beyond my technical
> knowledge—has focused on the ability of server-side processing via APIs to
> resolve some of these ambiguities. Which I was delighted to find: I was
> worried that the fact that a DOI could not be relied upon to resolve
> directly to one and only one version and instance of a document, and that
> what it resolves to can change over time, would make it impossible to use
> the way the Open Annotations WG needs annotations to work. Not only is that
> not necessarily the case, Paolo and others are actually using DOIs now for
> this purpose, with mechanisms to deal with those potential ambiguities.
>
>
>
> Your clarifications are all helpful and probably quite relevant in that
> context, so I'm copying the WG.
>
>
>
> One other point: although I realized that there are CrossRef DOIs for
> datasets (thanks for the details, and reminding me how much that is in fact
> done), my main point was that _different metadata_ is associated with a
> CrossRef DOI and a DataCite DOI. Is that correct? Or when there is a
> CrossRef DOI associated with a dataset, is the metadata the same as if it
> had a DataCite DOI? (BTW I also knew CrossRef and DataCite are
> collaborating: kudos for that, of course! Ditto for ORCID and ISNI, though
> "talking" rather than "collaborating" may more accurately reflect the status
> of that. I think ISNI is going to be essential for organizational
> identification, as a complement to ORCID for contributors.)
>
>
>
> --Bill
>
>
>
> From: Carol Anne Meyer [mailto:cmeyer@crossref.org]
> Sent: Monday, December 08, 2014 11:32 PM
> To: Bill Kasdorf
> Cc: Dawson, Laura (Laura.Dawson@bowker.com)
> Subject: Re: FW: use case clarification - cross format annotations
>
>
>
> Hi Bill,
>
>
>
> Thanks very much for sharing this.
>
>
>
> Yep, you got it right--just a few notes to elaborate below:
>
>
>
> On Thu, Dec 4, 2014 at 3:45 AM, Bill Kasdorf <bkasdorf@apexcovantage.com>
> wrote:
>
> Hi, Laura and Carol—
>
>
>
> I don't think you two get the W3C OAWG e-mails, and I wanted you to see what
> I just sent. You both may have comments or corrections to what I wrote. Hope
> I didn't misrepresent anything from your point of view!
>
>
>
> --Bill
>
>
>
> From: Bill Kasdorf
> Sent: Thursday, December 04, 2014 3:31 AM
> To: Paolo Ciccarese; Frederick HIrsch
> Cc: W3C Public Annotation List
> Subject: RE: use case clarification - cross format annotations
>
>
>
> I just want to reinforce the importance of this issue. In fact from a use
> case POV I think there are two issues:
>
>
>
> --The same document referenced by multiple URIs.
>
> This CAN but is not always handled by CrossRef with Multiple Resolution, but
> only when the documents with the different URIs are the same
> versione--typically the version of record. In this case, one DOI has more
> than one URI associated with it. The service provides a user-choice popup. A
> specific URI can be accessed with a CrossRef DOI and parameter to by-pass
> the multiple resolution interface.
>
>
>
> --Synchronizing annotations to multiple formats of the same document (that
> is the same _version_ of the same document . . . which implies what I would
> consider a third use case, versions, which we are already addressing).
>
>
>
> And I also want to highlight this comment from Paolo:
>
>
>
>> When tools like Domeo and Annotopia see a document, the first thing they
>> do is capture available IDs. Domeo looks up for DOIs, PMIDs, PMCIDs, PIIs
>> and so on. When sending the annotation to Annotopia, the bibliographic data
>> are sent as description of the target document. This is done by reusing
>> existing vocabularies/ontologies.
>
>
>
> This is really essential for people to understand. I know many in this
> community are skeptical of IDs like the DOI that require implementation of
> support systems around them. But in the real world ;-) this is how this
> works.
>
>
>
> The way to think of it is this: identifiers are proxies for metadata.
>
>
>
> The systems associated with these IDs provide documented specifications for
> what _their_ metadata includes. And they usually also provide APIs for the
> retrieval of their stored metadata based on the identifier. So btw when
> Paolo refers to DOIs for a scientific or scholarly paper, he really means a
> "CrossRef DOI." A data set associated with that paper would have a different
> DOI (a "DataCite DOI") which would have entirely different metadata
> associated with it.
>
>
>
> So this is very close to being true; just to be precise, data sets
> associated with papers can have "CrossRef DOIs"; The difference for CrossRef
> is really the community. If the publisher is hosting or maintaining the
> data, it may be easier for them to add dataset DOIs at CrossRef. And several
> significant databases have been assigning data set DOIs through CrossRef for
> years. An example is the Protein Data Bank. Another is the Organization for
> Economic and Cooperative Development (OECD). In fact there are almost a
> million data sets from 1100 databases with CrossRef DOIs. There are about 5
> million DOIs assigned to data sets at DataCite.
>
>
>
> CrossRef and DataCite have made a commitment to collaborate--for example,
> CrossRef's content negotiation APIs were extended to help with
> interoperability between the two registration agencies, and we have plans to
> work closely together going forward.
>
>
>
> And the entertainment industry, which now also uses DOIs, obviously has
> entirely different metadata associated with those DOIs.
>
>
>
> A sidenote: because the CrossRef DOI is so ubiquitous in STM, people tend to
> think it has _all possible metadata_. Nope! ;-) They think they can get an
> e-mail of a contributor from CrossRef, but that's not in the CrossRef
> metadata. But guess what? It's probably available via the ORCID ID that
> should be available in the CrossRef metadata, which would send a system to a
> different server to retrieve information about that specific contributor
> (and a scientific paper can have scores of contributors).
>
>
> Yes this is right. Though right now there are not a ton of ORCIDs in the
> CrossRef metadata, they are growing and expected to do so faster as
> publishers figure out how to get the right data from the right systems to
> CrossRef.
>
>
>
> Where I'm going with this is that it is WAY better to have these
> centralized, authoritative, ideally continually maintained repositories of
> _particular kinds of metadata with IDs associated with the metadata records_
> than to try to ship boatloads of metadata all over the place with individual
> documents. Thus: Why We Need Identifiers, and Why Identifiers Need Support
> Systems.
>
>
>
> Another example we've been looking at is institutional
> identifiers--candidates include Ringgold and the ISBN's new organizational
> ID.  We have a taxonomy of some funding institutions (and they have a Funder
> ID) as part of our FundRef funding data service.
>
>
>
> For a given community of users (scientists, librarians, scholars, data
> curators), getting a known ID like a CrossRef DOI or a DataCite DOI or an
> ORCID is just amazingly efficient. The metadata thus available may not be
> useful or relevant to users outside that sector, but for the users for whom
> that identifier and its support system were created, it saves the day.
>
>
>
> I realize that you may be thinking "well this is all very interesting but
> what does this mean for OA?" I guess my point is that these purpose-built
> identifiers and the systems associated with them will not go away. Lacking a
> canonical and ubiquitous "work identifier," this is the ecosystem that we
> are working with now.
>
>
>
> The demo is very interesting. Tangentially, It may be of interest that we
> have worked with Ubiquity on a few projects and they have become a
> sponsoring entity that agrees to fulfill CrossRef membership obligations
> (depositing DOIs and creating outbound reference links and paying the bills)
> on behalf of small publishers who may not have the resources to do so
> themselves.
>
>
>
> --Bill Kasdorf
>
>
>
> From: Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com]
> Sent: Wednesday, December 03, 2014 8:27 AM
> To: Frederick HIrsch
> Cc: W3C Public Annotation List
> Subject: Re: use case clarification - cross format annotations
>
>
>
> Hi Frederick,
>
> comments in line
>
>
>
> On Wed, Dec 3, 2014 at 6:52 AM, Frederick HIrsch <hirsch@fjhirsch.com>
> wrote:
>
> Paolo
>
> Thanks for providing a use case on the wiki -
> https://www.w3.org/annotation/wiki/Cross-formats_Annotations
>
> I think what you are saying is that the same document can be provided in
> different formats (e.g. HTML or PDF) at different portals (e.g. PubMed
> Central vs authors personal web site etc) - I guess different portals could
> also offer the same format with different URLs as well.
>
>
>
> Correct. This is a very common scenario for scientific papers, one of the
> main resources I annotate.
>
>
>
>
> The use case also says that sometimes these various targets should be
> treated as the same despite having different URLs and sometimes should be
> treated as different, depending on user choice.
>
>
>
> Correct. For instance if I annotate with Domeo an HTML version, I want to
> see the same annotations on my PDF version through the Utopia client. This
> is in fact already implemented through the Annotopia server:
> https://www.youtube.com/watch?v=OrNX6Sfg_RQ
>
>
>
>
> Thus I have  questions
>
> - how can a system know that two documents are different representations of
> the same document when they have different URLs?
>
>
>
> When tools like Domeo and Annotopia see a document, the first thing they do
> is capture available IDs. Domeo looks up for DOIs, PMIDs, PMCIDs, PIIs and
> so on. When sanding the annotation to Annotopia, the bibliographic data are
> sent as description of the target document. This is done by reusing existing
> vocabularies/ontologies.
>
>
>
>
> - why would a end-user want only to provide annotations for a specific
> representation of the same target and not have it apply to all versions?
>
>
>
> It depends what is the task. If the task is to compare output formats you
> might want to do that. Also different formats might be different in layout
> and the annotation might be related to that.
> In general, it is important to know exactly which variant motivated the
> annotation so that the process can be fully understood.
>
>
>
>
> - should we simplify the use case to how to share annotations for a target
> that has multiple instances with different URLs.
>
>
>
> I guess so. Keeping in mind that one URL can refer to HTML and one to PDF?
>
>
>
>
> It seems the big issue here is that different URLs might refer to the same
> target, and how to handle that.
>
>
>
> Yup. In my case I incorporate bibliographic data in the annotation. In
> alternative something else need to do that job of finding that out.
>
>
>
>
> I know I’m jumping ahead, but thought I’d ask now.
>
>
>
> Good you asked :)
>
>
>
>
> regards, Frederick
>
> Frederick Hirsch
> @fjhirsch
>
>
>
>
>
> --
>
> Dr. Paolo Ciccarese
> Assistant Professor of Neurology, Harvard Medical School
> Assistant in Neuroscience, Massachusetts General Hospital
> Senior Information Scientist, MGH Biomedical Informatics Core
>
> ORCID: http://orcid.org/0000-0002-5156-2703
>
>
> CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
> may contain information that is considered
> to be sensitive or confidential and may not be forwarded or disclosed to any
> other party without the permission of the sender.
> If you have received this message in error, please notify the sender
> immediately.
>
>
>
>
>
> --
>
> Carol Anne Meyer
>
> Business Development and Marketing
>
> CrossRef
>
> 50 Salem Street
>
> Lynnfield, MA 01940
>
> + 1 781 629 9782
>
> International +1 781 295 0072 x23
>
> @meyercarol
>
>
>
> www.crossref.org
>
> @CrossRefNews



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Thursday, 11 December 2014 16:59:14 UTC