- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Thu, 11 Dec 2014 16:58:26 +0000
- To: Bill Kasdorf <bkasdorf@apexcovantage.com>
- Cc: Robert Bolick <robert.bolick@gmail.com>, Carol Anne Meyer <cmeyer@crossref.org>, Web Annotation <public-annotation@w3.org>, "Dawson, Laura (Laura.Dawson@bowker.com)" <Laura.Dawson@bowker.com>
Sometimes you don't have a common DOI - but you still know there is a common "abstract thing" the two formats share. You can use existing PROV and DC mechanisms to relate different formats of "the same thing". http://www.w3.org/TR/prov-o/#alternateOf is exactly for this purpose (e.g. relate PDF to HTML sibling) - where http://www.w3.org/TR/prov-o/#specializationOf can be used to relate a DOI and HTML. <http://example.com/paper54.html> prov:specializationOf <http://dx.doi.org/10.1234/p54> . <http://example.com/paper54.pdf> prov:specializationOf <http://dx.doi.org/10.1234/p54> . <http://example.com/paper54.pdf> prov:alternateOf <http://example.com/paper54.html> . http://purl.org/dc/terms/isFormatOf can be used similarly: " A related resource that is substantially the same as the described resource, but in another format." <http://example.com/paper54.pdf> dcterms:isFormatOf <http://example.com/paper54.html> . The not-quite-inverse http://purl.org/dc/terms/hasFormat on the other hand implies that the subject pre-existed, e.g. one converted to another. Here the PDF format was made from the HTML format: <http://example.com/paper54.html> dcterms:hasFormat <http://example.com/paper54.pdf> . But if you really want to also state such a conversion provenance I believe it's better to use prov:alternateOf in addition to a derivation relation like pav:importedFrom http://purl.org/pav/html#http://purl.org/pav/importedFrom : <http://example.com/paper54.pdf> pav:importedFrom <http://example.com/paper54.html> (This is not usually true as a publisher will commonly generate both HTML and PDF from a common, internal XML format - meaning you are left with just alternateOf and possibly specializationOf the DOI) On 9 December 2014 at 18:29, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote: > Excellent. The STM and standards communities are obvious beneficiaries of > the Annotations work and would likely be early adopters compared to others > within the broader publishing space. > > > > From: Robert Bolick [mailto:robert.bolick@gmail.com] > Sent: Tuesday, December 09, 2014 1:25 PM > To: Bill Kasdorf > Cc: Carol Anne Meyer; Web Annotation; Dawson, Laura > (Laura.Dawson@bowker.com) > Subject: Re: use case clarification - cross format annotations > > > > And standards developing and publishing bodies are collaborating with each > other and CrossRef as I write, Bill. I'm hopeful that this will facilitate > earlier adoption of this group's deliveries! > > Sent from my iPhone > > > On 9 Dec 2014, at 15:40, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote: > > Thanks, Carol! > > > > Subsequent discussion on the list—much of which has gone beyond my technical > knowledge—has focused on the ability of server-side processing via APIs to > resolve some of these ambiguities. Which I was delighted to find: I was > worried that the fact that a DOI could not be relied upon to resolve > directly to one and only one version and instance of a document, and that > what it resolves to can change over time, would make it impossible to use > the way the Open Annotations WG needs annotations to work. Not only is that > not necessarily the case, Paolo and others are actually using DOIs now for > this purpose, with mechanisms to deal with those potential ambiguities. > > > > Your clarifications are all helpful and probably quite relevant in that > context, so I'm copying the WG. > > > > One other point: although I realized that there are CrossRef DOIs for > datasets (thanks for the details, and reminding me how much that is in fact > done), my main point was that _different metadata_ is associated with a > CrossRef DOI and a DataCite DOI. Is that correct? Or when there is a > CrossRef DOI associated with a dataset, is the metadata the same as if it > had a DataCite DOI? (BTW I also knew CrossRef and DataCite are > collaborating: kudos for that, of course! Ditto for ORCID and ISNI, though > "talking" rather than "collaborating" may more accurately reflect the status > of that. I think ISNI is going to be essential for organizational > identification, as a complement to ORCID for contributors.) > > > > --Bill > > > > From: Carol Anne Meyer [mailto:cmeyer@crossref.org] > Sent: Monday, December 08, 2014 11:32 PM > To: Bill Kasdorf > Cc: Dawson, Laura (Laura.Dawson@bowker.com) > Subject: Re: FW: use case clarification - cross format annotations > > > > Hi Bill, > > > > Thanks very much for sharing this. > > > > Yep, you got it right--just a few notes to elaborate below: > > > > On Thu, Dec 4, 2014 at 3:45 AM, Bill Kasdorf <bkasdorf@apexcovantage.com> > wrote: > > Hi, Laura and Carol— > > > > I don't think you two get the W3C OAWG e-mails, and I wanted you to see what > I just sent. You both may have comments or corrections to what I wrote. Hope > I didn't misrepresent anything from your point of view! > > > > --Bill > > > > From: Bill Kasdorf > Sent: Thursday, December 04, 2014 3:31 AM > To: Paolo Ciccarese; Frederick HIrsch > Cc: W3C Public Annotation List > Subject: RE: use case clarification - cross format annotations > > > > I just want to reinforce the importance of this issue. In fact from a use > case POV I think there are two issues: > > > > --The same document referenced by multiple URIs. > > This CAN but is not always handled by CrossRef with Multiple Resolution, but > only when the documents with the different URIs are the same > versione--typically the version of record. In this case, one DOI has more > than one URI associated with it. The service provides a user-choice popup. A > specific URI can be accessed with a CrossRef DOI and parameter to by-pass > the multiple resolution interface. > > > > --Synchronizing annotations to multiple formats of the same document (that > is the same _version_ of the same document . . . which implies what I would > consider a third use case, versions, which we are already addressing). > > > > And I also want to highlight this comment from Paolo: > > > >> When tools like Domeo and Annotopia see a document, the first thing they >> do is capture available IDs. Domeo looks up for DOIs, PMIDs, PMCIDs, PIIs >> and so on. When sending the annotation to Annotopia, the bibliographic data >> are sent as description of the target document. This is done by reusing >> existing vocabularies/ontologies. > > > > This is really essential for people to understand. I know many in this > community are skeptical of IDs like the DOI that require implementation of > support systems around them. But in the real world ;-) this is how this > works. > > > > The way to think of it is this: identifiers are proxies for metadata. > > > > The systems associated with these IDs provide documented specifications for > what _their_ metadata includes. And they usually also provide APIs for the > retrieval of their stored metadata based on the identifier. So btw when > Paolo refers to DOIs for a scientific or scholarly paper, he really means a > "CrossRef DOI." A data set associated with that paper would have a different > DOI (a "DataCite DOI") which would have entirely different metadata > associated with it. > > > > So this is very close to being true; just to be precise, data sets > associated with papers can have "CrossRef DOIs"; The difference for CrossRef > is really the community. If the publisher is hosting or maintaining the > data, it may be easier for them to add dataset DOIs at CrossRef. And several > significant databases have been assigning data set DOIs through CrossRef for > years. An example is the Protein Data Bank. Another is the Organization for > Economic and Cooperative Development (OECD). In fact there are almost a > million data sets from 1100 databases with CrossRef DOIs. There are about 5 > million DOIs assigned to data sets at DataCite. > > > > CrossRef and DataCite have made a commitment to collaborate--for example, > CrossRef's content negotiation APIs were extended to help with > interoperability between the two registration agencies, and we have plans to > work closely together going forward. > > > > And the entertainment industry, which now also uses DOIs, obviously has > entirely different metadata associated with those DOIs. > > > > A sidenote: because the CrossRef DOI is so ubiquitous in STM, people tend to > think it has _all possible metadata_. Nope! ;-) They think they can get an > e-mail of a contributor from CrossRef, but that's not in the CrossRef > metadata. But guess what? It's probably available via the ORCID ID that > should be available in the CrossRef metadata, which would send a system to a > different server to retrieve information about that specific contributor > (and a scientific paper can have scores of contributors). > > > Yes this is right. Though right now there are not a ton of ORCIDs in the > CrossRef metadata, they are growing and expected to do so faster as > publishers figure out how to get the right data from the right systems to > CrossRef. > > > > Where I'm going with this is that it is WAY better to have these > centralized, authoritative, ideally continually maintained repositories of > _particular kinds of metadata with IDs associated with the metadata records_ > than to try to ship boatloads of metadata all over the place with individual > documents. Thus: Why We Need Identifiers, and Why Identifiers Need Support > Systems. > > > > Another example we've been looking at is institutional > identifiers--candidates include Ringgold and the ISBN's new organizational > ID. We have a taxonomy of some funding institutions (and they have a Funder > ID) as part of our FundRef funding data service. > > > > For a given community of users (scientists, librarians, scholars, data > curators), getting a known ID like a CrossRef DOI or a DataCite DOI or an > ORCID is just amazingly efficient. The metadata thus available may not be > useful or relevant to users outside that sector, but for the users for whom > that identifier and its support system were created, it saves the day. > > > > I realize that you may be thinking "well this is all very interesting but > what does this mean for OA?" I guess my point is that these purpose-built > identifiers and the systems associated with them will not go away. Lacking a > canonical and ubiquitous "work identifier," this is the ecosystem that we > are working with now. > > > > The demo is very interesting. Tangentially, It may be of interest that we > have worked with Ubiquity on a few projects and they have become a > sponsoring entity that agrees to fulfill CrossRef membership obligations > (depositing DOIs and creating outbound reference links and paying the bills) > on behalf of small publishers who may not have the resources to do so > themselves. > > > > --Bill Kasdorf > > > > From: Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com] > Sent: Wednesday, December 03, 2014 8:27 AM > To: Frederick HIrsch > Cc: W3C Public Annotation List > Subject: Re: use case clarification - cross format annotations > > > > Hi Frederick, > > comments in line > > > > On Wed, Dec 3, 2014 at 6:52 AM, Frederick HIrsch <hirsch@fjhirsch.com> > wrote: > > Paolo > > Thanks for providing a use case on the wiki - > https://www.w3.org/annotation/wiki/Cross-formats_Annotations > > I think what you are saying is that the same document can be provided in > different formats (e.g. HTML or PDF) at different portals (e.g. PubMed > Central vs authors personal web site etc) - I guess different portals could > also offer the same format with different URLs as well. > > > > Correct. This is a very common scenario for scientific papers, one of the > main resources I annotate. > > > > > The use case also says that sometimes these various targets should be > treated as the same despite having different URLs and sometimes should be > treated as different, depending on user choice. > > > > Correct. For instance if I annotate with Domeo an HTML version, I want to > see the same annotations on my PDF version through the Utopia client. This > is in fact already implemented through the Annotopia server: > https://www.youtube.com/watch?v=OrNX6Sfg_RQ > > > > > Thus I have questions > > - how can a system know that two documents are different representations of > the same document when they have different URLs? > > > > When tools like Domeo and Annotopia see a document, the first thing they do > is capture available IDs. Domeo looks up for DOIs, PMIDs, PMCIDs, PIIs and > so on. When sanding the annotation to Annotopia, the bibliographic data are > sent as description of the target document. This is done by reusing existing > vocabularies/ontologies. > > > > > - why would a end-user want only to provide annotations for a specific > representation of the same target and not have it apply to all versions? > > > > It depends what is the task. If the task is to compare output formats you > might want to do that. Also different formats might be different in layout > and the annotation might be related to that. > In general, it is important to know exactly which variant motivated the > annotation so that the process can be fully understood. > > > > > - should we simplify the use case to how to share annotations for a target > that has multiple instances with different URLs. > > > > I guess so. Keeping in mind that one URL can refer to HTML and one to PDF? > > > > > It seems the big issue here is that different URLs might refer to the same > target, and how to handle that. > > > > Yup. In my case I incorporate bibliographic data in the annotation. In > alternative something else need to do that job of finding that out. > > > > > I know I’m jumping ahead, but thought I’d ask now. > > > > Good you asked :) > > > > > regards, Frederick > > Frederick Hirsch > @fjhirsch > > > > > > -- > > Dr. Paolo Ciccarese > Assistant Professor of Neurology, Harvard Medical School > Assistant in Neuroscience, Massachusetts General Hospital > Senior Information Scientist, MGH Biomedical Informatics Core > > ORCID: http://orcid.org/0000-0002-5156-2703 > > > CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s), > may contain information that is considered > to be sensitive or confidential and may not be forwarded or disclosed to any > other party without the permission of the sender. > If you have received this message in error, please notify the sender > immediately. > > > > > > -- > > Carol Anne Meyer > > Business Development and Marketing > > CrossRef > > 50 Salem Street > > Lynnfield, MA 01940 > > + 1 781 629 9782 > > International +1 781 295 0072 x23 > > @meyercarol > > > > www.crossref.org > > @CrossRefNews -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Thursday, 11 December 2014 16:59:14 UTC