- From: Bill Kasdorf <bkasdorf@apexcovantage.com>
- Date: Thu, 11 Dec 2014 20:33:17 +0000
- To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- CC: Robert Bolick <robert.bolick@gmail.com>, Carol Anne Meyer <cmeyer@crossref.org>, Web Annotation <public-annotation@w3.org>, "Dawson, Laura (Laura.Dawson@bowker.com)" <Laura.Dawson@bowker.com>
Thanks, this is a really nice clear explanation and example. My question, though, is _who_ does this? Where do these expressions "live" in the ecosystem and the workflow? -----Original Message----- From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of Stian Soiland-Reyes Sent: Thursday, December 11, 2014 11:58 AM To: Bill Kasdorf Cc: Robert Bolick; Carol Anne Meyer; Web Annotation; Dawson, Laura (Laura.Dawson@bowker.com) Subject: Re: use case clarification - cross format annotations Sometimes you don't have a common DOI - but you still know there is a common "abstract thing" the two formats share. You can use existing PROV and DC mechanisms to relate different formats of "the same thing". http://www.w3.org/TR/prov-o/#alternateOf is exactly for this purpose (e.g. relate PDF to HTML sibling) - where http://www.w3.org/TR/prov-o/#specializationOf can be used to relate a DOI and HTML. <http://example.com/paper54.html> prov:specializationOf <http://dx.doi.org/10.1234/p54> . <http://example.com/paper54.pdf> prov:specializationOf <http://dx.doi.org/10.1234/p54> . <http://example.com/paper54.pdf> prov:alternateOf <http://example.com/paper54.html> . http://purl.org/dc/terms/isFormatOf can be used similarly: " A related resource that is substantially the same as the described resource, but in another format." <http://example.com/paper54.pdf> dcterms:isFormatOf <http://example.com/paper54.html> . The not-quite-inverse http://purl.org/dc/terms/hasFormat on the other hand implies that the subject pre-existed, e.g. one converted to another. Here the PDF format was made from the HTML format: <http://example.com/paper54.html> dcterms:hasFormat <http://example.com/paper54.pdf> . But if you really want to also state such a conversion provenance I believe it's better to use prov:alternateOf in addition to a derivation relation like pav:importedFrom http://purl.org/pav/html#http://purl.org/pav/importedFrom : <http://example.com/paper54.pdf> pav:importedFrom <http://example.com/paper54.html> (This is not usually true as a publisher will commonly generate both HTML and PDF from a common, internal XML format - meaning you are left with just alternateOf and possibly specializationOf the DOI) On 9 December 2014 at 18:29, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote: > Excellent. The STM and standards communities are obvious beneficiaries > of the Annotations work and would likely be early adopters compared to > others within the broader publishing space. > > > > From: Robert Bolick [mailto:robert.bolick@gmail.com] > Sent: Tuesday, December 09, 2014 1:25 PM > To: Bill Kasdorf > Cc: Carol Anne Meyer; Web Annotation; Dawson, Laura > (Laura.Dawson@bowker.com) > Subject: Re: use case clarification - cross format annotations > > > > And standards developing and publishing bodies are collaborating with > each other and CrossRef as I write, Bill. I'm hopeful that this will > facilitate earlier adoption of this group's deliveries! > > Sent from my iPhone > > > On 9 Dec 2014, at 15:40, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote: > > Thanks, Carol! > > > > Subsequent discussion on the list—much of which has gone beyond my > technical knowledge—has focused on the ability of server-side > processing via APIs to resolve some of these ambiguities. Which I was > delighted to find: I was worried that the fact that a DOI could not be > relied upon to resolve directly to one and only one version and > instance of a document, and that what it resolves to can change over > time, would make it impossible to use the way the Open Annotations WG > needs annotations to work. Not only is that not necessarily the case, > Paolo and others are actually using DOIs now for this purpose, with mechanisms to deal with those potential ambiguities. > > > > Your clarifications are all helpful and probably quite relevant in > that context, so I'm copying the WG. > > > > One other point: although I realized that there are CrossRef DOIs for > datasets (thanks for the details, and reminding me how much that is in > fact done), my main point was that _different metadata_ is associated > with a CrossRef DOI and a DataCite DOI. Is that correct? Or when there > is a CrossRef DOI associated with a dataset, is the metadata the same > as if it had a DataCite DOI? (BTW I also knew CrossRef and DataCite > are > collaborating: kudos for that, of course! Ditto for ORCID and ISNI, > though "talking" rather than "collaborating" may more accurately > reflect the status of that. I think ISNI is going to be essential for > organizational identification, as a complement to ORCID for > contributors.) > > > > --Bill > > > > From: Carol Anne Meyer [mailto:cmeyer@crossref.org] > Sent: Monday, December 08, 2014 11:32 PM > To: Bill Kasdorf > Cc: Dawson, Laura (Laura.Dawson@bowker.com) > Subject: Re: FW: use case clarification - cross format annotations > > > > Hi Bill, > > > > Thanks very much for sharing this. > > > > Yep, you got it right--just a few notes to elaborate below: > > > > On Thu, Dec 4, 2014 at 3:45 AM, Bill Kasdorf > <bkasdorf@apexcovantage.com> > wrote: > > Hi, Laura and Carol— > > > > I don't think you two get the W3C OAWG e-mails, and I wanted you to > see what I just sent. You both may have comments or corrections to > what I wrote. Hope I didn't misrepresent anything from your point of view! > > > > --Bill > > > > From: Bill Kasdorf > Sent: Thursday, December 04, 2014 3:31 AM > To: Paolo Ciccarese; Frederick HIrsch > Cc: W3C Public Annotation List > Subject: RE: use case clarification - cross format annotations > > > > I just want to reinforce the importance of this issue. In fact from a > use case POV I think there are two issues: > > > > --The same document referenced by multiple URIs. > > This CAN but is not always handled by CrossRef with Multiple > Resolution, but only when the documents with the different URIs are > the same versione--typically the version of record. In this case, one > DOI has more than one URI associated with it. The service provides a > user-choice popup. A specific URI can be accessed with a CrossRef DOI > and parameter to by-pass the multiple resolution interface. > > > > --Synchronizing annotations to multiple formats of the same document > (that is the same _version_ of the same document . . . which implies > what I would consider a third use case, versions, which we are already addressing). > > > > And I also want to highlight this comment from Paolo: > > > >> When tools like Domeo and Annotopia see a document, the first thing >> they do is capture available IDs. Domeo looks up for DOIs, PMIDs, >> PMCIDs, PIIs and so on. When sending the annotation to Annotopia, the >> bibliographic data are sent as description of the target document. >> This is done by reusing existing vocabularies/ontologies. > > > > This is really essential for people to understand. I know many in this > community are skeptical of IDs like the DOI that require > implementation of support systems around them. But in the real world > ;-) this is how this works. > > > > The way to think of it is this: identifiers are proxies for metadata. > > > > The systems associated with these IDs provide documented > specifications for what _their_ metadata includes. And they usually > also provide APIs for the retrieval of their stored metadata based on > the identifier. So btw when Paolo refers to DOIs for a scientific or > scholarly paper, he really means a "CrossRef DOI." A data set > associated with that paper would have a different DOI (a "DataCite > DOI") which would have entirely different metadata associated with it. > > > > So this is very close to being true; just to be precise, data sets > associated with papers can have "CrossRef DOIs"; The difference for > CrossRef is really the community. If the publisher is hosting or > maintaining the data, it may be easier for them to add dataset DOIs at > CrossRef. And several significant databases have been assigning data > set DOIs through CrossRef for years. An example is the Protein Data > Bank. Another is the Organization for Economic and Cooperative > Development (OECD). In fact there are almost a million data sets from > 1100 databases with CrossRef DOIs. There are about 5 million DOIs assigned to data sets at DataCite. > > > > CrossRef and DataCite have made a commitment to collaborate--for > example, CrossRef's content negotiation APIs were extended to help > with interoperability between the two registration agencies, and we > have plans to work closely together going forward. > > > > And the entertainment industry, which now also uses DOIs, obviously > has entirely different metadata associated with those DOIs. > > > > A sidenote: because the CrossRef DOI is so ubiquitous in STM, people > tend to think it has _all possible metadata_. Nope! ;-) They think > they can get an e-mail of a contributor from CrossRef, but that's not > in the CrossRef metadata. But guess what? It's probably available via > the ORCID ID that should be available in the CrossRef metadata, which > would send a system to a different server to retrieve information > about that specific contributor (and a scientific paper can have scores of contributors). > > > Yes this is right. Though right now there are not a ton of ORCIDs in > the CrossRef metadata, they are growing and expected to do so faster > as publishers figure out how to get the right data from the right > systems to CrossRef. > > > > Where I'm going with this is that it is WAY better to have these > centralized, authoritative, ideally continually maintained > repositories of _particular kinds of metadata with IDs associated with > the metadata records_ than to try to ship boatloads of metadata all > over the place with individual documents. Thus: Why We Need > Identifiers, and Why Identifiers Need Support Systems. > > > > Another example we've been looking at is institutional > identifiers--candidates include Ringgold and the ISBN's new > organizational ID. We have a taxonomy of some funding institutions > (and they have a Funder > ID) as part of our FundRef funding data service. > > > > For a given community of users (scientists, librarians, scholars, data > curators), getting a known ID like a CrossRef DOI or a DataCite DOI or > an ORCID is just amazingly efficient. The metadata thus available may > not be useful or relevant to users outside that sector, but for the > users for whom that identifier and its support system were created, it saves the day. > > > > I realize that you may be thinking "well this is all very interesting > but what does this mean for OA?" I guess my point is that these > purpose-built identifiers and the systems associated with them will > not go away. Lacking a canonical and ubiquitous "work identifier," > this is the ecosystem that we are working with now. > > > > The demo is very interesting. Tangentially, It may be of interest that > we have worked with Ubiquity on a few projects and they have become a > sponsoring entity that agrees to fulfill CrossRef membership > obligations (depositing DOIs and creating outbound reference links and > paying the bills) on behalf of small publishers who may not have the > resources to do so themselves. > > > > --Bill Kasdorf > > > > From: Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com] > Sent: Wednesday, December 03, 2014 8:27 AM > To: Frederick HIrsch > Cc: W3C Public Annotation List > Subject: Re: use case clarification - cross format annotations > > > > Hi Frederick, > > comments in line > > > > On Wed, Dec 3, 2014 at 6:52 AM, Frederick HIrsch <hirsch@fjhirsch.com> > wrote: > > Paolo > > Thanks for providing a use case on the wiki - > https://www.w3.org/annotation/wiki/Cross-formats_Annotations > > I think what you are saying is that the same document can be provided > in different formats (e.g. HTML or PDF) at different portals (e.g. > PubMed Central vs authors personal web site etc) - I guess different > portals could also offer the same format with different URLs as well. > > > > Correct. This is a very common scenario for scientific papers, one of > the main resources I annotate. > > > > > The use case also says that sometimes these various targets should be > treated as the same despite having different URLs and sometimes should > be treated as different, depending on user choice. > > > > Correct. For instance if I annotate with Domeo an HTML version, I want > to see the same annotations on my PDF version through the Utopia > client. This is in fact already implemented through the Annotopia server: > https://www.youtube.com/watch?v=OrNX6Sfg_RQ > > > > > Thus I have questions > > - how can a system know that two documents are different > representations of the same document when they have different URLs? > > > > When tools like Domeo and Annotopia see a document, the first thing > they do is capture available IDs. Domeo looks up for DOIs, PMIDs, > PMCIDs, PIIs and so on. When sanding the annotation to Annotopia, the > bibliographic data are sent as description of the target document. > This is done by reusing existing vocabularies/ontologies. > > > > > - why would a end-user want only to provide annotations for a specific > representation of the same target and not have it apply to all versions? > > > > It depends what is the task. If the task is to compare output formats > you might want to do that. Also different formats might be different > in layout and the annotation might be related to that. > In general, it is important to know exactly which variant motivated > the annotation so that the process can be fully understood. > > > > > - should we simplify the use case to how to share annotations for a > target that has multiple instances with different URLs. > > > > I guess so. Keeping in mind that one URL can refer to HTML and one to PDF? > > > > > It seems the big issue here is that different URLs might refer to the > same target, and how to handle that. > > > > Yup. In my case I incorporate bibliographic data in the annotation. In > alternative something else need to do that job of finding that out. > > > > > I know I’m jumping ahead, but thought I’d ask now. > > > > Good you asked :) > > > > > regards, Frederick > > Frederick Hirsch > @fjhirsch > > > > > > -- > > Dr. Paolo Ciccarese > Assistant Professor of Neurology, Harvard Medical School Assistant in > Neuroscience, Massachusetts General Hospital Senior Information > Scientist, MGH Biomedical Informatics Core > > ORCID: http://orcid.org/0000-0002-5156-2703 > > > CONFIDENTIALITY NOTICE: This message is intended only for the > addressee(s), may contain information that is considered to be > sensitive or confidential and may not be forwarded or disclosed to any > other party without the permission of the sender. > If you have received this message in error, please notify the sender > immediately. > > > > > > -- > > Carol Anne Meyer > > Business Development and Marketing > > CrossRef > > 50 Salem Street > > Lynnfield, MA 01940 > > + 1 781 629 9782 > > International +1 781 295 0072 x23 > > @meyercarol > > > > www.crossref.org > > @CrossRefNews -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Thursday, 11 December 2014 20:33:48 UTC