RE: use case clarification - cross format annotations from Bill Kasdorf on 2014-12-05 (public-annotation@w3.org from December 2014)

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Fri, 5 Dec 2014 09:32:16 +0000
To: Robert Bolick <robert.bolick@gmail.com>
CC: Nick Stenning <nick@whiteink.com>, "public-annotation@w3.org" <public-annotation@w3.org>
Message-ID: <8f50257e4dcc45a78fb8943b3246bde7@CO2PR06MB572.namprd06.prod.outlook.com>
Thanks for this great perspective from the standards community, Bob!

To underscore one of your points: in that realm, pointing to a date-specific version is critical. Many standards are adopted by governmental bodies and in effect written into law--legislation, regulation, ordinance, etc. at various levels from local to national. When disputes arise--e.g., somebody is suing a builder for faulty construction leading to a bridge or building collapse (quite a big deal)--the liability is based on whether that builder conformed to the standard _in force at the time of construction_. So they can't just look up the current standard, they have to be able to roll back the standard to virtually any previous version.

I should point out that the problem is not on the annotation side here, but on the identifier side. From an annotation perspective, it is actually also critical to point to a particular version, especially if you need to navigate to a certain point within the text. This problem is most critical in the standards realm but is in fact quite general throughout publishing, and is one of the biggest obstacles to this annotation initiative.

For what it's worth, the scholarly journal community has developed a vocabulary for "version of record" (VOR). This recent article, "A JSON-Based Identity Protocol Suite," may be of interest to this WG:
http://www.niso.org/apps/group_public/download.php/14003/SP_Jones_JSON_isqv26no3.pdf

--Bill Kasdorf


-----Original Message-----
From: Robert Bolick [mailto:robert.bolick@gmail.com] 
Sent: Friday, December 05, 2014 2:40 AM
To: Bill Kasdorf
Cc: Nick Stenning; public-annotation@w3.org
Subject: Re: use case clarification - cross format annotations

Yep, bit of a stretch, that one, Bill.

There's a similar nest of problems over in the national standards community.

We have a sometimes extant (sometimes not) version of a standard for which there are extant adopted versions available  (ISOs, ENs and BS ISOs and BS ENs). 

At BSI, we're assigning DOIs to our BS ISOs and BS ENs. ISO (in Geneva) does not assign DOIs to the extant "central" ISOs, and there is nobody to assign DOIs to the non-extant ENs.

Just to complicate matters further, standards are date-specific, yet global practice is to cite a standard without referring to the date. So when someone submits an undated reference to CrossRef, it shrugs its shoulders. In response, the standards community is agreeing to a DOI-assigned entity for which there is a landing page and to which an undated reference will be pointed. From there, according to the policy of the relevant standards body, the user is redirected to the most recent dated version or to a timeline of dated versions with links to each from which the user can choose. In the case of machine-to-machine users, CrossRef - at the publisher's request - can set the automated response to allow for the multiple versions response or not, and if the former, an exception can be thrown or, if the recipient is set to pass through or cope with multiple resolutions, the multiple versions details will be sent.

Then there's the interesting case of a standard jointly developed by two standards bodies who want separate DOIs pointing to the same thing!

Similar but not quite the same problem as in the annotation use case you've highlighted. The use cases document has been reviewed by BSI publishing product developers and warmly welcomed. Standards are  heavily annotated within companies and across companies in multi-vendor projects. The effort on this key area of collaborative work is eagerly watched.

Cheers,
BobB

Sent from my iPhone

> On 4 Dec 2014, at 18:39, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
> 
> Isn't it a problem, though, that the DOI _identifies_ the document but 
> it doesn't necessarily _locate_, or link _to_ the document? Said as a 
> fan of the DOI . . . I would be delighted to learn that what you're 
> advocating would work, at least within a domain like scholarly 
> articles that pretty universally uses DOIs. But I've been advocating 
> the use of the DOI as a work identifier for years to STM publishers 
> and even CrossRef itself and nobody wants to go there. . . . (To be 
> clear, using _a_ DOI as a work identifier, because of course you can 
> assign DOIs to various representations of a work or components of a 
> work as well.)
> 
> -----Original Message-----
> From: Nick Stenning [mailto:nick@whiteink.com]
> Sent: Thursday, December 04, 2014 7:29 AM
> To: public-annotation@w3.org
> Subject: Re: use case clarification - cross format annotations
> 
>> On Wed, Dec 3, 2014, at 12:52, Frederick Hirsch wrote:
>> 
>> - how can a system know that two documents are different 
>> representations of the same document when they have different URLs?
> 
> I think the assumption that's being made here is that the target would be a URL identifying the representation, whereas it could (and even should, usually) be a URL identifying the resource.
> 
> For example, if I'm currently looking at an HTML version of a paper, there might be a meta tag in the page that identifies the resource by DOI, such as:
> 
>    <meta name="dc.identifier" content="doi:10.1038/171737a0">
> 
> If I, as an implementer, know how DOIs work, that allows me to say that the target of this annotation is actually the resource:
> 
>    http://dx.doi.org/10.1038/171737a0
> 
> Which in turn allows me to do a number of things:
> 
> - navigate the world of linked data associated with that resource:
> 
>       curl -L -H 'Accept: text/turtle'
>       http://dx.doi.org/10.1038/171737a0
> 
> - get metadata about the original published resource:
> 
>       curl -L -H 'Accept: application/json'
>       http://dx.doi.org/10.1038/171737a0
> 
> - identify a PDF with appropriate metadata as being another 
> representation of the same resource
> - provide links in the user interface to other representations of the 
> same resource
> 
> As such, as I understand JSON-LD (not well), I would expect to 
> generate an annotation of the form
> 
> {
>  "@type": "oa:Annotation",
>  "target": {"@id": "http://dx.doi.org/10.1038/171737a0"}
> }
> 
> in this scenario.
> 
>> - why would a end-user want only to provide annotations for a 
>> specific representation of the same target and not have it apply to all versions?
> 
> I think Paolo's given a great answer to this already. But it's worth noting that in this case your target really is a web page (or a PDF) and not an abstract resource identifying a paper, so you'd set your target to be "http://jimwatson.com/papers/dna.pdf", and not "http://dx.doi.org/10.1038/171737a0".
> 
>> - should we simplify the use case to how to share annotations for a 
>> target that has multiple instances with different URLs.
> 
> I hope I've given some idea of how I think we should manage this above.
> Namely, we shouldn't. Targets are resources. If there is a specific domain within which different reprs can be canonicalised (i.e. academic papers -> DOIs) then great, you can use the canonicalised URIs as targets. But your receiving client will also need to know how to interpret this data in the annotation.
> 
> And it's worth noting that this is a feature, not a bug, in my opinion.
> 
> If I write a naive annotation store that doesn't know that "http://jimwatson.com/papers/dna.pdf" might be a representation of the resource "http://dx.doi.org/10.1038/171737a0", then I can fail gracefully, by simply not returning annotations of the latter when someone queries me with the former URL.
> 
> But if my client wants to solve cross-format annotation problems in academia, chances are I need to know what to do with DOIs, so I can solve that problem.
> 
>> It seems the big issue here is that different URLs might refer to the 
>> same target, and how to handle that.
> 
> Right, but just to really hammer my point home: the world of linked data already has answer to this problem. If you want to refer to a canonicalised version of a document, then you need to know how to canonicalise the document. This can be domain specific (although there are of course more general implementations such as rel=canonical).
> 
> But if you want to target the resource, then get the resource's URI and target that, rather then the representation's URI.
> 
> In summary, I think I'm saying that we don't need more machinery in Web Annotations to address this issue.
> 
> -N
> 
> 
>
Received on Friday, 5 December 2014 09:32:45 UTC