RE: use case clarification - cross format annotations from Bill Kasdorf on 2014-12-04 (public-annotation@w3.org from December 2014)

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Thu, 4 Dec 2014 18:39:09 +0000
To: Nick Stenning <nick@whiteink.com>, "public-annotation@w3.org" <public-annotation@w3.org>
Message-ID: <ddd851ee56df4fcaa0671ba5b105fd1c@CO2PR06MB572.namprd06.prod.outlook.com>
Isn't it a problem, though, that the DOI _identifies_ the document but it doesn't necessarily _locate_, or link _to_ the document? Said as a fan of the DOI . . . I would be delighted to learn that what you're advocating would work, at least within a domain like scholarly articles that pretty universally uses DOIs. But I've been advocating the use of the DOI as a work identifier for years to STM publishers and even CrossRef itself and nobody wants to go there. . . . (To be clear, using _a_ DOI as a work identifier, because of course you can assign DOIs to various representations of a work or components of a work as well.)

-----Original Message-----
From: Nick Stenning [mailto:nick@whiteink.com] 
Sent: Thursday, December 04, 2014 7:29 AM
To: public-annotation@w3.org
Subject: Re: use case clarification - cross format annotations

On Wed, Dec 3, 2014, at 12:52, Frederick Hirsch wrote:
>
> - how can a system know that two documents are different 
> representations of the same document when they have different URLs?

I think the assumption that's being made here is that the target would be a URL identifying the representation, whereas it could (and even should, usually) be a URL identifying the resource.

For example, if I'm currently looking at an HTML version of a paper, there might be a meta tag in the page that identifies the resource by DOI, such as:

    <meta name="dc.identifier" content="doi:10.1038/171737a0">

If I, as an implementer, know how DOIs work, that allows me to say that the target of this annotation is actually the resource:

    http://dx.doi.org/10.1038/171737a0

Which in turn allows me to do a number of things:

- navigate the world of linked data associated with that resource:

       curl -L -H 'Accept: text/turtle'
       http://dx.doi.org/10.1038/171737a0

- get metadata about the original published resource:

       curl -L -H 'Accept: application/json'
       http://dx.doi.org/10.1038/171737a0

- identify a PDF with appropriate metadata as being another representation of the same resource
- provide links in the user interface to other representations of the same resource

As such, as I understand JSON-LD (not well), I would expect to generate an annotation of the form

{
  "@type": "oa:Annotation",
  "target": {"@id": "http://dx.doi.org/10.1038/171737a0"}
}

in this scenario.

> - why would a end-user want only to provide annotations for a specific 
> representation of the same target and not have it apply to all versions?

I think Paolo's given a great answer to this already. But it's worth noting that in this case your target really is a web page (or a PDF) and not an abstract resource identifying a paper, so you'd set your target to be "http://jimwatson.com/papers/dna.pdf", and not "http://dx.doi.org/10.1038/171737a0".

> - should we simplify the use case to how to share annotations for a 
> target that has multiple instances with different URLs.

I hope I've given some idea of how I think we should manage this above.
Namely, we shouldn't. Targets are resources. If there is a specific domain within which different reprs can be canonicalised (i.e. academic papers -> DOIs) then great, you can use the canonicalised URIs as targets. But your receiving client will also need to know how to interpret this data in the annotation.

And it's worth noting that this is a feature, not a bug, in my opinion.

If I write a naive annotation store that doesn't know that "http://jimwatson.com/papers/dna.pdf" might be a representation of the resource "http://dx.doi.org/10.1038/171737a0", then I can fail gracefully, by simply not returning annotations of the latter when someone queries me with the former URL.

But if my client wants to solve cross-format annotation problems in academia, chances are I need to know what to do with DOIs, so I can solve that problem.

> It seems the big issue here is that different URLs might refer to the 
> same target, and how to handle that.

Right, but just to really hammer my point home: the world of linked data already has answer to this problem. If you want to refer to a canonicalised version of a document, then you need to know how to canonicalise the document. This can be domain specific (although there are of course more general implementations such as rel=canonical).

But if you want to target the resource, then get the resource's URI and target that, rather then the representation's URI.

In summary, I think I'm saying that we don't need more machinery in Web Annotations to address this issue.

-N
Received on Thursday, 4 December 2014 18:39:39 UTC