Re: use case clarification - cross format annotations

On Wed, Dec 3, 2014, at 12:52, Frederick Hirsch wrote:
>
> - how can a system know that two documents are different representations
> of the same document when they have different URLs?

I think the assumption that's being made here is that the target would
be a URL identifying the representation, whereas it could (and even
should, usually) be a URL identifying the resource.

For example, if I'm currently looking at an HTML version of a paper,
there might be a meta tag in the page that identifies the resource by
DOI, such as:

    <meta name="dc.identifier" content="doi:10.1038/171737a0">

If I, as an implementer, know how DOIs work, that allows me to say that
the target of this annotation is actually the resource:

    http://dx.doi.org/10.1038/171737a0

Which in turn allows me to do a number of things:

- navigate the world of linked data associated with that resource:

       curl -L -H 'Accept: text/turtle'
       http://dx.doi.org/10.1038/171737a0

- get metadata about the original published resource:

       curl -L -H 'Accept: application/json'
       http://dx.doi.org/10.1038/171737a0

- identify a PDF with appropriate metadata as being another
representation of the same resource
- provide links in the user interface to other representations of the
same resource

As such, as I understand JSON-LD (not well), I would expect to generate
an annotation of the form

{
  "@type": "oa:Annotation",
  "target": {"@id": "http://dx.doi.org/10.1038/171737a0"}
}

in this scenario.

> - why would a end-user want only to provide annotations for a specific
> representation of the same target and not have it apply to all versions?

I think Paolo's given a great answer to this already. But it's worth
noting that in this case your target really is a web page (or a PDF) and
not an abstract resource identifying a paper, so you'd set your target
to be "http://jimwatson.com/papers/dna.pdf", and not
"http://dx.doi.org/10.1038/171737a0".

> - should we simplify the use case to how to share annotations for a
> target that has multiple instances with different URLs.

I hope I've given some idea of how I think we should manage this above.
Namely, we shouldn't. Targets are resources. If there is a specific
domain within which different reprs can be canonicalised (i.e. academic
papers -> DOIs) then great, you can use the canonicalised URIs as
targets. But your receiving client will also need to know how to
interpret this data in the annotation.

And it's worth noting that this is a feature, not a bug, in my opinion.

If I write a naive annotation store that doesn't know that
"http://jimwatson.com/papers/dna.pdf" might be a representation of the
resource "http://dx.doi.org/10.1038/171737a0", then I can fail
gracefully, by simply not returning annotations of the latter when
someone queries me with the former URL.

But if my client wants to solve cross-format annotation problems in
academia, chances are I need to know what to do with DOIs, so I can
solve that problem.

> It seems the big issue here is that different URLs might refer to the
> same target, and how to handle that.

Right, but just to really hammer my point home: the world of linked data
already has answer to this problem. If you want to refer to a
canonicalised version of a document, then you need to know how to
canonicalise the document. This can be domain specific (although there
are of course more general implementations such as rel=canonical).

But if you want to target the resource, then get the resource's URI and
target that, rather then the representation's URI.

In summary, I think I'm saying that we don't need more machinery in Web
Annotations to address this issue.

-N

Received on Thursday, 4 December 2014 12:29:32 UTC