Re: Streamlining the OA Model from Robert Sanderson on 2012-07-31 (public-openannotation@w3.org from July 2012)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Tue, 31 Jul 2012 13:50:29 -0600
To: Bernhard Haslhofer <bernhard.haslhofer@cornell.edu>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CABevsUFLGLs4gLbsszockBvfRnMPzyDeJUSJLQGVNFZQf5BJmw@mail.gmail.com>
Thanks for your thoughts Bernhard.

On Tue, Jul 31, 2012 at 10:02 AM, Bernhard Haslhofer
<bernhard.haslhofer@cornell.edu> wrote:

> 1.) Direct Relationship between Annotation and the Source
> "Give me all annotations for resource X", is probably one of the most important queries that needs to be answered. X could be an image URI, the URI of a video, whatever. Since the the Target of an annotation may be a resource with its own dereferencable URI OR a Specific Target with a UUID node, you need to consider this when formulating a query and end up with a SPARQL UNION query or some conditional node traversal code when using an RDF API.
> Technically, it is of course possible to do that, but given the importance of that query, I would argue that the solution is not very intuitive and maybe also not very efficient. I believe that this can easily be be fixed by introducing a direct relationship property (e.g., oa:annotates, oa:hasTargetSource) between the Annotation and the Source resource.


One of the guiding principles is to keep the simple cases simple and
the complex cases possible.  So a proposal that makes the complex
cases slightly easier at the expense of making the simple cases less
simple would violate that principle.
As Dan Brickley has argued, the vast majority of annotations will be
(are!) a simple link between two resources, rather than to a Specific
Resource.  All of those Likes, +1s, Disqus comments, tags, and so
forth do not require a Specific Target, just the URI of the resource
being annotated.

Given that background, let's consider the proposal.

To avoid a Union query, a developer would need to rely on the presence
of a direct relationship between annotation and the full URI for the
target.  In the case of a Specific Target, it would be Annotation ->
Source and in the case of a regular target it would be Annotation ->
Target.  Thus there would need to be two relationships between
annotation and target in the simple case to have this query work,
hasTarget and hasTargetSource.  If the relationship was not present in
the simple case you would still need a Union query.
Unless the relationship was mandatory in all cases, even though it's
redundant, you would still need to do the union query to be certain.
And even if it was mandatory, many developers would likely ignore it.

To spell it out, you would need to make the following mandatory for
all simple annotations:

    _:Anno a oa:Annotation ;
        oa:hasTarget _:Target ;
        oa:hasTargetSource _:Target ;
        oa:hasBody _:Body .

Which to me massively outweighs the benefit in making a particular
technology slightly simpler to use at the server side when building a
repository of annotations.  As already pointed out, this could be added
at ingest time rather than at creation time, for those systems that
could profit from it.  Other systems may not find this difficult, such as
document rather than graph centric indexing.

The take away here is that the specification should definitely have a
SPARQL example of how to get all of the full URIs for the targets of
an annotation. That's a big oversight, and one of the motivating
examples for the SPARQL blocks in the document in the first place!

So -1 from me.  The proposal doesn't alleviate the problem and breaks
a fundamental design principle.  This should probably go into the FAQ.
(Coming soon!)


> 2.) Fragment URIs as Targets
> In our API (the GeoReference part) we followed the OA recommendation and used a Specific Resource and a Fragment Selector to express that a URI annotates an XY point on a raster image. We could express the same information by using W3C Media Fragments and thereby reduce the verbosity and complexity of the resulting serialization. API consumers then don't even need to know about OA-specific "Specific Resources", "Fragment Selectors", etc.
> The Open Annotation model currently does NOT RECOMMEND the use of fragment URIs for identifying segments of Targets or Bodies for three reasons (see 5.2.1):
> - "cannot query the source directly": I think this could and should be solved by considering (1.)

As above, plus you would still need most of the components anyway.
In particular to distinguish the containership you would need some
relationship between the fragment and the resource, recalling that
URIs are to be treated as opaque.    Fragment isPartOf Resource
simply replaces the SpecificResource hasSource Resource.

> - "they are not compatible with State and Style Specifiers; many annotations may have the same segment of interest, but have different States and Styles": from previous emails and discussions I understood that Styles should be directly attached to the Annotation, which also means that that they are contextualized and not an argument against fragment URIs anymore. I think that sth. similar can be done with "State" and would also result in a more consistent model and allow for fragment URIs

Paolo has strongly argued that State and Style are very different
aspects of the model.  Ignoring State would significantly alter the
intent of the annotation.  Many previous annotation efforts have
failed (as per the list compiled by Dan Whaley) due to ignoring
the dynamic nature of web resources, amongst other reasons, and
I don't want to see Open Annotation fail to learn from their mistakes.


> - "Fragment URIs conflate the identity and the description of the segment of interest by including the description inline within the identity": I am not sure if I get the point of this argument right; however, I believe that for very practical reasons the OA model should reuse what other specifications (Web Architecture, Media Fragment RFCs) already define; this brings modularity and flexibility and avoids the risk of re-designing what others already did elsewhere.

The point of the argument is that there are two aspects to a
particular part of a resource, its identity (so it can be referenced)
and its description (so it can be interpreted by a client, rendered
and understood by a user).
Media fragments conflate these two by including a small set of
descriptions into the URI that identifies the segment.

The FragmentSelector was added to reuse the other specifications,
exactly to avoid re-inventing the wheel.  We do not specify the
meaning of xywh= for example, we just reference the Media Fragment
document.

Furthermore, this has already come up on the list as not enough
information, let alone conflating the two together! Not only do you
need the fragment URI, you need to understand how to interpret
the fragment.  Without subclassing (or other descriptive properties)
of FragmentSelector, a client does not know whether xywh= is a
media fragment for spatial dimensions or an HTML anchor.  Please
see Jeni Tennison's recent W3C document about this too:
    http://www.w3.org/TR/fragid-best-practices/
where she demonstrates in Appendix B that an SVG document
can legitimately  have the same fragment identify several different
options!

> I think the benefits of reusing (Media) Fragment URIs in OA prevail the arguments of not using them and therefore I propose to RECOMMEND the use of Fragment URIs and only fall-back on OA-specific Selectors if Fragment URIs are not expressive enough.

You gloss over quite a lot of issues here. To quote the spec:

* They are not expressive enough. The Media Fragment URIs for images,
for example, only allow rectangular regions to be defined, and not
arbitrary paths or even simple circles, and these use cases are
important to support.
* They are not comprehensive enough. Although some media types have
fragment semantics defined, many do not. A solution for these other
types is necessary.
* They are not extensible. Content-type specific fragments are fixed,
and the Media Fragments are not extensible for either other types of
segment description or other types of resource.
* In solving the more difficult cases, solutions for the easier cases
covered by fragment URIs will be defined as part of the process. There
will be multiple competing solutions when one recommendation might
have been sufficient.

So -1 here as well. The consistency of the Selector approach is
important, and the reuse of the Fragment specifications is
accomplished already with Fragment Selectors.


> 3.) Simple Literal Body Shortcut
> I understand that an OA annotation is a relationship between resources (the body and the target) and that inline bodies are represented using the Content in RDF specification (see 6.1.). However, our own demonstrator and also the majority of use cases demonstrated in the OAC meeting last week showed that many annotation bodies are simply strings, which could be represented as literals.
> Therefore I am proposing to introduce a "shortcut" property between the Annotation and the "content" Literal (e.g., hasLiteralBody). This allows people to express simple annotations in a, in my opinion, more straightforward way and doesn't contradict the current oa:hasBody approach.

This has been discussed many times and would be a significant step backwards.

There are several reasons for this design choice:

* Literals have no identity, and thus cannot be referenced. If the
body was a literal, it would be impossible to refer to it directly.
* It is inconsistent with the rest of the model which allows any
resource as a Body or Target, and would thus be a special case just
for text in the body.
* Literals can have language associated with them in RDF using the
@lang construction, and format using the explicit datatyping via ^^,
however there are other aspects of a resource that are important for
interpretation that cannot be associated with a literal. Examples
would include directionality of the text and encoding, plus of course
metadata such as authorship, date and so forth.
* If a server wished to extract the text and make it a resource with
an HTTP URI, as per publishing resources, it would not be possible to
assert equivalence or provenance.
* The cost of just following the specification is minimal, one
additional triple over the literal case, and it avoids either RDFS
punning or checking for multiple relationships with different handling
requirements.

Given your first suggestion, I'm surprised that you think requiring a
UNION query here is appropriate for when some people used the shortcut
and others did the right thing.  Also your argument that existing standards
should be reused argues in favor of the current approach.

This will definitely be part of the FAQ :)

A very strong -1.  I propose that unless there is strong feeling on the list in
general that this particular issue continue to be discussed, we put it aside
permanently  and work on more important aspects of the model rather than
discussing the same thing over and over again.


> 4.) Style Attached directly to the Annotation

So far there have been no complaints about the proposal.

> 5.) JSON (-LD) Serialization Recommendation
> At the moment the spec recommends that RDF/XML is used as default serialization language. We haven't implemented it yet, but I'd consider JSON (-LD) at least as alternate "default" serialization format to open the door for JS clients.

I don't see a problem with this once the JSON-LD spec is finalized.
We've pretty much just been waiting until that happens to recommend
its use.

Rob
Received on Tuesday, 31 July 2012 19:51:00 UTC