Re: Annotations and the Graph from Jacob Jett on 2015-10-28 (public-annotation@w3.org from October 2015)

From: Jacob Jett <jgjett@gmail.com>
Date: Wed, 28 Oct 2015 12:25:49 -0500
To: Randall Leeds <randall@bleeds.info>
Cc: Robert Sanderson <azaroth42@gmail.com>, Web Annotation <public-annotation@w3.org>
Message-ID: <CABzPtBKUBWcF6BK8QcGAsuJx54L3PW9GDnH0+TEbZdyv3MejXg@mail.gmail.com>
Hi Randall,

Let me see if I can navigate your questions inline below.

On Tue, Oct 27, 2015 at 6:56 PM, Randall Leeds <randall@bleeds.info> wrote:

> On Tue, Oct 27, 2015 at 4:41 PM Robert Sanderson <azaroth42@gmail.com>
> wrote:
>
>>
>> Hi Randall, Jacob,
>>
>> There are three issues with the proposal that I can see.
>>
>> One concern about the proposal is that rdf:Statement reifies a single
>> triple, and an Annotation has significantly more flexibility.  For example,
>> there may be no body to an Annotation, which would result in a statement
>> with no subject.  Or for multiple bodies or multiple targets, there would
>> be multiple subjects or objects respectively.  That's not the intent of
>> rdf:Statement and related predicates, and I would be hard pressed to argue
>> that it was kosher to subclass it.
>>
>
> I did suggest that Annotation could actually 1 or more rdfs:Statement
> associated with it, rather than being an rdfs:Statement itself, but I was
> trying to squeeze it into a shape that would be backward compatible with
> many existing annotations, to make it more palatable.
>

IMO, annotations (and other structured data types) are always backwards
compatible, you just map one shape into another. This can be a lossy
process unless you're allowed to extend with additional vocabs as needed.

More to the point though, since there is no semantic difference (that I can
see) between rdf:Statement or oa:Annotation (as a sub-class of
rdf:Statement) it's hard to predict exactly how an OWL-based reasoner is
going to treat it. As near as I can tell (and I'm not an expert here) one
of at least (there could be more permutations) three possible scenarios
might play out in the reasoner.

Scenario 1: It may simply ignore the sub-class, i.e., all "annotations" are
instances of rdf:Statement. In this world there simply are no annotations.

Scenario 2: It may ignore the super-class (and thereby ignore the rdf and
rdfs vocabularies altogether), i.e., all rdf-triples are instances of
oa:Annotation (essentially replacing rdf with oa) since oa doesn't
replicate all of the rdf and rdfs vocabularies I would think that this
essentially breaks the whole thing because rdfs:Class is the lynchpin upon
which the whole shooting match revolves and that goes away. Your instance
of oa:Annotation is suddenly both an instance of rdfs:Class and not an
instance of rdfs:Class (after all, if it is an instance of rdfs:Class then
there isn't an rdf vocabulary in this case, as it gets replaced by the oa
vocabulary). I think this weirdly gets into a chicken/egg problem and
probably your reasoner loops itself straight to hell or large portions of
your data simply don't work anymore or disappear.

Scenario 3: The reasoner simply can't decide if all rdf-triples are
instances of oa:Annotation or if all instances of oa:Annotation are just
rdf:Statements and so the reasoner simply loops itself straight to hell.

RDF only really works when things are instances of classes of things it
already defines. An rdf-triple is an instance of an rdf:Statement and of no
other thing. The class of rdf:Statements is an instance of rdfs:Class
asserted by the rdf:predicate, rdf:type.

I think maybe you're trying to treat RDF as though it were UML, which it
isn't. Likewise, we often discuss everything about this RDF-based model as
though RDF was a serialization format, which it also isn't. On the whole
the group frequently asks the wrong questions about the model, i.e., "what
does this data do?" Data is data, it just kind of is. The correct question
is "what can I do with this data?"

I'm still not sure I'm convinced that an annotation without a body is a
> useful thing. I know we debated it and decided to include it. I can't
> remember why. It would seem to me any such annotation could have a stub
> body. Or, more likely, that there is no annotation, there is only the
> production of a SpecificResource.
>

This proves my point about the questions. There are numerous use cases
surrounding highlighting, bookmarking, upvoting, etc. that nominally do not
have "bodies" as we traditionally think of them. I follow the rough notion
of "stub body" and frankly I'm of the opinion that if a youtube video can
be a body or an actionable edit can be a body then some actionable CSS can
also be a body (and so I'm somewhat sympathetic with an argument that
annotation's without bodies actually should (and do) have bodies). However,
I'm not too sympathetic with the notion that bodiless annotations are "not
useful" or that they are not annotations at all. Just because one person's
set of use cases doesn't call for them doesn't mean that someone else's
doesn't. Since our standard is a Web standard then it needs to have a broad
range of features that not every implementer is going to use, which in turn
is going to complicate implementation and interoperability but that's the
price we have to pay for a generalized approach.


> You could say that bookmarking is an annotation with the bookmarking
> motivation that has no body. Or you could just say that you have a bookmark
> (the body) and it refers to the target. Whether you even need an annotation
> as a separate resource here is questionable to me.
>

But it's not questionable to others. The question is should the model be
inclusive of use cases (and communities) or exclusive? If the latter then
why is it in the W3C standards process? The beauty (and the high price) of
the Web (and Semantic Web) is that any community can develop data
vocabularies particular to their community's needs. The W3C standards are
supposed to be the backbone that supports those activities and makes it
possible for them work together at various levels. If we discount certain
positions as not worthy of support then I'm not certain how we'll ever
develop a standard that makes that backbone possible.


>
>> Secondly, motivation and predicate are not really the same thing, as has
>> become clearer with the adoption of the role proposal to allow motivations
>> to be associated per body.  For example, if you had a single Annotation
>> with a motivation of bookmarking, and a body with a role of describing,
>> plus a second body with a role of tagging, the use of rdf:predicate seems
>> very difficult to work with.
>>
>>
> It seems easier to me, though difficult to put them all into a single
> annotation. Again, though, here I am confused about what the model is
> attempting to do. It seems like it's trying to double as a simple container
> when really the user could create a container and give it provenance and
> have it contain all the annotations, one which describes and another which
> tags. Or maybe you have a resource "Bookmark" which can have properties for
> tags and descriptions, and the predicate relates the bookmark and the
> annotation.
>

The model is providing an "annotation kind of" container and promulgates
certain metadata specific to the "annotation kind of" container that
support both its findability and distinguishes it from other kinds of
containers.

I get that you have a need for a container with some provenance but, an
annotation would be a sub-class of *that container*. There are other kinds
of containers too, like collections. Collections are much more generic.
There's actually a formalization that defines them quite well:

∀y(∃x isGatheredInto(x,y)↔ Collection(y))

You can read all about it in the following paper: Wickett, K. M., Renear,
A. H., & Furner, J. (2011). Are collections sets? *Proceedings of the
74th ASIS&T Annual Meeting* (New Orleans, LA, 9-13 October 2011).

The important thing is to avoid disenfranchising entire web-going
communities of practice simply because their needs are different.

Why should we avoid instantiating a Bookmark and instead create an
> Annotation that refers to some tags, some text, and some link, and
> complicates the relationship between them?
>

See above, but more generally simply because we're the web annotation group
and we're modeling annotations.

In your example's case it just so happens that the annotation is a
bookmark. Could some other community develop a model particular to
bookmarks? Of course they could, but that's not our job. Our job is to
develop the model of annotations and allow enough affordances for those
communities who believe that bookmarking is one of the roles that an
annotation can play to serialize annotations motivated by bookmarking.


> And finally ... we already have rdf:Statement (and the recommendations
>> against its use) and named graphs. If an Annotation were restricted to just
>> a single statement, I'm not sure that we would need a new specification :)
>>
>
> Thanks for making my point for me.
>

I think I missed the point where he made the point for you. The problem is
that your proposal doesn't actually create a sub-class of rdf:Statement
inasmuch as it simply makes an alias for it. The effects of aliasing
rdf:Statement are uncertain (but are probably calamitous if you don't alias
the rest of RDF). It isn't clear if oa:Annotation inherits the properties
of rdf:Statement or simply replaces it wholecloth vis-a-vis OWL-based
reasoners. This undermines the firmament upon which the model stands for
the Semantic Web side of this working group (and the Semantic Web
community).

I feel like this is a problem for engineers in general and developers in
particular. You're reinventing the wheel by sub-classing. But we already
have plenty of wheels. Just pick one and stick with it. The container here
is the annotation-flavored one, if another container is required then this
may be the wrong standard for your use cases.


> SpecificResource and the selector vocabulary is great and I don't see
> anything that exists quite like that.
>

Here I'm in total agreement with you. I've suggested to Rob in the past
that the Specific Resource / Specifiers generalizes to a broader set of
Web/Semantic Web use cases. Perhaps what is really needed is a more
generalized Web/Semantic Web container specification that exploits and
develops that part of the vocabulary. What is the procedure for proposing
that we spin that part out to a different, more general working/community
group?


> But we have mechanisms to distribute statements with attribution. I don't
> understand what the model adds to that other than, it seems, a way to
> obfuscate the semantics and avoid actually creating triples that relate the
> body and target.
>

It's providing a framework where people who don't agree on the nature of
the relationship between body and target can still agree that body and
target are related and also provide their interpretation of that
relationship through motivation/role. Afterall what you call a comment, I
(rather intractably) call a remark. It prevents a combinatorial explosion
of predicates by dumping that information into an attribute value bucket.

(Note that avoidance of these kinds of combinatorial explosions is the
hallmark of a quality ontology. Examples of ontologies that fail to do
this, e.g., Bibframe, Schema, etc., showcase where a failure to generalize
through the proper exploitation of metaproperties has occurred. In
Bibframe's case, it has many predicates particular to naming (identifier)
or other standards. Since we keep inventing more standards we can safely
expect that Bibframe will continue its expansion forever. We've avoided
that here by providing a framework where everyone agrees that the thing in
question is an annotation and other communities do the work of determining
its precise nature, e.g., is it a bookmark, comment, edit, etc. This is
great division of labor for vocab development but seems to be a major
complicating factor for JSON developers.)


> In a sense, it seems to me like the purpose of much of the model is to
> escape from having to model that which we want to model.
>

In a sense that is correct. We're trying to provide a model that gives us
agnostic annotation-flavored containers. The precise relationships between
the bodies and targets is left for individual communities of practice to
define through the motivation/role attribute value. Rather than mandate
what kinds of annotations there are from on-high, we just give some high
level examples of how they can express this information themselves.

I get that this makes like harder on JSON developers but I'm going to take
an extremely blunt tack and tell you that it's for your own good. The
end-users need to be the community that shapes the structure of the data.
What best captures the idiosyncracies of their intentions and needs. Not,
what is most convenient for us to parse and serialize.

The outcomes of your proposal are completely unknown but the worst part is,
the unknown factor isn't one you need to worry about. You're asking another
community to pay the tab for changes just so that life is more convenient
for your community. That's the worst possible reason to make a proposal.

I apologize to everyone for the soapbox sermon but this working group
sometimes feels like a house divided. If we don't stop playing use case
trumpery and focus more broadly on the cost and benefits for all of the
communities involved, it's hard for me to see how this process will result
in something successful.
Received on Wednesday, 28 October 2015 17:27:02 UTC