Re: Data Model Assumptions from Robert Sanderson on 2015-08-18 (public-annotation@w3.org from August 2015)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Tue, 18 Aug 2015 11:28:37 -0700
To: Doug Schepers <schepers@w3.org>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CABevsUEXVtOxnq4b3VN7x8vPgYvxqY_tq-ujMoNi7rVGij=6dA@mail.gmail.com>
On Mon, Aug 17, 2015 at 10:16 PM, Doug Schepers <schepers@w3.org> wrote:

> My idea of the Data Model has always rested on the notion of objects with
> properties, which is informed by my JavaScript background.
>

I admit to being confused as to where this notion comes from.

* The charter explicitly states that the data model "will start from the
Open Annotation Data Model".

* That data model, in section 1.1 Aims of the Model, clearly states the
relationship to RDF and the approach taken:

A single, consistent model that can be used by all interested parties is
the goal of the standardization process. The number of RDF triples required
or bytes needed for serializations, while a consideration, is less
important than the coherency of the model. All efforts are made to keep the
implementation costs for both producers and consumers to a minimum. A
single method of fulfilling a use case is strongly preferred over multiple
methods, unless there are existing standards that need to be accommodated
or there is a significant cost associated with a method that is otherwise
necessary.


* This Working Group, at Doug's suggestion, removed all of the RDF / Linked
Data language from the specification, such as can be seen in the equivalent
section of our FPWD:

The Web Annotation Data Model is a single, consistent model that can be
used by all interested parties. All efforts have been made to keep the
implementation costs for both producers and consumers to a minimum. A
single method of fulfilling a use case is strongly preferred over multiple
methods, unless there are existing standards that need to be accommodated
or there is a significant cost associated with a method that is otherwise
necessary.


But has not introduced any further model.

* There is a section on Principles in the model that lays out the abstract
data model. It asserts, without reference to RDF or linked data, the
fundamentals of the model.  It does not say there are objects and
properties either.

* To my recollection, and happy to be proven wrong, we have never discussed
a model that is more abstract than, or even just different from, what is
already laid out in the annotation-model specification.  Unless there is
something that I'm missing or forgetting? Please do provide pointers to any
discussions or documents that suggest this object/property model.


Others in the WG, especially those from the Open Annotation Community
> Group, seem to have an additional set of constraints on top of this
> object-property data model, as RDF or Linked Data.


As we currently do not have this notion of an object/property based data
model, I don't think we're adding constraints on top of it.


The consequence of some combination of these additional constraints seems
> to impose a rigid syntactic/semantic object structure that makes it more
> difficult to express objects with flexible property specificity.


It leads to a consistent, coherent model where developers can be confident
that they can write code against a structure that will meet all of their
needs, rather than having to write many little tests to see which of the
myriad of possibilities each particular annotation is using.


> This leads to an object structure with additional nesting and sets of
> properties that I don't personally find intuitive, and which I suspect
> other JavaScript developers won't either.
>

And there are a lot of JavaScript developers who are perfectly happy with
it too.  And a lot of non JavaScript developers beyond that.


Again, the example of the copy-edit use case, with roles/motivations on the
> body, seems to be difficult to express concisely or simply.
>

I understand that you don't find this structure simple:

{
  "body": {
    "role": "commenting",
    "source": {
      "value": "A comment"
    }
  }
}

Which is as complex as it gets, even with the most restrictive proposal
(mine).
Note that we have to allow it regardless of whether we also allow other
patterns, as for external resources, with segments we would have:

{
  "body": {
    "role": "commenting",
    "selector": { ...},
    "source": "http://some.url/"
  }
}

If we use Tim's proposal to also allow role on Embedded Content when it is
used as a Body:

{
  "body": {
    "type": "Embedded",
    "role": "commenting",
    "value": "A comment"
  }
}

If there is something more simple and intuitive than even this, I strongly
invite you to suggest it.



> That said, structuring the annotation objects this way seems to add some
> ability to parse the annotation through an "RDF reasoner" to help make
> derivative assertions about the annotation body and target, with other
> annotations or data. I am not totally clear on this, but I'm open to the
> idea that this has some important effects.
>

You can parse the annotation with any of the many standards-based parsers,
in a large number of languages, including JavaScript.  We do not require
any reasoning or inference, even as simple as sub-classes /
sub-properties.  If we *did* require this, we would not have the current
role issue at all, as we would just use sub properties of hasBody.  The
serialization in JSON-LD would then become:

{
  "comment": "A comment"
}

But clients would not know that comment was a body.  The number of roles
across different communities is prohibitively large to specify or take into
account in a non reasoner based system, and hence the use of Motivations.



> The simple object-based data model I've described above is very much in
> line with that goal; it conveys the necessary information that would allow
> a large number of apps and services to model their data for lossless
> interchange, with a minimum of extra development work. Following a design
> principle like this creates a strong incentive towards, and prevents a
> disincentive against, adoption by vendors.
>

I look forwards to seeing a proposal of a simple, intuitive and lossless
serialization format that is somehow significantly different to the above
structures.



> By contrast, inheriting a set of additional requirements from Linked
> Data/RDF increases the complexity of the model, both in the number and type
> of properties and in the rigidity of the structure of the data.


A predictable structure for data rather than a soup of triples that
developers must fish around in for information is actually a strong feature
of our work, not a bug.  We could very easily loosen the requirements and
make interoperability significantly harder for everyone.



> So, as a measure of the universality of appeal and ease of adoption,
> requiring Linked Data/RDF is an additional burden that should not be part
> of the simplest possible data model.
>

-1


However, I'm not going so far as that, for two reasons:
> * There are many existing vendors who do want the features that are
> available (only?) through Linked Data/RDF
> * It's possible that some of these features may add significant value
> above and beyond what the minimum viable data model would include, and thus
> be a more tempting implementation target.
>
> If this is what we as a WG believe, then we should clearly identify and
> communicate what value is added by the addition of these design
> constraints, in a concise, concrete, and compelling explanation.


+1.

As Ivan, Stian, Jacob, Raphael and Benjamin have already said, there are
two primary drivers:

* External Integration
    We do not know how annotations will be used by different systems.  The
use by ebook readers, either online or not, by browsers, by different
communities, by existing and novel applications, will all have different
structures -in which- annotations are managed.  The advantage of keeping
protocol and model separate is that we explicitly allow interoperability
between those systems without mandating specific interactions between
client and server.

As an example: IDPF needs annotation collections with strong metadata, such
that those collections of annotations can be managed and even sold by
vendors.  We are not going to be able to meet all of those requirements,
nor should we expect to.  With a pure JSON format, this would not be
possible in a coherent way, other than what amounts to cut and pasting.


* Managed Extensibility
   Extensibility, as also brought up by Dinesh, would be a completely
chaotic free-for-all without some overarching framework that specifies how
the different communities and applications can add their own needed
features.  Imagine if everyone just added new HTML tags at will.  Without
the mapping to uniquely identified properties (in @context), there would be
no way to distinguish between two different communities using the same key
for different purposes.

As an example:  IIIF needs to be able to associate dynamic services with
the image resources either annotated, or used as the body of annotations,
to allow rich client interfaces to zoom and pan around those very high
resolution images.  Without being able to define (or in fact re-use) the
notion of a service associated with a resource, IIIF would either require
this to be part of the basic annotation model (which would be
inappropriate), to simply throw the information into the JSON and hope it
doesn't collide with other "service" keys elsewhere, or to not use the
model at all.


* And, in my opinion, building on the work of previous groups puts us in a
much stronger position for success than abandoning all previous work and
constructing an annotation specific abstract model, vocabulary,
serialization and protocol. Where each requires documentation,
implementation and testing.  If there is, actually, existing work that we
would be building on for this new model, please do let us know so it can be
evaluated.


* Further, there is difficulty in implementing all sorts of specifications,
for example HTTP. I don't expect that we'll abandon that specification,
however. Why not? I expect, please correct me if I'm wrong, it's because
there are implementations already available that mean developers do not
need to worry about the details, and can just do something like:

html = requests.get("http://cnn.com/")

and get back a representation of that web page.

Before the end of this working group's process, there will be multiple
implementations with tests for the features in the specification.  At that
point, that there are libraries available for developers will also be true
for Annotations.  Thus the simplicity for developers issue will be solved
by having a rigorous and consistent model, with a well implemented and
tested API that exposes the annotation's information to developers in a
useful and as-flexible-as-needed way.


Rob

-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305
Received on Tuesday, 18 August 2015 18:29:12 UTC