Re: Data Model Assumptions from Benjamin Young on 2015-08-18 (public-annotation@w3.org from August 2015)

From: Benjamin Young <bigbluehat@hypothes.is>
Date: Tue, 18 Aug 2015 17:01:05 -0400
To: Doug Schepers <schepers@w3.org>
Cc: Robert Sanderson <azaroth42@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CAE3H5FJmQwE2y1kr0t52nU24CZJ4rjmLzGdjecfG4OEkQ6f1nw@mail.gmail.com>
On Tue, Aug 18, 2015 at 2:54 PM, Doug Schepers <schepers@w3.org> wrote:

> Hi, Rob–
>
> Perhaps it's my fault for not truly understanding the implications of
> using the OA Data Model.
>
> What I'm saying is that when we agreed to start with the OA Data Model, I
> thought:
>
> "We're going to be using an object-property model expressed in some
> JSON-like structure with some specific terms, and a vocabulary that's
> defined in @context metadata."
>
> While you thought:
>
> "We're going to be using an RDF graph structure expressed in JSON-LD, with
> specific terms and also specific constraints around how the data is
> structured."
>
> I'm not trying to turn this into a he-said-she-said, or introduce
> different concepts, or say that one approach is superior or inferior.
>
> I'm simply trying to explain the source of some of our fundamental
> disagreements. I'm sure several people had your understanding, and several
> people had mine. I'm hoping that by making that distinction explicit, we
> can move forward.
>

Regardless of the misunderstanding, confusion, etc, there seems to be a
need to put back some of the graph related language into the "Aims of the
Model" section.

The model never has been an "object-property" model. JSON-LD doesn't
express one (despite what it looks like at times). It would be a change in
course (at this point) to redirect away from the graph-based model which
has been in use since the beginning and toward an "object-property" model.

If there are others on this list who got this impression--that the model is
*not* graph-based--please speak up.

We obviously need to do more to clear up that misconception while in tandem
solving this multiple-bodies problem.

Doug, if you could look through the "Aims of the Model" and more of the
model's introduction to see what could be made clearer to:
a) point out what's actually going on
b) not "scare" too many people ;)
...that'd be great.

We don't want this confusion to happen again, and the wider the audience
gets, the more likely it will be to happen, so your help here would be
super.


>
> I'm still hopeful that we can come up with a structure that fits both the
> JSON and JSON-LD mindset.
>

Given Rob's recent posts, I think we may be getting quite close.

At this point, we should probably surface some new threads with clear
examples for single user stories, and then pick them to pieces on their own
merits (ideally with some code ;) ).

Thanks for being here everyone,
Benjamin
--
Developer Advocate
http://hypothes.is/


>
> Regards–
> –Doug
>
> On 8/18/15 2:28 PM, Robert Sanderson wrote:
>
>>
>>
>> On Mon, Aug 17, 2015 at 10:16 PM, Doug Schepers <schepers@w3.org
>> <mailto:schepers@w3.org>> wrote:
>>
>>     My idea of the Data Model has always rested on the notion of objects
>>     with properties, which is informed by my JavaScript background.
>>
>>
>> I admit to being confused as to where this notion comes from.
>>
>> * The charter explicitly states that the data model "will start from the
>> Open Annotation Data Model".
>>
>> * That data model, in section 1.1 Aims of the Model, clearly states the
>> relationship to RDF and the approach taken:
>>
>>     A single, consistent model that can be used by all interested
>>     parties is the goal of the standardization process. The number of
>>     RDF triples required or bytes needed for serializations, while a
>>     consideration, is less important than the coherency of the model.
>>     All efforts are made to keep the implementation costs for both
>>     producers and consumers to a minimum. A single method of fulfilling
>>     a use case is strongly preferred over multiple methods, unless there
>>     are existing standards that need to be accommodated or there is a
>>     significant cost associated with a method that is otherwise necessary.
>>
>> * This Working Group, at Doug's suggestion, removed all of the RDF /
>> Linked Data language from the specification, such as can be seen in the
>> equivalent section of our FPWD:
>>
>>     The Web Annotation Data Model is a single, consistent model that can
>>     be used by all interested parties. All efforts have been made to
>>     keep the implementation costs for both producers and consumers to a
>>     minimum. A single method of fulfilling a use case is strongly
>>     preferred over multiple methods, unless there are existing standards
>>     that need to be accommodated or there is a significant cost
>>     associated with a method that is otherwise necessary.
>>
>>
>> But has not introduced any further model.
>>
>> * There is a section on Principles in the model that lays out the
>> abstract data model. It asserts, without reference to RDF or linked
>> data, the fundamentals of the model.  It does not say there are objects
>> and properties either.
>>
>> * To my recollection, and happy to be proven wrong, we have never
>> discussed a model that is more abstract than, or even just different
>> from, what is already laid out in the annotation-model specification.
>> Unless there is something that I'm missing or forgetting? Please do
>> provide pointers to any discussions or documents that suggest this
>> object/property model.
>>
>>
>>     Others in the WG, especially those from the Open Annotation
>>     Community Group, seem to have an additional set of constraints on
>>     top of this object-property data model, as RDF or Linked Data.
>>
>>
>> As we currently do not have this notion of an object/property based data
>> model, I don't think we're adding constraints on top of it.
>>
>>
>>     The consequence of some combination of these additional constraints
>>     seems to impose a rigid syntactic/semantic object structure that
>>     makes it more difficult to express objects with flexible property
>>     specificity.
>>
>>
>> It leads to a consistent, coherent model where developers can be
>> confident that they can write code against a structure that will meet
>> all of their needs, rather than having to write many little tests to see
>> which of the myriad of possibilities each particular annotation is using.
>>
>>     This leads to an object structure with additional nesting and sets
>>     of properties that I don't personally find intuitive, and which I
>>     suspect other JavaScript developers won't either.
>>
>>
>> And there are a lot of JavaScript developers who are perfectly happy
>> with it too.  And a lot of non JavaScript developers beyond that.
>>
>>
>>     Again, the example of the copy-edit use case, with roles/motivations
>>     on the body, seems to be difficult to express concisely or simply.
>>
>>
>> I understand that you don't find this structure simple:
>>
>> {
>>    "body": {
>>      "role": "commenting",
>>      "source": {
>>        "value": "A comment"
>>      }
>>    }
>> }
>>
>> Which is as complex as it gets, even with the most restrictive proposal
>> (mine).
>> Note that we have to allow it regardless of whether we also allow other
>> patterns, as for external resources, with segments we would have:
>>
>> {
>>    "body": {
>>      "role": "commenting",
>>      "selector": { ...},
>>      "source": "http://some.url/"
>>    }
>> }
>>
>> If we use Tim's proposal to also allow role on Embedded Content when it
>> is used as a Body:
>>
>> {
>>    "body": {
>>      "type": "Embedded",
>>      "role": "commenting",
>>      "value": "A comment"
>>    }
>> }
>>
>> If there is something more simple and intuitive than even this, I
>> strongly invite you to suggest it.
>>
>>
>>     That said, structuring the annotation objects this way seems to add
>>     some ability to parse the annotation through an "RDF reasoner" to
>>     help make derivative assertions about the annotation body and
>>     target, with other annotations or data. I am not totally clear on
>>     this, but I'm open to the idea that this has some important effects.
>>
>>
>> You can parse the annotation with any of the many standards-based
>> parsers, in a large number of languages, including JavaScript.  We do
>> not require any reasoning or inference, even as simple as sub-classes /
>> sub-properties.  If we *did* require this, we would not have the current
>> role issue at all, as we would just use sub properties of hasBody.  The
>> serialization in JSON-LD would then become:
>>
>> {
>>    "comment": "A comment"
>> }
>>
>> But clients would not know that comment was a body.  The number of roles
>> across different communities is prohibitively large to specify or take
>> into account in a non reasoner based system, and hence the use of
>> Motivations.
>>
>>
>>     The simple object-based data model I've described above is very much
>>     in line with that goal; it conveys the necessary information that
>>     would allow a large number of apps and services to model their data
>>     for lossless interchange, with a minimum of extra development work.
>>     Following a design principle like this creates a strong incentive
>>     towards, and prevents a disincentive against, adoption by vendors.
>>
>>
>> I look forwards to seeing a proposal of a simple, intuitive and lossless
>> serialization format that is somehow significantly different to the
>> above structures.
>>
>>     By contrast, inheriting a set of additional requirements from Linked
>>     Data/RDF increases the complexity of the model, both in the number
>>     and type of properties and in the rigidity of the structure of the
>> data.
>>
>>
>> A predictable structure for data rather than a soup of triples that
>> developers must fish around in for information is actually a strong
>> feature of our work, not a bug.  We could very easily loosen the
>> requirements and make interoperability significantly harder for everyone.
>>
>>     So, as a measure of the universality of appeal and ease of adoption,
>>     requiring Linked Data/RDF is an additional burden that should not be
>>     part of the simplest possible data model.
>>
>>
>> -1
>>
>>     However, I'm not going so far as that, for two reasons:
>>     * There are many existing vendors who do want the features that are
>>     available (only?) through Linked Data/RDF
>>     * It's possible that some of these features may add significant
>>     value above and beyond what the minimum viable data model would
>>     include, and thus be a more tempting implementation target.
>>
>>     If this is what we as a WG believe, then we should clearly identify
>>     and communicate what value is added by the addition of these design
>>     constraints, in a concise, concrete, and compelling explanation.
>>
>>
>> +1.
>>
>> As Ivan, Stian, Jacob, Raphael and Benjamin have already said, there are
>> two primary drivers:
>>
>> * External Integration
>>      We do not know how annotations will be used by different systems.
>> The use by ebook readers, either online or not, by browsers, by
>> different communities, by existing and novel applications, will all have
>> different structures -in which- annotations are managed.  The advantage
>> of keeping protocol and model separate is that we explicitly allow
>> interoperability between those systems without mandating specific
>> interactions between client and server.
>>
>> As an example: IDPF needs annotation collections with strong metadata,
>> such that those collections of annotations can be managed and even sold
>> by vendors.  We are not going to be able to meet all of those
>> requirements, nor should we expect to.  With a pure JSON format, this
>> would not be possible in a coherent way, other than what amounts to cut
>> and pasting.
>>
>>
>> * Managed Extensibility
>>     Extensibility, as also brought up by Dinesh, would be a completely
>> chaotic free-for-all without some overarching framework that specifies
>> how the different communities and applications can add their own needed
>> features.  Imagine if everyone just added new HTML tags at will.
>> Without the mapping to uniquely identified properties (in @context),
>> there would be no way to distinguish between two different communities
>> using the same key for different purposes.
>>
>> As an example:  IIIF needs to be able to associate dynamic services with
>> the image resources either annotated, or used as the body of
>> annotations, to allow rich client interfaces to zoom and pan around
>> those very high resolution images.  Without being able to define (or in
>> fact re-use) the notion of a service associated with a resource, IIIF
>> would either require this to be part of the basic annotation model
>> (which would be inappropriate), to simply throw the information into the
>> JSON and hope it doesn't collide with other "service" keys elsewhere, or
>> to not use the model at all.
>>
>>
>> * And, in my opinion, building on the work of previous groups puts us in
>> a much stronger position for success than abandoning all previous work
>> and constructing an annotation specific abstract model, vocabulary,
>> serialization and protocol. Where each requires documentation,
>> implementation and testing.  If there is, actually, existing work that
>> we would be building on for this new model, please do let us know so it
>> can be evaluated.
>>
>>
>> * Further, there is difficulty in implementing all sorts of
>> specifications, for example HTTP. I don't expect that we'll abandon that
>> specification, however. Why not? I expect, please correct me if I'm
>> wrong, it's because there are implementations already available that
>> mean developers do not need to worry about the details, and can just do
>> something like:
>>
>> html = requests.get("http://cnn.com/")
>>
>> and get back a representation of that web page.
>>
>> Before the end of this working group's process, there will be multiple
>> implementations with tests for the features in the specification.  At
>> that point, that there are libraries available for developers will also
>> be true for Annotations.  Thus the simplicity for developers issue will
>> be solved by having a rigorous and consistent model, with a well
>> implemented and tested API that exposes the annotation's information to
>> developers in a useful and as-flexible-as-needed way.
>>
>>
>> Rob
>>
>> --
>> Rob Sanderson
>> Information Standards Advocate
>> Digital Library Systems and Services
>> Stanford, CA 94305
>>
>
>
Received on Tuesday, 18 August 2015 21:01:34 UTC