Re: Data Model Assumptions from Benjamin Young on 2015-08-18 (public-annotation@w3.org from August 2015)

From: Benjamin Young <bigbluehat@hypothes.is>
Date: Tue, 18 Aug 2015 12:02:47 -0400
To: Doug Schepers <schepers@w3.org>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CAE3H5FL5m2vvmtC0GT=iR_F3ceOjLJu0WriLp8v8QXT+o50Gpw@mail.gmail.com>
On Tue, Aug 18, 2015 at 1:16 AM, Doug Schepers <schepers@w3.org> wrote:

> Hi, folks–
>
> During a conversation with Rob, Frederick, and Ivan, we realized that we
> have different conceptions about what the core of the "data model" is,
> which has led to some of the misunderstandings about what is possible and
> desirable.
>
>
> My idea of the Data Model has always rested on the notion of objects with
> properties, which is informed by my JavaScript background.
>

This is perhaps the key point in the discussion. JSON-LD makes it *look*
like we have a Data Model based on "object with properties" when in fact we
have a "graph-based" Data Model (and always have):
http://www.w3.org/TR/annotation-model/#h2_web-annotation-principles

JSON-LD is an encoding of a graph...and in fact can have multiple (and
drastically different looking) serializations that are in fact all
equivalent--despite the way the "JSON tree" changes in each serialization.
This happens because JSON-LD is not an object tree, but a graph encoding
format.

So...zooming back out.

There's only so much that can be done for the "JSON only" community of
developers without explaining the core of the data model and it's
graph-based foundation. There will be points, like this "multiple-bodies /
roles" conversation where the model's expression will look "strange" to
"JSON only" developers. We should work hard with the JSON-LD community and
others to try and minimize that impact, but at some level it's unavoidable
without throwing out the underlying abstract data model--which would throw
this group's work back to the beginning.

I'm not sure we can put the genie back in the bottle at this point. The
"JSON only" object + properties approach could more simply represent some
of these scenarios, but at the loss of "meaning" and (most likely)
interoperability with the few out there who are implementing this data
model (and the serializations) right now--which, afaik, are the people
active on this mailing list.

I'll surface something separately that may be a way forward with regards to
the blank nodes conversation based on some JSON-LD research.

For now, everyone go re-read those principles, and start separate threads
(please :) ), if there are problems with the *abstract* model or some
reason to abandon it in favor of Something Completely Different. If a good
case can be made at that level, then I think we've got some ground to build
up from. If not, then lets see what the simplest, friendliest JSON-LD
encoding is that we can ask others to make for the multiple-bodies
situation.

Thanks, all!
Benjamin
--
Developer Advocate
http://hypothes.is/


>
> The way I've been thinking about the data model is a set of object with
> child objects and properties, where the properties are name-value pairs:
> * we have an Annotation object, with some properties like id, author,
> timestamp and other provenance properties, and a role/motivation;
> * the Annotation object also one or more child objects of Body or Target
> type:
> ** the Body object has properties like id, type, format, language, and
> value/content
> ** the Target object has properties like id, type, source, and one or more
> Selector objects
> *** the Selector object has properties like id, type, value, and other
> type-specific properties
>
> Thus, it seems perfectly normal that we can add arbitrary properties, or
> even objects, to any of these objects, in order to add information about
> it, or to deliberately move properties from the parent object to one of the
> child objects to change where the level of specificity is defined.
>
> For example, as in the copy-edit use case, if there are two different
> types of Body, one for the replacement text and one for comments or
> explanation about a replacement, moving the role/motivation property from
> the Annotation object to the child Body objects seems reasonable.
>
> This relatively unstructured, self-contained object-property system was
> the full extent of my notion of the data model.
>
> The data model, of course, is separate from the serialization, which could
> be expressed as JSON, JSON-LD, HTML, Turtle, or whatever other format is
> desired.
>
>
> Others in the WG, especially those from the Open Annotation Community
> Group, seem to have an additional set of constraints on top of this
> object-property data model, as RDF or Linked Data. I can't claim to
> understand all the details, but it seems to consist of at least:
> * strong datatyping, with a URI-reference system to type definitions
> * a subject–predicate–object triple "grammar" for the objects and
> properties
> * unusual, but apparently optional, "predicate" names (e.g. "hasBody")
> * a requirement that each object (or subobject) be independently
> addressable on the Web
> * a notation that expresses each name-value property pair as an
> "assertion", where each assertion has a global scope not confined to the
> annotation
> * a peculiar behavior around lists (which I don't really understand)
>
> (Please correct me if I'm wrong.)
>
> The consequence of some combination of these additional constraints seems
> to impose a rigid syntactic/semantic object structure that makes it more
> difficult to express objects with flexible property specificity. This leads
> to an object structure with additional nesting and sets of properties that
> I don't personally find intuitive, and which I suspect other JavaScript
> developers won't either.
>
> Again, the example of the copy-edit use case, with roles/motivations on
> the body, seems to be difficult to express concisely or simply.
>
> That said, structuring the annotation objects this way seems to add some
> ability to parse the annotation through an "RDF reasoner" to help make
> derivative assertions about the annotation body and target, with other
> annotations or data. I am not totally clear on this, but I'm open to the
> idea that this has some important effects.
>
>
> So, by all agreeing that we would start with the Open Annotation Data
> Model as a starting point, we seem to have been agreeing to different
> fundamental understandings of what that data model consists of:
> 1) a nested object-property data model; or
> 2) an RDF triple data model, with all the concomitant constraints.
>
> I hope I've characterized it fairly, and that we can use this shared
> understanding to better discuss what we want and need. If not, I welcome a
> more accurate description of these two data models.
>
>
> With that as the (rough) basis, I'd like to extrapolate a bit.
>
>
> One could reasonably argue that the standardized interchange format
> between annotation applications should be the simplest common set of
> features, perhaps with some low-cost extras that fit nicely and which
> enhance the format in a way that enables the minimum viable product for the
> most prevalent apps. The simple object-based data model I've described
> above is very much in line with that goal; it conveys the necessary
> information that would allow a large number of apps and services to model
> their data for lossless interchange, with a minimum of extra development
> work. Following a design principle like this creates a strong incentive
> towards, and prevents a disincentive against, adoption by vendors.
>
> By contrast, inheriting a set of additional requirements from Linked
> Data/RDF increases the complexity of the model, both in the number and type
> of properties and in the rigidity of the structure of the data. So, as a
> measure of the universality of appeal and ease of adoption, requiring
> Linked Data/RDF is an additional burden that should not be part of the
> simplest possible data model.
>
> However, I'm not going so far as that, for two reasons:
> * There are many existing vendors who do want the features that are
> available (only?) through Linked Data/RDF
> * It's possible that some of these features may add significant value
> above and beyond what the minimum viable data model would include, and thus
> be a more tempting implementation target.
>
> If this is what we as a WG believe, then we should clearly identify and
> communicate what value is added by the addition of these design
> constraints, in a concise, concrete, and compelling explanation. I don't
> believe it's enough to cite conformance to some document of architectural
> principles without describing precisely how these benefits convey at the
> level we're talking about.
>
> In addition, I think we should continue to strive to make the smallest
> possible impact on complexity of understanding (for Web developers) and
> implementation (for vendors). We've taken steps in that direction, and I'd
> like to see that continue.
>
>
> I feel like I'm probably in the minority on some of these views (within
> the WG, not necessarily in the wider developer community), so if anyone
> (inside the WG or outside of it) shares similar notions, I'd appreciate
> hearing from you.
>
> Regards–
> –Doug
>
>
Received on Tuesday, 18 August 2015 16:03:29 UTC