Data Model Assumptions

Hi, folks–

During a conversation with Rob, Frederick, and Ivan, we realized that we 
have different conceptions about what the core of the "data model" is, 
which has led to some of the misunderstandings about what is possible 
and desirable.


My idea of the Data Model has always rested on the notion of objects 
with properties, which is informed by my JavaScript background.

The way I've been thinking about the data model is a set of object with 
child objects and properties, where the properties are name-value pairs:
* we have an Annotation object, with some properties like id, author, 
timestamp and other provenance properties, and a role/motivation;
* the Annotation object also one or more child objects of Body or Target 
type:
** the Body object has properties like id, type, format, language, and 
value/content
** the Target object has properties like id, type, source, and one or 
more Selector objects
*** the Selector object has properties like id, type, value, and other 
type-specific properties

Thus, it seems perfectly normal that we can add arbitrary properties, or 
even objects, to any of these objects, in order to add information about 
it, or to deliberately move properties from the parent object to one of 
the child objects to change where the level of specificity is defined.

For example, as in the copy-edit use case, if there are two different 
types of Body, one for the replacement text and one for comments or 
explanation about a replacement, moving the role/motivation property 
from the Annotation object to the child Body objects seems reasonable.

This relatively unstructured, self-contained object-property system was 
the full extent of my notion of the data model.

The data model, of course, is separate from the serialization, which 
could be expressed as JSON, JSON-LD, HTML, Turtle, or whatever other 
format is desired.


Others in the WG, especially those from the Open Annotation Community 
Group, seem to have an additional set of constraints on top of this 
object-property data model, as RDF or Linked Data. I can't claim to 
understand all the details, but it seems to consist of at least:
* strong datatyping, with a URI-reference system to type definitions
* a subject–predicate–object triple "grammar" for the objects and properties
* unusual, but apparently optional, "predicate" names (e.g. "hasBody")
* a requirement that each object (or subobject) be independently 
addressable on the Web
* a notation that expresses each name-value property pair as an 
"assertion", where each assertion has a global scope not confined to the 
annotation
* a peculiar behavior around lists (which I don't really understand)

(Please correct me if I'm wrong.)

The consequence of some combination of these additional constraints 
seems to impose a rigid syntactic/semantic object structure that makes 
it more difficult to express objects with flexible property specificity. 
This leads to an object structure with additional nesting and sets of 
properties that I don't personally find intuitive, and which I suspect 
other JavaScript developers won't either.

Again, the example of the copy-edit use case, with roles/motivations on 
the body, seems to be difficult to express concisely or simply.

That said, structuring the annotation objects this way seems to add some 
ability to parse the annotation through an "RDF reasoner" to help make 
derivative assertions about the annotation body and target, with other 
annotations or data. I am not totally clear on this, but I'm open to the 
idea that this has some important effects.


So, by all agreeing that we would start with the Open Annotation Data 
Model as a starting point, we seem to have been agreeing to different 
fundamental understandings of what that data model consists of:
1) a nested object-property data model; or
2) an RDF triple data model, with all the concomitant constraints.

I hope I've characterized it fairly, and that we can use this shared 
understanding to better discuss what we want and need. If not, I welcome 
a more accurate description of these two data models.


With that as the (rough) basis, I'd like to extrapolate a bit.


One could reasonably argue that the standardized interchange format 
between annotation applications should be the simplest common set of 
features, perhaps with some low-cost extras that fit nicely and which 
enhance the format in a way that enables the minimum viable product for 
the most prevalent apps. The simple object-based data model I've 
described above is very much in line with that goal; it conveys the 
necessary information that would allow a large number of apps and 
services to model their data for lossless interchange, with a minimum of 
extra development work. Following a design principle like this creates a 
strong incentive towards, and prevents a disincentive against, adoption 
by vendors.

By contrast, inheriting a set of additional requirements from Linked 
Data/RDF increases the complexity of the model, both in the number and 
type of properties and in the rigidity of the structure of the data. So, 
as a measure of the universality of appeal and ease of adoption, 
requiring Linked Data/RDF is an additional burden that should not be 
part of the simplest possible data model.

However, I'm not going so far as that, for two reasons:
* There are many existing vendors who do want the features that are 
available (only?) through Linked Data/RDF
* It's possible that some of these features may add significant value 
above and beyond what the minimum viable data model would include, and 
thus be a more tempting implementation target.

If this is what we as a WG believe, then we should clearly identify and 
communicate what value is added by the addition of these design 
constraints, in a concise, concrete, and compelling explanation. I don't 
believe it's enough to cite conformance to some document of 
architectural principles without describing precisely how these benefits 
convey at the level we're talking about.

In addition, I think we should continue to strive to make the smallest 
possible impact on complexity of understanding (for Web developers) and 
implementation (for vendors). We've taken steps in that direction, and 
I'd like to see that continue.


I feel like I'm probably in the minority on some of these views (within 
the WG, not necessarily in the wider developer community), so if anyone 
(inside the WG or outside of it) shares similar notions, I'd appreciate 
hearing from you.

Regards–
–Doug

Received on Tuesday, 18 August 2015 05:17:05 UTC