- From: Doug Schepers <schepers@w3.org>
- Date: Tue, 18 Aug 2015 01:16:50 -0400
- To: W3C Public Annotation List <public-annotation@w3.org>
Hi, folks– During a conversation with Rob, Frederick, and Ivan, we realized that we have different conceptions about what the core of the "data model" is, which has led to some of the misunderstandings about what is possible and desirable. My idea of the Data Model has always rested on the notion of objects with properties, which is informed by my JavaScript background. The way I've been thinking about the data model is a set of object with child objects and properties, where the properties are name-value pairs: * we have an Annotation object, with some properties like id, author, timestamp and other provenance properties, and a role/motivation; * the Annotation object also one or more child objects of Body or Target type: ** the Body object has properties like id, type, format, language, and value/content ** the Target object has properties like id, type, source, and one or more Selector objects *** the Selector object has properties like id, type, value, and other type-specific properties Thus, it seems perfectly normal that we can add arbitrary properties, or even objects, to any of these objects, in order to add information about it, or to deliberately move properties from the parent object to one of the child objects to change where the level of specificity is defined. For example, as in the copy-edit use case, if there are two different types of Body, one for the replacement text and one for comments or explanation about a replacement, moving the role/motivation property from the Annotation object to the child Body objects seems reasonable. This relatively unstructured, self-contained object-property system was the full extent of my notion of the data model. The data model, of course, is separate from the serialization, which could be expressed as JSON, JSON-LD, HTML, Turtle, or whatever other format is desired. Others in the WG, especially those from the Open Annotation Community Group, seem to have an additional set of constraints on top of this object-property data model, as RDF or Linked Data. I can't claim to understand all the details, but it seems to consist of at least: * strong datatyping, with a URI-reference system to type definitions * a subject–predicate–object triple "grammar" for the objects and properties * unusual, but apparently optional, "predicate" names (e.g. "hasBody") * a requirement that each object (or subobject) be independently addressable on the Web * a notation that expresses each name-value property pair as an "assertion", where each assertion has a global scope not confined to the annotation * a peculiar behavior around lists (which I don't really understand) (Please correct me if I'm wrong.) The consequence of some combination of these additional constraints seems to impose a rigid syntactic/semantic object structure that makes it more difficult to express objects with flexible property specificity. This leads to an object structure with additional nesting and sets of properties that I don't personally find intuitive, and which I suspect other JavaScript developers won't either. Again, the example of the copy-edit use case, with roles/motivations on the body, seems to be difficult to express concisely or simply. That said, structuring the annotation objects this way seems to add some ability to parse the annotation through an "RDF reasoner" to help make derivative assertions about the annotation body and target, with other annotations or data. I am not totally clear on this, but I'm open to the idea that this has some important effects. So, by all agreeing that we would start with the Open Annotation Data Model as a starting point, we seem to have been agreeing to different fundamental understandings of what that data model consists of: 1) a nested object-property data model; or 2) an RDF triple data model, with all the concomitant constraints. I hope I've characterized it fairly, and that we can use this shared understanding to better discuss what we want and need. If not, I welcome a more accurate description of these two data models. With that as the (rough) basis, I'd like to extrapolate a bit. One could reasonably argue that the standardized interchange format between annotation applications should be the simplest common set of features, perhaps with some low-cost extras that fit nicely and which enhance the format in a way that enables the minimum viable product for the most prevalent apps. The simple object-based data model I've described above is very much in line with that goal; it conveys the necessary information that would allow a large number of apps and services to model their data for lossless interchange, with a minimum of extra development work. Following a design principle like this creates a strong incentive towards, and prevents a disincentive against, adoption by vendors. By contrast, inheriting a set of additional requirements from Linked Data/RDF increases the complexity of the model, both in the number and type of properties and in the rigidity of the structure of the data. So, as a measure of the universality of appeal and ease of adoption, requiring Linked Data/RDF is an additional burden that should not be part of the simplest possible data model. However, I'm not going so far as that, for two reasons: * There are many existing vendors who do want the features that are available (only?) through Linked Data/RDF * It's possible that some of these features may add significant value above and beyond what the minimum viable data model would include, and thus be a more tempting implementation target. If this is what we as a WG believe, then we should clearly identify and communicate what value is added by the addition of these design constraints, in a concise, concrete, and compelling explanation. I don't believe it's enough to cite conformance to some document of architectural principles without describing precisely how these benefits convey at the level we're talking about. In addition, I think we should continue to strive to make the smallest possible impact on complexity of understanding (for Web developers) and implementation (for vendors). We've taken steps in that direction, and I'd like to see that continue. I feel like I'm probably in the minority on some of these views (within the WG, not necessarily in the wider developer community), so if anyone (inside the WG or outside of it) shares similar notions, I'd appreciate hearing from you. Regards– –Doug
Received on Tuesday, 18 August 2015 05:17:05 UTC