- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Tue, 18 Aug 2015 11:28:37 -0700
- To: Doug Schepers <schepers@w3.org>
- Cc: W3C Public Annotation List <public-annotation@w3.org>
- Message-ID: <CABevsUEXVtOxnq4b3VN7x8vPgYvxqY_tq-ujMoNi7rVGij=6dA@mail.gmail.com>
On Mon, Aug 17, 2015 at 10:16 PM, Doug Schepers <schepers@w3.org> wrote: > My idea of the Data Model has always rested on the notion of objects with > properties, which is informed by my JavaScript background. > I admit to being confused as to where this notion comes from. * The charter explicitly states that the data model "will start from the Open Annotation Data Model". * That data model, in section 1.1 Aims of the Model, clearly states the relationship to RDF and the approach taken: A single, consistent model that can be used by all interested parties is the goal of the standardization process. The number of RDF triples required or bytes needed for serializations, while a consideration, is less important than the coherency of the model. All efforts are made to keep the implementation costs for both producers and consumers to a minimum. A single method of fulfilling a use case is strongly preferred over multiple methods, unless there are existing standards that need to be accommodated or there is a significant cost associated with a method that is otherwise necessary. * This Working Group, at Doug's suggestion, removed all of the RDF / Linked Data language from the specification, such as can be seen in the equivalent section of our FPWD: The Web Annotation Data Model is a single, consistent model that can be used by all interested parties. All efforts have been made to keep the implementation costs for both producers and consumers to a minimum. A single method of fulfilling a use case is strongly preferred over multiple methods, unless there are existing standards that need to be accommodated or there is a significant cost associated with a method that is otherwise necessary. But has not introduced any further model. * There is a section on Principles in the model that lays out the abstract data model. It asserts, without reference to RDF or linked data, the fundamentals of the model. It does not say there are objects and properties either. * To my recollection, and happy to be proven wrong, we have never discussed a model that is more abstract than, or even just different from, what is already laid out in the annotation-model specification. Unless there is something that I'm missing or forgetting? Please do provide pointers to any discussions or documents that suggest this object/property model. Others in the WG, especially those from the Open Annotation Community > Group, seem to have an additional set of constraints on top of this > object-property data model, as RDF or Linked Data. As we currently do not have this notion of an object/property based data model, I don't think we're adding constraints on top of it. The consequence of some combination of these additional constraints seems > to impose a rigid syntactic/semantic object structure that makes it more > difficult to express objects with flexible property specificity. It leads to a consistent, coherent model where developers can be confident that they can write code against a structure that will meet all of their needs, rather than having to write many little tests to see which of the myriad of possibilities each particular annotation is using. > This leads to an object structure with additional nesting and sets of > properties that I don't personally find intuitive, and which I suspect > other JavaScript developers won't either. > And there are a lot of JavaScript developers who are perfectly happy with it too. And a lot of non JavaScript developers beyond that. Again, the example of the copy-edit use case, with roles/motivations on the > body, seems to be difficult to express concisely or simply. > I understand that you don't find this structure simple: { "body": { "role": "commenting", "source": { "value": "A comment" } } } Which is as complex as it gets, even with the most restrictive proposal (mine). Note that we have to allow it regardless of whether we also allow other patterns, as for external resources, with segments we would have: { "body": { "role": "commenting", "selector": { ...}, "source": "http://some.url/" } } If we use Tim's proposal to also allow role on Embedded Content when it is used as a Body: { "body": { "type": "Embedded", "role": "commenting", "value": "A comment" } } If there is something more simple and intuitive than even this, I strongly invite you to suggest it. > That said, structuring the annotation objects this way seems to add some > ability to parse the annotation through an "RDF reasoner" to help make > derivative assertions about the annotation body and target, with other > annotations or data. I am not totally clear on this, but I'm open to the > idea that this has some important effects. > You can parse the annotation with any of the many standards-based parsers, in a large number of languages, including JavaScript. We do not require any reasoning or inference, even as simple as sub-classes / sub-properties. If we *did* require this, we would not have the current role issue at all, as we would just use sub properties of hasBody. The serialization in JSON-LD would then become: { "comment": "A comment" } But clients would not know that comment was a body. The number of roles across different communities is prohibitively large to specify or take into account in a non reasoner based system, and hence the use of Motivations. > The simple object-based data model I've described above is very much in > line with that goal; it conveys the necessary information that would allow > a large number of apps and services to model their data for lossless > interchange, with a minimum of extra development work. Following a design > principle like this creates a strong incentive towards, and prevents a > disincentive against, adoption by vendors. > I look forwards to seeing a proposal of a simple, intuitive and lossless serialization format that is somehow significantly different to the above structures. > By contrast, inheriting a set of additional requirements from Linked > Data/RDF increases the complexity of the model, both in the number and type > of properties and in the rigidity of the structure of the data. A predictable structure for data rather than a soup of triples that developers must fish around in for information is actually a strong feature of our work, not a bug. We could very easily loosen the requirements and make interoperability significantly harder for everyone. > So, as a measure of the universality of appeal and ease of adoption, > requiring Linked Data/RDF is an additional burden that should not be part > of the simplest possible data model. > -1 However, I'm not going so far as that, for two reasons: > * There are many existing vendors who do want the features that are > available (only?) through Linked Data/RDF > * It's possible that some of these features may add significant value > above and beyond what the minimum viable data model would include, and thus > be a more tempting implementation target. > > If this is what we as a WG believe, then we should clearly identify and > communicate what value is added by the addition of these design > constraints, in a concise, concrete, and compelling explanation. +1. As Ivan, Stian, Jacob, Raphael and Benjamin have already said, there are two primary drivers: * External Integration We do not know how annotations will be used by different systems. The use by ebook readers, either online or not, by browsers, by different communities, by existing and novel applications, will all have different structures -in which- annotations are managed. The advantage of keeping protocol and model separate is that we explicitly allow interoperability between those systems without mandating specific interactions between client and server. As an example: IDPF needs annotation collections with strong metadata, such that those collections of annotations can be managed and even sold by vendors. We are not going to be able to meet all of those requirements, nor should we expect to. With a pure JSON format, this would not be possible in a coherent way, other than what amounts to cut and pasting. * Managed Extensibility Extensibility, as also brought up by Dinesh, would be a completely chaotic free-for-all without some overarching framework that specifies how the different communities and applications can add their own needed features. Imagine if everyone just added new HTML tags at will. Without the mapping to uniquely identified properties (in @context), there would be no way to distinguish between two different communities using the same key for different purposes. As an example: IIIF needs to be able to associate dynamic services with the image resources either annotated, or used as the body of annotations, to allow rich client interfaces to zoom and pan around those very high resolution images. Without being able to define (or in fact re-use) the notion of a service associated with a resource, IIIF would either require this to be part of the basic annotation model (which would be inappropriate), to simply throw the information into the JSON and hope it doesn't collide with other "service" keys elsewhere, or to not use the model at all. * And, in my opinion, building on the work of previous groups puts us in a much stronger position for success than abandoning all previous work and constructing an annotation specific abstract model, vocabulary, serialization and protocol. Where each requires documentation, implementation and testing. If there is, actually, existing work that we would be building on for this new model, please do let us know so it can be evaluated. * Further, there is difficulty in implementing all sorts of specifications, for example HTTP. I don't expect that we'll abandon that specification, however. Why not? I expect, please correct me if I'm wrong, it's because there are implementations already available that mean developers do not need to worry about the details, and can just do something like: html = requests.get("http://cnn.com/") and get back a representation of that web page. Before the end of this working group's process, there will be multiple implementations with tests for the features in the specification. At that point, that there are libraries available for developers will also be true for Annotations. Thus the simplicity for developers issue will be solved by having a rigorous and consistent model, with a well implemented and tested API that exposes the annotation's information to developers in a useful and as-flexible-as-needed way. Rob -- Rob Sanderson Information Standards Advocate Digital Library Systems and Services Stanford, CA 94305
Received on Tuesday, 18 August 2015 18:29:12 UTC