- From: Jacob Jett <jjett2@illinois.edu>
- Date: Mon, 27 Oct 2014 12:04:41 -0500
- To: Robert Sanderson <azaroth42@gmail.com>
- Cc: Paolo Ciccarese <paolo.ciccarese@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>
- Message-ID: <CABzPtBLtvfpgGsZmrx9YV6=QrHwJsnHN+YJQvU58dCDYWkdMNg@mail.gmail.com>
Some examples of various kinds of composites: During the pre-cg years of the standard's development, we gathered a number of use cases from scholars working with digitized emblem books here at Illinois. One of their chief use cases was the desire to juxtapose emblems and annotate the juxtoposition - i.e., their (body) content was about 2 targets physically arranged in a specific way without which arrangement the content of the annotation no longer made sense. In a very real sense the composite in this case is some new resource that is comprised of some other resources with the intention that they be presented to the end user in a very specific arrangement. In this case the annotation is not about the two resources that necessary for the rendered end product -- it is about that end product. To use a chemistry analogy, the annotation is about water but if I only have identifiers for hydrogen and oxygen then using the methodology outlined above my annotation would be: { "@type" : rdf:List, "item" : [hydrogen, oxygen, oxygen] } I put it to you that [hydrogen, oxygen, oxygen] is just not the same as water. Another example is collections. Frequently computer scientists treat collections as though they were lists, but a great deal of scholarly effort (see the following just to get started: Currall, J., Moss, M., & Stuart, S. (2004). What is a collection? *Archivaria* *58*, 131-146. Lynch, C. (2002). Digital collections, digital libraries, and the digitization of cultural heritage information. *First Monday, 7*(5). Palmer, C. L. (2004). Thematic research collections. In Schreibman, S., Siemens, R., and Unsworth, J. (Eds.) *A Companion to Digital Humanities*. Blackwell Publishing, Oxford. Palmer, C. L., & Knutson, E. (2004). Metadata practices and implications for federated collections. *Proceedings of the 67th ASIS&T Annual Meeting* (Providence, RI, Nov. 12-17, 2004). Palmer, C. L., Knutson, E., Twidale, M., and Zavalina, O. (2006). Collection definition in federated digital resource development. *Proceedings of the 69th ASIS&T Annual Meeting* (Austin, TX, Nov. 3-8, 2006). Renear, A. H., Wickett, K. M., Urban, R. J., and Dubin, D. (2008a). The return of the trivial: Formalizing collection/item metadata relationships. *Proceedings of **the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008* (Pittsburgh, PA, June 16-20, 2008). Renear, A. H., Wickett, K. M., Urban, R. J., Dubin, D., and Shreeves, S. (2008b). Collection/Item metadata relationships. *Proceedings of the International Conference on Dublin Core and Metadata Applications, 2008* (Berlin, Germany, Sept. 22-26, 2008). ) The fact of the matter is that there are great many relationships that exist between the entities that are gathered into a collection and it is very much the case that when I'm annotating a collection, it is the collection as a whole entity rather than the whole of the entities in it that I intend to annotate. This is because a collection is much more than the mere sum of its parts. Palmer (above) calls the phenomenon 'contextual mass'. And so again, while the parts of the collection are required pieces of the annotation's serialization, I need a better way to preserve the collection's identity than just typing it as a composite and listing out it contents. Choice is also a huge issue. The main goal of choice is to manage multiple representations of the exact same *abstract* entity. When we manage this with a list we lose that entity. I have an example from the HathiTrust digital library. Let us say that I have a scholar that wants to annotate the content on some arbitrary page. Now in the HathiTrust context it is the case that the content can be delivered to the end user as an image file (e.g., a .tif or .jpg) or as a text file containing the results of an OCR workflow. My scholar doesn't care about this distinction, she means to annotate the content on the page. The target of the annotation is the abstract entity of "the page". It just so happens that there are two ways of representing that abstract entity's content. This is directly related to a question Anna Gerber posed to the community group during our Boston face-to-face meeting years ago -- "How do we annotate a work?" How do we annotate abstract, non-information resources? When I ask about the semantics of Choice I'm trying to clarify whether or not the annotation's body/target is actually the right kind of entity. From my perspective oa:Choice suffers from extreme semantic overload as it is weakly acting as a surrogate for the actual abstract thing we mean to be annotating while also acting as a signpost for the serialization agent denoting that a list of options follows. What I really need is an abstract entity to act as the target/body and better method for communicating that there are multiple representations of that abstract thing's content. I have further use cases for abstract entities. Let us say I have a scholar who is remarking on publication practices in the 19th century. His annotation regards the specific typefont that appears in a book. How do I target the typefont? (This is essentially the mirror image of the highlighting issue. And for that matter, how do I target a highlight?) These all have the same underlying issue -- containers. RDF does not do containers particularly well. From what I can see, JSON-LD fairs even worse though because the transformation from RDF into JSON-LD is worst kind of lossy. Almost no information about relationships between entities in the lists can be preserved. To use an LIS - XML example, it's like one-shotting a transformation from MARCXML into Simple Dublin Core. The semantics become ambiguous and overloaded. Now, since we are headed away from RDF, I wonder if container entities are actually a thing we need to worry about. It seems to me that since what is composing the body or target is some (possibly new) abstract entity and not a composite or a choice, we could punt most of these issues out to the implementers and (more importantly) their communities. e.g., If I have an htrc:Page then I annotate that and leverage the htrc namespace's semantics for what to do with "choices". Composites and choices (and all kinds of container/container-like entities) are probably community specific and not necessarily generalizable to the annotation model. (And this might also be the case for selectors.) The other sobering thought I had is that if we are moving away from RDF and into something JSON specific, then why not just move all the way to JSON and skip over JSON-LD? Is this the intended end goal? If yes, then I can easily support it but, I think we need to be realistic about the goals of any conceptual modeling work, i.e., we're building a conceptual model of a json-ld serialization with all of the benefits and limitations that has. If the goal is smooth serialization in one specific serialization technology then let's not over think the semantics. Does that make sense? Apologies for the length of this email and also if it seems negative towards json-ld. That was not my intent. I'm trying to get a better grasp on what we should be doing on the conceptual modeling side of things. It may not help that serialization is not among my annotation use cases. Regards, Jacob _____________________________________________________ Jacob Jett Research Assistant Center for Informatics Research in Science and Scholarship The Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA (217) 244-2164 jjett2@illinois.edu On Mon, Oct 27, 2014 at 10:25 AM, Robert Sanderson <azaroth42@gmail.com> wrote: > > > > On Mon, Oct 27, 2014 at 8:17 AM, Jacob Jett <jgjett@gmail.com> wrote: > >> Ah, I see the serialization. So if I just have multiple bodies without an >> explicit choice or list I could serialize it as: >> >> { >> "@type" : oa:Annotation, >> "oa:hasBody" : [body1, body2, body3] >> } >> Is that correct? >> > > Yes, that's right :) > > >> To clarify we would retain the Choice and Composite types so that we can >> manage the semantics of the array, is that correct? Or are we considering >> jettisoning all of these types in favor of calling them all rdf:List? >> > > The semantic distinction between Choice and the other two is important > (pick one, require all) but I'm less convinced we need to distinguish > between Composite and List. > > Is there a use case when it is important *not* to have order? If the > serialization and underlying model always has order, to me oa:Composite is > going out of our way to include something that has no practical difference > with oa:List. > > > >> How can I differentiate the semantics of various choice and composite use >> cases, e.g., sometimes my composite is a collection of things and sometimes >> it is an amalgamation (see my previous juxtaposition use case)? >> > > Can you give examples of both please, so we can compare? > > Thanks! > > Rob > >
Received on Monday, 27 October 2014 17:05:50 UTC