Re: In preparation to the F2F: Data Model status from Jacob Jett on 2014-10-27 (public-annotation@w3.org from October 2014)

From: Jacob Jett <jjett2@illinois.edu>
Date: Mon, 27 Oct 2014 12:04:41 -0500
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Paolo Ciccarese <paolo.ciccarese@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CABzPtBLtvfpgGsZmrx9YV6=QrHwJsnHN+YJQvU58dCDYWkdMNg@mail.gmail.com>
Some examples of various kinds of composites:

During the pre-cg years of the standard's development, we gathered a number
of use cases from scholars working with digitized emblem books here at
Illinois. One of their chief use cases was the desire to juxtapose emblems
and annotate the juxtoposition - i.e., their (body) content was about 2
targets physically arranged in a specific way without which arrangement the
content of the annotation no longer made sense. In a very real sense the
composite in this case is some new resource that is comprised of some other
resources with the intention that they be presented to the end user in a
very specific arrangement. In this case the annotation is not about the two
resources that necessary for the rendered end product -- it is about that
end product.

To use a chemistry analogy, the annotation is about water but if I only
have identifiers for hydrogen and oxygen then using the methodology
outlined above my annotation would be:

{
"@type" : rdf:List,
"item" : [hydrogen, oxygen, oxygen]
}

I put it to you that [hydrogen, oxygen, oxygen] is just not the same as
water.

Another example is collections. Frequently computer scientists treat
collections as though they were lists, but a great deal of scholarly effort
(see the following just to get started:
Currall, J., Moss, M., & Stuart, S. (2004). What is a collection?
*Archivaria* *58*, 131-146.

Lynch, C. (2002). Digital collections, digital libraries, and the
digitization of cultural heritage information. *First Monday, 7*(5).

Palmer, C. L. (2004). Thematic research collections. In Schreibman, S.,
Siemens, R., and Unsworth, J. (Eds.) *A Companion to Digital Humanities*.
Blackwell Publishing, Oxford.

Palmer, C. L., & Knutson, E. (2004). Metadata practices and implications
for federated collections. *Proceedings of the 67th ASIS&T Annual Meeting*
(Providence, RI, Nov. 12-17, 2004).

Palmer, C. L., Knutson, E., Twidale, M., and Zavalina, O. (2006).
Collection definition in federated digital resource development. *Proceedings
of the 69th ASIS&T Annual Meeting* (Austin, TX, Nov. 3-8, 2006).
Renear, A. H., Wickett, K. M., Urban, R. J., and Dubin, D. (2008a). The
return of the trivial: Formalizing collection/item metadata
relationships. *Proceedings
of **the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008*
(Pittsburgh, PA, June 16-20, 2008).
Renear, A. H., Wickett, K. M., Urban, R. J., Dubin, D., and Shreeves, S.
(2008b). Collection/Item metadata relationships. *Proceedings of the
International Conference on Dublin Core and Metadata Applications, 2008*
(Berlin, Germany, Sept. 22-26, 2008). )

The fact of the matter is that there are great many relationships that
exist between the entities that are gathered into a collection and it is
very much the case that when I'm annotating a collection, it is the
collection as a whole entity rather than the whole of the entities in it
that I intend to annotate. This is because a collection is much more than
the mere sum of its parts. Palmer (above) calls the phenomenon 'contextual
mass'. And so again, while the parts of the collection are required pieces
of the annotation's serialization, I need a better way to preserve the
collection's identity than just typing it as a composite and listing out it
contents.

Choice is also a huge issue. The main goal of choice is to manage multiple
representations of the exact same *abstract* entity. When we manage this
with a list we lose that entity. I have an example from the HathiTrust
digital library. Let us say that I have a scholar that wants to annotate
the content on some arbitrary page. Now in the HathiTrust context it is the
case that the content can be delivered to the end user as an image file
(e.g., a .tif or .jpg) or as a text file containing the results of an OCR
workflow. My scholar doesn't care about this distinction, she means to
annotate the content on the page. The target of the annotation is the
abstract entity of "the page". It just so happens that there are two ways
of representing that abstract entity's content.

This is directly related to a question Anna Gerber posed to the community
group during our Boston face-to-face meeting years ago -- "How do we
annotate a work?" How do we annotate abstract, non-information resources?

When I ask about the semantics of Choice I'm trying to clarify whether or
not the annotation's body/target is actually the right kind of entity. From
my perspective oa:Choice suffers from extreme semantic overload as it is
weakly acting as a surrogate for the actual abstract thing we mean to be
annotating while also acting as a signpost for the serialization agent
denoting that a list of options follows. What I really need is an abstract
entity to act as the target/body and better method for communicating that
there are multiple representations of that abstract thing's content.

I have further use cases for abstract entities. Let us say I have a scholar
who is remarking on publication practices in the 19th century. His
annotation regards the specific typefont that appears in a book. How do I
target the typefont? (This is essentially the mirror image of the
highlighting issue. And for that matter, how do I target a highlight?)

These all have the same underlying issue -- containers. RDF does not do
containers particularly well. From what I can see, JSON-LD fairs even worse
though because the transformation from RDF into JSON-LD is worst kind of
lossy. Almost no information about relationships between entities in the
lists can be preserved. To use an LIS - XML example, it's like one-shotting
a transformation from MARCXML into Simple Dublin Core. The semantics become
ambiguous and overloaded.

Now, since we are headed away from RDF, I wonder if container entities are
actually a thing we need to worry about. It seems to me that since what is
composing the body or target is some (possibly new) abstract entity and not
a composite or a choice, we could punt most of these issues out to the
implementers and (more importantly) their communities. e.g., If I have an
htrc:Page then I annotate that and leverage the htrc namespace's semantics
for what to do with "choices". Composites and choices (and all kinds of
container/container-like entities) are probably community specific and not
necessarily generalizable to the annotation model. (And this might also be
the case for selectors.)

The other sobering thought I had is that if we are moving away from RDF and
into something JSON specific, then why not just move all the way to JSON
and skip over JSON-LD? Is this the intended end goal?

If yes, then I can easily support it but, I think we need to be realistic
about the goals of any conceptual modeling work, i.e., we're building a
conceptual model of a json-ld serialization with all of the benefits and
limitations that has. If the goal is smooth serialization in one specific
serialization technology then let's not over think the semantics. Does that
make sense?

Apologies for the length of this email and also if it seems negative
towards json-ld. That was not my intent. I'm trying to get a better grasp
on what we should be doing on the conceptual modeling side of things. It
may not help that serialization is not among my annotation use cases.

Regards,

Jacob


_____________________________________________________
Jacob Jett
Research Assistant
Center for Informatics Research in Science and Scholarship
The Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
(217) 244-2164
jjett2@illinois.edu

On Mon, Oct 27, 2014 at 10:25 AM, Robert Sanderson <azaroth42@gmail.com>
wrote:

>
>
>
> On Mon, Oct 27, 2014 at 8:17 AM, Jacob Jett <jgjett@gmail.com> wrote:
>
>> Ah, I see the serialization. So if I just have multiple bodies without an
>> explicit choice or list I could serialize it as:
>>
>> {
>> "@type" : oa:Annotation,
>> "oa:hasBody" : [body1, body2, body3]
>> }
>> Is that correct?
>>
>
> Yes, that's right :)
>
>
>> To clarify we would retain the Choice and Composite types so that we can
>> manage the semantics of the array, is that correct? Or are we considering
>> jettisoning all of these types in favor of calling them all rdf:List?
>>
>
> The semantic distinction between Choice and the other two is important
> (pick one, require all) but I'm less convinced we need to distinguish
> between Composite and List.
>
> Is there a use case when it is important *not* to have order?  If the
> serialization and underlying model always has order, to me oa:Composite is
> going out of our way to include something that has no practical difference
> with oa:List.
>
>
>
>> How can I differentiate the semantics of various choice and composite use
>> cases, e.g., sometimes my composite is a collection of things and sometimes
>> it is an amalgamation (see my previous juxtaposition use case)?
>>
>
> Can you give examples of both please, so we can compare?
>
> Thanks!
>
> Rob
>
>
Received on Monday, 27 October 2014 17:05:50 UTC