Re: [web-annotation] Make Selectors available for the wide world?

Agree with @tilgovi and @BigBlueHat . +1 on 1 and 2. 

IIRC I mentioned that I have a use case for selectors outside of the 
annotation domain. Having finally crawled out from my school work 
rock, let me see if I can coherently articulate what that use case is.

The HathiTrust Digital Library is the public facing access point for 
the Google Digitization Efforts. It is a very large scale digital 
repository with over 14 million digitized books comprising several 
billions of pages. One of the ongoing questions is how to provide some
 form of access (or psuedo-access) for digital humanities researchers 
to the corpus for the purposes of computational analysis (for 
extracting features, modeling topics, etc.). Because 2/3rds of the 
corpus remains within the domain of copyright we've had to develop a 
specialized container called a workset[1]. The primary feature of the 
workset is that it provides a method for researchers to aggregate 
objects for analysis (see Figure 1). It also records a certain amount 
of metadata describing the aggregation as a whole (see Figure 2).

Figure 1: Basic HTRC Workset Model
![image](https://cloud.githubusercontent.com/assets/4933420/11307725/9f2fb850-8f7f-11e5-8f94-b1b5e88e763f.png)

Figure 2: Full HTRC Workset Model
![image](https://cloud.githubusercontent.com/assets/4933420/11307763/ceb72356-8f7f-11e5-9abf-30fa27bcce6c.png)

The nature of the HTRC's current architecture limits what can be 
gathered into worksets to just a notional thing called a volume (which
 is itself an aggregation of some pages with metadata describing the 
aggregation). Our scholarly users want us to move beyond notional 
volumes and provide them with tools that let them aggregate finer 
grained objects of interest into their worksets. The want to gather 
together specific pages or features on pages rather than whole volumes
 so that the data preparation overhead can be reduced and previous 
feature extraction work can be fully leveraged.

Specific Resources and Selectors (and the other specifiers) provide a 
very, very good way of doing this exact thing. These things would 
provide us a method for selecting specific portions of page(s), e.g., 
a scholar wants to analyze the text of a collection of poems, each 
poem is named as a specific resource and the selectors provide the 
architecture with a relatively simple means of cherry picking just the
 poem's text off of the page and feeding it into the analysis 
algorithm.

As you can see, this is not an *annotation* use case. So again +1 to 
suggestions 1 and 2 but moreover -1 to any language that is going to 
peg specific resources and selectors/specifiers to something specific 
to annotations. This latter thing will put me in the awkward position 
of plagiarizing/reinventing specific resources and specifiers for the 
HTRC's workset context.

Regards,

Jacob

[1] Additional information on the workset data model can be found at: 
http://hdl.handle.net/2142/78149 




_____________________________________________________
Jacob Jett
Research Assistant
Center for Informatics Research in Science and Scholarship
The Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
(217) 244-2164
jjett2@illinois.edu

-- 
GitHub Notification of comment by jjett
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/110#issuecomment-158482382
 using your GitHub account

Received on Friday, 20 November 2015 18:25:46 UTC