- From: Jacob via GitHub <sysbot+gh@w3.org>
- Date: Fri, 20 Nov 2015 18:25:27 +0000
- To: public-annotation@w3.org
Agree with @tilgovi and @BigBlueHat . +1 on 1 and 2. IIRC I mentioned that I have a use case for selectors outside of the annotation domain. Having finally crawled out from my school work rock, let me see if I can coherently articulate what that use case is. The HathiTrust Digital Library is the public facing access point for the Google Digitization Efforts. It is a very large scale digital repository with over 14 million digitized books comprising several billions of pages. One of the ongoing questions is how to provide some form of access (or psuedo-access) for digital humanities researchers to the corpus for the purposes of computational analysis (for extracting features, modeling topics, etc.). Because 2/3rds of the corpus remains within the domain of copyright we've had to develop a specialized container called a workset[1]. The primary feature of the workset is that it provides a method for researchers to aggregate objects for analysis (see Figure 1). It also records a certain amount of metadata describing the aggregation as a whole (see Figure 2). Figure 1: Basic HTRC Workset Model ![image](https://cloud.githubusercontent.com/assets/4933420/11307725/9f2fb850-8f7f-11e5-8f94-b1b5e88e763f.png) Figure 2: Full HTRC Workset Model ![image](https://cloud.githubusercontent.com/assets/4933420/11307763/ceb72356-8f7f-11e5-9abf-30fa27bcce6c.png) The nature of the HTRC's current architecture limits what can be gathered into worksets to just a notional thing called a volume (which is itself an aggregation of some pages with metadata describing the aggregation). Our scholarly users want us to move beyond notional volumes and provide them with tools that let them aggregate finer grained objects of interest into their worksets. The want to gather together specific pages or features on pages rather than whole volumes so that the data preparation overhead can be reduced and previous feature extraction work can be fully leveraged. Specific Resources and Selectors (and the other specifiers) provide a very, very good way of doing this exact thing. These things would provide us a method for selecting specific portions of page(s), e.g., a scholar wants to analyze the text of a collection of poems, each poem is named as a specific resource and the selectors provide the architecture with a relatively simple means of cherry picking just the poem's text off of the page and feeding it into the analysis algorithm. As you can see, this is not an *annotation* use case. So again +1 to suggestions 1 and 2 but moreover -1 to any language that is going to peg specific resources and selectors/specifiers to something specific to annotations. This latter thing will put me in the awkward position of plagiarizing/reinventing specific resources and specifiers for the HTRC's workset context. Regards, Jacob [1] Additional information on the workset data model can be found at: http://hdl.handle.net/2142/78149 _____________________________________________________ Jacob Jett Research Assistant Center for Informatics Research in Science and Scholarship The Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA (217) 244-2164 jjett2@illinois.edu -- GitHub Notification of comment by jjett Please view or discuss this issue at https://github.com/w3c/web-annotation/issues/110#issuecomment-158482382 using your GitHub account
Received on Friday, 20 November 2015 18:25:46 UTC