Re: Selection Filtering from Robert Sanderson on 2012-08-13 (public-openannotation@w3.org from August 2012)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Mon, 13 Aug 2012 15:27:23 -0600
To: t-cole3@illinois.edu
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CABevsUEGqBTnq=2Zruk3nPwNbCqy4fHdCtE03g4i=Y3vvMPUGw@mail.gmail.com>
To try and summarize:

We should allow HTTP URIs for Specific Resources, for example URIs with
fragments or ones that provide representations of segments of resources, in
the situation when there is only a Selector and no other specifiers.

And, to clarify, it is to be used as the identifier for the specific
resource *with* a Selector, rather than in place of it.  The reasoning:
the URI is not obfuscated as it's in hasSource, conflation isn't really an
issue as the description is also available, and that it's explicitly only
for when there's a single selector.

Did I get that right?

I think the only discussion towards this point was at the Boston meeting
(if at all) where it was thought that it would be easier to have a single
recommendation for all specific resources rather than a double barreled one
where some can have an HTTP URI but all the rest of the cases get a URN.

And the zoom level issue is noted, but seems like either quite a specific
problem (how to describe zoom level for an image server) or a use case
within a much much larger problem (how to relate very similar resources
with different URIs, eg Hamlet, the Bible, dynamically generated images
from a single source)

Rob

On Fri, Aug 10, 2012 at 5:43 PM, Tim Cole <t-cole3@illinois.edu> wrote:

> I think we should consider oa:hasContext, so yes, I also vote to discuss
> this topic on next week's Community Group call.****
>
> ** **
>
> But in anticipation of this discussion and before layer another property
> on top of SpecificResource, I'd like to step back a bit and review again
> some of ours assumptions about SpecificResource Targets (and Bodies) and
> their Identifiers. I apologize in advance for being long-winded and
> pedantic in this post, but presumably I've not understood something in the
> earlier discussions about this last week and prior, so I'm trying to go
> back to basics and work through it one more time. Better to do this over
> email than during the call next week.****
>
> ** **
>
> As I understand it all SpecificResources in the OA universe must have an
> oa:hasSource (pointer to "the full, unqualified resource"). All
> SpecificResources also must have at least 1 of 3 (or possibly 4) other
> properties ("Specifiers ... that describe how it [the Source] should be
> refined"):****
>
> ** **
>
> **A)     **oa:hasSelector – To describe entities that are segments or
> components of resources****
>
> **B)      **oa:hasState -- Entities that are representations or versions
> of resources (e.g., retrieved via content negotiation or from an archive)*
> ***
>
> **C)      **oa:hasContext (proposed) -- Entities that are resources
> within a qualifying context****
>
> ** **
>
> The consensus seems to be emerging that hasStyle is better as a property
> of an annotation rather than as property of a SpecificResource Target, so
> I'm leaving out of this list for now -- although I'm not sure we've fully
> considered whether zoom-level (for example)  or similar constraints are
> considerations about which we have to worry.  These other properties are
> not mutually exclusive (hence at least 1 of 3). You can have a
> SpecificResource that is a segment (hasSelector) of the image/jpeg
> representation (hasState) of a resource.  The end result is that in the OA
> data model, SpecificResources do a lot of heavy lifting and have a broad
> scope. ****
>
> ** **
>
> So far pretty non-controversial, I think. ****
>
> ** **
>
> Next we come to the identifier of the SpecificResource. We say that
> SpecificResources will typically be identified by URNs and our examples use
> UUID URNs exclusively (I think). Thus: ****
>
> ** **
>
> <http://myserver.net/Anno1> a oa:Annotation ;****
>
>     oa:hasBody <http://myserver.net/Body1> ;****
>
>     oa:hasTarget <urn:uuid:b9692250-e31c-11e1-9b23-0800200c9a66> .****
>
>     ****
>
>   <urn:uuid:b9692250-e31c-11e1-9b23-0800200c9a66> a oa:SpecificResource ;*
> ***
>
>     oa:hasSelector <urn:uuid: cca2db40-e31c-11e1-9b23-0800200c9a66> ;****
>
>     oa:hasSource <http://myserver.net/TargetResource> .****
>
>     ****
>
>   <urn:uuid:cca2db40-e31c-11e1-9b23-0800200c9a66> a oa:FragmentSelector ;*
> ***
>
>     rdf:value " #xywh=100,100,50,75 " . ****
>
> ** **
>
> We justify our recommendation to use URNs for SpecificResources by saying
> that "an HTTP URI would imply that the exact nature of the Specific
> Body/Target was available to retrieve by dereferencing the HTTP URI" and
> then implying that this is not commonplace. I think this is actually
> becoming increasingly commonplace, and with the goal that simple, more
> frequently employed annotation use cases should generate the simplest
> graphs with a minimum of generated IDs, I think we should reconsider our
> suggestion that URNs will typically be used for SpecificResource
> Identifiers. Fragment identifiers are not the only kinds of potential
> SpecificResource Identifiers to consider here, but I think our logic for
> discouraging fragment identifiers is part of the story. So in regard to not
> using URIs containing fragment identifiers we offer 3 specific reasons:***
> *
>
> ** **
>
> **1.       **If two annotations target the same fragment of a resource
> but include different values for oa:hasState, oa:hasStyle (deprecated?), or
> oa:hasContext (added?) then an inconsistency is created if both
> SpecificResources share the same URI.****
>
> **2.       **The source URI is obfuscated by the fragment identifier****
>
> **3.       **Fragment URIs conflate the identity and the description of
> the segment of interest by including the description inline within the
> identity****
>
> ** **
>
> With regard to 2, if a URI containing a fragment identifier (or any other
> opaque URI for that matter) is used to identify a SpecificResource that is
> a segment of another resource, then the annotator must (or at least
> should?) provide oa:hasSource and oa:hasSelector. The presence of the
> hasSource makes clear the retrievable resource being annotated so as to
> facilitate collocation of annotations sharing a common target or body. ***
> *
>
> ** **
>
> With regard to 3, at this point I'm unconvinced this is a problem in the
> context of RDF (a point others have suggested).  ****
>
> ** **
>
> With regard to 1, my thinking is that 1 is backwards. In the absence of
> oa:hasState, oa:hasStyle, and/or oa:hasContext – i.e., in the simpler cases
> – a de-referenceable http: URI should be preferred, even if it contains
> fragment identifiers. In the absence of State, Style, Context, nothing is
> being said that is not universally true about the thing identified by the
> http: URI. There is no  opportunity for confusion, but as soon as you add
> oa:hasState, oa:hasStyle, and/or oa:hasContext in addition to
> oa:hasSelector, you are talking about a different SpecificResource, one
> that is most likely best identified by a URN. My basic argument is that
> properly constructed URIs containing fragment identifiers, and in fact many
> URIs that don't contain fragment identifiers, intrinsically have Source and
> Selector properties, but not State, Style, or Context properties. So as
> soon as you add State, Style, or Context, you are talking about a different
> resource, and so need a different Identifier. (Conceivably you could have a
> URI which conflated Source, Selector and State – e.g., a service like
> djatoka that could be used not only to retrieve a region of an image, but
> to retrieve that region as a particular MIME type. In this case you
> SpecificResource would be identified by the djatoka URL AND would have all
> of hasSource, hasSelector, and hasState properties. It would be at least
> bad practice to leave hasState off – arguably not an error because of the
> RDF Open World assumption.)****
>
> ** **
>
> As illustration consider the case where
> http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 de-references to the
> 50x75 pixel segment of the source image resource
> http://myserver.net/bar.jpg with upper left hand corner of the segment
> located at pixel 100x100 of the source image. Then: ****
>
> ** **
>
>  <http://myserver.net/Anno1> a oa:Annotation ;****
>
>     oa:hasBody <http://myserver.net/Body1> ;****
>
>     oa:hasTarget < http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 > .**
> **
>
>     ****
>
>   < http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 >****
>
>     oa:hasSelector <urn:uuid: cca2db40-e31c-11e1-9b23-0800200c9a66> ;****
>
>     oa:hasSource < http://myserver.net/bar.jpg > .****
>
> ** **
>
> <urn:uuid:cca2db40-e31c-11e1-9b23-0800200c9a66> a oa:FragmentSelector ;***
> *
>
>     rdf:value " #xywh=100,100,50,75 " . ****
>
> ** **
>
> While someone else could make an annotation just like this and add
> oa:hasContext, my contention is that if he or she did so without changing
> the Identifier that is the object of the oa:hasTarget to a URN, he or she
> would be wrong, i.e., would be making an untrue statement about
> http://myserver.net/foo/bar?x=100&y=100&w=50&h=75. The resource
> identified by http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 does not
> have hasContext as a universally true property. When hasContext is added
> you are talking about a different resource and so need to mint a new
> identifier, essentially a new identifier to identify your use of the
> resource in your annotation.****
>
> ** **
>
> Now, is http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 as useful in my
> triple store as http://myserver.net/bar.jpg, probably not. But it's at
> least as useful as urn:uuid:b9692250-e31c-11e1-9b23-0800200c9a66, and since
> http://myserver.net/bar.jpg is still provided as the object of hasSource,
> your triple store is still well populated with regard to this annotation.
> (I also contend that because
> http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 is identifying a
> segment of http://myserver.net/bar.jpg hasSource + hasSelector is more
> informative than dcterms:isPartOf which would be an alternative way to
> express some of these relationships.)****
>
> ** **
>
> The natural extension of this argument is that
> http://myserver.net/foo/bar?x=100&y=100&w=50&h=75* *can equally well  be
> replaced by http://myserver.net/bar.jpg#xywh=100,100,50,75. ****
>
> ** **
>
> So, obviously I've missed something here from the previous discussions.
> And now multiple of you are probably going to tell me what it is I missed.
> ****
>
> ** **
>
> Thanks,****
>
> ** **
>
> ** **
>
> Tim Cole****
>
> University of Illinois at UC****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com]
> *Sent:* Thursday, August 09, 2012 5:49 PM
> *To:* Robert Sanderson
> *Cc:* James Smith; public-openannotation
> *Subject:* Re: Selection Filtering****
>
> ** **
>
> +1 yes I would start discussing the topic****
>
> On Thu, Aug 9, 2012 at 6:46 PM, Robert Sanderson <azaroth42@gmail.com>
> wrote:****
>
> Yes, I see the (significant) advantages of the hasContext approach,
> both for intuitive-ness and ease of processing.  Perhaps we do just
> need both ways.
>
> A topic for next week's call?
>
> Rob
>
> On Thu, Aug 9, 2012 at 4:20 PM, Paolo Ciccarese****
>
> <paolo.ciccarese@gmail.com> wrote:
> > Maybe we just need both ways?
> >
> > In your specific example I feel your first proposal makes more sense.
> > The zip file is the resource and the XHTML files are parts of it and they
> > don't seem having autonomous life.
> > If they do they would probably get a URI?
> >
> > But for classic HTML pages where Images have a URI I would go the other
> way.
> >
> > (still brainstorming)
> >
> >
> >
> > On Thu, Aug 9, 2012 at 6:13 PM, Robert Sanderson <azaroth42@gmail.com>
> > wrote:
> >>
> >> Does the hasContext approach (eg climbing up rather than drilling
> >> down) deal with situations when there's really embedded content, or
> >> only resources that are referenced but have their own unique URIs?
> >>
> >> For example, the use case I have in mind is an ePub document
> >> (basically a zip file that contains HTML and related content) that has
> >> a URI, but the chapter xhtml files within it do not.  And similarly
> >> the images referenced from those chapters don't have their own URIs,
> >> they're just named bitstreams within the zip.
> >>
> >> I don't immediately see it, if it does.  And if not, how would we go
> >> about providing a solution for the use case?
> >>
> >> Rob
> >>
> >>
> >> On Thu, Aug 9, 2012 at 4:09 PM, Paolo Ciccarese
> >> <paolo.ciccarese@gmail.com> wrote:
> >> > I am dealing with the same use case exactly now.
> >> > I like Rob's first solution but I agree the image is buried in the
> >> > selector.
> >> > The oa:hasSelector {FileSel1,ImgSel1,Svg1} does not communicate the
> >> > message
> >> > clearly.
> >> >
> >> > I would pick Jim's solution as it is simple and in line with the
> >> > discussions
> >> > we had in the last weeks. Christian Morbidoni was suggesting a similar
> >> > approach in a previous email exchange:
> >> >
> >> >
> http://lists.w3.org/Archives/Public/public-openannotation/2012Jul/0038.html
> >> >
> >> > However, I am not sure if, with that, you can distinguish in between
> the
> >> > fact that the image has been simply annotated within a context and the
> >> > fact
> >> > that the annotation makes sense only within that context. In the first
> >> > case,
> >> > ignoring the context is probably fine. In the second case, it is
> >> > probably
> >> > not. Probably adding a subproperty could be enough but I was wondering
> >> > if
> >> > that approach has the potential of scaling to more complex filtering
> >> > criteria.
> >> >
> >> > Paolo
> >> >
> >> >
> >> > On Thu, Aug 9, 2012 at 5:29 PM, James Smith <jgsmith@gmail.com>
> wrote:
> >> >>
> >> >> I've been thinking about how to bite, assuming I'm thinking about the
> >> >> same
> >> >> problem. I've been considering how to specify that a particular
> >> >> annotation
> >> >> is about a resource when that resource is considered in the context
> of
> >> >> another resource.
> >> >>
> >> >> This can be done with an additional selector-like property:
> >> >> oax:hasContext. This could have the same target as oa:hasTarget, so
> >> >> anything
> >> >> that can be a target can be a context.
> >> >>
> >> >> For example, if I wanted to annotate an image as it is embedded in an
> >> >> html
> >> >> document, I could have the following triples:
> >> >>
> >> >> Anno1 a oa:Annotation ;
> >> >>   oa:hasTarget Spec1 .
> >> >> Spec1 a oa:SpecificResource ;
> >> >>   oa:hasSource IMG ;
> >> >>   oax:hasContext Sel1 .
> >> >> Sel1 a oa:SpecificResource ;
> >> >>   oa:hasSource HTML .
> >> >>
> >> >> I'm not sure how to interpret the oa:hasSelector
> >> >> {FileSel1,ImgSel1,Svg1}
> >> >> pieces of the second example to know how to transform them into a
> >> >> similar
> >> >> form as above.
> >> >>
> >> >> I like this form because I can ignore the oax:hasContext piece and
> >> >> still
> >> >> have a good chance at getting the annotation in the right place. For
> >> >> example, if I am annotating a video embedded on a particular page,
> all
> >> >> I
> >> >> have to add is the oax:hasContext piece to state that it is in the
> >> >> context
> >> >> of that resource, instead of annotating that resource and hoping I
> can
> >> >> select the video within that resource (and hope that such a selection
> >> >> doesn't have to change due to edits in the embedding document).
> >> >>
> >> >> -- Jim
> >> >>
> >> >>
> >> >> On Aug 9, 2012, at 4:56 PM, Robert Sanderson <azaroth42@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > No one seems to be biting, so I'll throw out a proposal for a
> >> >> > solution
> >> >> > (maybe)  :)
> >> >> >
> >> >> > Instead of considering the annotation to be on the lowest level
> >> >> > object
> >> >> > and then climbing back up the hierarchy (annotate the image, in the
> >> >> > html) instead we can use the regular structure of annotating the
> >> >> > highest level resource and drilling down with Selectors to the most
> >> >> > appropriate part (annotate the html, select the image).
> >> >> >
> >> >> > This would work in all of the cases described, and often with just
> >> >> > FragmentSelector.
> >> >> > eg:
> >> >> >
> >> >> > Anno1 a oa:Annotation ;
> >> >> >    oa:hasTarget Spec1 .
> >> >> > Spec1 a oa:SpecificResource ;
> >> >> >    oa:hasSource HTML ;
> >> >> >    oa:hasSelector Sel1 .
> >> >> > Sel1 a oa:FragmentSelector ;     // oax:XpointerFragmentSelector ?
> >> >> >    rdf:value "xpointer(/xpath/to/img[@href="Img1"])" .
> >> >> >
> >> >> > Anno2 a oa:Annotation ;
> >> >> >    oa:hasTarget Spec2 .
> >> >> > Spec2 a oa:SpecificResource ;
> >> >> >    oa:hasSource ePub1 ;
> >> >> >    oa:hasSelector CompSel1 .
> >> >> > CompSel1 a oa:CompositeSelector ;
> >> >> >    oa:hasSelector FileSel1 ;    //  select xhtml file in zip
> >> >> >    oa:hasSelector ImgSel1 ;    //  select image in xhtml
> >> >> >    oa:hasSelector Svg1 .         //  select SVG area of image
> >> >> > (...)
> >> >> >
> >> >> >
> >> >> > The main issue is that the URI of the component resource (eg the
> >> >> > image) is not easily accessible, if it has one.  In the ePub case,
> it
> >> >> > doesn't have its own HTTP URI, but in the regular web page it does.
> >> >> >
> >> >> > Thoughts?
> >> >> >
> >> >> > Rob
> >> >> >
> >> >> >
> >> >> > On Wed, Aug 1, 2012 at 4:09 PM, Robert Sanderson
> >> >> > <azaroth42@gmail.com>
> >> >> > wrote:
> >> >> >> Starting a new thread on this topic for ease of tracking :)
> >> >> >>
> >> >> >> In other a couple of other threads, the desire to describe an
> >> >> >> annotation which targets a resource in some particular context was
> >> >> >> expressed.
> >> >> >> For example, to annotate an image only as it appears in a
> particular
> >> >> >> html page.
> >> >> >>
> >> >> >> The base requirement seems to me to be:
> >> >> >>    Annotate [part of] (resource) as it is used in (resource)
> >> >> >>
> >> >> >> This extends quickly to:
> >> >> >>    Annotate [part of] (resource) as it is used in [part of]
> >> >> >> (resource)
> >> >> >> For example, annotate an image as it is used on page 4 of a PDF.
> >> >> >>
> >> >> >> This could mean arbitrary nesting, to allow for annotating an
> image
> >> >> >> in
> >> >> >> an html file in an ePub document.
> >> >> >> The same should be applicable for bodies as well as targets, in
> >> >> >> order
> >> >> >> to extract contents from container resources.
> >> >> >>
> >> >> >> Is there a requirement for differentiating between the resource,
> and
> >> >> >> the resource used in some container resource?
> >> >> >> For example, is it important to be able to annotate an image, but
> >> >> >> not
> >> >> >> have the annotation appear when that image is embedded within an
> >> >> >> HTML
> >> >> >> page?
> >> >> >> For annotating non-rendering resources (such as CSS, Javascript
> etc)
> >> >> >> it might be important?
> >> >> >>
> >> >> >> Is there a requirement for sets of container resources, or is it
> >> >> >> sufficient to simply create new annotations? For example, this
> image
> >> >> >> in these 3 HTML pages.
> >> >> >>
> >> >> >>
> >> >> >> A second application of filtering, that makes me very nervous, is:
> >> >> >>    Annotate all occurrences of (selection) in (set of resources)
> >> >> >>
> >> >> >> For example all occurrences of the word "annotate" in any textual
> >> >> >> resource, all occurrences of the top left pixel in JPEG images,
> all
> >> >> >> occurrences of the first line of text in all copies of
> Shakespeare's
> >> >> >> "Hamlet".
> >> >> >>
> >> >> >>
> >> >> >> Before we start thinking about approaches and solutions, it would
> be
> >> >> >> great to firmly scope what it is that we're trying to solve :)
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >> Rob
> >> >> >
> >> >>
> >> >>
> >> >
> >
> >
> >
> >
> > --
> > Dr. Paolo Ciccarese
> > http://www.paolociccarese.info/
> > Biomedical Informatics Research & Development
> > Instructor of Neurology at Harvard Medical School
> > Assistant in Neuroscience at Mass General Hospital
> > +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
> >
> > CONFIDENTIALITY NOTICE: This message is intended only for the
> addressee(s),
> > may contain information that is considered
> > to be sensitive or confidential and may not be forwarded or disclosed to
> any
> > other party without the permission of the sender.
> > If you have received this message in error, please notify the sender
> > immediately.
> >****
>
>
>
>
> --
> Dr. Paolo Ciccarese
> http://www.paolociccarese.info/
> Biomedical Informatics Research & Development
> Instructor of Neurology at Harvard Medical School
> Assistant in Neuroscience at Mass General Hospital
> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>
> CONFIDENTIALITY NOTICE: This message is intended only for the
> addressee(s), may contain information that is considered
> to be sensitive or confidential and may not be forwarded or disclosed to
> any other party without the permission of the sender.
> If you have received this message in error, please notify the sender
> immediately.****
>
Received on Monday, 13 August 2012 21:27:53 UTC