RE: Selection Filtering from Tim Cole on 2012-08-10 (public-openannotation@w3.org from August 2012)

From: Tim Cole <t-cole3@illinois.edu>
Date: Fri, 10 Aug 2012 18:43:17 -0500
To: 'Robert Sanderson' <azaroth42@gmail.com>
CC: 'public-openannotation' <public-openannotation@w3.org>
Message-ID: <0d7701cd7751$edd41d10$c97c5730$@illinois.edu>
I think we should consider oa:hasContext, so yes, I also vote to discuss
this topic on next week's Community Group call.

 

But in anticipation of this discussion and before layer another property on
top of SpecificResource, I'd like to step back a bit and review again some
of ours assumptions about SpecificResource Targets (and Bodies) and their
Identifiers. I apologize in advance for being long-winded and pedantic in
this post, but presumably I've not understood something in the earlier
discussions about this last week and prior, so I'm trying to go back to
basics and work through it one more time. Better to do this over email than
during the call next week.

 

As I understand it all SpecificResources in the OA universe must have an
oa:hasSource (pointer to "the full, unqualified resource"). All
SpecificResources also must have at least 1 of 3 (or possibly 4) other
properties ("Specifiers ... that describe how it [the Source] should be
refined"):

 

A)     oa:hasSelector - To describe entities that are segments or components
of resources

B)      oa:hasState -- Entities that are representations or versions of
resources (e.g., retrieved via content negotiation or from an archive)

C)      oa:hasContext (proposed) -- Entities that are resources within a
qualifying context

 

The consensus seems to be emerging that hasStyle is better as a property of
an annotation rather than as property of a SpecificResource Target, so I'm
leaving out of this list for now -- although I'm not sure we've fully
considered whether zoom-level (for example)  or similar constraints are
considerations about which we have to worry.  These other properties are not
mutually exclusive (hence at least 1 of 3). You can have a SpecificResource
that is a segment (hasSelector) of the image/jpeg representation (hasState)
of a resource.  The end result is that in the OA data model,
SpecificResources do a lot of heavy lifting and have a broad scope. 

 

So far pretty non-controversial, I think. 

 

Next we come to the identifier of the SpecificResource. We say that
SpecificResources will typically be identified by URNs and our examples use
UUID URNs exclusively (I think). Thus: 

 

<http://myserver.net/Anno1> a oa:Annotation ;

    oa:hasBody <http://myserver.net/Body1> ;

    oa:hasTarget <urn:uuid:b9692250-e31c-11e1-9b23-0800200c9a66> .

    

  <urn:uuid:b9692250-e31c-11e1-9b23-0800200c9a66> a oa:SpecificResource ;

    oa:hasSelector <urn:uuid: cca2db40-e31c-11e1-9b23-0800200c9a66> ;

    oa:hasSource <http://myserver.net/TargetResource> .

    

  <urn:uuid:cca2db40-e31c-11e1-9b23-0800200c9a66> a oa:FragmentSelector ;

    rdf:value " #xywh=100,100,50,75 " . 

 

We justify our recommendation to use URNs for SpecificResources by saying
that "an HTTP URI would imply that the exact nature of the Specific
Body/Target was available to retrieve by dereferencing the HTTP URI" and
then implying that this is not commonplace. I think this is actually
becoming increasingly commonplace, and with the goal that simple, more
frequently employed annotation use cases should generate the simplest graphs
with a minimum of generated IDs, I think we should reconsider our suggestion
that URNs will typically be used for SpecificResource Identifiers. Fragment
identifiers are not the only kinds of potential SpecificResource Identifiers
to consider here, but I think our logic for discouraging fragment
identifiers is part of the story. So in regard to not using URIs containing
fragment identifiers we offer 3 specific reasons:

 

1.       If two annotations target the same fragment of a resource but
include different values for oa:hasState, oa:hasStyle (deprecated?), or
oa:hasContext (added?) then an inconsistency is created if both
SpecificResources share the same URI.

2.       The source URI is obfuscated by the fragment identifier

3.       Fragment URIs conflate the identity and the description of the
segment of interest by including the description inline within the identity

 

With regard to 2, if a URI containing a fragment identifier (or any other
opaque URI for that matter) is used to identify a SpecificResource that is a
segment of another resource, then the annotator must (or at least should?)
provide oa:hasSource and oa:hasSelector. The presence of the hasSource makes
clear the retrievable resource being annotated so as to facilitate
collocation of annotations sharing a common target or body. 

 

With regard to 3, at this point I'm unconvinced this is a problem in the
context of RDF (a point others have suggested).  

 

With regard to 1, my thinking is that 1 is backwards. In the absence of
oa:hasState, oa:hasStyle, and/or oa:hasContext - i.e., in the simpler cases
- a de-referenceable http: URI should be preferred, even if it contains
fragment identifiers. In the absence of State, Style, Context, nothing is
being said that is not universally true about the thing identified by the
http: URI. There is no  opportunity for confusion, but as soon as you add
oa:hasState, oa:hasStyle, and/or oa:hasContext in addition to
oa:hasSelector, you are talking about a different SpecificResource, one that
is most likely best identified by a URN. My basic argument is that properly
constructed URIs containing fragment identifiers, and in fact many URIs that
don't contain fragment identifiers, intrinsically have Source and Selector
properties, but not State, Style, or Context properties. So as soon as you
add State, Style, or Context, you are talking about a different resource,
and so need a different Identifier. (Conceivably you could have a URI which
conflated Source, Selector and State - e.g., a service like djatoka that
could be used not only to retrieve a region of an image, but to retrieve
that region as a particular MIME type. In this case you SpecificResource
would be identified by the djatoka URL AND would have all of hasSource,
hasSelector, and hasState properties. It would be at least bad practice to
leave hasState off - arguably not an error because of the RDF Open World
assumption.)

 

As illustration consider the case where
http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 de-references to the 50x75
pixel segment of the source image resource http://myserver.net/bar.jpg with
upper left hand corner of the segment located at pixel 100x100 of the source
image. Then: 

 

 <http://myserver.net/Anno1> a oa:Annotation ;

    oa:hasBody <http://myserver.net/Body1> ;

    oa:hasTarget < http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 > .

    

  < http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 >

    oa:hasSelector <urn:uuid: cca2db40-e31c-11e1-9b23-0800200c9a66> ;

    oa:hasSource < http://myserver.net/bar.jpg > .

 

<urn:uuid:cca2db40-e31c-11e1-9b23-0800200c9a66> a oa:FragmentSelector ;

    rdf:value " #xywh=100,100,50,75 " . 

 

While someone else could make an annotation just like this and add
oa:hasContext, my contention is that if he or she did so without changing
the Identifier that is the object of the oa:hasTarget to a URN, he or she
would be wrong, i.e., would be making an untrue statement about
http://myserver.net/foo/bar?x=100&y=100&w=50&h=75. The resource identified
by http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 does not have
hasContext as a universally true property. When hasContext is added you are
talking about a different resource and so need to mint a new identifier,
essentially a new identifier to identify your use of the resource in your
annotation.

 

Now, is http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 as useful in my
triple store as http://myserver.net/bar.jpg, probably not. But it's at least
as useful as urn:uuid:b9692250-e31c-11e1-9b23-0800200c9a66, and since
http://myserver.net/bar.jpg is still provided as the object of hasSource,
your triple store is still well populated with regard to this annotation. (I
also contend that because http://myserver.net/foo/bar?x=100&y=100&w=50&h=75
is identifying a segment of http://myserver.net/bar.jpg hasSource +
hasSelector is more informative than dcterms:isPartOf which would be an
alternative way to express some of these relationships.)

 

The natural extension of this argument is that
http://myserver.net/foo/bar?x=100&y=100&w=50&h=75 can equally well  be
replaced by http://myserver.net/bar.jpg#xywh=100,100,50,75. 

 

So, obviously I've missed something here from the previous discussions. And
now multiple of you are probably going to tell me what it is I missed.

 

Thanks,

 

 

Tim Cole

University of Illinois at UC

 

 

 

From: Paolo Ciccarese [mailto:paolo.ciccarese@gmail.com] 
Sent: Thursday, August 09, 2012 5:49 PM
To: Robert Sanderson
Cc: James Smith; public-openannotation
Subject: Re: Selection Filtering

 

+1 yes I would start discussing the topic

On Thu, Aug 9, 2012 at 6:46 PM, Robert Sanderson <azaroth42@gmail.com>
wrote:

Yes, I see the (significant) advantages of the hasContext approach,
both for intuitive-ness and ease of processing.  Perhaps we do just
need both ways.

A topic for next week's call?

Rob

On Thu, Aug 9, 2012 at 4:20 PM, Paolo Ciccarese

<paolo.ciccarese@gmail.com> wrote:
> Maybe we just need both ways?
>
> In your specific example I feel your first proposal makes more sense.
> The zip file is the resource and the XHTML files are parts of it and they
> don't seem having autonomous life.
> If they do they would probably get a URI?
>
> But for classic HTML pages where Images have a URI I would go the other
way.
>
> (still brainstorming)
>
>
>
> On Thu, Aug 9, 2012 at 6:13 PM, Robert Sanderson <azaroth42@gmail.com>
> wrote:
>>
>> Does the hasContext approach (eg climbing up rather than drilling
>> down) deal with situations when there's really embedded content, or
>> only resources that are referenced but have their own unique URIs?
>>
>> For example, the use case I have in mind is an ePub document
>> (basically a zip file that contains HTML and related content) that has
>> a URI, but the chapter xhtml files within it do not.  And similarly
>> the images referenced from those chapters don't have their own URIs,
>> they're just named bitstreams within the zip.
>>
>> I don't immediately see it, if it does.  And if not, how would we go
>> about providing a solution for the use case?
>>
>> Rob
>>
>>
>> On Thu, Aug 9, 2012 at 4:09 PM, Paolo Ciccarese
>> <paolo.ciccarese@gmail.com> wrote:
>> > I am dealing with the same use case exactly now.
>> > I like Rob's first solution but I agree the image is buried in the
>> > selector.
>> > The oa:hasSelector {FileSel1,ImgSel1,Svg1} does not communicate the
>> > message
>> > clearly.
>> >
>> > I would pick Jim's solution as it is simple and in line with the
>> > discussions
>> > we had in the last weeks. Christian Morbidoni was suggesting a similar
>> > approach in a previous email exchange:
>> >
>> >
http://lists.w3.org/Archives/Public/public-openannotation/2012Jul/0038.html
>> >
>> > However, I am not sure if, with that, you can distinguish in between
the
>> > fact that the image has been simply annotated within a context and the
>> > fact
>> > that the annotation makes sense only within that context. In the first
>> > case,
>> > ignoring the context is probably fine. In the second case, it is
>> > probably
>> > not. Probably adding a subproperty could be enough but I was wondering
>> > if
>> > that approach has the potential of scaling to more complex filtering
>> > criteria.
>> >
>> > Paolo
>> >
>> >
>> > On Thu, Aug 9, 2012 at 5:29 PM, James Smith <jgsmith@gmail.com> wrote:
>> >>
>> >> I've been thinking about how to bite, assuming I'm thinking about the
>> >> same
>> >> problem. I've been considering how to specify that a particular
>> >> annotation
>> >> is about a resource when that resource is considered in the context of
>> >> another resource.
>> >>
>> >> This can be done with an additional selector-like property:
>> >> oax:hasContext. This could have the same target as oa:hasTarget, so
>> >> anything
>> >> that can be a target can be a context.
>> >>
>> >> For example, if I wanted to annotate an image as it is embedded in an
>> >> html
>> >> document, I could have the following triples:
>> >>
>> >> Anno1 a oa:Annotation ;
>> >>   oa:hasTarget Spec1 .
>> >> Spec1 a oa:SpecificResource ;
>> >>   oa:hasSource IMG ;
>> >>   oax:hasContext Sel1 .
>> >> Sel1 a oa:SpecificResource ;
>> >>   oa:hasSource HTML .
>> >>
>> >> I'm not sure how to interpret the oa:hasSelector
>> >> {FileSel1,ImgSel1,Svg1}
>> >> pieces of the second example to know how to transform them into a
>> >> similar
>> >> form as above.
>> >>
>> >> I like this form because I can ignore the oax:hasContext piece and
>> >> still
>> >> have a good chance at getting the annotation in the right place. For
>> >> example, if I am annotating a video embedded on a particular page, all
>> >> I
>> >> have to add is the oax:hasContext piece to state that it is in the
>> >> context
>> >> of that resource, instead of annotating that resource and hoping I can
>> >> select the video within that resource (and hope that such a selection
>> >> doesn't have to change due to edits in the embedding document).
>> >>
>> >> -- Jim
>> >>
>> >>
>> >> On Aug 9, 2012, at 4:56 PM, Robert Sanderson <azaroth42@gmail.com>
>> >> wrote:
>> >>
>> >> > No one seems to be biting, so I'll throw out a proposal for a
>> >> > solution
>> >> > (maybe)  :)
>> >> >
>> >> > Instead of considering the annotation to be on the lowest level
>> >> > object
>> >> > and then climbing back up the hierarchy (annotate the image, in the
>> >> > html) instead we can use the regular structure of annotating the
>> >> > highest level resource and drilling down with Selectors to the most
>> >> > appropriate part (annotate the html, select the image).
>> >> >
>> >> > This would work in all of the cases described, and often with just
>> >> > FragmentSelector.
>> >> > eg:
>> >> >
>> >> > Anno1 a oa:Annotation ;
>> >> >    oa:hasTarget Spec1 .
>> >> > Spec1 a oa:SpecificResource ;
>> >> >    oa:hasSource HTML ;
>> >> >    oa:hasSelector Sel1 .
>> >> > Sel1 a oa:FragmentSelector ;     // oax:XpointerFragmentSelector ?
>> >> >    rdf:value "xpointer(/xpath/to/img[@href="Img1"])" .
>> >> >
>> >> > Anno2 a oa:Annotation ;
>> >> >    oa:hasTarget Spec2 .
>> >> > Spec2 a oa:SpecificResource ;
>> >> >    oa:hasSource ePub1 ;
>> >> >    oa:hasSelector CompSel1 .
>> >> > CompSel1 a oa:CompositeSelector ;
>> >> >    oa:hasSelector FileSel1 ;    //  select xhtml file in zip
>> >> >    oa:hasSelector ImgSel1 ;    //  select image in xhtml
>> >> >    oa:hasSelector Svg1 .         //  select SVG area of image
>> >> > (...)
>> >> >
>> >> >
>> >> > The main issue is that the URI of the component resource (eg the
>> >> > image) is not easily accessible, if it has one.  In the ePub case,
it
>> >> > doesn't have its own HTTP URI, but in the regular web page it does.
>> >> >
>> >> > Thoughts?
>> >> >
>> >> > Rob
>> >> >
>> >> >
>> >> > On Wed, Aug 1, 2012 at 4:09 PM, Robert Sanderson
>> >> > <azaroth42@gmail.com>
>> >> > wrote:
>> >> >> Starting a new thread on this topic for ease of tracking :)
>> >> >>
>> >> >> In other a couple of other threads, the desire to describe an
>> >> >> annotation which targets a resource in some particular context was
>> >> >> expressed.
>> >> >> For example, to annotate an image only as it appears in a
particular
>> >> >> html page.
>> >> >>
>> >> >> The base requirement seems to me to be:
>> >> >>    Annotate [part of] (resource) as it is used in (resource)
>> >> >>
>> >> >> This extends quickly to:
>> >> >>    Annotate [part of] (resource) as it is used in [part of]
>> >> >> (resource)
>> >> >> For example, annotate an image as it is used on page 4 of a PDF.
>> >> >>
>> >> >> This could mean arbitrary nesting, to allow for annotating an image
>> >> >> in
>> >> >> an html file in an ePub document.
>> >> >> The same should be applicable for bodies as well as targets, in
>> >> >> order
>> >> >> to extract contents from container resources.
>> >> >>
>> >> >> Is there a requirement for differentiating between the resource,
and
>> >> >> the resource used in some container resource?
>> >> >> For example, is it important to be able to annotate an image, but
>> >> >> not
>> >> >> have the annotation appear when that image is embedded within an
>> >> >> HTML
>> >> >> page?
>> >> >> For annotating non-rendering resources (such as CSS, Javascript
etc)
>> >> >> it might be important?
>> >> >>
>> >> >> Is there a requirement for sets of container resources, or is it
>> >> >> sufficient to simply create new annotations? For example, this
image
>> >> >> in these 3 HTML pages.
>> >> >>
>> >> >>
>> >> >> A second application of filtering, that makes me very nervous, is:
>> >> >>    Annotate all occurrences of (selection) in (set of resources)
>> >> >>
>> >> >> For example all occurrences of the word "annotate" in any textual
>> >> >> resource, all occurrences of the top left pixel in JPEG images, all
>> >> >> occurrences of the first line of text in all copies of
Shakespeare's
>> >> >> "Hamlet".
>> >> >>
>> >> >>
>> >> >> Before we start thinking about approaches and solutions, it would
be
>> >> >> great to firmly scope what it is that we're trying to solve :)
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Rob
>> >> >
>> >>
>> >>
>> >
>
>
>
>
> --
> Dr. Paolo Ciccarese
> http://www.paolociccarese.info/
> Biomedical Informatics Research & Development
> Instructor of Neurology at Harvard Medical School
> Assistant in Neuroscience at Mass General Hospital
> +1-857-366-1524 <tel:%2B1-857-366-1524>  (mobile)   +1-617-768-8744
<tel:%2B1-617-768-8744>  (office)
>
> CONFIDENTIALITY NOTICE: This message is intended only for the
addressee(s),
> may contain information that is considered
> to be sensitive or confidential and may not be forwarded or disclosed to
any
> other party without the permission of the sender.
> If you have received this message in error, please notify the sender
> immediately.
>




-- 
Dr. Paolo Ciccarese
http://www.paolociccarese.info/
Biomedical Informatics Research & Development
Instructor of Neurology at Harvard Medical School
Assistant in Neuroscience at Mass General Hospital
+1-857-366-1524 (mobile)   +1-617-768-8744 (office)

CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
may contain information that is considered
to be sensitive or confidential and may not be forwarded or disclosed to any
other party without the permission of the sender. 
If you have received this message in error, please notify the sender
immediately.
Received on Friday, 10 August 2012 23:44:04 UTC