Re: Streamlining the OA Model from Sebastian Hellmann on 2012-08-01 (public-openannotation@w3.org from August 2012)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Wed, 01 Aug 2012 21:58:21 +0200
To: Robert Sanderson <azaroth42@gmail.com>
CC: Paolo Ciccarese <paolo.ciccarese@gmail.com>, public-openannotation <public-openannotation@w3.org>
Message-ID: <50198A5D.4060200@informatik.uni-leipzig.de>
Dear Robert,

Am 01.08.2012 21:16, schrieb Robert Sanderson:
> To be clear, you know that this breaks the IETF restrictions on URI
> fragments, right?
> You can't just invent new fragment schemes for existing mime-types,
> they MUST be specified in the mime type registration document.
This is only for the retrieval actions. Specifying senses for URIs in 
RDF and OWL is *allowed* and even *promoted*.
See http://www.w3.org/TR/rdf-concepts/#section-fragID
Technical question: How come the MediaFragments are defined separately 
from the mime type?
> This is only one reason we didn't go this way, but it's certainly a big one!
> Others include that URIs should be opaque
This is a design decision. There are two camps fighting over this for 
years. Latest discussion was at Wikidata list, IIRC.
> , the opportunity (if not likelihood) for collision,
Well, in theory you might as well never use turing-complete languages. 
How can you make sure your programs will stop in finite time?
> the query-ability
Another design decision. Personally, I would always favor less triples, 
which can be expanded on demand to achieve query-ability. If you start 
calculating number of triples produced by the current Open Annotation 
Spec for part of speech tags, you will soon reach a very practical limit 
on how much text you can handle.

I am not saying that Open Annotation did anything wrong. I am just 
saying that it is not suitable for every use case, as is every 
technology: 
http://www.slideshare.net/slideshow/embed_code/13309166?startSlide=10

NIF and Open Annotation complement each other nicely.
Sebastian
> and so forth, as per the
> media fragments discussion.
>
> Rob
>
> On Wed, Aug 1, 2012 at 9:31 AM, Sebastian Hellmann
> <hellmann@informatik.uni-leipzig.de> wrote:
>> Hello Paolo,
>> let's separate the issues.
>> Issue a) things you can represent with fragment selectors (expressivity)
>> Issue b) syntax
>>
>> a) I am well aware of your use case. Do you have a benchmark that I could
>> use for experiments? If you look at
>> http://svn.aksw.org/papers/2012/NIF/EKAW_short_paper/public.pdf page 6, then
>> you can see that NIF hash URIs are designed for robustness and to withstand
>> changes made to Wikipedia. I am collecting a larger corpus currently, also
>> including HTML. Do you have data sets or pages, which I could use?
>> b) has nothing to do with a) . Truth is, however, that current fragment Ids
>> are not designed to suit many use cases, but this is a shortcoming of issue
>> a) expressivity and not the fault of the syntax. Let's say you could encode
>> all information of OA selectors into fragment-id syntax. What is the harm
>> done?
>>
>> I would really like to have a look into this. Is there a list with available
>> selectors?  I found:
>> http://code.google.com/p/annotation-ontology/wiki/v2Selectors
>> But it only lists 4 classes of selectors and all are not very powerful.
>> Sebastian
>>
>> Am 01.08.2012 17:12, schrieb Paolo Ciccarese:
>>> Dear Sebastian,
>>> I produce annotation on webpages that I cannot control and I work with the
>>> DOM. I mainly annotate scientific content with
>>> http://annotationframework.org
>>>
>>> One example of why the counting and XPointer might not work is the fact
>>> that pages includes  sections like advertisements and news which change
>>> often.  There are even more simple examples, like having the document
>>> displaying somewhere today's date. These modifications can fail selection
>>> and counting and that is why, three years ago I started using different
>>> mechanisms that are less affected - not immune unfortunately - to the
>>> common changes in pages. About at the same time, the need emerged in the
>>> OAC community as well.
>>>
>>> In general, Selectors also makes sense considering the need for annotating
>>> media types other than HTML. For instance, Media Fragments fall short in
>>> many of the already implemented use cases of video annotation tools.
>>>
>>> Hope this helps,
>>> Paolo
>>>
>>>
>>> On Wed, Aug 1, 2012 at 2:43 AM, Sebastian Hellmann <
>>> hellmann@informatik.uni-leipzig.de> wrote:
>>>
>>>> Dear Paolo,
>>>> Why wouldn't this work well?  It is based on RFC5147. Offset works for
>>>> any
>>>> string and therefore also HTML source. Problems arise, when you interpret
>>>> strings. They do not work well for DOM, of course, but this is where one
>>>> would rather use xPointer (W3C) . I guess, it also wouldn't work well to
>>>> use an OA text selector on an image, right?
>>>> With fragments, you definitely gain:
>>>> - compatibility with the web (which also means free implementations)
>>>> - less triples
>>>> - less generated UUID's (if any at all)
>>>>
>>>> What do you gain, when using selectors?  I am not interested in
>>>> theoretical/modelling issues. For me only things count that help you
>>>> succeed in a use case.
>>>> Building a parser for URIs is something very easy to implement, much
>>>> easier in fact than understanding and working with selectors.
>>>> Sebastian
>>>>
>>>>
>>>> Am 31.07.2012 19:51, schrieb Paolo Ciccarese:
>>>>
>>>>    Is the mechanism
>>>>>
>>>>> http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>really
>>>>>
>>>>> working in general?
>>>>>
>>>>> In my experience it does not with HTML pages in general. That would mean
>>>>> having lots of ways of composing the URIs that then need would need to
>>>>> be
>>>>> parsed. That is why we designed more complex selection mechanisms (
>>>>>
>>>>> http://www.openannotation.org/**spec/core/#Selector).<http://www.openannotation.org/spec/core/#Selector%29.>..
>>>>>
>>>>> and therefore more
>>>>> triples.
>>>>>
>>>>> Paolo
>>>>>
>>>>
>>>>
>>>> --
>>>> Dipl. Inf. Sebastian Hellmann
>>>> Department of Computer Science, University of Leipzig
>>>> Events:
>>>>     * http://sabre2012.infai.org/**mlode
>>>> <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012)
>>>>
>>>>     * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>>>> Projects: http://nlp2rdf.org , http://dbpedia.org
>>>> Homepage:
>>>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>>> Research Group: http://aksw.org
>>>>
>>>>
>>
>> --
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Events:
>>    * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
>>    * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>> Projects: http://nlp2rdf.org , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Wednesday, 1 August 2012 19:58:50 UTC