RE: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup

Hi Felix,



My comments are inline.



Best regards,

Mārcis ;o)



-----Original Message-----
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Tuesday, January 29, 2013 6:35 PM
To: Mārcis Pinnis
Cc: public-multilingualweb-lt@w3.org
Subject: Re: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup



Hi Mārcis,



Am 29.01.13 16:02, schrieb Mārcis Pinnis:

> Hi Felix,

>

> Could you please explain this in a bit more detail. I am not sure I follow your idea how the stand-off mechanism won't be hierarchical and won't overlap anymore?

>

> So ... (I think it gets very confusing and difficult to follow what the current proposal is) if I understand correctly the ref and id places have been switched so that one could refer from a single span only to a single textAnalyticsAnnotation, right?



Correct.



>   Then if I understand correctly, the following would not resolve to having "University of London" as an organisation and "London" as a place, right? (probably not, therefore, I think I have lost the idea).

>

> <span tanRefs="id1">University of <span

> tanRefs="id2">London</span></span>



It would. The point is: ITS annotations marked as not inheriting (like terminology or disambig) always refer to an element (or attribute) *text

content* - but excluding nested elements. Now, in your example you have separate annotations. So the one with tanRefs="id1" would refer to text content "University of London", and the one with "tanRefs="id" would refer to London.



Mārcis: OK, then I guess I do not understand what was meant with (see below, I marked it in red ... if you will get the e-mail as HTML, if not – here is a quote from below:

Mārcis: I quote: „it would also mean - that's a different to my proposal - that annotations would not be hierarchical and they would not overlap, since they always - both in the inline and standoff case - are anchored at the same span of text”

Mārcis: actually I am also a bit confused about overlapping – we keep mentioning it, but would that even be possible with the stand-off mark-up? For overlapping you need to specify a range (like the example given by Felix from NIF or TEI ... cannot remember right now, but that described ranges), but here we can only get nested annotations (hierarchical and also contradicting).



>

> <its:textAnalyticsAnnotations id="id1"> <its:textAnalyticsAnnotation

> its-tan-type="entity"

> its-tan-ident-ref="http://dbpedia.org/resource/UniversityOfLondon"

> its-tan-class-ref="http://nerd.eurecom.fr/ontology#Organisation"

> its-tan-confidence="0.7" annotatorsRef="tan|annotator-1"/>

> </its:textAnalyticsAnnotations> <its:textAnalyticsAnnotations

> id="id2"> <its:textAnalyticsAnnotation its-tan-type="entity"

> its-tan-ident-ref="http://dbpedia.org/resource/London"

> its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place"

> its-tan-confidence="0.7" annotatorsRef="tan|annotator-1"/>

> </its:textAnalyticsAnnotations>

>

> Also ... is there a reason why the ref and id places have been switched?



Yes - so that there is the same appraoch as for localization quality issue and provenance.



>   In the initial Felix proposal the textAnalyticsAnnotation elements could refer to one span in text (thus allowing actually easy cleanup of the content). This IMO makes content management slightly more difficult (or am I missing something)?



I'm not sure whether this is more difficult?



Mārcis: Now if in the other scenario where you had the id in the span you could remove the textAnalyticsAnnotation elements easy (or even maybe keep them apart having a physical stand-off solution), this is not possible if you have a "ref" attribute in the span, because you would have to run through the whole document and remove all ref attributes that link to the section you would want to delete.



Mārcis: I am not sure whether there is really a need for that, but where there are multi-layer multi-annotator solutions this might in some cases turn out to be handy...



Mārcis: This is, however, just a comment ... maybe there is a really justified reason why for LQI and Provenance it has been decided to have the refs in the span...?!



Best,



Felix



>

> Best regards,

> Mārcis ;o)

>

> -----Original Message-----

> From: Felix Sasaki [mailto:fsasaki@w3.org]

> Sent: Tuesday, January 29, 2013 4:39 PM

> To: public-multilingualweb-lt@w3.org<mailto:public-multilingualweb-lt@w3.org>

> Subject: Re: issue-68 from an annotation representation point of view,

> with potential implications for annotatorsRef and standoff markup

>

> Am 29.01.13 10:56, schrieb Tadej Štajner:

>> Hi, Felix, Phil,

>> maybe 'tanRefs' was misleading. the intention was to point to an

>> its:textAnalysisAnnotations, element which could in turn contain

>> contain several its:textAnalysisAnnotation elements that all describe

>> the same fragment.

> Thanks for the clarification, Tadej - that makes things clearer to me.

>

> I think it also means that we could - instead of "my" standoff

> proposal

> - have standoff markup for a joint terminology + disambiguation data category, to allow for both kinds of annotations to be represented for the same fragment. At Marcis: it would also mean - that's a different to my proposal - that annotations would not be hierarchical and they would not overlap, since they always - both in the inline and standoff case - are anchored at the same span of text.

>

> Best,

>

> Felix

>

>> Is this valid usage of the its:textAnalysisAnnotations, or was it

>> only meant to be a container for the individual rules? I was looking

>> at this example for inspiration:

>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examp


>> les/xml/EX-locQualityIssue-local-2.xml

>>

>>

>> Alternatively, having multiple values would also work equivalently,

>> then we could point to individual textAnalysisAnnotation statements.

>> -- Tadej

>>

>> On 29. 01. 2013 10:41, Felix Sasaki wrote:

>>> Thanks, Phil. Tadej, was the intention of its:tanRefs at

>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Ja


>>> n/0212.html

>>>

>>> to have several pointers, e.g. allow for

>>> its:tanRefs="tan1 tan2 tan3"

>>> or just one, that is only "tan1"?

>>>

>>> Best,

>>>

>>> Felx

>>>

>>>

>>> Am 29.01.13 10:34, schrieb Phil Ritchie:

>>>> All

>>>>

>>>> @Felix: "But while doing that a question on the LQI/Provenance

>>>> implementers: is it a feature that you point to just one external

>>>> standoff unit, or an oversight, and it could it be several ones?"

>>>>

>>>> My current thinking is that stand-off stores many annotations for

>>>> one segment. This is because if several segments are linked to one

>>>> stand-off block, then if one of those segments needs to have

>>>> another unique issue registered against it, you have to copy the

>>>> stand-off, add the unique annotation and change the reference id's

>>>> so that the link is between the segment with the additional

>>>> annotation and the copied stand-off.

>>>> Complex.

>>>>

>>>> Another argument for pointing to a single stand-off is that

>>>> although the "classification" attributes of the markup might be

>>>> identical (e.g.

>>>> loc-quality-issue-type="style" loc-quality-issue-severity="75")

>>>> each may have a different loc-quality-issue-comment to highlight

>>>> the specific nature of the error.

>>>>

>>>> Hmm. The benefit of the id being on the segment/element and the

>>>> idRefs being on the stand-off really comes into its own if you want

>>>> to have multiple annotations across many data categories for the

>>>> same segment/element.

>>>>

>>>> <span id="loaded">blah</span>

>>>>

>>>> <its:prov ref="loaded"...

>>>> <its:locQualityIssues ref="loaded"...

>>>> <its:textAnalysis ref="loaded"

>>>> (on the train, I know this is not valid markup.)

>>>>

>>>> Phil

>>>>

>>>>

>>>>

>>>> On 28 Jan 2013, at 19:57, "Felix Sasaki" <fsasaki@w3.org<mailto:fsasaki@w3.org>> wrote:

>>>>

>>>>> But while doing that a question on the LQI/Provenance implementers:

>>>>> is it

>>>> a feature that you point to just one external standoff unit, or an

>>>> oversight, and it could it be several ones?

>>>>

>>>>

>>>> ************************************************************

>>>> This email and any files transmitted with it are confidential and

>>>> intended solely for the use of the individual or entity to whom

>>>> they are addressed. If you have received this email in error please

>>>> notify the sender immediately by e-mail.

>>>>

>>>> www.vistatec.com<http://www.vistatec.com>

>>>> ************************************************************

>>>>

>>>>

>>>

>>

>

Received on Tuesday, 29 January 2013 16:50:48 UTC