W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

RE: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup

From: Mārcis Pinnis <marcis.pinnis@Tilde.lv>
Date: Tue, 29 Jan 2013 17:02:05 +0200
To: Felix Sasaki <fsasaki@w3.org>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <AC6FD4BB9BB02540AC7322091A6C3B5472B0F01180@postal.Tilde.lv>
Hi Felix,

Could you please explain this in a bit more detail. I am not sure I follow your idea how the stand-off mechanism won't be hierarchical and won't overlap anymore?

So ... (I think it gets very confusing and difficult to follow what the current proposal is) if I understand correctly the ref and id places have been switched so that one could refer from a single span only to a single textAnalyticsAnnotation, right? Then if I understand correctly, the following would not resolve to having "University of London" as an organisation and "London" as a place, right? (probably not, therefore, I think I have lost the idea).

<span tanRefs="id1">University of <span tanRefs="id2">London</span></span>

<its:textAnalyticsAnnotations id="id1">
<its:textAnalyticsAnnotation its-tan-type="entity" its-tan-ident-ref="http://dbpedia.org/resource/UniversityOfLondon" its-tan-class-ref="http://nerd.eurecom.fr/ontology#Organisation" its-tan-confidence="0.7" annotatorsRef="tan|annotator-1"/>
</its:textAnalyticsAnnotations>
<its:textAnalyticsAnnotations id="id2">
<its:textAnalyticsAnnotation its-tan-type="entity" its-tan-ident-ref="http://dbpedia.org/resource/London" its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place" its-tan-confidence="0.7" annotatorsRef="tan|annotator-1"/>
</its:textAnalyticsAnnotations>

Also ... is there a reason why the ref and id places have been switched? In the initial Felix proposal the textAnalyticsAnnotation elements could refer to one span in text (thus allowing actually easy cleanup of the content). This IMO makes content management slightly more difficult (or am I missing something)?

Best regards,
Mārcis ;o)

-----Original Message-----
From: Felix Sasaki [mailto:fsasaki@w3.org] 
Sent: Tuesday, January 29, 2013 4:39 PM
To: public-multilingualweb-lt@w3.org
Subject: Re: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup

Am 29.01.13 10:56, schrieb Tadej Štajner:
> Hi, Felix, Phil,
> maybe 'tanRefs' was misleading. the intention was to point to an 
> its:textAnalysisAnnotations, element which could in turn contain 
> contain several its:textAnalysisAnnotation elements that all describe 
> the same fragment.

Thanks for the clarification, Tadej - that makes things clearer to me.

I think it also means that we could - instead of "my" standoff proposal
- have standoff markup for a joint terminology + disambiguation data category, to allow for both kinds of annotations to be represented for the same fragment. At Marcis: it would also mean - that's a different to my proposal - that annotations would not be hierarchical and they would not overlap, since they always - both in the inline and standoff case - are anchored at the same span of text.

Best,

Felix

> Is this valid usage of the its:textAnalysisAnnotations, or was it only 
> meant to be a container for the individual rules? I was looking at 
> this example for inspiration:
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/xml/EX-locQualityIssue-local-2.xml 
>
>
> Alternatively, having multiple values would also work equivalently, 
> then we could point to individual textAnalysisAnnotation statements.
> -- Tadej
>
> On 29. 01. 2013 10:41, Felix Sasaki wrote:
>> Thanks, Phil. Tadej, was the intention of its:tanRefs at
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0212.html 
>>
>> to have several pointers, e.g. allow for
>> its:tanRefs="tan1 tan2 tan3"
>> or just one, that is only "tan1"?
>>
>> Best,
>>
>> Felx
>>
>>
>> Am 29.01.13 10:34, schrieb Phil Ritchie:
>>> All
>>>
>>> @Felix: "But while doing that a question on the LQI/Provenance
>>> implementers: is it a feature that you point to just one external 
>>> standoff
>>> unit, or an oversight, and it could it be several ones?"
>>>
>>> My current thinking is that stand-off stores many annotations for one
>>> segment. This is because if several segments are linked to one 
>>> stand-off
>>> block, then if one of those segments needs to have another unique issue
>>> registered against it, you have to copy the stand-off, add the unique
>>> annotation and change the reference id's so that the link is between 
>>> the
>>> segment with the additional annotation and the copied stand-off. 
>>> Complex.
>>>
>>> Another argument for pointing to a single stand-off is that although 
>>> the
>>> "classification" attributes of the markup might be identical (e.g.
>>> loc-quality-issue-type="style" loc-quality-issue-severity="75") each 
>>> may
>>> have a different loc-quality-issue-comment to highlight the specific 
>>> nature
>>> of the error.
>>>
>>> Hmm. The benefit of the id being on the segment/element and the idRefs
>>> being on the stand-off really comes into its own if you want to have
>>> multiple annotations across many data categories for the same
>>> segment/element.
>>>
>>> <span id="loaded">blah</span>
>>>
>>> <its:prov ref="loaded"...
>>> <its:locQualityIssues ref="loaded"...
>>> <its:textAnalysis ref="loaded"
>>> (on the train, I know this is not valid markup.)
>>>
>>> Phil
>>>
>>>
>>>
>>> On 28 Jan 2013, at 19:57, "Felix Sasaki" <fsasaki@w3.org> wrote:
>>>
>>>> But while doing that a question on the LQI/Provenance implementers: 
>>>> is it
>>> a feature that you point to just one external standoff unit, or an
>>> oversight, and it could it be several ones?
>>>
>>>
>>> ************************************************************
>>> This email and any files transmitted with it are confidential and
>>> intended solely for the use of the individual or entity to whom they
>>> are addressed. If you have received this email in error please notify
>>> the sender immediately by e-mail.
>>>
>>> www.vistatec.com
>>> ************************************************************
>>>
>>>
>>
>>
>
>


Received on Tuesday, 29 January 2013 15:02:38 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC