W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

Re: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 29 Jan 2013 18:26:57 +0100
Message-ID: <51080661.1020300@w3.org>
To: Mārcis Pinnis <marcis.pinnis@Tilde.lv>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Hi Mārcis,

as stated at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0237.html
  - if we follow the "trust is important and the main comment is not 
about introducing multi-layer annotations to ITS2" reasoning, then we 
can close this discussion. Hence I'm providing a few more arguments in 
the 237 mail, please have a look.

Best,

Felix


Am 29.01.13 17:50, schrieb Mārcis Pinnis:
>
> Hi Felix,
>
> My comments are inline.
>
> Best regards,
>
> Mārcis ;o)
>
> -----Original Message-----
> From: Felix Sasaki [mailto:fsasaki@w3.org]
> Sent: Tuesday, January 29, 2013 6:35 PM
> To: Mārcis Pinnis
> Cc: public-multilingualweb-lt@w3.org
> Subject: Re: issue-68 from an annotation representation point of view, 
> with potential implications for annotatorsRef and standoff markup
>
> Hi Mārcis,
>
> Am 29.01.13 16:02, schrieb Mārcis Pinnis:
>
> > Hi Felix,
>
> >
>
> > Could you please explain this in a bit more detail. I am not sure I 
> follow your idea how the stand-off mechanism won't be hierarchical and 
> won't overlap anymore?
>
> >
>
> > So ... (I think it gets very confusing and difficult to follow what 
> the current proposal is) if I understand correctly the ref and id 
> places have been switched so that one could refer from a single span 
> only to a single textAnalyticsAnnotation, right?
>
> Correct.
>
> >   Then if I understand correctly, the following would not resolve to 
> having "University of London" as an organisation and "London" as a 
> place, right? (probably not, therefore, I think I have lost the idea).
>
> >
>
> > <span tanRefs="id1">University of <span
>
> > tanRefs="id2">London</span></span>
>
> It would. The point is: ITS annotations marked as not inheriting (like 
> terminology or disambig) always refer to an element (or attribute) *text
>
> content* - but excluding nested elements. Now, in your example you 
> have separate annotations. So the one with tanRefs="id1" would refer 
> to text content "University of London", and the one with "tanRefs="id" 
> would refer to London.
>
> Mārcis: OK, then I guess I do not understand what was meant with (see 
> below, I marked it in red ... if you will get the e-mail as HTML, if 
> not – here is a quote from below:
>
> Mārcis: I quote: „it would also mean - that's a different to my 
> proposal - that annotations would not be hierarchical and they would 
> not overlap, since they always - both in the inline and standoff case 
> - are anchored at the same span of text”
>
> Mārcis: actually I am also a bit confused about overlapping – we keep 
> mentioning it, but would that even be possible with the stand-off 
> mark-up? For overlapping you need to specify a range (like the example 
> given by Felix from NIF or TEI ... cannot remember right now, but that 
> described ranges), but here we can only get nested annotations 
> (hierarchical and also contradicting).
>
> >
>
> > <its:textAnalyticsAnnotations id="id1"> <its:textAnalyticsAnnotation
>
> > its-tan-type="entity"
>
> > its-tan-ident-ref="http://dbpedia.org/resource/UniversityOfLondon"
>
> > its-tan-class-ref="http://nerd.eurecom.fr/ontology#Organisation"
>
> > its-tan-confidence="0.7" annotatorsRef="tan|annotator-1"/>
>
> > </its:textAnalyticsAnnotations> <its:textAnalyticsAnnotations
>
> > id="id2"> <its:textAnalyticsAnnotation its-tan-type="entity"
>
> > its-tan-ident-ref="http://dbpedia.org/resource/London"
>
> > its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place"
>
> > its-tan-confidence="0.7" annotatorsRef="tan|annotator-1"/>
>
> > </its:textAnalyticsAnnotations>
>
> >
>
> > Also ... is there a reason why the ref and id places have been switched?
>
> Yes - so that there is the same appraoch as for localization quality 
> issue and provenance.
>
> >   In the initial Felix proposal the textAnalyticsAnnotation elements 
> could refer to one span in text (thus allowing actually easy cleanup 
> of the content). This IMO makes content management slightly more 
> difficult (or am I missing something)?
>
> I'm not sure whether this is more difficult?
>
> Mārcis: Now if in the other scenario where you had the id in the span 
> you could remove the textAnalyticsAnnotation elements easy (or even 
> maybe keep them apart having a physical stand-off solution), this is 
> not possible if you have a "ref" attribute in the span, because you 
> would have to run through the whole document and remove all ref 
> attributes that link to the section you would want to delete.
>
> Mārcis: I am not sure whether there is really a need for that, but 
> where there are multi-layer multi-annotator solutions this might in 
> some cases turn out to be handy...
>
> Mārcis: This is, however, just a comment ... maybe there is a really 
> justified reason why for LQI and Provenance it has been decided to 
> have the refs in the span...?!
>
> Best,
>
> Felix
>
> >
>
> > Best regards,
>
> > Mārcis ;o)
>
> >
>
> > -----Original Message-----
>
> > From: Felix Sasaki [mailto:fsasaki@w3.org]
>
> > Sent: Tuesday, January 29, 2013 4:39 PM
>
> > To: public-multilingualweb-lt@w3.org 
> <mailto:public-multilingualweb-lt@w3.org>
>
> > Subject: Re: issue-68 from an annotation representation point of view,
>
> > with potential implications for annotatorsRef and standoff markup
>
> >
>
> > Am 29.01.13 10:56, schrieb Tadej Štajner:
>
> >> Hi, Felix, Phil,
>
> >> maybe 'tanRefs' was misleading. the intention was to point to an
>
> >> its:textAnalysisAnnotations, element which could in turn contain
>
> >> contain several its:textAnalysisAnnotation elements that all describe
>
> >> the same fragment.
>
> > Thanks for the clarification, Tadej - that makes things clearer to me.
>
> >
>
> > I think it also means that we could - instead of "my" standoff
>
> > proposal
>
> > - have standoff markup for a joint terminology + disambiguation data 
> category, to allow for both kinds of annotations to be represented for 
> the same fragment. At Marcis: it would also mean - that's a different 
> to my proposal - that annotations would not be hierarchical and they 
> would not overlap, since they always - both in the inline and standoff 
> case - are anchored at the same span of text.
>
> >
>
> > Best,
>
> >
>
> > Felix
>
> >
>
> >> Is this valid usage of the its:textAnalysisAnnotations, or was it
>
> >> only meant to be a container for the individual rules? I was looking
>
> >> at this example for inspiration:
>
> >> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examp
>
> >> les/xml/EX-locQualityIssue-local-2.xml
>
> >>
>
> >>
>
> >> Alternatively, having multiple values would also work equivalently,
>
> >> then we could point to individual textAnalysisAnnotation statements.
>
> >> -- Tadej
>
> >>
>
> >> On 29. 01. 2013 10:41, Felix Sasaki wrote:
>
> >>> Thanks, Phil. Tadej, was the intention of its:tanRefs at
>
> >>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Ja
>
> >>> n/0212.html
>
> >>>
>
> >>> to have several pointers, e.g. allow for
>
> >>> its:tanRefs="tan1 tan2 tan3"
>
> >>> or just one, that is only "tan1"?
>
> >>>
>
> >>> Best,
>
> >>>
>
> >>> Felx
>
> >>>
>
> >>>
>
> >>> Am 29.01.13 10:34, schrieb Phil Ritchie:
>
> >>>> All
>
> >>>>
>
> >>>> @Felix: "But while doing that a question on the LQI/Provenance
>
> >>>> implementers: is it a feature that you point to just one external
>
> >>>> standoff unit, or an oversight, and it could it be several ones?"
>
> >>>>
>
> >>>> My current thinking is that stand-off stores many annotations for
>
> >>>> one segment. This is because if several segments are linked to one
>
> >>>> stand-off block, then if one of those segments needs to have
>
> >>>> another unique issue registered against it, you have to copy the
>
> >>>> stand-off, add the unique annotation and change the reference id's
>
> >>>> so that the link is between the segment with the additional
>
> >>>> annotation and the copied stand-off.
>
> >>>> Complex.
>
> >>>>
>
> >>>> Another argument for pointing to a single stand-off is that
>
> >>>> although the "classification" attributes of the markup might be
>
> >>>> identical (e.g.
>
> >>>> loc-quality-issue-type="style" loc-quality-issue-severity="75")
>
> >>>> each may have a different loc-quality-issue-comment to highlight
>
> >>>> the specific nature of the error.
>
> >>>>
>
> >>>> Hmm. The benefit of the id being on the segment/element and the
>
> >>>> idRefs being on the stand-off really comes into its own if you want
>
> >>>> to have multiple annotations across many data categories for the
>
> >>>> same segment/element.
>
> >>>>
>
> >>>> <span id="loaded">blah</span>
>
> >>>>
>
> >>>> <its:prov ref="loaded"...
>
> >>>> <its:locQualityIssues ref="loaded"...
>
> >>>> <its:textAnalysis ref="loaded"
>
> >>>> (on the train, I know this is not valid markup.)
>
> >>>>
>
> >>>> Phil
>
> >>>>
>
> >>>>
>
> >>>>
>
> >>>> On 28 Jan 2013, at 19:57, "Felix Sasaki" <fsasaki@w3.org 
> <mailto:fsasaki@w3.org>> wrote:
>
> >>>>
>
> >>>>> But while doing that a question on the LQI/Provenance implementers:
>
> >>>>> is it
>
> >>>> a feature that you point to just one external standoff unit, or an
>
> >>>> oversight, and it could it be several ones?
>
> >>>>
>
> >>>>
>
> >>>> ************************************************************
>
> >>>> This email and any files transmitted with it are confidential and
>
> >>>> intended solely for the use of the individual or entity to whom
>
> >>>> they are addressed. If you have received this email in error please
>
> >>>> notify the sender immediately by e-mail.
>
> >>>>
>
> >>>> www.vistatec.com <http://www.vistatec.com>
>
> >>>> ************************************************************
>
> >>>>
>
> >>>>
>
> >>>
>
> >>
>
> >
>
Received on Tuesday, 29 January 2013 17:27:30 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC