W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

Re: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 28 Jan 2013 20:56:56 +0100
Message-ID: <5106D808.7030303@w3.org>
To: Tadej Štajner <tadej.stajner@ijs.si>
CC: Mārcis Pinnis <marcis.pinnis@Tilde.lv>, Yves Savourel <ysavourel@enlaso.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, Artūrs Vasiļevskis <arturs.vasilevskis@Tilde.lv>
Hi Tadej, all,

sorry for not giving detailed replies to other mails. Trying to bring 
together *some* loose ends here.

Am 28.01.13 19:08, schrieb Tadej Štajner:
> Hi, all, (long e-mail ahead, you can scroll to TL;DR)
> true - the current state is a local optimum that satisfies the 
> requirements. It would need some polish, better guidance and stricter 
> definitions, and possibly renaming disambigGranularity back to 
> disambigType.
>
> As an improvement, Felix's proposal makes some sense, since it makes 
> ITS2.0 capable of proper multi-layer annotation. If this two 
> mechanisms for inline+standoff annotation is too complex to implement, 
> it would be an acceptable compromise to just have only the stand-off 
> and no inline (except for term="yes", maybe), but I'd vote in favor of 
> keeping the inline part.
>
> Also, the ref/id pointing could also be expressed the other way 
> around, pointing from fragment to the annotation. Instead of:
> <span id="dublin1">Dublin</span>
> ...
> <its:textAnalysisAnnotation its:tanType="entity" 
> its:tanIdentRef="http://dbpedia.org/resource/Dublin" ref="dublin1" />
>
> I would suggest same mechanism as in LQI, so we have some symmetry:
>
> <span its:tanRefs="tan1">Dublin</span>
> <its:textAnalysisAnnotations id="tan1">
>     <its:textAnalysisAnnotation its:tanType="entity" 
> its:tanIdentRef="http://dbpedia.org/resource/Dublin"/>
> </its:textAnalysisAnnotations>

In the above you use the name its:tanRefs. Does that imply that you 
assume referencs to several annotations?
At Yves, as a reply to
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0206.html
"I don't see a difference between what the standoff markup of 
LQI/Provenance does and this standoff for Term+Disambiguation does."
I think the difference is how you store in my example the external 
annotations: in separate units, pointing to the same ID. In Tadejs 
example you then also have the potential to point to several units. I 
think that is different from the current LQI/Provenance approach: here 
the idea is to just add one link relation. I'm not sure yet whether that 
difference is significant - I have to think about it.
But while doing that a question on the LQI/Provenance implementers: is 
it a feature that you point to just one external standoff unit, or an 
oversight, and it could it be several ones?

Wrt to the below, the lowest effort would probably be "drop 
granularity", that is 2) below. To accomodate one part of Christian's 
comment at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0014.html
we could rename disambigatution to its-tan-*, and re-write the 
disambiguation section.

If we then forsee that several annotations might happen, we could 
accomodate for the LQI/Provenance standoff approach.

Since there have been many others mails on this, and I can't reply to 
these here: Mārcis, Yves, would that resolve your concerns and 
questions? Christian, I assume that Tadej's characterization 
"less-specific 'pointer to some meaning identifier' brother to 
Terminology." of disambiguation (or "tan") would not satisfy your 
concern - what would you propose?

Best,

Felix


>
> Secondly, I'll give another alternative (and orthogonal) proposal, 
> repeating what Pablo Mendes already hinted at in August: remember the 
> question of supporting the distinction between different 
> disambiguation types - entity, lexical concept, ontology, concept, 
> which is now encoded in the 'disambigGranularity' attribute (relevant 
> discussion 
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0322.html).
>
> When trying to merge Terminology and Disambiguation, having that many 
> disambiguation types supported in the same way implies that we end up 
> with 16 or so attributes. After some discussion in Prague, we realized 
> that although we've established that a distinction between those types 
> exists and it is important, we couldn't come up with a use case where 
> having that information would make a difference in the actual workflows.
>
> Let me clarify:  if a consumer component cares about disambiguation, 
> it will try to resolve the disambigIdentRef identifier. By resolving 
> it, it is able to know what type/level/granularity of disambiguation 
> it's dealing with. By that reasoning, having this information explicit 
> is redundant, because the system already did its job. The question is, 
> is there a use case that justifies keeping the 'disambigGranularity'? 
> For instance, operating on the disambiguation values without actually 
> resolving them? Maybe filtering?
>
> So, we'd go from:
> <span
>           its-disambig-confidence="0.7"
>           its-disambig-class-ref="http://nerd.eurecom.fr/ontology#Place"
>           its-disambig-ident-ref="http://dbpedia.org/resource/Dublin"
> its-disambig-granularity="entity">Dublin</span>
>       is the <span
>           its-disambig-source="Wordnet3.0"
>           its-disambig-ident="301467919"
>           its-disambig-granularity="lexical-concept"
>           its-disambig-confidence="0.5"
>           >capital</span> of Ireland.
>
> to:
> <span
>           its-disambig-confidence="0.7"
>           its-disambig-class-ref="http://nerd.eurecom.fr/ontology#Place"
>           
> its-disambig-ident-ref="http://dbpedia.org/resource/Dublin">Dublin</span>
>       is the <span
>           its-disambig-source="Wordnet3.0"
>           its-disambig-ident="301467919"
>           its-disambig-confidence="0.5"
>           >capital</span> of Ireland.
>
> In this setting, ITS would just operate with references to identifiers 
> and wouldn't care about the type of that relationship. I understand 
> this is losing information, and it weakens the expressive power, but 
> I'm asking this because it might simplify a couple of solutions here. 
> Even though I proposed it initially, I wouldn't push something that 
> hasn't got any consumers behind it (the T in ITS doesn't stand for 
> Tadej.. :) ). It would also establish a clearer boundary between what 
> ITS covers and what other formats should cover.
>
> TL;DR
> In short, I see the some scenarios that I'd be ok with:
> 1) If we keep 'granularity':
>     1a) We keep granularity in the form of its:tanType and go with 
> Felix's proposal in the form of its:tanType, and possibly inverting 
> the addressing so it's like LQI;
>     1b) We keep granularity, we keep current proposed Disambiguation 
> data model, possibly renaming 'disambigGranularity' back to 
> 'disambigType';
> 2) If we drop 'granularity', we probably wouldn't need the new 
> its:tan* model, and it would make sense to keep the rest of the 
> disambiguation data category as-is, and describing the three usage 
> scenarios only as best practices. Disambiguation would then serve as a 
> less-specific 'pointer to some meaning identifier' brother to Terminology.
>
> -- Tadej
>
> On 28. 01. 2013 16:42, Mārcis Pinnis wrote:
>> Hi Felix, all,
>>
>> I also do not have anything against leaving everything as is.
>> I however (as I made clear in my previous e-mail) don't think that the stand-off markup is a nice solution.
>>
>> Best regards,
>> Mārcis ;o)
>>
>> -----Original Message-----
>> From: Yves Savourel [mailto:ysavourel@enlaso.com]
>> Sent: Monday, January 28, 2013 5:31 PM
>> To: 'Felix Sasaki'; Mārcis Pinnis
>> Cc:public-multilingualweb-lt@w3.org; Artūrs Vasiļevskis
>> Subject: RE: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup
>>
>> Hi Felix, all,
>>
>>> Just a judgment from my side: I think at the moment we don't have
>>> consensus for
>>>
>>> - leaving everything as is (Dave's proposal)
>> I don't have anything against leaving things as is.
>> There is nothing really broken.
>>
>> It's just that having both data categories fused would be a bit nicer. But overall if there is no time to make that work, we can indeed just leave it as it is.
>>
>> cheers,
>> -yves
>>
>>
>
Received on Monday, 28 January 2013 19:57:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC