W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

Re: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 29 Jan 2013 08:51:24 +0100
Message-ID: <51077F7C.8070809@w3.org>
To: Mārcis Pinnis <marcis.pinnis@Tilde.lv>
CC: Tadej Štajner <tadej.stajner@ijs.si>, Yves Savourel <ysavourel@enlaso.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, Artūrs Vasiļevskis <arturs.vasilevskis@Tilde.lv>
Am 29.01.13 07:52, schrieb Mārcis Pinnis:
>
> Hi Felix,
>
> If I understood correctly, the new proposal is to slightly change the 
> Disambiguation data category (by dropping granularity)
>

Hi Mārcis,

the below proposal is like that, correct. However, it has the drawback 
that no relation between terminology and disambiguation is expressed. 
That brings us back to the original issue-68. That included deprecating 
terminology. I assume that you would not agree with that, but would 
continue to generate terminology markup? So in a sense we are back at 
the start.

In a different sense we made a progress. At
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0042.html
your main concern about disambiguation was the granularities, and below 
proposal includes dropping them. However, another concern may be the 
naming "disambiguation". I'm not sure about this, hence just asking you 
and others interested in the issue.

Best,

Felix

> and leave Terminology as is? If yes, then I’m OK with that if everyone 
> else is.
>
> Best regards,
>
> Mārcis ;o)
>
> *From:*Felix Sasaki [mailto:fsasaki@w3.org]
> *Sent:* Monday, January 28, 2013 9:57 PM
> *To:* Tadej Štajner
> *Cc:* Mārcis Pinnis; Yves Savourel; public-multilingualweb-lt@w3.org; 
> Artūrs Vasiļevskis
> *Subject:* Re: issue-68 from an annotation representation point of 
> view, with potential implications for annotatorsRef and standoff markup
>
> Hi Tadej, all,
>
> sorry for not giving detailed replies to other mails. Trying to bring 
> together *some* loose ends here.
>
> Am 28.01.13 19:08, schrieb Tadej Štajner:
>
>     Hi, all, (long e-mail ahead, you can scroll to TL;DR)
>     true - the current state is a local optimum that satisfies the
>     requirements. It would need some polish, better guidance and
>     stricter definitions, and possibly renaming disambigGranularity
>     back to disambigType.
>
>     As an improvement, Felix's proposal makes some sense, since it
>     makes ITS2.0 capable of proper multi-layer annotation. If this two
>     mechanisms for inline+standoff annotation is too complex to
>     implement, it would be an acceptable compromise to just have only
>     the stand-off and no inline (except for term="yes", maybe), but
>     I'd vote in favor of keeping the inline part.
>
>     Also, the ref/id pointing could also be expressed the other way
>     around, pointing from fragment to the annotation. Instead of:
>     <span id="dublin1">Dublin</span>
>     ...
>     <its:textAnalysisAnnotation its:tanType="entity"
>     its:tanIdentRef="http://dbpedia.org/resource/Dublin"
>     <http://dbpedia.org/resource/Dublin> ref="dublin1" />
>
>     I would suggest same mechanism as in LQI, so we have some symmetry:
>
>     <span its:tanRefs="tan1">Dublin</span>
>     <its:textAnalysisAnnotations id="tan1">
>         <its:textAnalysisAnnotation its:tanType="entity"
>     its:tanIdentRef="http://dbpedia.org/resource/Dublin"
>     <http://dbpedia.org/resource/Dublin>/>
>     </its:textAnalysisAnnotations>
>
>
> In the above you use the name its:tanRefs. Does that imply that you 
> assume referencs to several annotations?
> At Yves, as a reply to
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0206.html
> "I don't see a difference between what the standoff markup of 
> LQI/Provenance does and this standoff for Term+Disambiguation does."
> I think the difference is how you store in my example the external 
> annotations: in separate units, pointing to the same ID. In Tadejs 
> example you then also have the potential to point to several units. I 
> think that is different from the current LQI/Provenance approach: here 
> the idea is to just add one link relation. I'm not sure yet whether 
> that difference is significant - I have to think about it.
> But while doing that a question on the LQI/Provenance implementers: is 
> it a feature that you point to just one external standoff unit, or an 
> oversight, and it could it be several ones?
>
> Wrt to the below, the lowest effort would probably be "drop 
> granularity", that is 2) below. To accomodate one part of Christian's 
> comment at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0014.html
> we could rename disambigatution to its-tan-*, and re-write the 
> disambiguation section.
>
> If we then forsee that several annotations might happen, we could 
> accomodate for the LQI/Provenance standoff approach.
>
> Since there have been many others mails on this, and I can't reply to 
> these here: Mārcis, Yves, would that resolve your concerns and 
> questions? Christian, I assume that Tadej's characterization 
> "less-specific 'pointer to some meaning identifier' brother to 
> Terminology." of disambiguation (or "tan") would not satisfy your 
> concern - what would you propose?
>
> Best,
>
> Felix
>
>
>
>
>     Secondly, I'll give another alternative (and orthogonal) proposal,
>     repeating what Pablo Mendes already hinted at in August: remember
>     the question of supporting the distinction between different
>     disambiguation types - entity, lexical concept, ontology, concept,
>     which is now encoded in the 'disambigGranularity' attribute
>     (relevant discussion
>     http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0322.html).
>
>     When trying to merge Terminology and Disambiguation, having that
>     many disambiguation types supported in the same way implies that
>     we end up with 16 or so attributes. After some discussion in
>     Prague, we realized that although we've established that a
>     distinction between those types exists and it is important, we
>     couldn't come up with a use case where having that information
>     would make a difference in the actual workflows.
>
>     Let me clarify:  if a consumer component cares about
>     disambiguation, it will try to resolve the disambigIdentRef
>     identifier. By resolving it, it is able to know what
>     type/level/granularity of disambiguation it's dealing with. By
>     that reasoning, having this information explicit is redundant,
>     because the system already did its job. The question is, is there
>     a use case that justifies keeping the 'disambigGranularity'? For
>     instance, operating on the disambiguation values without actually
>     resolving them? Maybe filtering?
>
>     So, we'd go from:
>     <span
>               its-disambig-confidence="0.7"
>              
>     its-disambig-class-ref="http://nerd.eurecom.fr/ontology#Place"
>     <http://nerd.eurecom.fr/ontology#Place>
>              
>     its-disambig-ident-ref="http://dbpedia.org/resource/Dublin"
>     <http://dbpedia.org/resource/Dublin>
>     its-disambig-granularity="entity">Dublin</span>
>           is the <span
>               its-disambig-source="Wordnet3.0"
>               its-disambig-ident="301467919"
>               its-disambig-granularity="lexical-concept"
>               its-disambig-confidence="0.5"
>               >capital</span> of Ireland.
>
>     to:
>     <span
>               its-disambig-confidence="0.7"
>              
>     its-disambig-class-ref="http://nerd.eurecom.fr/ontology#Place"
>     <http://nerd.eurecom.fr/ontology#Place>
>              
>     its-disambig-ident-ref="http://dbpedia.org/resource/Dublin"
>     <http://dbpedia.org/resource/Dublin>>Dublin</span>
>           is the <span
>               its-disambig-source="Wordnet3.0"
>               its-disambig-ident="301467919"
>               its-disambig-confidence="0.5"
>               >capital</span> of Ireland.
>
>     In this setting, ITS would just operate with references to
>     identifiers and wouldn't care about the type of that relationship.
>     I understand this is losing information, and it weakens the
>     expressive power, but I'm asking this because it might simplify a
>     couple of solutions here. Even though I proposed it initially, I
>     wouldn't push something that hasn't got any consumers behind it
>     (the T in ITS doesn't stand for Tadej.. :) ). It would also
>     establish a clearer boundary between what ITS covers and what
>     other formats should cover.
>
>     TL;DR
>     In short, I see the some scenarios that I'd be ok with:
>     1) If we keep 'granularity':
>         1a) We keep granularity in the form of its:tanType and go with
>     Felix's proposal in the form of its:tanType, and possibly
>     inverting the addressing so it's like LQI;
>         1b) We keep granularity, we keep current proposed
>     Disambiguation data model, possibly renaming 'disambigGranularity'
>     back to 'disambigType';
>     2) If we drop 'granularity', we probably wouldn't need the new
>     its:tan* model, and it would make sense to keep the rest of the
>     disambiguation data category as-is, and describing the three usage
>     scenarios only as best practices. Disambiguation would then serve
>     as a less-specific 'pointer to some meaning identifier' brother to
>     Terminology.
>
>     -- Tadej
>
>     On 28. 01. 2013 16:42, Mārcis Pinnis wrote:
>
>         Hi Felix, all,
>
>           
>
>         I also do not have anything against leaving everything as is.
>
>         I however (as I made clear in my previous e-mail) don't think that the stand-off markup is a nice solution.
>
>           
>
>         Best regards,
>
>         Mārcis ;o)
>
>           
>
>         -----Original Message-----
>
>         From: Yves Savourel [mailto:ysavourel@enlaso.com]
>
>         Sent: Monday, January 28, 2013 5:31 PM
>
>         To: 'Felix Sasaki'; Mārcis Pinnis
>
>         Cc:public-multilingualweb-lt@w3.org  <mailto:public-multilingualweb-lt@w3.org>; Artūrs Vasiļevskis
>
>         Subject: RE: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup
>
>           
>
>         Hi Felix, all,
>
>           
>
>             Just a judgment from my side: I think at the moment we don't have
>
>             consensus for
>
>               
>
>             - leaving everything as is (Dave's proposal)
>
>         I don't have anything against leaving things as is.
>
>         There is nothing really broken.
>
>           
>
>         It's just that having both data categories fused would be a bit nicer. But overall if there is no time to make that work, we can indeed just leave it as it is.
>
>           
>
>         cheers,
>
>         -yves
>
>           
>
>           
>
Received on Tuesday, 29 January 2013 07:51:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC