W3C home > Mailing lists > Public > public-multilingualweb-lt-comments@w3.org > February 2013

Re: [ISSUE-109]: disambiguation ITS 2.0 requirements w.r.t Indian [Indic] languages [ACTION-418]

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Wed, 20 Feb 2013 20:35:37 +0100
Message-ID: <51252589.5020208@informatik.uni-leipzig.de>
To: Felix Sasaki <fsasaki@w3.org>
CC: Somnath Chandra <schandra@deity.gov.in>, Dave Lewis <dave.lewis@cs.tcd.ie>, slata@mit.gov.in, public-multilingualweb-lt-comments@w3.org, Manoj Jain <mjain@deity.gov.in>
Hi all,
just a quick comment on this. I am currently working on a NIF version 
2.0. It is basically done, but needs a lot of documentation to be 
refchanged and code to be cleaned and updated.

The latest description can be found in a chapter of this Springer book:
The People's Web Meets NLP - Collaboratively Constructed Language 
Resources, Gurevych, Iryna; Kim, Jungi (Eds.) 2013, XXX, 394 p. 79 
illus., 54 in color.

But feel free to look at the preprint:
http://svn.aksw.org/papers/2012/PeoplesWeb/public_preprint.pdf

Regarding OLIA:
As I see you are planning to construct an upper level-ontology for three 
Indic POS tag sets.
Please have a look, whether OLiA doesn't provide this already. Maybe you 
can reuse it. If not you are also welcome to contribute Annotation 
models to OliA at:
http://sourceforge.net/projects/olia/

The NLP2RDF mailing list for NIF and OLiA is here: 
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf

Once you add your ontologies to OLiA they will be automatically 
available for NIF.

Please email the NLP2RDF list, if you need any help.

All the best,
Sebastian


Am 12.02.2013 09:53, schrieb Felix Sasaki:
> Thanks, Somnath. FYI, I have subscribed you to the 
> public-multilingualweb-lt-comments list, so that your mails get 
> through. At moment I cannot give more directions than to propose you 
> to start writing :) Would it be ok for you to join the working group, 
> to move things forward easier?
>
> Best,
>
> Felix
>
> Am 12.02.13 07:02, schrieb Somnath Chandra:
>> Hello Felix, Dave and Others,
>> Thanks for your very encouraging respose. Your ideas are appropriate 
>> to take up the matter in a  timebound way. We have already mobilized 
>> the language technology researchers in India and would develop the 
>> best pratices document quickly and would be in active collaboration 
>> with you all.  While developing the best practices , we shall also 
>> capture the linguistic nuances for different languages (22 
>> constitutionally recognized Indian languages) and their requirements.
>> Looking forward for your further direction.
>> With best regards,
>> Somnath
>> On 02/12/13, *Felix Sasaki *<fsasaki@w3.org> wrote:
>>>
>>> Hi Dave, all,
>>>
>>> Am 11.02.13 16:00, schrieb Dave Lewis:
>>>> Hi Somnath,
>>>> thanks you for your response. Your input into best practices would 
>>>> be warmly weclome by the group. i think this would split into two 
>>>> potential parts;
>>>> 1) producing a best practice for NIF usage in the context of 
>>>> typical workflows that would use ITS
>>>> 2) if during (1) the expressiveness of NIF was found wanting, you 
>>>> may want to engage directly with the NIF community.
>>>>
>>>> I'd ask Felix and Sebastien Hellman to also comment on the best 
>>>> route to advancing ITS2.0 best practice in this area, - Felix we 
>>>> don't currently have a stub for best practice in relation to NIF on 
>>>> the wiki, so should we start one?
>>>
>>> Mostly for Somnath et al.: As discussed in the group call today, 
>>> that is just a question of manpower. At
>>> http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Draft_documents_and_time_line
>>> we have linked to potential "best practices documents, see
>>> http://www.w3.org/International/multilingualweb/lt/wiki/Best_Practice_Documents
>>> Sure we can add a NIF document here. We just need a volunteer to 
>>> write it. So Somnath, if that is of interest for you, let us know. 
>>> It might also make sense to involve Sebastian Hellmann - putting him 
>>> here in CC.
>>>
>>> Best,
>>>
>>> Felix
>>>
>>>>
>>>> Regards,
>>>> Dave
>>>>
>>>> On 07/02/2013 10:06, Somnath Chandra wrote:
>>>>> Hello Lewis,
>>>>> Thanks a lot for your feedback. We have studied the NIF encoding 
>>>>> and Indian Languages requirements for Hierarchical Annotation need 
>>>>> to be incorporated in details in NIF.
>>>>>
>>>>> As defined in NIF Version 
>>>>> (http://nlp2rdf.org/nif-1-0#toc-part-of-speech-tags ) , Part of 
>>>>> speech tags should make use of Ontologies of Linguistic 
>>>>> Annotations (OLiA) . OLiA connects local annotation tag sets with 
>>>>> a global reference ontology. Therefore it allows to keep the 
>>>>> specific part of speech tag at a fine granularity, while at the 
>>>>> same time having a coarse grained reference model.
>>>>>
>>>>> OLiA defines OLiA Annotation Models for morphology, morphosyntax 
>>>>> and syntax for multilingual.
>>>>>
>>>>> However there are three Multilingual Annotation Models for 
>>>>> morphological, morphosyntactic and syntactic annotation for Indian 
>>>>> langauges i.e L-POSTS tagset Baskaran et al. (2008) , 
>>>>> AnnCorra,Bharati et al. (2006), IIIT tagset,IIT (2007).
>>>>>
>>>>>       We are in  process of defining a common POS tagset for Indic 
>>>>> languages , based on W3C Internationalization best practices. The 
>>>>> draft standard has been developed and is under process of testing 
>>>>> and evaluation. Once finalized, the above three POS tagsets would 
>>>>> be replaced by this national standard , which may be incorporated 
>>>>> in NIF.
>>>>>
>>>>>      We would definitely actively participate in developing the 
>>>>> best practices for use of ITS with external NIF models with the 
>>>>> W3C team.
>>>>>
>>>>>      With regards,
>>>>>
>>>>> Dr. Somnath Chandra
>>>>> Scientist-E & Dy. Country Manager W3C India
>>>>> Dept. of Electronics & Information Technology
>>>>> Ministry of Communications & Information Technology
>>>>> Govt. of India
>>>>> Tel:+91-11-24364744,24301856
>>>>> Fax: +91-11-24363099
>>>>> e-mail :schandra@mit.gov.in
>>>>>
>>>>> On 02/04/13, *Dave Lewis *<dave.lewis@cs.tcd.ie> 
>>>>> <mailto:dave.lewis@cs.tcd.ie> wrote:
>>>>>>
>>>>>> Hi Somnath,
>>>>>> I wanted to follow up on this comment also. Do you have any 
>>>>>> comments on our response, was it satisfactory? If we hear from 
>>>>>> you to the contrary we will assume you are satisfied and aim to 
>>>>>> close ISSUE-109 on the 11th February.
>>>>>>
>>>>>> Kind Regards,
>>>>>> Dave
>>>>>>
>>>>>> On 28/01/2013 00:46, Dave Lewis wrote:
>>>>>>> Hi Somnath,
>>>>>>> I wanted to update you of the status of ISSUE-109, related to 
>>>>>>> disambiguation.
>>>>>>>
>>>>>>> We discussed this at the WG face to face meeting last week, see:
>>>>>>> http://www.w3.org/2013/01/23-mlw-lt-minutes.html#item37.
>>>>>>>
>>>>>>> The consensus was that hierarchical annotation for 
>>>>>>> disambiguation was difficult to achieve technically. As you 
>>>>>>> point out, ITS override rule mean that any hierarchical 
>>>>>>> annotation has to be supported explicitly with special 
>>>>>>> attributes. However, doing this in a generic way is difficult  
>>>>>>> as we may also need to support multiple different annotations of 
>>>>>>> the same text, and therefore map sub-annotations to specific 
>>>>>>> parent ones.
>>>>>>>
>>>>>>> You may have seen that there has been extensive discussion on 
>>>>>>> potentially merging the terminology and disambiguation data 
>>>>>>> categories:
>>>>>>> http://www.w3.org/2013/01/24-mlw-lt-minutes.html#item03
>>>>>>> and
>>>>>>> https://www.w3.org/International/multilingualweb/lt/track/issues/68
>>>>>>>
>>>>>>> At the meeting we asked the experts involved is considering 
>>>>>>> technical solutions to this to also address your hierarchical 
>>>>>>> annotation requirement, but this yielded no usable technical 
>>>>>>> solution.
>>>>>>>
>>>>>>> We therefore propose to reject this suggested change.
>>>>>>>
>>>>>>> We would however point out  that the external NIF encoding (see 
>>>>>>> http://nlp2rdf.org/) would be better suited to capturing such 
>>>>>>> hierarchical annotations. We would welcome you input therefore 
>>>>>>> in formulating best practice for the use of ITS with external 
>>>>>>> NIF models.
>>>>>>>
>>>>>>> Please let us know if you are satisfied with this response.
>>>>>>>
>>>>>>> I look forward to hearing from you.
>>>>>>> Regards,
>>>>>>> Dave Lewis
>>>>>>>
>>>>>>>
>>>>>>> On 21/01/2013 15:02, Dave Lewis wrote:
>>>>>>>> Hi,
>>>>>>>> To speed the resolution of the different issues in your 
>>>>>>>> original post I'll restricted ISSUE-84 to comments about the 
>>>>>>>> translate data category and raised two new issues:
>>>>>>>> ISSUE-108: locNote ITS 2.0 requirements w.r.t Indian [Indic] 
>>>>>>>> languages
>>>>>>>> ISSUE-109: disambiguation ITS 2.0 requirements w.r.t Indian 
>>>>>>>> [Indic] languages
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dave
>>>>>>>>
>>>>>>>> On 18/01/2013 12:46, Dr. David Filip wrote:
>>>>>>>>> Hi all, this comment is now associated with Issue-84
>>>>>>>>> Rgds
>>>>>>>>> dF
>>>>>>>>>
>>>>>>>>> Dr. David Filip
>>>>>>>>> =======================
>>>>>>>>> LRC | CNGL | LT-Web | CSIS
>>>>>>>>> University of Limerick, Ireland
>>>>>>>>> telephone: +353-6120-2781
>>>>>>>>> *cellphone: +353-86-0222-158*
>>>>>>>>> facsimile: +353-6120-2734
>>>>>>>>> mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 16, 2013 at 8:52 AM, Felix Sasaki <fsasaki@w3.org 
>>>>>>>>> <mailto:fsasaki@w3.org>> wrote:
>>>>>>>>>
>>>>>>>>>     Forwarded on behalf of Somnath Chandra (by permission),
>>>>>>>>>     with CC to Somnath and Svaran Lata.The comments are not
>>>>>>>>>     yet in tracker. See also new comments (also not in tracker
>>>>>>>>>     yet) on the www-international list at
>>>>>>>>>     http://lists.w3.org/Archives/Public/www-international/2013JanMar/0065.html
>>>>>>>>>
>>>>>>>>>     If you have input for replying to the comments, please
>>>>>>>>>     provide it on our comments list (but feel free to put
>>>>>>>>>     others in CC to speed up the process).
>>>>>>>>>
>>>>>>>>>     Best,
>>>>>>>>>
>>>>>>>>>     Felix
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     -------- Original-Nachricht --------
>>>>>>>>>     Betreff: 	Fwd: ITS 2.0 requirements w.r.t Indian languages
>>>>>>>>>     Datum: 	Wed, 16 Jan 2013 13:46:52 +0530
>>>>>>>>>     Von: 	Somnath Chandra <schandra@deity.gov.in>
>>>>>>>>>     <mailto:schandra@deity.gov.in>
>>>>>>>>>     An: 	Felix Sasaki <fsasaki@w3.org> <mailto:fsasaki@w3.org>
>>>>>>>>>     Kopie (CC): 	slata <slata@mit.gov.in>
>>>>>>>>>     <mailto:slata@mit.gov.in>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Dear Dr. Felix Sasaki,
>>>>>>>>>     W3C India has compiled the Indic Languages requirements
>>>>>>>>>     for ITS 2.0. Kindly find enclosed the draft
>>>>>>>>>     document developed for the purpose.
>>>>>>>>>     Submitted for kind perusal. Please feel free to contact me
>>>>>>>>>     for any further clarifications / discussions.
>>>>>>>>>     With best regards,
>>>>>>>>>     Somnath , W3C India
>>>>>>>>>     Dr. Somnath Chandra
>>>>>>>>>     Joint Director and Dy. Country Manager , W3C India
>>>>>>>>>     Dept. of Electronics & Information Technology
>>>>>>>>>     Ministry of Communications & Information Technology
>>>>>>>>>     Govt. of India
>>>>>>>>>     Tel:+91-11-24364744,24301811 <tel:+91-11-24364744,24301811>
>>>>>>>>>     Fax: +91-11-24363099 <tel:%2B91-11-24363099>
>>>>>>>>>     e-mail :schandra@mit.gov.in <mailto:schandra@mit.gov.in>
>>>>>>>>>     -------- Original Message --------
>>>>>>>>>     From: *Prashant Verma *<vermaprashant1@gmail.com>
>>>>>>>>>     <mailto:vermaprashant1@gmail.com>
>>>>>>>>>     Date: Jan 10, 2013 2:20:32 PM
>>>>>>>>>     Subject: ITS 2.0 requirements w.r.t Indian languages
>>>>>>>>>     To: schandra@mit.gov.in <mailto:schandra@mit.gov.in>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     -- 
>>>>>>>>>
>>>>>>>>>     Prashant Verma I  Sr. Software Engineer
>>>>>>>>>     W3C India
>>>>>>>>>     New Delhi
>>>>>>>>>     Cell : +91-8800521042 <tel:%2B91-8800521042>
>>>>>>>>>     Website : http://www.w3cindia.in <http://www.w3cindia.in/>
>>>>>>>>>     --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>
>>>
>> --
>> Dr. Somnath Chandra
>> Scientist-E
>> Dept. of Electronics & Information Technology
>> Ministry of Communications & Information Technology
>> Govt. of India
>> Tel:+91-11-24364744,24301856
>> Fax: +91-11-24363099
>> e-mail :schandra@mit.gov.in
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Wednesday, 20 February 2013 19:36:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 20 February 2013 19:36:19 GMT