Re: [ISSUE-108] locNote ITS 2.0 requirements w.r.t Indian [Indic] languages [ACTION-417] from Dave Lewis on 2013-01-24 (public-multilingualweb-lt-comments@w3.org from January 2013)

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Thu, 24 Jan 2013 14:50:03 +0000
To: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>, Somnath Chandra <schandra@deity.gov.in>
CC: slata <slata@mit.gov.in>
Message-ID: <51014A1B.9020700@cs.tcd.ie>
Hi Somnath,
I just wanted to update you on discussions we had at our WG face to face 
meeting yesterday.

The concensus was similar to my initial response, that addressing 
part-of-speech (POS) encoding formally in the localizationNote was not 
feasible. Several issues were identified that motivate this concensus:

1) though you proposed a specific POS categorisation, it was recognised 
that there were other possible categorisations and therefore reaching 
concensus on an encoding that could be applied internationally was 
extremely challenging within the scope to the WG

2) There was a concern that if we support POS, would there then be call 
to support other annotations, e.g. syntax annotation.  It was also 
pointed out that NIF model I mentioned already provides a fairly good 
encoding of POS and other syntax annotation information in an open form 
as external linked data.

3) it seems that some MT engines already integrate PoS taggers so there 
may not be a strong use case for defining an open annotation mechanism 
for this.

4) it was not clear how strong the use case is for PoS annotation was 
for human translators. Do they need such support or are they able to 
easily identify PoS directly from the text and its context?

We are therefore proposing not to change the specification in response 
to your comment. Is this satisfactory?

If you have further details on the use case for POS (e.g. in realtion to 
point 3 and 4 above) we would happily accept and document this for 
helping to support future possible working in this area. Also we'd very 
much encourage you to engage with the WG if you would like to contribute 
to a best practice document on this topic. For instance an article on 
explaining approaches such as name-value pairs in the loc note attribute 
or linking to POS in a NIF repository.

We look forward to your response,
Kind Regards,
Dave Lewis

On 23/01/2013 02:11, Dave Lewis wrote:
> Hi Somnath,
> In relation to localizationNote, you indicate that correct translation 
> of some Indic language is sensitive to the part of speech annotation 
> and suggest some encodings that could be used with LocalisationNote to 
> provide such annotation.
>
> My personal feeling is that including such an encoding to the format 
> of this data category would add a large implementation burden to many 
> potential adopter who would not require it. However, we have seen with 
> ITS 1.0 that companies have use the value of this attribute to encode 
> thier own name:value formats.
> e.g.
> <span its-loc-note="pos:N_NNN">????</span>
>
> Would this address you comment?
>
> This would not require a change to the specification, but could be 
> capture in a separate best practice document, perhaps specifically 
> targetting the use of ITS for indic languages more generally if you 
> were interested in contributing to this.
>
> Two other possibilities may exist.
> 1) an entirely new part-of-speech data category. This is currently 
> outside of scope of ITS2.0, but we could collect requirements as we 
> are starting to record such needs that are not covered by ITS2.0 for 
> future activities.
>
> 2) I believe  the NLP interchange format (NIF) can encode details such 
> as POS. ITS2.0 has a mapping to NIF, so perhaps this POS information 
> could be usefully recorded externally to the document using NIF within 
> the context of this mapping. But I'd ask the NIF experts, Felix and 
> Sebastien, to comment on this possibility.
>
> Please let us know what you think.
> kind regards,
>
> Dave Lewis
>
> On 16/01/2013 08:52, Felix Sasaki wrote:
>> Forwarded on behalf of Somnath Chandra (by permission), with CC to 
>> Somnath and Svaran Lata.The comments are not yet in tracker. See also 
>> new comments (also not in tracker yet) on the www-international list at
>> http://lists.w3.org/Archives/Public/www-international/2013JanMar/0065.html
>>
>> If you have input for replying to the comments, please provide it on 
>> our comments list (but feel free to put others in CC to speed up the 
>> process).
>>
>> Best,
>>
>> Felix
>>
>>
>> -------- Original-Nachricht --------
>> Betreff: 	Fwd: ITS 2.0 requirements w.r.t Indian languages
>> Datum: 	Wed, 16 Jan 2013 13:46:52 +0530
>> Von: 	Somnath Chandra <schandra@deity.gov.in>
>> An: 	Felix Sasaki <fsasaki@w3.org>
>> Kopie (CC): 	slata <slata@mit.gov.in>
>>
>>
>>
>> Dear Dr. Felix Sasaki,
>> W3C India has compiled the Indic Languages requirements for ITS 2.0. 
>> Kindly find enclosed the draft document developed for the purpose.
>> Submitted for kind perusal. Please feel free to contact me for any 
>> further clarifications / discussions.
>> With best regards,
>> Somnath , W3C India
>> Dr. Somnath Chandra
>> Joint Director and Dy. Country Manager , W3C India
>> Dept. of Electronics & Information Technology
>> Ministry of Communications & Information Technology
>> Govt. of India
>> Tel:+91-11-24364744,24301811
>> Fax: +91-11-24363099
>> e-mail :schandra@mit.gov.in
>> -------- Original Message --------
>> From: *Prashant Verma *<vermaprashant1@gmail.com>
>> Date: Jan 10, 2013 2:20:32 PM
>> Subject: ITS 2.0 requirements w.r.t Indian languages
>> To: schandra@mit.gov.in
>>
>>
>> -- 
>>
>> Prashant Verma I  Sr. Software Engineer
>> W3C India
>> New Delhi
>> Cell : +91-8800521042
>> Website : http://www.w3cindia.in <http://www.w3cindia.in/>
>> --
>>
>>
>
>
Received on Thursday, 24 January 2013 14:50:52 UTC