W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > October 2014

Re: [xliff] ITS scope with sm/em

From: Felix Sasaki <felix@sasakiatcf.com>
Date: Thu, 9 Oct 2014 20:19:52 +0200
Cc: XLIFF Main List <xliff@lists.oasis-open.org>, public-i18n-its-ig <public-i18n-its-ig@w3.org>
Message-Id: <08511715-A306-469A-8980-08CAB90F2C10@sasakiatcf.com>
To: Yves Savourel <ysavourel@enlaso.com>

Am 09.10.2014 um 14:18 schrieb Yves Savourel <ysavourel@enlaso.com>:

> Yes, something like the MT Confidence value is different, but that conversion can be described in the mapping itself (If I recall
> correctly). So an ITS processor has nothing 'special' to do: it just applies the rules.
> 
> I suppose we could have additional pre-processing steps for a case like <sm>/<em>. But that means you can't really use a 'pure' ITS
> processor to look at an XLIFF file because it would not know how to do the pre-processing.
> But that is probably acceptable, especially if we provide generic ways to do the transformation.
> 
> This said, I'm not 100% sure you can transform <sm>/<em> into <mrk>/</mrk> for all data categories: it would be ok for things like
> translate, domain, etc. But info like terminology, Text Analysis, LQI make sense only when set as a single content.

Sorry, I donít get this. Do you have small examples (e.g. one for translate, on for text analysis) of the difference?

- Felix

> 
> -ys
> 
> 
> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Felix Sasaki
> Sent: Thursday, October 9, 2014 5:53 AM
> To: Yves Savourel
> Cc: Dr. David Filip; Estreen, Fredrik; XLIFF Main List; public-i18n-its-ig
> Subject: Re: [xliff] ITS scope with sm/em
> 
> Hi Yves, al,
> 
> understand. Though: aren't there also other parts of the xliff/its mapping that require from an ITS1 or 2 processor special
> handling? E.g. mt confidence
> https://www.w3.org/International/its/wiki/XLIFF_2.0_Mapping#MT_Confidence_.28.3D.3D.3D.3D.3D.3D.3D.3D.3D.3DTO_REVIEW.29
> which requires a computation of values. In the other thread you just listed the types of processors:
> 
> "- An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document.
> - An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document.
> - An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document." 
> 
> Couldn't we require in the mapping specification that before using a general ITS processor uses XLIFF+ITS content, it has to do the
> preprocessing described in this thread, the one for MT confidence etc?
> 
> Cheers,
> 
> Felix
> 
> Am 09.10.2014 um 13:33 schrieb Yves Savourel <ysavourel@enlaso.com>:
> 
>> Hi all,
>> 
>> Thanks for the input Fredrik and Felix.
>> 
>> I'm not worried about the XLIFF implementation of those cases: We have had working code for those since a long time (a good use
> case is mrk with translate='yes|no').
>> 
>> I was thinking more about the ITS aspect of it.
>> 
>>> From an ITS viewpoint something like this: <sm id='1' itx:domain='travel'/>...<em startRef='1'/> the scope of the domain is an
> empty content (the content of <sm/>). There is nothing in ITS that allows to use distinct elements to annotate a span.
>> 
>> Because, while on the XLIFF side the processing expectation is to treat the content between a given <sm/> and its corresponding
> <em/> as a span, on the ITS side there is no semantic for such construct.
>> 
>> Cheers,
>> -ys
>> 
>> 
>> From: Dr. David Filip [mailto:David.Filip@ul.ie]
>> Sent: Thursday, October 9, 2014 5:08 AM
>> To: Felix Sasaki
>> Cc: Estreen, Fredrik; Yves Savourel; XLIFF Main List; 
>> public-i18n-its-ig
>> Subject: Re: [xliff] ITS scope with sm/em
>> 
>> Felix, I like the algorithmic approach that is open to different implementations.
>> 
>> After all ITS is a set of abstract categories that should not be restricted to hierarchical structured formats.
>> 
>> Now to your proposed algorithm.
>> 
>> Unlike native codes, annotations MUST have the opening and closing tag in the same unit.
>> So you will be always creating <mrk> nodes from <sm/> tags if you consider the whole <unit> content, which is the point..
>> 
>> Cheers
>> dF
>> 
>> 
>> Dr. David Filip
>> =======================
>> OASIS XLIFF TC Secretary, Editor, and Liaison Officer LRC | CNGL | 
>> CSIS University of Limerick, Ireland
>> telephone: +353-6120-2781
>> cellphone: +353-86-0222-158
>> facsimile: +353-6120-2734
>> http://www.cngl.ie/profile/?i=452
>> mailto: david.filip@ul.ie
>> 
>> On Thu, Oct 9, 2014 at 3:49 AM, Felix Sasaki <felix@sasakiatcf.com> wrote:
>> I agree with Fredrik. Processing of overlapping hierarchies is a task that cannot be solved in general and discarding
> non-hierarchical structures is a good strategy for XML / HTML content.
>> 
>> 
>> If people don't want to specify an XSLT conversion we could also define the conversion process in an algorithmic way like this:
>> 
>> 0) set current content to whole content to be processed.
>> 1) is there an s tag in current content?
>>       Then output text before s tag and do 2)
>>       else just output all text in current content.
>> 2) has the s tag an e tag with corresponding id?
>>       Then create a mrk node
>>       set the content between s and e to new current content
>>       do 1)
>> else discard s and go to 1)
>> 3) output rest of text
>> 
>> and say: you can implement this as XSLT (example given) or in different programing languages. That would have the benefit to keep
> the door open to future non XML, API focsued XLIFF.
>> 
>> - Felix
>> 
>> Am 08.10.2014 um 18:41 schrieb Estreen, Fredrik <Fredrik.Estreen@lionbridge.com>:
>> 
>>> Hi Yves,
>>> 
>>>> Hi all,
>>>> 
>>>> Looking at the ITS mapping: In many case we put the ITS information 
>>>> on a marker (<mrk> element).
>>>> 
>>>> But such element can be represented by <sm/>...<em/> when it's 
>>>> overlapping another element.
>>>> In that case the normal ITS scope mechanism can't work because it 
>>>> applies to the empty content of <sm/>, not the content between <sm/> 
>>>> and the corresponding <em/>.
>>>> 
>>>> We can have provision for this in the XLIFF module. But I'm not sure 
>>>> it's doable in the ITS rules, especially with inheritance when there 
>>>> are nested annotations.
>>> 
>>> This is an interesting problem and I doubt it is solvable in a general way without additional steps. It might be solvable when
> the <sm/> and <em/> is in the same segment, but I doubt it is in the case where they start and end in different segments (ie.
> different sibling trees).
>>> 
>>> One potentially workable solution would be to apply an XSLT transform on the XLIFF that merges all segments in each unit.
> Discards any non ITS carrying marker (to reduce risk of overlapping markers) and finally normalize the remaining markers to  the
> <mrk></mrk> spanning form. Since ITS information will likely be coming from and going to an XML source there should not be any
> overlapping markers at that stage as they would be difficult to represent in the source format. It is not guaranteed but we could
> declare that ill-formed. ITS global rules could then be evaluated against the transformed version. Admittedly not the most beautiful
> solution but I think it could work.
>>> 
>>>> I vaguely recall that such topic was discussed at some point in the ITS-WG no?
>>>> Does anyone recall the outcome?
>>>> 
>>>> Cheers,
>>>> -ys
>>> 
>>> Regards,
>>> Fredrik Estreen
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you must leave the OASIS TC that 
>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.ph
>>> p
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that 
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>> 
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
> at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
> 
> 
> 
Received on Thursday, 9 October 2014 18:20:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:31 UTC