W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > October 2014

RE: [xliff] ITS scope with sm/em

From: Yves Savourel <ysavourel@enlaso.com>
Date: Thu, 9 Oct 2014 06:18:29 -0600
To: "'Felix Sasaki'" <felix@sasakiatcf.com>
CC: "'XLIFF Main List'" <xliff@lists.oasis-open.org>, "'public-i18n-its-ig'" <public-i18n-its-ig@w3.org>
Message-ID: <004b01cfe3bb$22822de0$678689a0$@enlaso.com>
Yes, something like the MT Confidence value is different, but that conversion can be described in the mapping itself (If I recall
correctly). So an ITS processor has nothing 'special' to do: it just applies the rules.

I suppose we could have additional pre-processing steps for a case like <sm>/<em>. But that means you can't really use a 'pure' ITS
processor to look at an XLIFF file because it would not know how to do the pre-processing.
But that is probably acceptable, especially if we provide generic ways to do the transformation.

This said, I'm not 100% sure you can transform <sm>/<em> into <mrk>/</mrk> for all data categories: it would be ok for things like
translate, domain, etc. But info like terminology, Text Analysis, LQI make sense only when set as a single content.

-ys


-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Felix Sasaki
Sent: Thursday, October 9, 2014 5:53 AM
To: Yves Savourel
Cc: Dr. David Filip; Estreen, Fredrik; XLIFF Main List; public-i18n-its-ig
Subject: Re: [xliff] ITS scope with sm/em

Hi Yves, al,

understand. Though: aren't there also other parts of the xliff/its mapping that require from an ITS1 or 2 processor special
handling? E.g. mt confidence
https://www.w3.org/International/its/wiki/XLIFF_2.0_Mapping#MT_Confidence_.28.3D.3D.3D.3D.3D.3D.3D.3D.3D.3DTO_REVIEW.29
which requires a computation of values. In the other thread you just listed the types of processors:

"- An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document.
- An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document.
- An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document." 

Couldn't we require in the mapping specification that before using a general ITS processor uses XLIFF+ITS content, it has to do the
preprocessing described in this thread, the one for MT confidence etc?

Cheers,

Felix

Am 09.10.2014 um 13:33 schrieb Yves Savourel <ysavourel@enlaso.com>:

> Hi all,
> 
> Thanks for the input Fredrik and Felix.
> 
> I'm not worried about the XLIFF implementation of those cases: We have had working code for those since a long time (a good use
case is mrk with translate='yes|no').
> 
> I was thinking more about the ITS aspect of it.
> 
>> From an ITS viewpoint something like this: <sm id='1' itx:domain='travel'/>...<em startRef='1'/> the scope of the domain is an
empty content (the content of <sm/>). There is nothing in ITS that allows to use distinct elements to annotate a span.
> 
> Because, while on the XLIFF side the processing expectation is to treat the content between a given <sm/> and its corresponding
<em/> as a span, on the ITS side there is no semantic for such construct.
> 
> Cheers,
> -ys
> 
> 
> From: Dr. David Filip [mailto:David.Filip@ul.ie]
> Sent: Thursday, October 9, 2014 5:08 AM
> To: Felix Sasaki
> Cc: Estreen, Fredrik; Yves Savourel; XLIFF Main List; 
> public-i18n-its-ig
> Subject: Re: [xliff] ITS scope with sm/em
> 
> Felix, I like the algorithmic approach that is open to different implementations.
> 
> After all ITS is a set of abstract categories that should not be restricted to hierarchical structured formats.
> 
> Now to your proposed algorithm.
> 
> Unlike native codes, annotations MUST have the opening and closing tag in the same unit.
> So you will be always creating <mrk> nodes from <sm/> tags if you consider the whole <unit> content, which is the point..
> 
> Cheers
> dF
> 
> 
> Dr. David Filip
> =======================
> OASIS XLIFF TC Secretary, Editor, and Liaison Officer LRC | CNGL | 
> CSIS University of Limerick, Ireland
> telephone: +353-6120-2781
> cellphone: +353-86-0222-158
> facsimile: +353-6120-2734
> http://www.cngl.ie/profile/?i=452
> mailto: david.filip@ul.ie
> 
> On Thu, Oct 9, 2014 at 3:49 AM, Felix Sasaki <felix@sasakiatcf.com> wrote:
> I agree with Fredrik. Processing of overlapping hierarchies is a task that cannot be solved in general and discarding
non-hierarchical structures is a good strategy for XML / HTML content.
> 
> 
> If people don't want to specify an XSLT conversion we could also define the conversion process in an algorithmic way like this:
> 
> 0) set current content to whole content to be processed.
> 1) is there an s tag in current content?
>        Then output text before s tag and do 2)
>        else just output all text in current content.
> 2) has the s tag an e tag with corresponding id?
>        Then create a mrk node
>        set the content between s and e to new current content
>        do 1)
> else discard s and go to 1)
> 3) output rest of text
> 
> and say: you can implement this as XSLT (example given) or in different programing languages. That would have the benefit to keep
the door open to future non XML, API focsued XLIFF.
> 
> - Felix
> 
> Am 08.10.2014 um 18:41 schrieb Estreen, Fredrik <Fredrik.Estreen@lionbridge.com>:
> 
>> Hi Yves,
>> 
>>> Hi all,
>>> 
>>> Looking at the ITS mapping: In many case we put the ITS information 
>>> on a marker (<mrk> element).
>>> 
>>> But such element can be represented by <sm/>...<em/> when it's 
>>> overlapping another element.
>>> In that case the normal ITS scope mechanism can't work because it 
>>> applies to the empty content of <sm/>, not the content between <sm/> 
>>> and the corresponding <em/>.
>>> 
>>> We can have provision for this in the XLIFF module. But I'm not sure 
>>> it's doable in the ITS rules, especially with inheritance when there 
>>> are nested annotations.
>> 
>> This is an interesting problem and I doubt it is solvable in a general way without additional steps. It might be solvable when
the <sm/> and <em/> is in the same segment, but I doubt it is in the case where they start and end in different segments (ie.
different sibling trees).
>> 
>> One potentially workable solution would be to apply an XSLT transform on the XLIFF that merges all segments in each unit.
Discards any non ITS carrying marker (to reduce risk of overlapping markers) and finally normalize the remaining markers to  the
<mrk></mrk> spanning form. Since ITS information will likely be coming from and going to an XML source there should not be any
overlapping markers at that stage as they would be difficult to represent in the source format. It is not guaranteed but we could
declare that ill-formed. ITS global rules could then be evaluated against the transformed version. Admittedly not the most beautiful
solution but I think it could work.
>> 
>>> I vaguely recall that such topic was discussed at some point in the ITS-WG no?
>>> Does anyone recall the outcome?
>>> 
>>> Cheers,
>>> -ys
>> 
>> Regards,
>> Fredrik Estreen
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that 
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.ph
>> p
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
Received on Thursday, 9 October 2014 12:18:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:31 UTC