- From: Felix Sasaki <felix@sasakiatcf.com>
- Date: Thu, 9 Oct 2014 20:19:52 +0200
- To: Yves Savourel <ysavourel@enlaso.com>
- Cc: XLIFF Main List <xliff@lists.oasis-open.org>, public-i18n-its-ig <public-i18n-its-ig@w3.org>
Am 09.10.2014 um 14:18 schrieb Yves Savourel <ysavourel@enlaso.com>: > Yes, something like the MT Confidence value is different, but that conversion can be described in the mapping itself (If I recall > correctly). So an ITS processor has nothing 'special' to do: it just applies the rules. > > I suppose we could have additional pre-processing steps for a case like <sm>/<em>. But that means you can't really use a 'pure' ITS > processor to look at an XLIFF file because it would not know how to do the pre-processing. > But that is probably acceptable, especially if we provide generic ways to do the transformation. > > This said, I'm not 100% sure you can transform <sm>/<em> into <mrk>/</mrk> for all data categories: it would be ok for things like > translate, domain, etc. But info like terminology, Text Analysis, LQI make sense only when set as a single content. Sorry, I don’t get this. Do you have small examples (e.g. one for translate, on for text analysis) of the difference? - Felix > > -ys > > > -----Original Message----- > From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Felix Sasaki > Sent: Thursday, October 9, 2014 5:53 AM > To: Yves Savourel > Cc: Dr. David Filip; Estreen, Fredrik; XLIFF Main List; public-i18n-its-ig > Subject: Re: [xliff] ITS scope with sm/em > > Hi Yves, al, > > understand. Though: aren't there also other parts of the xliff/its mapping that require from an ITS1 or 2 processor special > handling? E.g. mt confidence > https://www.w3.org/International/its/wiki/XLIFF_2.0_Mapping#MT_Confidence_.28.3D.3D.3D.3D.3D.3D.3D.3D.3D.3DTO_REVIEW.29 > which requires a computation of values. In the other thread you just listed the types of processors: > > "- An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document. > - An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document. > - An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document." > > Couldn't we require in the mapping specification that before using a general ITS processor uses XLIFF+ITS content, it has to do the > preprocessing described in this thread, the one for MT confidence etc? > > Cheers, > > Felix > > Am 09.10.2014 um 13:33 schrieb Yves Savourel <ysavourel@enlaso.com>: > >> Hi all, >> >> Thanks for the input Fredrik and Felix. >> >> I'm not worried about the XLIFF implementation of those cases: We have had working code for those since a long time (a good use > case is mrk with translate='yes|no'). >> >> I was thinking more about the ITS aspect of it. >> >>> From an ITS viewpoint something like this: <sm id='1' itx:domain='travel'/>...<em startRef='1'/> the scope of the domain is an > empty content (the content of <sm/>). There is nothing in ITS that allows to use distinct elements to annotate a span. >> >> Because, while on the XLIFF side the processing expectation is to treat the content between a given <sm/> and its corresponding > <em/> as a span, on the ITS side there is no semantic for such construct. >> >> Cheers, >> -ys >> >> >> From: Dr. David Filip [mailto:David.Filip@ul.ie] >> Sent: Thursday, October 9, 2014 5:08 AM >> To: Felix Sasaki >> Cc: Estreen, Fredrik; Yves Savourel; XLIFF Main List; >> public-i18n-its-ig >> Subject: Re: [xliff] ITS scope with sm/em >> >> Felix, I like the algorithmic approach that is open to different implementations. >> >> After all ITS is a set of abstract categories that should not be restricted to hierarchical structured formats. >> >> Now to your proposed algorithm. >> >> Unlike native codes, annotations MUST have the opening and closing tag in the same unit. >> So you will be always creating <mrk> nodes from <sm/> tags if you consider the whole <unit> content, which is the point.. >> >> Cheers >> dF >> >> >> Dr. David Filip >> ======================= >> OASIS XLIFF TC Secretary, Editor, and Liaison Officer LRC | CNGL | >> CSIS University of Limerick, Ireland >> telephone: +353-6120-2781 >> cellphone: +353-86-0222-158 >> facsimile: +353-6120-2734 >> http://www.cngl.ie/profile/?i=452 >> mailto: david.filip@ul.ie >> >> On Thu, Oct 9, 2014 at 3:49 AM, Felix Sasaki <felix@sasakiatcf.com> wrote: >> I agree with Fredrik. Processing of overlapping hierarchies is a task that cannot be solved in general and discarding > non-hierarchical structures is a good strategy for XML / HTML content. >> >> >> If people don't want to specify an XSLT conversion we could also define the conversion process in an algorithmic way like this: >> >> 0) set current content to whole content to be processed. >> 1) is there an s tag in current content? >> Then output text before s tag and do 2) >> else just output all text in current content. >> 2) has the s tag an e tag with corresponding id? >> Then create a mrk node >> set the content between s and e to new current content >> do 1) >> else discard s and go to 1) >> 3) output rest of text >> >> and say: you can implement this as XSLT (example given) or in different programing languages. That would have the benefit to keep > the door open to future non XML, API focsued XLIFF. >> >> - Felix >> >> Am 08.10.2014 um 18:41 schrieb Estreen, Fredrik <Fredrik.Estreen@lionbridge.com>: >> >>> Hi Yves, >>> >>>> Hi all, >>>> >>>> Looking at the ITS mapping: In many case we put the ITS information >>>> on a marker (<mrk> element). >>>> >>>> But such element can be represented by <sm/>...<em/> when it's >>>> overlapping another element. >>>> In that case the normal ITS scope mechanism can't work because it >>>> applies to the empty content of <sm/>, not the content between <sm/> >>>> and the corresponding <em/>. >>>> >>>> We can have provision for this in the XLIFF module. But I'm not sure >>>> it's doable in the ITS rules, especially with inheritance when there >>>> are nested annotations. >>> >>> This is an interesting problem and I doubt it is solvable in a general way without additional steps. It might be solvable when > the <sm/> and <em/> is in the same segment, but I doubt it is in the case where they start and end in different segments (ie. > different sibling trees). >>> >>> One potentially workable solution would be to apply an XSLT transform on the XLIFF that merges all segments in each unit. > Discards any non ITS carrying marker (to reduce risk of overlapping markers) and finally normalize the remaining markers to the > <mrk></mrk> spanning form. Since ITS information will likely be coming from and going to an XML source there should not be any > overlapping markers at that stage as they would be difficult to represent in the source format. It is not guaranteed but we could > declare that ill-formed. ITS global rules could then be evaluated against the transformed version. Admittedly not the most beautiful > solution but I think it could work. >>> >>>> I vaguely recall that such topic was discussed at some point in the ITS-WG no? >>>> Does anyone recall the outcome? >>>> >>>> Cheers, >>>> -ys >>> >>> Regards, >>> Fredrik Estreen >>> >>> --------------------------------------------------------------------- >>> To unsubscribe from this mail list, you must leave the OASIS TC that >>> generates this mail. Follow this link to all your TCs in OASIS at: >>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.ph >>> p >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe from this mail list, you must leave the OASIS TC that >> generates this mail. Follow this link to all your TCs in OASIS at: >> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS > at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > > >
Received on Thursday, 9 October 2014 18:20:27 UTC