- From: Yves Savourel <ysavourel@enlaso.com>
- Date: Thu, 9 Oct 2014 13:16:01 -0600
- To: "'Felix Sasaki'" <fsasaki@w3.org>
- CC: "XLIFF Main List" <xliff@lists.oasis-open.org>, "'public-i18n-its-ig'" <public-i18n-its-ig@w3.org>
> Sorry, I don't get this. > Do you have small examples (e.g. one for translate, on for text analysis) > of the difference? My understanding is that to work with an ITS processor we would change each span marked by <sm/>/<em/> to a set of <mrk>/</mrk>. At least this is what I read from your algorithm (not from Fredrik's option). For example: <sm id='1' translate='no'/>French <pc id='2'>Canadian<em startRef='1'/> hockey</pc>. Would be changed to: <mrk id='1' translate='no'>French </mrk><pc id='2'><mrk id='m1' translate='no'>Canadian</mrk> hockey</pc>. This would get you the same properties as the original for each character of the content. But you can't do this for a data category where the content is related in a meaningful way to the attributes of the data category. For example: <sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <pc id='2'>Canadian<em startRef='1'/> hockey</pc>. Cannot be split into two instances: <mrk id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'>French </mrk><pc id='2'><mrk id='3' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'>Canadian</mrk> hockey</pc>. That is: the term is not "French " or "Canadian" it is "French Canadian". But maybe I didn't get your algorithm correctly and it doesn't result in multiple <mrk>/</mrk> for a single <sm/>/<em/>. Looking at Fredrik's note: there are various ways to reduce the <sm/>/<em/> in favor of well-formed <mrk> elements (joining all segments, prioritizing well-formness of annotation over inline codes, etc.) but I'm not sure it can be an absolute solution: the markup may originate from the XLIFF tool and care little about well-formness, and one always can have overlapping annotations. But maybe this is a restriction we can live with. -ys -----Original Message----- From: Felix Sasaki [mailto:felix@sasakiatcf.com] Sent: Thursday, October 9, 2014 12:20 PM To: Yves Savourel Cc: XLIFF Main List; public-i18n-its-ig Subject: Re: [xliff] ITS scope with sm/em Am 09.10.2014 um 14:18 schrieb Yves Savourel <ysavourel@enlaso.com>: > Yes, something like the MT Confidence value is different, but that > conversion can be described in the mapping itself (If I recall correctly). So an ITS processor has nothing 'special' to do: it just applies the rules. > > I suppose we could have additional pre-processing steps for a case > like <sm>/<em>. But that means you can't really use a 'pure' ITS processor to look at an XLIFF file because it would not know how to do the pre-processing. > But that is probably acceptable, especially if we provide generic ways to do the transformation. > > This said, I'm not 100% sure you can transform <sm>/<em> into > <mrk>/</mrk> for all data categories: it would be ok for things like translate, domain, etc. But info like terminology, Text Analysis, LQI make sense only when set as a single content. Sorry, I don't get this. Do you have small examples (e.g. one for translate, on for text analysis) of the difference? - Felix > > -ys > > > -----Original Message----- > From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] > On Behalf Of Felix Sasaki > Sent: Thursday, October 9, 2014 5:53 AM > To: Yves Savourel > Cc: Dr. David Filip; Estreen, Fredrik; XLIFF Main List; > public-i18n-its-ig > Subject: Re: [xliff] ITS scope with sm/em > > Hi Yves, al, > > understand. Though: aren't there also other parts of the xliff/its > mapping that require from an ITS1 or 2 processor special handling? > E.g. mt confidence > https://www.w3.org/International/its/wiki/XLIFF_2.0_Mapping#MT_Confide > nce_.28.3D.3D.3D.3D.3D.3D.3D.3D.3D.3DTO_REVIEW.29 > which requires a computation of values. In the other thread you just listed the types of processors: > > "- An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document. > - An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document. > - An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document." > > Couldn't we require in the mapping specification that before using a > general ITS processor uses XLIFF+ITS content, it has to do the preprocessing described in this thread, the one for MT confidence etc? > > Cheers, > > Felix > > Am 09.10.2014 um 13:33 schrieb Yves Savourel <ysavourel@enlaso.com>: > >> Hi all, >> >> Thanks for the input Fredrik and Felix. >> >> I'm not worried about the XLIFF implementation of those cases: We >> have had working code for those since a long time (a good use > case is mrk with translate='yes|no'). >> >> I was thinking more about the ITS aspect of it. >> >>> From an ITS viewpoint something like this: <sm id='1' >>> itx:domain='travel'/>...<em startRef='1'/> the scope of the domain >>> is an > empty content (the content of <sm/>). There is nothing in ITS that allows to use distinct elements to annotate a span. >> >> Because, while on the XLIFF side the processing expectation is to >> treat the content between a given <sm/> and its corresponding > <em/> as a span, on the ITS side there is no semantic for such construct. >> >> Cheers, >> -ys >> >> >> From: Dr. David Filip [mailto:David.Filip@ul.ie] >> Sent: Thursday, October 9, 2014 5:08 AM >> To: Felix Sasaki >> Cc: Estreen, Fredrik; Yves Savourel; XLIFF Main List; >> public-i18n-its-ig >> Subject: Re: [xliff] ITS scope with sm/em >> >> Felix, I like the algorithmic approach that is open to different implementations. >> >> After all ITS is a set of abstract categories that should not be restricted to hierarchical structured formats. >> >> Now to your proposed algorithm. >> >> Unlike native codes, annotations MUST have the opening and closing tag in the same unit. >> So you will be always creating <mrk> nodes from <sm/> tags if you consider the whole <unit> content, which is the point.. >> >> Cheers >> dF >> >> >> Dr. David Filip >> ======================= >> OASIS XLIFF TC Secretary, Editor, and Liaison Officer LRC | CNGL | >> CSIS University of Limerick, Ireland >> telephone: +353-6120-2781 >> cellphone: +353-86-0222-158 >> facsimile: +353-6120-2734 >> http://www.cngl.ie/profile/?i=452 >> mailto: david.filip@ul.ie >> >> On Thu, Oct 9, 2014 at 3:49 AM, Felix Sasaki <felix@sasakiatcf.com> wrote: >> I agree with Fredrik. Processing of overlapping hierarchies is a task >> that cannot be solved in general and discarding > non-hierarchical structures is a good strategy for XML / HTML content. >> >> >> If people don't want to specify an XSLT conversion we could also define the conversion process in an algorithmic way like this: >> >> 0) set current content to whole content to be processed. >> 1) is there an s tag in current content? >> Then output text before s tag and do 2) >> else just output all text in current content. >> 2) has the s tag an e tag with corresponding id? >> Then create a mrk node >> set the content between s and e to new current content >> do 1) >> else discard s and go to 1) >> 3) output rest of text >> >> and say: you can implement this as XSLT (example given) or in >> different programing languages. That would have the benefit to keep > the door open to future non XML, API focsued XLIFF. >> >> - Felix >> >> Am 08.10.2014 um 18:41 schrieb Estreen, Fredrik <Fredrik.Estreen@lionbridge.com>: >> >>> Hi Yves, >>> >>>> Hi all, >>>> >>>> Looking at the ITS mapping: In many case we put the ITS information >>>> on a marker (<mrk> element). >>>> >>>> But such element can be represented by <sm/>...<em/> when it's >>>> overlapping another element. >>>> In that case the normal ITS scope mechanism can't work because it >>>> applies to the empty content of <sm/>, not the content between >>>> <sm/> and the corresponding <em/>. >>>> >>>> We can have provision for this in the XLIFF module. But I'm not >>>> sure it's doable in the ITS rules, especially with inheritance when >>>> there are nested annotations. >>> >>> This is an interesting problem and I doubt it is solvable in a >>> general way without additional steps. It might be solvable when > the <sm/> and <em/> is in the same segment, but I doubt it is in the case where they start and end in different segments (ie. > different sibling trees). >>> >>> One potentially workable solution would be to apply an XSLT transform on the XLIFF that merges all segments in each unit. > Discards any non ITS carrying marker (to reduce risk of overlapping > markers) and finally normalize the remaining markers to the > <mrk></mrk> spanning form. Since ITS information will likely be coming > from and going to an XML source there should not be any overlapping > markers at that stage as they would be difficult to represent in the source format. It is not guaranteed but we could declare that ill-formed. ITS global rules could then be evaluated against the transformed version. Admittedly not the most beautiful solution but I think it could work. >>> >>>> I vaguely recall that such topic was discussed at some point in the ITS-WG no? >>>> Does anyone recall the outcome? >>>> >>>> Cheers, >>>> -ys >>> >>> Regards, >>> Fredrik Estreen >>> >>> -------------------------------------------------------------------- >>> - To unsubscribe from this mail list, you must leave the OASIS TC >>> that generates this mail. Follow this link to all your TCs in OASIS >>> at: >>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.p >>> h >>> p >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe from this mail list, you must leave the OASIS TC that >> generates this mail. Follow this link to all your TCs in OASIS at: >> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.ph >> p >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS > at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > > >
Received on Thursday, 9 October 2014 19:16:29 UTC