- From: Sergey Nozhenko <sergey.nozhenko@logrusglobal.com>
- Date: Tue, 23 Aug 2016 22:50:18 +0300
- To: Felix Sasaki <fsasaki@w3.org>
- CC: Serge Gladkoff <serge.gladkoff@gmail.com>, "public-i18n-its-ig@w3.org" <public-i18n-its-ig@w3.org>, Renat Bikmatov <renat.bikmatov@logrusglobal.com>
- Message-ID: <25b85751-2305-06f0-50c3-c7dc3cee071a@logrusglobal.com>
Hi, sm and em elements may be nested in mrk and overlap it. For example: <xliff version="2.0" xmlns="urn:oasis:names:tc:xliff:document:2.0" srcLang="en" xmlns:itsm="urn:oasis:names:tc:xliff:itsm:2.1"> <file id="f1"> <unit id="u1"> <segment> <source><mrk id="m1" translate="no" type="term">Text1 <sm id="sm1" type="itsm:generic" itsm:taClassRef="http://example/ontology#Thing" itsm:taIdentRef="http://example.com/ref"/>Text2.</mrk></source> </segment> <segment> <source>Text4<em startRef="sm1"/> text5.</source> </segment> </unit> </file> </xliff> Serge On 23.08.2016 19:48, Felix Sasaki wrote: > Apologies for the late reply, Sergey, Serge and all. > > The issue is an XLIFF issue related to the annotations mechanism > http://docs.oasis-open.org/xliff/xliff-core/v2.0/os/xliff-core-v2.0-os.html#annotations > if an annotation in XLIFF is represented with sm and em, the > application has to find the content relating to the annotation. > > I think this is doable, both for general XLIFF annotations (e.g. of > terms) and ITS annotations. I updated my implementation with an XPath > expression that has a larger search space than the previous one. The > new expression searches for the corresponding em tag in the following > nodes that have the same parent node type (e.g. all source elements or > all target elements). > > It seems to work, see the more complex input > https://github.com/fsasaki/its20-extractor/blob/master/sample/xliff21sample/inputfile.xml > and output > https://github.com/fsasaki/its20-extractor/blob/master/sample/xliff21sample/output-inline-annotation.xml > and the adapted XPath at > https://github.com/fsasaki/its20-extractor/commit/62428b4484df7a073be3c2c0033e2a389dc83350 > in tools/datacategories-2-xsl.xsl. > > I’m happy to work on this more if you give me more XLIFF annotation > samples. > > Best, > > Felix > > >> Am 18.08.2016 um 11:32 schrieb Sergey Nozhenko >> <sergey.nozhenko@logrusglobal.com >> <mailto:sergey.nozhenko@logrusglobal.com>>: >> >> How about this: >> >> <xliff version="2.0" xmlns="urn:oasis:names:tc:xliff:document:2.0" srcLang="en" trgLang="ru" >> xmlns:itsm="urn:oasis:names:tc:xliff:itsm:2.1"> >> <file id="f1"> >> <unit id="u1"> >> <segment> >> <source><sm id="sm1" type="itsm:generic" itsm:taClassRef="http://nerd.eurecom.fr/ontology#Place" >> itsm:taIdentRef="http://dbpedia.org/resource/Arizona"/>Arizona</source> >> <target>Аризона</target> >> </segment> >> <segment> >> <source><em startRef="sm1"/> Yeah!</source> >> <target>Да!</target> >> </segment> >> </unit> >> </file> >> </xliff> >> >> Serge >> >> *From:*Felix Sasaki <mailto:fsasaki@w3.org> >> *Sent:*18 августа 2016 г. 8:18 >> *To:*Serge Gladkoff <mailto:serge.gladkoff@gmail.com> >> *Cc:*public-i18n-its-ig@w3.org >> <mailto:public-i18n-its-ig@w3.org>;Renat Bikmatov >> <mailto:renat.bikmatov@logrusglobal.com>;Sergey Nozhenko >> <mailto:sergey.nozhenko@logrusglobal.com> >> *Subject:*Re: ITS rules for XLIFF 2.1 >> >> >>> Am 17.08.2016 um 23:08 schrieb Serge Gladkoff >>> <serge.gladkoff@gmail.com <mailto:serge.gladkoff@gmail.com>>: >>> >>> Hello Felix, >>> I am sorry to say this but our developers believe that this is a >>> clear case where ITS hit rock-bottom, so to speak. >>> The function of <sm>/<em> tags is to markup the areas which cannot >>> be annotated by one tag because this would result in invalid XML >>> file. This happens when the markup is conflicting with other tags. >>> For example, with segmentation. >>> In such cases inheritance does not work because the beginning of the >>> unit may find itself inside one tag, and the end – inside another, >>> and even on different levels. >> >> Indeed - that was exactly my point. >> >>> How one could describe ITS tags distribution in such cases? >> >> By keeping your ITS processor (including inheritance behavior) as is, >> and then specify additional processing for sm, as defined below. My >> main point was that this does not change the behavior of a conformant >> ITS processor. It is *additional* behavior. >> >>> Indeed, it is far from clear. >>> I wouldn't call this “a small burden”. >> >> I implemented this as an additional behavior of my ITS processor. See >> https://github.com/fsasaki/its20-extractor/commit/4816b29f8b7010f307c5dad98b1ab4aa92c4ae70 >> the changes to datacategories-2-xsl.xsl . The changes was 4 lines of >> code. I am happy to look at your code with your developers, if that >> helps, to lower the burden. >> >> Best, >> >> Felix >> >>> Regards, >>> Serge >>> *From:*Felix Sasaki [mailto:fsasaki@w3.org] >>> *Sent:*Tuesday, August 16, 2016 7:20 PM >>> *To:*public-i18n-its-ig@w3.org <mailto:public-i18n-its-ig@w3.org> >>> *Subject:*ITS rules for XLIFF 2.1 >>> Hi all, >>> in the OASIS TC, currently the support of ITS in XLIFF 2.1 is being >>> discussed. >>> As part of the discussion an ITS rules file is developed. The file >>> should allow general ITS processors to work with XLIFF 2.X >>> documents. There is one issue: XLIFF has elements „sm“ and „em“ >>> which are empty markers. (ITS or any other) information then relates >>> to the content between the start and end marker. >>> Below is a mail I had sent to the XLIFF list to find a work around. >>> This would put a (small) burden on ITS processors, to deal with the >>> sm / em elements. See below, I tried this with my general XSLT >>> implementation. What do people think on this, esp. implementers? >>> Best, >>> Felix >>> >>> >>> Anfang der weitergeleiteten Nachricht: >>> *Von: *Felix Sasaki <felix@sasakiatcf.com <mailto:felix@sasakiatcf.com>> >>> *Betreff: Implementation of XLIFF 2.1 - ITS module* >>> *Datum: *12. August 2016 um 11:51:14 MESZ >>> *An: *XLIFF Main List <xliff@lists.oasis-open.org >>> <mailto:xliff@lists.oasis-open.org>> >>> Hi all, >>> I started an ITS module implementation relying on my generic ITS >>> processor. See the processed files here >>> https://github.com/fsasaki/its20-extractor/tree/master/sample/xliff21sample >>> external-rules.xml contains the rules, currently only for text >>> analytics. inputfile.xml is an XLIFF 2.1 input file, currently with >>> ITS Text Analytics information. The output is as a list of XPath >>> expressions in nodelist-with-its-information.xml and as inline >>> annotations in output-inline-annotation.xml >>> The output shows one issue which we had discussed before, see below, >>> taken from output-inline-annotation.xml >>> <source> >>> <itsAnn xmlns=""/> >>> <sm id="sm1" >>> type="itsm:generic" >>> itsm:taClassRef="http://nerd.eurecom.fr/ontology#Place" >>> itsm:taIdentRef="http://dbpedia.org/resource/Arizona"> >>> <itsAnn xmlns=""> >>> <elem> >>> <taClassRefPointer xmlns:xlf2="urn:oasis:names:tc:xliff:document:2.0" >>> xmlns:its="http://www.w3.org/2005/11/its" >>> xmlns:datc="http://example.com/datacats" >>> itsm:taClassRef="http://nerd.eurecom.fr/ontology#Place"/> >>> <taIdentRefPointer xmlns:xlf2="urn:oasis:names:tc:xliff:document:2.0" >>> xmlns:its="http://www.w3.org/2005/11/its" >>> xmlns:datc="http://example.com/datacats" >>> itsm:taIdentRef="http://dbpedia.org/resource/Arizona"/> >>> </elem> >>> </itsAnn> >>> </sm>Arizona<em startRef="sm1"> >>> <itsAnn xmlns=""/> >>> </em> >>> </source> >>> With the ITS rules file, „sm“ is annotated to have the text >>> analytics information. But it is actually the content between sm and >>> em that should be annotated. I don’t know how to resolve this. Maybe >>> we should add to the ITS module the constraint that extends general >>> ITS processors: if the selected element is XLIFF sm, apply the ITS >>> information to the next em which corresponds to sm, via the startRef >>> attribute. This would be a small burden on the ITS processors, but >>> would greatly simply the creation of the ITS/XLIFF rules file. >>> Thoughts? >>> Best, >>> Felix >
Received on Tuesday, 23 August 2016 19:51:43 UTC