W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > August 2016

ITS rules for XLIFF 2.1

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 16 Aug 2016 18:19:40 +0200
Message-Id: <A6390DE1-AB61-43FB-BBB4-707B7E0E904A@w3.org>
To: public-i18n-its-ig@w3.org
Hi all,

in the OASIS TC, currently the support of ITS in XLIFF 2.1 is being discussed.

As part of the discussion an ITS rules file is developed. The file should allow general ITS processors to work with XLIFF 2.X documents. There is one issue: XLIFF has elements „sm“ and „em“ which are empty markers. (ITS or any other) information then relates to the content between the start and end marker.

Below is a mail I had sent to the XLIFF list to find a work around. This would put a (small) burden on ITS processors, to deal with the sm / em elements. See below, I tried this with my general XSLT implementation. What do people think on this, esp. implementers?



> Anfang der weitergeleiteten Nachricht:
> Von: Felix Sasaki <felix@sasakiatcf.com>
> Betreff: Implementation of XLIFF 2.1 - ITS module
> Datum: 12. August 2016 um 11:51:14 MESZ
> An: XLIFF Main List <xliff@lists.oasis-open.org>
> Hi all,
> I started an ITS module implementation relying on my generic ITS processor. See the processed files here
> https://github.com/fsasaki/its20-extractor/tree/master/sample/xliff21sample <https://github.com/fsasaki/its20-extractor/tree/master/sample/xliff21sample>
> external-rules.xml contains the rules, currently only for text analytics. inputfile.xml is an XLIFF 2.1 input file, currently with ITS Text Analytics information. The output is as a list of XPath expressions in nodelist-with-its-information.xml and as inline annotations in output-inline-annotation.xml
> The output shows one issue which we had discussed before, see below, taken from output-inline-annotation.xml
> <source>
>                <itsAnn xmlns=""/>
>                <sm id="sm1"
>                    type="itsm:generic"
>                    itsm:taClassRef="http://nerd.eurecom.fr/ontology#Place <http://nerd.eurecom.fr/ontology#Place>"
>                    itsm:taIdentRef="http://dbpedia.org/resource/Arizona <http://dbpedia.org/resource/Arizona>">
>                   <itsAnn xmlns="">
>                      <elem>
>                         <taClassRefPointer xmlns:xlf2="urn:oasis:names:tc:xliff:document:2.0"
>                                            xmlns:its="http://www.w3.org/2005/11/its <http://www.w3.org/2005/11/its>"
>                                            xmlns:datc="http://example.com/datacats <http://example.com/datacats>"
>                                            itsm:taClassRef="http://nerd.eurecom.fr/ontology#Place <http://nerd.eurecom.fr/ontology#Place>"/>
>                         <taIdentRefPointer xmlns:xlf2="urn:oasis:names:tc:xliff:document:2.0"
>                                            xmlns:its="http://www.w3.org/2005/11/its <http://www.w3.org/2005/11/its>"
>                                            xmlns:datc="http://example.com/datacats <http://example.com/datacats>"
>                                            itsm:taIdentRef="http://dbpedia.org/resource/Arizona <http://dbpedia.org/resource/Arizona>"/>
>                      </elem>
>                   </itsAnn>
>                </sm>Arizona<em startRef="sm1">
>                   <itsAnn xmlns=""/>
>                </em>
>             </source>
>  With the ITS rules file, „sm“ is annotated to have the text analytics information. But it is actually the content between sm and em that should be annotated. I don’t know how to resolve this. Maybe we should add to the ITS module the constraint that extends general ITS processors: if the selected element is XLIFF sm, apply the ITS information to the next em which corresponds to sm, via the startRef attribute. This would be a small burden on the ITS processors, but would greatly simply the creation of the ITS/XLIFF rules file. 
> Thoughts?
> Best,
> Felix
Received on Tuesday, 16 August 2016 16:19:44 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 16 August 2016 16:19:45 UTC