RE: ITS rules for XLIFF 2.1

Re-sending just in case Sergey email is not in the list of allowed senders:

 

From: Sergey Nozhenko [mailto:sergey.nozhenko@logrusglobal.com] 
Sent: Thursday, August 18, 2016 12:32 PM
To: Felix Sasaki; Serge Gladkoff
Cc: public-i18n-its-ig@w3.org; Renat Bikmatov
Subject: RE: ITS rules for XLIFF 2.1

 

How about this:

 

<xliff version="2.0" xmlns="urn:oasis:names:tc:xliff:document:2.0"
srcLang="en" trgLang="ru"
 xmlns:itsm="urn:oasis:names:tc:xliff:itsm:2.1">
 <file id="f1">
  <unit id="u1">
   <segment>
    <source><sm id="sm1" type="itsm:generic"
itsm:taClassRef="http://nerd.eurecom.fr/ontology#Place"
     itsm:taIdentRef="http://dbpedia.org/resource/Arizona"/>Arizona</source>
    <target>Аризона</target>
   </segment>
   <segment>
    <source><em startRef="sm1"/> Yeah!</source>
    <target>Да!</target>
   </segment>
  </unit>
 </file>

</xliff>

 

Serge

 

From: Felix Sasaki <mailto:fsasaki@w3.org> 
Sent: 18 августа 2016 г. 8:18
To: Serge Gladkoff <mailto:serge.gladkoff@gmail.com> 
Cc: public-i18n-its-ig@w3.org; Renat Bikmatov
<mailto:renat.bikmatov@logrusglobal.com> ; Sergey Nozhenko
<mailto:sergey.nozhenko@logrusglobal.com> 
Subject: Re: ITS rules for XLIFF 2.1

 

 

Am 17.08.2016 um 23:08 schrieb Serge Gladkoff <serge.gladkoff@gmail.com>:

 

Hello Felix,

 

I am sorry to say this but our developers believe that this is a clear case
where ITS hit rock-bottom, so to speak.

 

The function of <sm>/<em> tags is to markup the areas which cannot be
annotated by one tag because this would result in invalid XML file. This
happens when the markup is conflicting with other tags. For example, with
segmentation. 

 

In such cases inheritance does not work because the beginning of the unit
may find itself inside one tag, and the end - inside another, and even on
different levels.

 

Indeed - that was exactly my point. 





 

How one could describe ITS tags distribution in such cases? 

 

By keeping your ITS processor (including inheritance behavior) as is, and
then specify additional processing for sm, as defined below. My main point
was that this does not change the behavior of a conformant ITS processor. It
is *additional* behavior. 





Indeed, it is far from clear.

 

I wouldn't call this "a small burden".

 

I implemented this as an additional behavior of my ITS processor. See 

https://github.com/fsasaki/its20-extractor/commit/4816b29f8b7010f307c5dad98b
1ab4aa92c4ae70

the changes to datacategories-2-xsl.xsl . The changes was 4 lines of code. I
am happy to look at your code with your developers, if that helps, to lower
the burden.

 

Best,

 

Felix





 

Regards,

Serge

 

 

 

From: Felix Sasaki [mailto:fsasaki@w3.org] 
Sent: Tuesday, August 16, 2016 7:20 PM
To: public-i18n-its-ig@w3.org
Subject: ITS rules for XLIFF 2.1

 

Hi all,

 

in the OASIS TC, currently the support of ITS in XLIFF 2.1 is being
discussed.

 

As part of the discussion an ITS rules file is developed. The file should
allow general ITS processors to work with XLIFF 2.X documents. There is one
issue: XLIFF has elements "sm" and "em" which are empty markers. (ITS or any
other) information then relates to the content between the start and end
marker.

 

Below is a mail I had sent to the XLIFF list to find a work around. This
would put a (small) burden on ITS processors, to deal with the sm / em
elements. See below, I tried this with my general XSLT implementation. What
do people think on this, esp. implementers?

 

Best,

 

Felix 

 

 

 

Anfang der weitergeleiteten Nachricht:

 

Von: Felix Sasaki < <mailto:felix@sasakiatcf.com> felix@sasakiatcf.com>

Betreff: Implementation of XLIFF 2.1 - ITS module

Datum: 12. August 2016 um 11:51:14 MESZ

An: XLIFF Main List < <mailto:xliff@lists.oasis-open.org>
xliff@lists.oasis-open.org>

 

Hi all,

 

I started an ITS module implementation relying on my generic ITS processor.
See the processed files here

 
<https://github.com/fsasaki/its20-extractor/tree/master/sample/xliff21sample
>
https://github.com/fsasaki/its20-extractor/tree/master/sample/xliff21sample

external-rules.xml contains the rules, currently only for text analytics.
inputfile.xml is an XLIFF 2.1 input file, currently with ITS Text Analytics
information. The output is as a list of XPath expressions in
nodelist-with-its-information.xml and as inline annotations in
output-inline-annotation.xml

 

The output shows one issue which we had discussed before, see below, taken
from output-inline-annotation.xml

 

<source>
               <itsAnn xmlns=""/>
               <sm id="sm1"
                   type="itsm:generic"
                   itsm:taClassRef=" <http://nerd.eurecom.fr/ontology#Place>
http://nerd.eurecom.fr/ontology#Place"
                   itsm:taIdentRef=" <http://dbpedia.org/resource/Arizona>
http://dbpedia.org/resource/Arizona">
                  <itsAnn xmlns="">
                     <elem>
                        <taClassRefPointer
xmlns:xlf2="urn:oasis:names:tc:xliff:document:2.0"
                                           xmlns:its="
<http://www.w3.org/2005/11/its> http://www.w3.org/2005/11/its"
                                           xmlns:datc="
<http://example.com/datacats> http://example.com/datacats"
                                           itsm:taClassRef="
<http://nerd.eurecom.fr/ontology#Place>
http://nerd.eurecom.fr/ontology#Place"/>
                        <taIdentRefPointer
xmlns:xlf2="urn:oasis:names:tc:xliff:document:2.0"
                                           xmlns:its="
<http://www.w3.org/2005/11/its> http://www.w3.org/2005/11/its"
                                           xmlns:datc="
<http://example.com/datacats> http://example.com/datacats"
                                           itsm:taIdentRef="
<http://dbpedia.org/resource/Arizona> http://dbpedia.org/resource/Arizona"/>
                     </elem>
                  </itsAnn>
               </sm>Arizona<em startRef="sm1">
                  <itsAnn xmlns=""/>
               </em>
            </source>

 

 With the ITS rules file, "sm" is annotated to have the text analytics
information. But it is actually the content between sm and em that should be
annotated. I don't know how to resolve this. Maybe we should add to the ITS
module the constraint that extends general ITS processors: if the selected
element is XLIFF sm, apply the ITS information to the next em which
corresponds to sm, via the startRef attribute. This would be a small burden
on the ITS processors, but would greatly simply the creation of the
ITS/XLIFF rules file. 

 

Thoughts?

 

Best,

 

Felix

 

Received on Thursday, 18 August 2016 11:02:16 UTC