RE: ACTION-54: Try to come up with example of xliff+its test format / output from Yves Savourel on 2014-11-06 (public-i18n-its-ig@w3.org from November 2014)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Thu, 6 Nov 2014 08:30:28 -0700
To: "'public-i18n-its-ig'" <public-i18n-its-ig@w3.org>
Message-ID: <004d01cff9d6$9812f310$c838d930$@enlaso.com>
Hi Felix,

I have nothing against this, but is there a reason for using offset? I suppose it would make the output more compact when you have
large chunks of text.

Using offset would make the creation of the output slightly more complicated (at least from a Java parser viewpoint). 

Cheers,
-yves


-----Original Message-----
From: Felix Sasaki [mailto:felix@sasakiatcf.com] 
Sent: Thursday, November 6, 2014 5:24 AM
To: Yves Savourel
Cc: public-i18n-its-ig
Subject: Re: ACTION-54: Try to come up with example of xliff+its test format / output

Hi Yves, all,

thanks, this looks good. One suggestion. Currently you are copying strings from the input in the path description, in case of text
nodes; e.g.

> /xliff/file/unit/source/"DATA "

maybe one could also work with character offsets, e.g.
/xliff/file/unit/source/#char=0,5"
And say that a tool that generates the output should preserve white space then generating the offsets?
The syntax
#char=0,5
is not important, just a way to identify the offsets.

Cheers,

- Felix

Am 04.11.2014 um 04:19 schrieb Yves Savourel <ysavourel@enlaso.com>:

> Hi all,
> 
> Following up on this action item:
> 
> The initial thought was to use a text file with two columns:
> - The first one with XLIFF's fragment identifier.
> - The second with the ITS data for the given element.
> 
> But I realized since that not all locations with ITS data will have an 
> ID, so it may be better to use something different for the first column, closer to what we have with the ITS test output.
> 
> The first column would be the 'path' of the element, up to the unit, 
> then, depending on the type of node, some additional
> information: fragId for the markers, quoted text for the text. Because 
> XLIFF may have overlapping markers, we need to also represent the text nodes as they may show inherited information.
> 
> For example for the Translate data category the following file:
> 
> <?xml version="1.0"?>
> <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.1" srcLang="en"
> xmlns:xits="urn:oasis:names:tc:xliff:xits:2.1">
> <file id="f1" translate="no">
>  <unit id="u1">
>   <segment>
>    <source>Source 1.</source>
>   </segment>
>  </unit>
>  <unit id="u2" translate="yes">
>   <segment>
>    <source>Text <mrk id="m1" translate="no">DATA <mrk id="m2" translate="yes">text </mrk>DATA </mrk> text.</source>
>   </segment>
>  </unit>
>  <unit id="u3" translate="yes">
>   <segment>
>    <source><sm id="m1" translate="yes"/>Text <sm id="m2" 
> translate="no"/>DATA <em startRef="m1"/>DATA <em startRef="m2"/>text.</source>
>   </segment>
>  </unit>
> </file>
> </xliff>
> 
> Would result in the following output:
> 
> /xliff	translate=yes
> /xliff/file	translate=no
> /xliff/file/unit	translate=no
> /xliff/file/unit/source/"Source 1."	translate=no
> /xliff/file/unit	translate=yes
> /xliff/file/unit/source/"Text "	translate=yes
> /xliff/file/unit/source/{START:/f=f1/u=u2/m1}	translate=no
> /xliff/file/unit/source/"DATA "	translate=no
> /xliff/file/unit/source/{START:/f=f1/u=u2/m2}	translate=yes
> /xliff/file/unit/source/"text "	translate=yes
> /xliff/file/unit/source/{END:/f=f1/u=u2/m2}	translate=no
> /xliff/file/unit/source/"DATA "	translate=no
> /xliff/file/unit/source/{END:/f=f1/u=u2/m1}	translate=yes
> /xliff/file/unit/source/" text."	translate=yes
> /xliff/file/unit	translate=yes
> /xliff/file/unit/source/{START:/f=f1/u=u3/m1}	translate=yes
> /xliff/file/unit/source/"Text "	translate=yes
> /xliff/file/unit/source/{START:/f=f1/u=u3/m2}	translate=no
> /xliff/file/unit/source/"DATA "	translate=no
> /xliff/file/unit/source/{END:/f=f1/u=u3/m1}	translate=no
> /xliff/file/unit/source/"DATA "	translate=no
> /xliff/file/unit/source/{END:/f=f1/u=u3/m2}	translate=yes
> /xliff/file/unit/source/"text."	translate=yes
> 
> The start markers would show the metadata for the node, the end 
> markers would show the metadata for after the marker is closed (or both start and end can show the metadata for the span they
denote: it doesn't really matter).
> 
> This is just something to start with, feedback and better ideas are welcome.
> 
> In the spirit of implementing things early and often, I've implemented a new command in the Lynx tool that creates the test file.
> You can do for example:
> 
> C:/>lynx -its translate myFile.xlf
> 
> This will generates myFile.xlf.txt with the test results (and output 
> them on the console). Just type -its ? to get the list of the data categories currently supported.
> 
> The latest version of Lynx is here:
> http://okapi.opentag.com/snapshots/okapi-xliffLib_all-platforms_1.1-SN
> APSHOT.zip
> 
> Cheers,
> -yves
> 
> 
> 
> 
>
Received on Thursday, 6 November 2014 15:31:01 UTC