Re: ACTION-54: Try to come up with example of xliff+its test format / output from Felix Sasaki on 2014-11-06 (public-i18n-its-ig@w3.org from November 2014)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 6 Nov 2014 17:14:06 +0100
To: Yves Savourel <ysavourel@enlaso.com>
Cc: public-i18n-its-ig <public-i18n-its-ig@w3.org>
Message-Id: <CB22B329-C840-4112-B38C-018FBA70DB97@w3.org>
Am 06.11.2014 um 16:30 schrieb Yves Savourel <ysavourel@enlaso.com>:

> Hi Felix,
> 
> I have nothing against this, but is there a reason for using offset? I suppose it would make the output more compact when you have
> large chunks of text.

Yes, that is one motivation. The other is that one needs the offset for the NIF conversion which we have done for ITS only, and it would be nice to have that for ITS+XLIFF too.

> 
> Using offset would make the creation of the output slightly more complicated (at least from a Java parser viewpoint). 

Understand. I hope to get a 2nd implementation, at least for „Translate" soon ...

Best,

Felix

> 
> Cheers,
> -yves
> 
> 
> -----Original Message-----
> From: Felix Sasaki [mailto:felix@sasakiatcf.com] 
> Sent: Thursday, November 6, 2014 5:24 AM
> To: Yves Savourel
> Cc: public-i18n-its-ig
> Subject: Re: ACTION-54: Try to come up with example of xliff+its test format / output
> 
> Hi Yves, all,
> 
> thanks, this looks good. One suggestion. Currently you are copying strings from the input in the path description, in case of text
> nodes; e.g.
> 
>> /xliff/file/unit/source/"DATA "
> 
> maybe one could also work with character offsets, e.g.
> /xliff/file/unit/source/#char=0,5"
> And say that a tool that generates the output should preserve white space then generating the offsets?
> The syntax
> #char=0,5
> is not important, just a way to identify the offsets.
> 
> Cheers,
> 
> - Felix
> 
> Am 04.11.2014 um 04:19 schrieb Yves Savourel <ysavourel@enlaso.com>:
> 
>> Hi all,
>> 
>> Following up on this action item:
>> 
>> The initial thought was to use a text file with two columns:
>> - The first one with XLIFF's fragment identifier.
>> - The second with the ITS data for the given element.
>> 
>> But I realized since that not all locations with ITS data will have an 
>> ID, so it may be better to use something different for the first column, closer to what we have with the ITS test output.
>> 
>> The first column would be the 'path' of the element, up to the unit, 
>> then, depending on the type of node, some additional
>> information: fragId for the markers, quoted text for the text. Because 
>> XLIFF may have overlapping markers, we need to also represent the text nodes as they may show inherited information.
>> 
>> For example for the Translate data category the following file:
>> 
>> <?xml version="1.0"?>
>> <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.1" srcLang="en"
>> xmlns:xits="urn:oasis:names:tc:xliff:xits:2.1">
>> <file id="f1" translate="no">
>> <unit id="u1">
>>  <segment>
>>   <source>Source 1.</source>
>>  </segment>
>> </unit>
>> <unit id="u2" translate="yes">
>>  <segment>
>>   <source>Text <mrk id="m1" translate="no">DATA <mrk id="m2" translate="yes">text </mrk>DATA </mrk> text.</source>
>>  </segment>
>> </unit>
>> <unit id="u3" translate="yes">
>>  <segment>
>>   <source><sm id="m1" translate="yes"/>Text <sm id="m2" 
>> translate="no"/>DATA <em startRef="m1"/>DATA <em startRef="m2"/>text.</source>
>>  </segment>
>> </unit>
>> </file>
>> </xliff>
>> 
>> Would result in the following output:
>> 
>> /xliff	translate=yes
>> /xliff/file	translate=no
>> /xliff/file/unit	translate=no
>> /xliff/file/unit/source/"Source 1."	translate=no
>> /xliff/file/unit	translate=yes
>> /xliff/file/unit/source/"Text "	translate=yes
>> /xliff/file/unit/source/{START:/f=f1/u=u2/m1}	translate=no
>> /xliff/file/unit/source/"DATA "	translate=no
>> /xliff/file/unit/source/{START:/f=f1/u=u2/m2}	translate=yes
>> /xliff/file/unit/source/"text "	translate=yes
>> /xliff/file/unit/source/{END:/f=f1/u=u2/m2}	translate=no
>> /xliff/file/unit/source/"DATA "	translate=no
>> /xliff/file/unit/source/{END:/f=f1/u=u2/m1}	translate=yes
>> /xliff/file/unit/source/" text."	translate=yes
>> /xliff/file/unit	translate=yes
>> /xliff/file/unit/source/{START:/f=f1/u=u3/m1}	translate=yes
>> /xliff/file/unit/source/"Text "	translate=yes
>> /xliff/file/unit/source/{START:/f=f1/u=u3/m2}	translate=no
>> /xliff/file/unit/source/"DATA "	translate=no
>> /xliff/file/unit/source/{END:/f=f1/u=u3/m1}	translate=no
>> /xliff/file/unit/source/"DATA "	translate=no
>> /xliff/file/unit/source/{END:/f=f1/u=u3/m2}	translate=yes
>> /xliff/file/unit/source/"text."	translate=yes
>> 
>> The start markers would show the metadata for the node, the end 
>> markers would show the metadata for after the marker is closed (or both start and end can show the metadata for the span they
> denote: it doesn't really matter).
>> 
>> This is just something to start with, feedback and better ideas are welcome.
>> 
>> In the spirit of implementing things early and often, I've implemented a new command in the Lynx tool that creates the test file.
>> You can do for example:
>> 
>> C:/>lynx -its translate myFile.xlf
>> 
>> This will generates myFile.xlf.txt with the test results (and output 
>> them on the console). Just type -its ? to get the list of the data categories currently supported.
>> 
>> The latest version of Lynx is here:
>> http://okapi.opentag.com/snapshots/okapi-xliffLib_all-platforms_1.1-SN
>> APSHOT.zip
>> 
>> Cheers,
>> -yves
>> 
>> 
>> 
>> 
>> 
> 
> 
>
Received on Thursday, 6 November 2014 16:14:36 UTC