Re: ACTION-54: Try to come up with example of xliff+its test format / output from Felix Sasaki on 2014-11-06 (public-i18n-its-ig@w3.org from November 2014)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 6 Nov 2014 17:20:03 +0100
To: Yves Savourel <ysavourel@enlaso.com>
Cc: public-i18n-its-ig <public-i18n-its-ig@w3.org>
Message-Id: <E46517CD-CF50-48A4-BABE-A4B8D52AF34E@w3.org>
Am 06.11.2014 um 17:17 schrieb Yves Savourel <ysavourel@enlaso.com>:

>> The other is that one needs the offset for the NIF conversion 
>> which we have done for ITS only, and it would be nice to have 
>> that for ITS+XLIFF too.
> 
> Hum... not sure I understand: why would you want to convert the output file of a test into NIF?

Sorry, I was not clear: I would not convert the output of the test to NIF, but generating NIF would be similar to the test output file generation anyway.

- Felix

> 
> But the first reason may be good enough.
> 
> -ys
> 
> -----Original Message-----
> From: Felix Sasaki [mailto:fsasaki@w3.org] 
> Sent: Thursday, November 6, 2014 9:14 AM
> To: Yves Savourel
> Cc: public-i18n-its-ig
> Subject: Re: ACTION-54: Try to come up with example of xliff+its test format / output
> 
> 
> Am 06.11.2014 um 16:30 schrieb Yves Savourel <ysavourel@enlaso.com>:
> 
>> Hi Felix,
>> 
>> I have nothing against this, but is there a reason for using offset? I 
>> suppose it would make the output more compact when you have large chunks of text.
> 
> Yes, that is one motivation. The other is that one needs the offset for the NIF conversion which we have done for ITS only, and it
> would be nice to have that for ITS+XLIFF too.
> 
>> 
>> Using offset would make the creation of the output slightly more complicated (at least from a Java parser viewpoint). 
> 
> Understand. I hope to get a 2nd implementation, at least for "Translate" soon ...
> 
> Best,
> 
> Felix
> 
>> 
>> Cheers,
>> -yves
>> 
>> 
>> -----Original Message-----
>> From: Felix Sasaki [mailto:felix@sasakiatcf.com]
>> Sent: Thursday, November 6, 2014 5:24 AM
>> To: Yves Savourel
>> Cc: public-i18n-its-ig
>> Subject: Re: ACTION-54: Try to come up with example of xliff+its test 
>> format / output
>> 
>> Hi Yves, all,
>> 
>> thanks, this looks good. One suggestion. Currently you are copying 
>> strings from the input in the path description, in case of text nodes; e.g.
>> 
>>> /xliff/file/unit/source/"DATA "
>> 
>> maybe one could also work with character offsets, e.g.
>> /xliff/file/unit/source/#char=0,5"
>> And say that a tool that generates the output should preserve white space then generating the offsets?
>> The syntax
>> #char=0,5
>> is not important, just a way to identify the offsets.
>> 
>> Cheers,
>> 
>> - Felix
>> 
>> Am 04.11.2014 um 04:19 schrieb Yves Savourel <ysavourel@enlaso.com>:
>> 
>>> Hi all,
>>> 
>>> Following up on this action item:
>>> 
>>> The initial thought was to use a text file with two columns:
>>> - The first one with XLIFF's fragment identifier.
>>> - The second with the ITS data for the given element.
>>> 
>>> But I realized since that not all locations with ITS data will have 
>>> an ID, so it may be better to use something different for the first column, closer to what we have with the ITS test output.
>>> 
>>> The first column would be the 'path' of the element, up to the unit, 
>>> then, depending on the type of node, some additional
>>> information: fragId for the markers, quoted text for the text. 
>>> Because XLIFF may have overlapping markers, we need to also represent the text nodes as they may show inherited information.
>>> 
>>> For example for the Translate data category the following file:
>>> 
>>> <?xml version="1.0"?>
>>> <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.1" srcLang="en"
>>> xmlns:xits="urn:oasis:names:tc:xliff:xits:2.1">
>>> <file id="f1" translate="no">
>>> <unit id="u1">
>>> <segment>
>>>  <source>Source 1.</source>
>>> </segment>
>>> </unit>
>>> <unit id="u2" translate="yes">
>>> <segment>
>>>  <source>Text <mrk id="m1" translate="no">DATA <mrk id="m2" 
>>> translate="yes">text </mrk>DATA </mrk> text.</source>  </segment> 
>>> </unit> <unit id="u3" translate="yes">  <segment>
>>>  <source><sm id="m1" translate="yes"/>Text <sm id="m2" 
>>> translate="no"/>DATA <em startRef="m1"/>DATA <em 
>>> startRef="m2"/>text.</source>  </segment> </unit> </file> </xliff>
>>> 
>>> Would result in the following output:
>>> 
>>> /xliff	translate=yes
>>> /xliff/file	translate=no
>>> /xliff/file/unit	translate=no
>>> /xliff/file/unit/source/"Source 1."	translate=no
>>> /xliff/file/unit	translate=yes
>>> /xliff/file/unit/source/"Text "	translate=yes
>>> /xliff/file/unit/source/{START:/f=f1/u=u2/m1}	translate=no
>>> /xliff/file/unit/source/"DATA "	translate=no
>>> /xliff/file/unit/source/{START:/f=f1/u=u2/m2}	translate=yes
>>> /xliff/file/unit/source/"text "	translate=yes
>>> /xliff/file/unit/source/{END:/f=f1/u=u2/m2}	translate=no
>>> /xliff/file/unit/source/"DATA "	translate=no
>>> /xliff/file/unit/source/{END:/f=f1/u=u2/m1}	translate=yes
>>> /xliff/file/unit/source/" text."	translate=yes
>>> /xliff/file/unit	translate=yes
>>> /xliff/file/unit/source/{START:/f=f1/u=u3/m1}	translate=yes
>>> /xliff/file/unit/source/"Text "	translate=yes
>>> /xliff/file/unit/source/{START:/f=f1/u=u3/m2}	translate=no
>>> /xliff/file/unit/source/"DATA "	translate=no
>>> /xliff/file/unit/source/{END:/f=f1/u=u3/m1}	translate=no
>>> /xliff/file/unit/source/"DATA "	translate=no
>>> /xliff/file/unit/source/{END:/f=f1/u=u3/m2}	translate=yes
>>> /xliff/file/unit/source/"text."	translate=yes
>>> 
>>> The start markers would show the metadata for the node, the end 
>>> markers would show the metadata for after the marker is closed (or 
>>> both start and end can show the metadata for the span they
>> denote: it doesn't really matter).
>>> 
>>> This is just something to start with, feedback and better ideas are welcome.
>>> 
>>> In the spirit of implementing things early and often, I've implemented a new command in the Lynx tool that creates the test file.
>>> You can do for example:
>>> 
>>> C:/>lynx -its translate myFile.xlf
>>> 
>>> This will generates myFile.xlf.txt with the test results (and output 
>>> them on the console). Just type -its ? to get the list of the data categories currently supported.
>>> 
>>> The latest version of Lynx is here:
>>> http://okapi.opentag.com/snapshots/okapi-xliffLib_all-platforms_1.1-S
>>> N
>>> APSHOT.zip
>>> 
>>> Cheers,
>>> -yves
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
>
Received on Thursday, 6 November 2014 16:20:36 UTC